Help us help you : keys to getting good answers

2012-06-21 by . 4 comments

Post to Twitter

http://www.flickr.com/photos/colinkinner/2200500024/

After having answered some 250 questions here on dba.SE, and over 1,500 questions on StackOverflow, I’ve read some good questions, and I’ve read some bad questions. I’m no Jon Skeet, but I think I can offer some perspective on how to ask effective database-related questions on this site and get solutions to your problems. So what follows is some advice about things you should be sure to think about before, during and after posting your question. Not all points will be relevant for all types of questions, but some are universal. I know this may seem like a big laundry list of rules, but please, bear with me – at the very least, read all the section titles (and the last section about reading over your question).


Try to solve exactly one problem

Don’t ask a run-on question that asks whether you should use merge or peer-to-peer replication between your two data centers in Europe, if SQL Server Express is capable of handling the back end for your beekeeping equipment store, and which MySQL engine you should use for storing lots and lots of integers. Those are three separate questions. Make sure there is a problem statement that is clear and concise, and that you really are trying to solve a specific problem. You might think you can invoke some good dialog if you ask for the “best” approach to high availability, but such a question will likely be closed because it is far too broad and will simply spur opinion, speculation and debate (never mind a stream of follow-up questions for more information).

Quoting “best” in the previous paragraph was not an accident – don’t ask what the “best” solution is for anything. Be specific about how you determine “best” – for some this is efficient (not performant :-)), and could be in terms of duration, memory, storage, or a host of other metrics; for others it is about maintainability, cost, simplicity, or something else entirely. Be specific about what factors the “best” answer will consider. If you just say “give me the best answer” then all of the people reading the question are going to be left on their own to determine what they think you mean by “best” – and this can lead to answers that don’t meet your criteria.


Post sample data and desired results

A lot of people get quite offended when asked for sample data. We don’t need your confidential data, and we don’t need your entire database. But enough sample rows to demonstrate the problem you are having, or for us to work with in order to get the output you’re expecting. Check out SQL Fiddle – this is a great place to mock up some dummy schema and sample data so that we can see the query you’re running.

The mistake most people make is they spend several minutes crafting a paragraph where they describe their issue using a word problem. I don’t know about you, but I hated word problems in school, and they aren’t any more interesting to me now. We’re data people and we speak much better with schema and sample data. For SQL Server specifically, you can easily generate scripts for both the table and some data using Management Studio. In an ideal world, you will have actual CREATE TABLE and INSERT statements that we can copy and paste into our own Management Studios without a bunch of translation. This is good:

CREATE TABLE dbo.splunge(mort INT, meld DATETIME);
INSERT dbo.splunge(mort, meld) VALUES
(1, CURRENT_TIMESTAMP), (2, '20120101'), (3, NULL);
This is not so good:
splunge has columns mort (integer), meld (date time)
mort meld
1 5/6/2012
2343 1/1/09
Not only is the latter much harder for us to reverse engineer, but another important thing that seems to escape a lot of folks: don’t use ambiguous date formats! Your audience isn’t necessarily in the same country as you or speaks your language as their first language. So their interpretation of 5/6/2012 might be different from yours. Always use yyyymmdd or yyyy-mm-dd (even though the latter is not safe in code, it is the most understandable in sample data).


Tell us what you’ve tried

Posting sample data, desired results and a query that you’ve tried goes a long way in showing us what you’ve already tried, and makes it very easy for us to take what you’ve provided and understand, fix or improve it.

You will meet some opposition (and potentially have your question closed before you get an answer) if you just ask, “tell me how to do this.” If your question shows a significant lack of research, you won’t get much sympathy from your peers. Most of the people providing you with valuable help did not become experts in their field by being spoon-fed answers – they worked at it by learning their platforms and programs, and trying to make things work (or trying to break them).

It’s okay if your question is for homework, and it’s ok if you’re in a rush, but even more so in this case, you should explain what you’ve tried (or why you couldn’t learn what to do based on the lecture or other materials, or why you’ve waited until the last minute). So along with some sample data and desired results, post the query (or queries) that you’ve tried, and why the results from those queries aren’t exactly what you wanted.

Here is an example of a recent SQL Server question, word for word, that shows very little research:

If a field’s datatype is a datetime, and does not allow nulls and there is no default value set, does the database enter the current timestamp?? I queried both here and MSDN but could not find an answer to this. Thanks.

Now, I can appreciate that the person tried searching MSDN first, before turning to a Q & A site. But how hard could it have been to simply create a table with a non-null datetime column, try an insert and see what happens? I could do it here, and it would take less time than it took to type the above question, but I don’t think I have to in order to prove the point.


Post real execution plans

If you’re asked for execution plans, don’t paste the SHOWPLAN_TEXT output, and don’t bother running an estimated plan. A screen shot of the actual graphical plan is a start, but it doesn’t contain nearly enough information to act on – we can only see the costly operators, but none of the data behind it. Ideally you can post the .SQLPlan file for an actual execution plan somewhere.

In a perfect world, you will generate the actual execution plan from within my company’s free execution plan analysis tool, SQL Sentry Plan Explorer, and save the result as a .QueryAnalysis file. In addition to including all of the information from the execution plan XML (which isn’t visible when you post a screen shot of the graphical plan, and which contains tons of information not available in SHOWPLAN_TEXT), and some runtime information (such as actual vs. estimated row counts, a comparison you can’t do with an estimated plan), you also get actual runtime metrics such as CPU and duration, which can greatly help in determining the resources being used for a particular query (or part of it).

You can post your files to github:gist or Pastebin. If you’re concerned about confidentiality (table names, etc.) and it is not convenient to try to mock up a similar case with less sensitive names, you can try to take the .SQLPlan or .QueryAnalysis file and mask your sensitive names using search and replace. Do so carefully, however – make sure you can still open the file after you’ve saved your modifications.


Don’t say “it’s broken”

If you are getting an error message, post the actual error message. Nobody knows what you mean when you say “it’s broken”, “it’s not working” or “SQL Server doesn’t seem to like it.” Leaving this information out might seem like you are doing everyone a favor, by keeping your question short, but if the reader doesn’t know what’s wrong, they’re not going to be very equipped to help.


Don’t over-simplify

I realize you don’t want to post a novel, and we certainly don’t need to see your entire 3,000 line stored procedure. But you will want to reduce the code and/or problem statement to a digestible chunk. There is a fine balance, though: don’t discard vital information. I’ve spent much valuable time solving problem A, when it turns out later that the solution only covers the simple case mentioned in the question, and not the 40 other edge cases the user had simplified away (trying to be helpful). We’re smart people; we can handle more details, as long as they’re cogent to the question.


Be open to solutions

Come into your question with an open mind. Don’t ask, “How can I accomplish x without using y?” Or “with y,” for that matter. Instead, ask, “How can I accomplish x?” If you have reasons for avoiding y, or requiring y, state them. But unless you are locked into a specific approach, don’t make that a condition of your question; relegate it to peripheral information. There might be a compelling argument for y that you don’t know about, that outweighs your reason against it (or vice-versa). The point is to state the problem you’re trying to solve, or the goal you’re trying to accomplish, rather than telling everyone you’ve already decided to solve it with y (or without y) and are just having this little problem with it.

It also helps if you reveal the motivation behind your question and give some context. For example, consider the question, “How can I read SELECTs and DML from the SQL Server transaction log?” The answer is you can’t, because SELECTs aren’t logged and, for DML, only the underlying operations SQL Server had to do in order to satisfy your DML is there. If you tell us that the reason you want to do this is because you have a table you need to retire, and you need to track down all the applications that talk to it, we can offer several alternatives that will do a much more complete job that hacking into the log (such as SQL Audit, server-side trace, etc). Initially the answer would have been that you can go spend a whole lot of money on 3rd party tools that will reverse engineer those log statements for you, but in the end that would not have been sufficient because the log does not contain information like user name, program name and host name.


Tag effectively

Don’t choose meaningless tags like column, error or query. Nobody is following tags like query because they are far too broad and cover too many database platforms. The context in your question, and title, are going to be much more useful in telling readers what your problem is about. Tags are used to narrow down your question to a reader’s field of interest. I’m not interested in errors in general, but I am interested in errors when using mirroring in sql-server.

Also, don’t tag a question as both mysql and sql-server if you are really only interested in a solution for one platform or the other. Casting a wider net in the hopes that someone following the MySQL-related tags will know the answer for SQL Server, or vice versa, is like calling the BMW shop to get a quote on a Mercedes. The BMW sales guy might have current pricing information, because he has a buddy at the Mercedes dealership, but you’re much more likely to offend the guy for wasting his time.

Finally, don’t just say you’re using SQL Server (or Oracle, or MySQL, or what have you). Tell us the version, too. Many new features have been added as new releases come out… if we know we’re using a newer version, we can take advantage of those newer features; if we know you’re using SQL Server 2000, for example, tag the question as sql-server-2000, we won’t waste time on CTEs or telling you to stop using NTEXT because NVARCHAR(MAX) is where it’s at.

(Oh, and if you tag the question with sql-server-2012, you don’t have to put that information in the title. This is considered redundant, and is going to be removed sooner or later.)


Remember your @language class

In the world of texting, emoticons and lazy shorthand, lapses in communication have become fairly pervasive. Try to use proper grammar (particularly I vs. i, which seems to be a pet peeve of many!), spelling, and avoid derogatory nonsense (like “M$”) or invalid terms (there is no such thing as “MSSQL”). Nobody is expecting you to be a perfect speller, but if you type cylinder instead of Cyrillic, there might be a problem with interpretation. Most of this is forgiven, but someone will correct it all – and if they’re spending time correcting your question, they’re not spending time answering it.


Don’t change your requirements on the fly

Don’t change requirements after your question has been posted, and solutions are being rejected because they didn’t take into account something that you should have mentioned up front. If the requirements have changed in such a way that once correct answers have since been invalidated, it is not very fair to those people who answered the original question in good faith. Accept the answer that best satisfied your initial requirements as written, up-vote any that are helpful (particularly those that helped identify edge cases or other reasons for your requirements to change), and start a new question.


Remember that you came here for help

Don’t argue with the people trying to help you. Everyone here is a volunteer, and they’re paying attention to your question because they are genuinely interested in helping another community member solve a problem. Being belligerent or disrespectful doesn’t really do anything for anyone, except that person will now think twice about trying to help you on your next question. Which may be your goal, but in general you’ll find that if you’re angry at someone who’s trying to help you, you’re wrong. Even when you’re right.


Don’t dine and dash

Once you’ve received your answer, don’t abandon your question. Respond to comments, up-vote helpful answers, accept the best answer, and thank the people who helped. It’s amazing how much impact showing a little appreciation can have.

Also, don’t accept the first answer that comes along, unless it is absolutely brain dead obvious (in which case, maybe it shouldn’t have been a question at all). Give your question time to breathe, let a wider audience consider solutions (or poke holes in the one you’ve accepted). Since questions with an accepted answer are largely ignored, you could be doing yourself a disservice by accepting an inferior answer, and implementing a sub-optimal solution, all because you didn’t wait long enough for a better answer. I wrote a much longer rant about that over on meta.stackoverflow.


Try to answer your own question

This is the absolute most important step you must take before posting your question. I’m not talking about trying to solve your problem instead of asking the question, but rather just to read over your question. Pretend you have no idea about the problem, and read the question from start to finish. Ask yourself if you’ve been provided enough information to offer a solution. If you find yourself asking questions about the problem, the answers to those questions should be part of your question. If you are not sure about something that is stated or not stated in the question, again forgetting that you wrote it, those trying to help you will likely have the same questions. Don’t make readers pull teeth to get the information that you should know will be required in order to solve the problem.


Other Resources

Writing the perfect question (Jon Skeet)

How To Ask Questions The Smart Way (Eric Steven Raymond)

How to Ask (dba.stackexchange.com)


Filed under asking questions

4 Comments

Subscribe to comments with RSS.

  • Leigh Riffel says:

    Very thorough question writing help. A bit SQL Server centric, but good none the less.

  • Viking says:

    Wonderful guidelines, Aaron. Thanks for writing this, and I am going to stick this at my desk, and share it with others in my shop.

    Thanks once again.

  • Leigh, I apologize, that’s my area of expertise. While my specific examples are based there, the spirit behind them applies universally, and translates easily, I think.

  • Evan Carroll says:

    We should update this to reference CREATE TEMPORARY TABLE AS SELECT which seems to be the best way to present sample data and schema.

    BTW, great post.

  • Comments have been closed for this post