Stack Exchange Database Administrator Blog

Data warehousing for the busy DBA, part 2

ConcernedOfTunbridgeWells — Mon, 06 Aug 2012 17:27:04 +0000

Data warehousing for the busy DBA

Part 2: What a data warehouse isn’t

Many data requirements can be fulfilled without going to the trouble of a data warehouse. In some cases the requirements are in conflict with the main requirements of a data warehouse system.

Data warehouses are not real-time systems

Real-time data warehousing is almost never a genuine requirement. Real-time analytic and aggregate reporting requirements are actually quite rare with very few exceptions. There are two very good reasons not to implement a real-time system unless you really need it:

They are much more complex and harder to get right. Real-time systems can be made to work on simple data – market feeds, account transactions and balances or web server logs. Complex real-time systems are much harder to build and make stable. Unless there is a genuine requirement (see below) this is an anti-pattern.
Stable data that can be reported by an ‘as-at’ position is much easier to work with. If you have figures changing in real-time you will have the same report generating different results if run twice. This can generate a whole class of really time-wasting reconciliation bunfights and can erode user confidence in the system.

Data warehouses are not operational reporting systems

One commonly confused requirement is to do operational reports off the data warehouse system. This is often cited as a requirement for real-time data. Generally this requirement is in conflict with the other requirements of a data warehouse, to the extent that it is a known anti-pattern.

Operational reports tend to be exactly that – operational. They are not analytic in nature; they tend to be detail level exception reports or status reports on a process. Normally they act as to-do lists or warnings of the need to rectify something. Typically, the process around an operational report will look something like:

Run the report, using it to check work that needs to be done, enter the work into the source system, and then re-run the report to see if there’s anything left that needs clearing, or
Enter data into the system and then run the report straight away including that data.
Generate a to-do list or a work list for somebody.

These reports are not analytical in nature and are typically detail reports showing lists of specific items. Typically, operational reports are tied to specific processes, and they are not ad-hoc in nature. Normally the best way to fulfil an operational reporting requirement is to run the reports off a replicated copy of the production system.

Data warehouses are not tied to a single system

If you make your warehouse model too tightly coupled to a single system then you will run into major issues if you try to bring data in from other sources. The data warehouse should have a model of its own.

If your requirement is to provide reporting off a single system you can use warehouse-like architecture to implement the facility, but a stovepiped data mart is much easier to implement. If your reporting requirement is tied to a single system it might be worth taking a step back from the concept of a data warehouse and implementing something that will just fulfil that requirement. It will be much quicker and easier, and much more politically acceptable to other stakeholders.

A data warehouse is more than just a central data store

Many projects simply build relatively straightforward feeds from their source systems and drop it into a reporting database with no conformation process and (often) relatively little transformation. This fails a key requirement of a data warehouse in that there is no ‘single source of the truth’. This type of system still requires bespoke queries and extracts to get data from.

A system built like this places a large burden on the reporting layer and will not be able to support an ad-hoc reporting facility. It is unlikely to have much effect on the status quo. It is often called a ‘data warehouse’, but the result is disappointing as it doesn’t materially improve the users’ access to the data.

It is suggested that you resist building this type of system and stick with providing ad-hoc extracts as an alternative. Don’t waste your time with a project like this. Let the business stick with their SAS and MS Access extracts. At best, use the database as a staging area for these extract processes but try to dispel any expectation that the business will get self-service access to the data.

Qlikview is not a data warehouse

Various ad-hoc reporting tools have some ability to put metadata over an arbitrary collection of data sources, or extract the data into a cube format. Examples of this type of tool include Qlikview, MS Powerpivot, Cognos TM1 and Business Objects. These tools all work well with a data warehouse but they are often sold by the vendor as a substitute for one.

The strength of such tools is that you can build a reasonable ad-hoc reporting facility within the limitations of the data. However they do not address conformed data or data cleansing issues and they have limited ability to do significant transformation on the data. They offer no facility for reconciliation controls against source data, so it can be hard to troubleshoot or prove that the data shown in the tool is correct.

The best approach with these tools is either to build a data warehouse and use these tools as a front-end, or to use the tool to extract directly from source and accept the limitations. Do enough to get that on your CV and then go pimp yourself as a B.I. consultant before the business work out how crappy the underlying data is.

The Busy DBA: Data warehousing, part 1

ConcernedOfTunbridgeWells — Wed, 04 Jul 2012 18:00:22 +0000

Data Warehousing for the Busy DBA

Part 1: What is a data warehouse?

So, one of the enterprise architects has convinced management that we need a Data Warehouse. You’re the DBA, you know about databases, right – we need you to go and design it. Can we have an estimate by next Thursday for the board meeting?

This series of articles walks through key data warehouse concepts. It is aimed at a database professional – a DBA or SQL developer – who is now getting involved in a data warehouse project. Therefore the articles assume some background in database development or administration.

The term ‘data warehouse’ gets used quite loosely at times. To start, here is a definition of what a data warehouse is (and is not) and the reasoning behind that definition. We’ll also look at some alternatives to a data warehouse if your requirements aren’t congruent with an actual data warehouse project. A data warehouse is more than just a reporting database. A full-fledged data warehouse will have most or all of these requirements:

Multiple Data Sources

The system will load data from more than one source. These data sources could contribute metric (numerical or statistical) data into the same fact table from different sources or they could be external sources of reference data not held at source but needed for reporting.

Clean and Conforming Data

All of the content in any given data item should be correct and behave the same, no matter what source it was loaded from. The data is transformed into a common format free from leaky abstractions. Failure to do this means that data in a given data item will need to be treated differently depending on the source, which creates a non-obvious leaky abstraction.

It will not take many incorrect reports produced by naïve end users before the issue starts to erode the system’s credibility. ‘User training’ is not a good solution. The system will quickly get a reputation for being incorrect or difficult to use, which will largely relegate its usefulness to reporting teams. As often as not the reporting teams will be comfortable with their own stove-piped extracts anyway.

Star Schemas

Star schemas are not strictly necessary but you’re doing it the hard way if you don’t make use of them. They are effective for efficient queries over aggregate data as there is typically the only large table involved in the join, and the most expensive query plan operator will be a single table scan. Partitioning works well on star schemas as there are no joins between large partitioned tables. Many DBMS platforms also have special query plan operators for efficient star schema queries.

Slowly changing dimensions (SCDs) are by far the simplest structure for capturing historical state; it is much easier to do this in a star schema with SCDs than in a normalised structure. I can see almost no reason to recommend a 3NF structure for historic data.

Historical Data

Typically a system of this sort has requirements to examine trends in data over time. A data warehouse system will have historical data and many requirements to run queries over that data. If you have requirements for real-time data it is likely that requirements for operational reporting are being conflated with the concept of a data warehouse. This is a known anti-pattern, which will be discussed in more depth later.

Analytical Queries Across Data in Aggregate

Typical data warehouse queries are in aggregate, producing statistical or financial metrics across a large volume of data. Normally a table scan is the most efficient way to do this sort of query; if a query hits more than a few percent of rows in a fact table it is usually faster to scan it (using sequential I/O) than to use book-marking or index lookup operations.

One key implication of this is that you will typically want to optimise your system for fast table scans, as opposed to fast random access operations. This has significant implications on storage architecture.

The System is Fronted by Ad-hoc Reporting Tools

One of the benefits of a data warehouse is the ability to provide a self-service query facility with clean data that is safe to be used by relatively naïve end users. This places two key requirements on the data

The data must be well behaved – clean, conformed and consistent. The data must be ‘safe’ enough to allow non-technical staff to query it with a reasonable expectation of getting a correct result.
The data must be structured so it plays nicely with the reporting tool that is being used with the project.

Practically, this implies a data cleansing process, and (most likely) star schemas, although some tools (e.g. MS Access) are designed to use relational data sources.

The Project Will be Highly Political

Data warehouse projects tend to be highly political and characterised by systemic responsibility without authority issues. You will need to put fingers in a lot of pies in the course of a data warehouse project, and they may not always be welcome. The #1 failure mode of data warehouse projects is weak business sponsorship. You will find a lot of obstacles, mostly political but some technical – usually requirements for data that isn’t actually recorded anywhere, recorded at the wrong grain or of unusably poor quality.

A strong business sponsor is an absolute necessity to succeed in a data warehousing project of non-trivial scope. The project will be politicised and painful; the business will feel the pain. You will have many issues that you are dependent on third parties to fix as you have no access or authority on the affected systems. These blockages will frequently hold up the project or limit what can be achieved. Although they’re caused by third parties they will typically be seen as the responsibility of the data warehouse team.

If the project has any profile and budget, these delays will quickly burn through the a reasonable sounding contingency, especially if a large team or (worse yet) a consulting firm has been brought in to do the work. Data warehouse projects have a reputation for being time-consuming and expensive and the principal drivers of this cost are the zillion and one holdups caused by external stakeholders.

Help us help you : keys to getting good answers

Aaron Bertrand — Thu, 21 Jun 2012 16:22:50 +0000

After having answered some 250 questions here on dba.SE, and over 1,500 questions on StackOverflow, I’ve read some good questions, and I’ve read some bad questions. I’m no Jon Skeet, but I think I can offer some perspective on how to ask effective database-related questions on this site and get solutions to your problems. So what follows is some advice about things you should be sure to think about before, during and after posting your question. Not all points will be relevant for all types of questions, but some are universal. I know this may seem like a big laundry list of rules, but please, bear with me – at the very least, read all the section titles (and the last section about reading over your question).

Try to solve exactly one problem

Don’t ask a run-on question that asks whether you should use merge or peer-to-peer replication between your two data centers in Europe, if SQL Server Express is capable of handling the back end for your beekeeping equipment store, and which MySQL engine you should use for storing lots and lots of integers. Those are three separate questions. Make sure there is a problem statement that is clear and concise, and that you really are trying to solve a specific problem. You might think you can invoke some good dialog if you ask for the “best” approach to high availability, but such a question will likely be closed because it is far too broad and will simply spur opinion, speculation and debate (never mind a stream of follow-up questions for more information).

Quoting “best” in the previous paragraph was not an accident – don’t ask what the “best” solution is for anything. Be specific about how you determine “best” – for some this is efficient (not performant :-)), and could be in terms of duration, memory, storage, or a host of other metrics; for others it is about maintainability, cost, simplicity, or something else entirely. Be specific about what factors the “best” answer will consider. If you just say “give me the best answer” then all of the people reading the question are going to be left on their own to determine what they think you mean by “best” – and this can lead to answers that don’t meet your criteria.

Post sample data and desired results

A lot of people get quite offended when asked for sample data. We don’t need your confidential data, and we don’t need your entire database. But enough sample rows to demonstrate the problem you are having, or for us to work with in order to get the output you’re expecting. Check out SQL Fiddle – this is a great place to mock up some dummy schema and sample data so that we can see the query you’re running.

The mistake most people make is they spend several minutes crafting a paragraph where they describe their issue using a word problem. I don’t know about you, but I hated word problems in school, and they aren’t any more interesting to me now. We’re data people and we speak much better with schema and sample data. For SQL Server specifically, you can easily generate scripts for both the table and some data using Management Studio. In an ideal world, you will have actual CREATE TABLE and INSERT statements that we can copy and paste into our own Management Studios without a bunch of translation. This is good:

CREATE TABLE dbo.splunge(mort INT, meld DATETIME);
INSERT dbo.splunge(mort, meld) VALUES
(1, CURRENT_TIMESTAMP), (2, '20120101'), (3, NULL);

This is not so good:

splunge has columns mort (integer), meld (date time)
mort meld
1 5/6/2012
2343 1/1/09

Not only is the latter much harder for us to reverse engineer, but another important thing that seems to escape a lot of folks: don’t use ambiguous date formats! Your audience isn’t necessarily in the same country as you or speaks your language as their first language. So their interpretation of 5/6/2012 might be different from yours. Always use yyyymmdd or yyyy-mm-dd (even though the latter is not safe in code, it is the most understandable in sample data).

Tell us what you’ve tried

Posting sample data, desired results and a query that you’ve tried goes a long way in showing us what you’ve already tried, and makes it very easy for us to take what you’ve provided and understand, fix or improve it.

You will meet some opposition (and potentially have your question closed before you get an answer) if you just ask, “tell me how to do this.” If your question shows a significant lack of research, you won’t get much sympathy from your peers. Most of the people providing you with valuable help did not become experts in their field by being spoon-fed answers – they worked at it by learning their platforms and programs, and trying to make things work (or trying to break them).

It’s okay if your question is for homework, and it’s ok if you’re in a rush, but even more so in this case, you should explain what you’ve tried (or why you couldn’t learn what to do based on the lecture or other materials, or why you’ve waited until the last minute). So along with some sample data and desired results, post the query (or queries) that you’ve tried, and why the results from those queries aren’t exactly what you wanted.

Here is an example of a recent SQL Server question, word for word, that shows very little research:

If a field’s datatype is a datetime, and does not allow nulls and there is no default value set, does the database enter the current timestamp?? I queried both here and MSDN but could not find an answer to this. Thanks.

Now, I can appreciate that the person tried searching MSDN first, before turning to a Q & A site. But how hard could it have been to simply create a table with a non-null datetime column, try an insert and see what happens? I could do it here, and it would take less time than it took to type the above question, but I don’t think I have to in order to prove the point.

Post real execution plans

If you’re asked for execution plans, don’t paste the SHOWPLAN_TEXT output, and don’t bother running an estimated plan. A screen shot of the actual graphical plan is a start, but it doesn’t contain nearly enough information to act on – we can only see the costly operators, but none of the data behind it. Ideally you can post the .SQLPlan file for an actual execution plan somewhere.

In a perfect world, you will generate the actual execution plan from within my company’s free execution plan analysis tool, SQL Sentry Plan Explorer, and save the result as a .QueryAnalysis file. In addition to including all of the information from the execution plan XML (which isn’t visible when you post a screen shot of the graphical plan, and which contains tons of information not available in SHOWPLAN_TEXT), and some runtime information (such as actual vs. estimated row counts, a comparison you can’t do with an estimated plan), you also get actual runtime metrics such as CPU and duration, which can greatly help in determining the resources being used for a particular query (or part of it).

You can post your files to github:gist or Pastebin. If you’re concerned about confidentiality (table names, etc.) and it is not convenient to try to mock up a similar case with less sensitive names, you can try to take the .SQLPlan or .QueryAnalysis file and mask your sensitive names using search and replace. Do so carefully, however – make sure you can still open the file after you’ve saved your modifications.

Don’t say “it’s broken”

If you are getting an error message, post the actual error message. Nobody knows what you mean when you say “it’s broken”, “it’s not working” or “SQL Server doesn’t seem to like it.” Leaving this information out might seem like you are doing everyone a favor, by keeping your question short, but if the reader doesn’t know what’s wrong, they’re not going to be very equipped to help.

Don’t over-simplify

I realize you don’t want to post a novel, and we certainly don’t need to see your entire 3,000 line stored procedure. But you will want to reduce the code and/or problem statement to a digestible chunk. There is a fine balance, though: don’t discard vital information. I’ve spent much valuable time solving problem A, when it turns out later that the solution only covers the simple case mentioned in the question, and not the 40 other edge cases the user had simplified away (trying to be helpful). We’re smart people; we can handle more details, as long as they’re cogent to the question.

Be open to solutions

Come into your question with an open mind. Don’t ask, “How can I accomplish x without using y?” Or “with y,” for that matter. Instead, ask, “How can I accomplish x?” If you have reasons for avoiding y, or requiring y, state them. But unless you are locked into a specific approach, don’t make that a condition of your question; relegate it to peripheral information. There might be a compelling argument for y that you don’t know about, that outweighs your reason against it (or vice-versa). The point is to state the problem you’re trying to solve, or the goal you’re trying to accomplish, rather than telling everyone you’ve already decided to solve it with y (or without y) and are just having this little problem with it.

It also helps if you reveal the motivation behind your question and give some context. For example, consider the question, “How can I read SELECTs and DML from the SQL Server transaction log?” The answer is you can’t, because SELECTs aren’t logged and, for DML, only the underlying operations SQL Server had to do in order to satisfy your DML is there. If you tell us that the reason you want to do this is because you have a table you need to retire, and you need to track down all the applications that talk to it, we can offer several alternatives that will do a much more complete job that hacking into the log (such as SQL Audit, server-side trace, etc). Initially the answer would have been that you can go spend a whole lot of money on 3rd party tools that will reverse engineer those log statements for you, but in the end that would not have been sufficient because the log does not contain information like user name, program name and host name.

Tag effectively

Don’t choose meaningless tags like column, error or query. Nobody is following tags like query because they are far too broad and cover too many database platforms. The context in your question, and title, are going to be much more useful in telling readers what your problem is about. Tags are used to narrow down your question to a reader’s field of interest. I’m not interested in errors in general, but I am interested in errors when using mirroring in sql-server.

Also, don’t tag a question as both mysql and sql-server if you are really only interested in a solution for one platform or the other. Casting a wider net in the hopes that someone following the MySQL-related tags will know the answer for SQL Server, or vice versa, is like calling the BMW shop to get a quote on a Mercedes. The BMW sales guy might have current pricing information, because he has a buddy at the Mercedes dealership, but you’re much more likely to offend the guy for wasting his time.

Finally, don’t just say you’re using SQL Server (or Oracle, or MySQL, or what have you). Tell us the version, too. Many new features have been added as new releases come out… if we know we’re using a newer version, we can take advantage of those newer features; if we know you’re using SQL Server 2000, for example, tag the question as sql-server-2000, we won’t waste time on CTEs or telling you to stop using NTEXT because NVARCHAR(MAX) is where it’s at.

(Oh, and if you tag the question with sql-server-2012, you don’t have to put that information in the title. This is considered redundant, and is going to be removed sooner or later.)

Remember your @language class

In the world of texting, emoticons and lazy shorthand, lapses in communication have become fairly pervasive. Try to use proper grammar (particularly I vs. i, which seems to be a pet peeve of many!), spelling, and avoid derogatory nonsense (like “M$”) or invalid terms (there is no such thing as “MSSQL”). Nobody is expecting you to be a perfect speller, but if you type cylinder instead of Cyrillic, there might be a problem with interpretation. Most of this is forgiven, but someone will correct it all – and if they’re spending time correcting your question, they’re not spending time answering it.

Don’t change your requirements on the fly

Don’t change requirements after your question has been posted, and solutions are being rejected because they didn’t take into account something that you should have mentioned up front. If the requirements have changed in such a way that once correct answers have since been invalidated, it is not very fair to those people who answered the original question in good faith. Accept the answer that best satisfied your initial requirements as written, up-vote any that are helpful (particularly those that helped identify edge cases or other reasons for your requirements to change), and start a new question.

Remember that you came here for help

Don’t argue with the people trying to help you. Everyone here is a volunteer, and they’re paying attention to your question because they are genuinely interested in helping another community member solve a problem. Being belligerent or disrespectful doesn’t really do anything for anyone, except that person will now think twice about trying to help you on your next question. Which may be your goal, but in general you’ll find that if you’re angry at someone who’s trying to help you, you’re wrong. Even when you’re right.

Don’t dine and dash

Once you’ve received your answer, don’t abandon your question. Respond to comments, up-vote helpful answers, accept the best answer, and thank the people who helped. It’s amazing how much impact showing a little appreciation can have.

Also, don’t accept the first answer that comes along, unless it is absolutely brain dead obvious (in which case, maybe it shouldn’t have been a question at all). Give your question time to breathe, let a wider audience consider solutions (or poke holes in the one you’ve accepted). Since questions with an accepted answer are largely ignored, you could be doing yourself a disservice by accepting an inferior answer, and implementing a sub-optimal solution, all because you didn’t wait long enough for a better answer. I wrote a much longer rant about that over on meta.stackoverflow.

Try to answer your own question

This is the absolute most important step you must take before posting your question. I’m not talking about trying to solve your problem instead of asking the question, but rather just to read over your question. Pretend you have no idea about the problem, and read the question from start to finish. Ask yourself if you’ve been provided enough information to offer a solution. If you find yourself asking questions about the problem, the answers to those questions should be part of your question. If you are not sure about something that is stated or not stated in the question, again forgetting that you wrote it, those trying to help you will likely have the same questions. Don’t make readers pull teeth to get the information that you should know will be required in order to solve the problem.

Other Resources

Writing the perfect question (Jon Skeet)

How To Ask Questions The Smart Way (Eric Steven Raymond)

How to Ask (dba.stackexchange.com)

Tag you’re it

shawn — Wed, 30 May 2012 01:43:13 +0000

I usually see newcomers not use these to their benefit, tags. If you did not notice when you ask a new question there is a box that requires you to put at least one tag before you can even submit the question. You are allowed a max of 5 tags, so choose wisely but please choose more than one. I don’t mean always pick 5 tags on every question you ask, but at least pick 2 or 3 that generalize what the question is regarding. With questions on DBA.SE, generally speaking, you should at least have a tag for the RDBMS you are working with (I did say generally speaking).

Tags are what get your questions noticed. If you don’t already use it, I set a filter for a few tags that interest me by using the filter questions page. This is a page that I check frequently. I only have 3 tags because these are the subject areas that interest me the most, and keep me pretty busy.

You can click on your filter and it will show you every question that was tagged with the tag you selected either for a handful of sites or any site within the SE family. I have mine set to “all sites” but I pay close attention to those I find on DBA.SE.

These are the options you have when creating a new filter:

You can enter the first few characters of a tag like “sql” and let it sit a second and a list will be opened with all the tags on SE that match it. Once you enter a tag click in “Just these sites” and a check-box list will show with sites that have questions with that tag.

Why ‘dba.se’ and not ‘data professionals’ or something else? What’s the scope of the site?

jcolebrand — Tue, 29 May 2012 22:29:46 +0000

This question gets asked a LOT on our meta, on meta.se, and it gets asked even more frequently in our chat by new users once they really learn our scope. Here’s some examples, just for fun:

And by rights, the title of “DBA” (database administrator) is not one that should be bestowed upon me. My pedigree is web-programmer, but I’ve been described by friends as a renaissance man, since I can do the entire field of web programming, from server configuration to database setup and configuration to web programming. It’s true that I know how to best utilize JSON or XML, that I know what binary transfers look like, I can decode compressed streams on the fly for debugging, and I’ve been playing with this stuff since before Apache had a version 1.0 (ok, that’s hyperbole, but I was there long before the 2.0 debut). But it’s also true that I can’t be as specialized as the three dbas that sit across the hall from me during the day. So I make up for that by keeping my ears open, and sharp, and paying attention to what they do.

And therein likes the crux. They do a LOT. So when the site was originally launched, we wanted it to be called “databases” but everyone was worried that it was going to get swamped with questions about configuration of the server before installing the database (rightly belongs on ServerFault) or that people would want to post a lot of shopping list questions (Q/A is hard, let’s go shopping). So someone (I wasn’t privy, I can’t comment) decided to call it “database administrators” when they launched the site, thinking “this is the target audience of the site, database administrators”. The first problem with that train of thought is this:

That’s the sort of face people tend to associate with a dba! He’s not very approachable is he? (Actually this is George Beech, a very valued employee with Stack Exchange http://serverfault.com/users/5880/zypher and I think most metaheads will recognize that picture ;-) ~ Thanks for being a good sport George!)

When in reality this is much more likely to be your local dba:

So we now have a cultural bias to avoid, but truth be told, if you told me to go ask the first guy to get something done, I would be scared too! Fortunately I work with these guys on a day to day basis (neither of the above two, sadly, but my dbas are just as photogenic as the second guy, Brent Ozar), so I know that they’re quite approachable.

The second problem with that train of thought is that we don’t just cater to current database administrators, but those who will be dbas, as well as those who just need the advice of a dba for a particular problem, such as database design, or recovery, but who can generally manage all day with no problem (much as I can manage most of my databases on a day-to-day basis, but sometimes I have to call in my dbas, and they prove why they are an absolute necessity on my team, improving performance in ways I didn’t know could be done!).

So we got stuck with a name that doesn’t quite work for us, which only means it falls to us to make the name dba more palatable. And that’s what we aim to do here at dba.se, is to dissolve the wall between you and the dbas in your life, and help you understand just WHY they want you to do those crazy things they sometimes demand of you, like changing your schema, or putting data on different servers.

So what sort of questions can you ask on dba.se? So long as they generally fall in the range of our FAQ, they’re pretty well fair game. That means NoSQL, SQL, Relational, NonRelational, BI, Data Warehousing and anything else you can think of. And if you find one you really don’t know if we’ll work with, ask on our community governance site: http://meta.dba.stackexchange.com/. We definitely love to consider enhancing our scope where it’s appropriate (it’s why we now do BI on our site). So that’s how our site got its name, and what we plan to do about it going forward in the future. Got some questions for me to elaborate on and help make this dialog better? Feel free to hit me with them in the comments, in chat http://chat.stackexchange.com/rooms/179/the-heap or on our meta feedback post http://meta.dba.stackexchange.com/questions/699/.

exec blogoverflow.dbo.createNew(‘dba’)

jcolebrand — Tue, 29 May 2012 21:59:57 +0000

In keeping with our interested parties list we’ve launched the blog with about 10 contributors, more or less. And since we veterans know that the StackExchange Network runs on MSSQL, I thought this was a catchy little title. Don’t let that make you think that we have to only write about MSSQL content. Trust me, we’re all about data storage and retrieval [1], so if you have Oracle, Microsoft SQL Server, MySQL, Postgresql, couchdb, mongodb, memcached, bigtable, hadoop, or any other similar related technology, you’re welcome to bring us questions on it. We hope to devote some of this blog space to defining some of the common problems people face, giving some walkthroughs, and giving history lessons (you don’t know where you’re going till you know where you’ve been). We’re also going to talk about Business Intelligence and all those other fancy buzzwords, as our expertise covers the lot of data, analysis and retrieval (but we’re not statisticians, they have their own site at Cross Validated). We also generally turn down questions that only ask how to do front-end programming, for that you’ll want to visit Stack Overflow.

Feel free to ask us to write about something here: Request Thread! and we’ll do our best to get round to it right away (unless we get flooded with requests).

[1] but not filesystems, those are generally more O/S level and considered “simpler”, altho they are also databases, and if you want to discuss the esoterics of data storage on modern filesystems, we’ll talk shop. But we don’t care about “how do I retrieve my lost contents” or “how do I format C:\” .. especially since most of us are likely to tell you how :p