Why Can't Database Tables Index Themselves?

Geoffrey_K · July 19, 2006, 12:00am

Actually, I don’t see why you couldn’t have it both ways, with the advisor configured to report “recommended indexes” and “estimated time/load required” and let the DBAs set reporting thresholds and give no/nogo/dontrecommendthisagain/remindmein30days approval and set a time/date to do it when system load is expected to be low.

Katy · July 19, 2006, 12:00am

What? Are you mad? Where would all the DBAs and consultants be in 1 years time???

Lovely idea though… if you live in the land of nod…

Eric · July 19, 2006, 12:00am

According to Microsoft’s Senior VP for Server Applications, future versions of SQL Server will be much more self-tuning than current versions are.

http://www.microsoft.com/sql/letter.mspx [dead link]

Eric · July 19, 2006, 12:00am

Oh, by the way, it’s an offhand comment in that letter, but I know I’ve read more about it somewhere else–just can’t remember where.

JohnS · July 19, 2006, 12:00am

@Daniel_Read: Statistics in SQL Server are a maintained to help the DB engine decide how to efficiently use the indexes it has. They contain information on how the data is distributed across the table.

If your query’s where clause was something like “WHERE orders.date ‘2006-1-1’” then the optimiser can look at the statistics to guestimate how many rows this will include. After it has determined roughly many rows it needs from each table the optimiser can make a better choice of what indexes to use to retreive the data, what order it should join the tables in, and what type of join it should use.

99% of the time it gets it dead right, which means you don’t need to use index hints etc.

I assume Oracle (bleh), DB2 etc have something similar.

foobar · July 19, 2006, 12:00am

But then what would DBAs do all day?

MartinM · July 19, 2006, 12:00am

But then what would DBAs do all day?

Go to Start - Programs - Microsoft SQL Server - Index Tuning Wizard

anthony2 · July 20, 2006, 12:00am

What if the query that has to be fast is only run once a month? You can anticipate that, the machine cannot.

PaulC · July 20, 2006, 12:00am

“But then what would DBAs do all day? :)”

I sometimes wonder.

I remember some of the people I’ve seen most vehemently against the use of SQL Server (which had self-tuning as a design goal) holding a lecture proudly displaying how their databases take an entire team to dynamically tune them ‘on-demand’ full-time, as they handle ‘thousands’ of transactions per day.

I’m not sure they realised how silly this looked side-by-side with their comments on how their favorite database product was superior.

It would have been funny, if not for the tax dollars spent and the fact that all data was ‘flat file’ transferred for ‘cross-platform compatibility’ and therefore had no dependable relationships.

rtaranu · July 20, 2006, 12:00am

if everything would be automatic you wouldn’t need to hire 100$ an hour database support, that would cut into the database provider’s profit. follow the money.

PaulC · July 20, 2006, 12:00am

" Tuning is an art. Why would you want to use robot?"

But if it’s not a science, is it real or imaginary?

Part of the reason why Microsoft made SQL Server more automated was because they believed making that ‘nth’ difference manually was at least ‘iffy’ if not impossible, given all the factors.

The implication was that that to believe it was even possible was a serious underestimation of the problem and that automated processes were more likely to get it right most of the time.

PhilipH · July 20, 2006, 12:00am

Hello,

I’m not sure, but I guess finding out if there is an index to use is less expensive an operation than to find out what index should be created to speed the query up by what extent.

I fear that adding a feature to find out what indexes should be created would significantly slow down every query to the database. But then again, if the feature is optimal, you can for example turn the thing on during developement (or even during end-user testing to get significant data), let it do its work and then turn it off.

The other problem is that the more indexes you have, the more expensive the operations become. That means that you can’t just analyze one query. You’d also have to take into account the amount of inserts happening on the tables you are auto-indexing.

And there are different types of indexes. Which one is best? Should the system create a unique one? And if the system provides multiple different technical types of indexes: Which type to use? All that makes implementing such a feature quite difficult I guess.

But I agree: I really think software should be smart enough by now to find a useful solution here. Too many times I’ve been bitten by suboptimal (or outright missing) indexes. And there’s more fun stuff to do than manually analyzing query plans the database is throwing at you.

At least for me, that is.

Philip

Ray_Booysen · July 20, 2006, 12:00am

Of course the database (in this scenario) won’t suddenly decide “Woops, I need an index, lets create one.”

The power would still lie with the developer or DBA. On logging in the database can notify the DBA that a potential index should be created and let the DBA decide on when to create it if they want it.

Considering the power of current database systems, I would trust a machine that has been logging queries for a week over my perceived ideas over what the indexes should be.

Tomy · July 20, 2006, 12:00am

I totally agree with the
"Start | Programs | Microsoft SQL Server 2005 | Performance Tools | Database Engine Tuning Advisor" way.
But I ALWAYS put a primary key on a table. In specially critical SQL queries I happen to add some manual indexes in the tuning process.
But how many rows should there be in a table before any secondary indexes need to be considered at all?? Say a table with 1000 rows, whatever you do to it is instantaneous so?? …

AngusM3 · July 20, 2006, 12:00am

Beware indexing every field in your table (even if you don’t have bit fields). One of the things SQL Server’s locking does when you’re writing a recordset is lock the indexes that recordset affects. Now a single record might only lock one leaf in the index, but larger recordsets obviosuly lock larger bits - in a transactional system that can easily lead to blocking and deadlocking.

ScottS · July 20, 2006, 12:00am

Because robots work harder, and are less excentric than artists

ChaseS · July 20, 2006, 12:00am

FileMaker Pro has had this feature for years. It’s not suitable for large-scale production apps (… leave it alone, crazy Filemaker zealots of the world, or you’ll get schooled by this bunch …) but I’ve always found it works quite well. I see no reason why their algorithm couldn’t be applied to other databases. Experts may be able to do better in theory, but in practice most people would underperform Filemaker’s auto-indexing.

Dean · July 20, 2006, 12:00am

Not exactly, but pretty close:

http://blogs.msdn.com/queryoptteam/archive/2006/06/01/613516.aspx

ChrisP · July 20, 2006, 12:00am

I’ve been wondering about this same question for about 2 years.

I once worked on an existing application (with a 2GB SQL 2000 database) that didn’t have any indexes because the devs that created it didn’t know what they were. A feature like this would have solved some of the performance issues it had.

andrew15 · July 20, 2006, 12:00am

even if the engine can determine where indexes should be placed it should try to determine the proper partitioning for a table as well. also i beleive that teradata uses a hashing alqorithm when placing rows in a table and has such a fast access time(due to distribution) that indexing becomes alot more meaningless. so the real question is how can we create a system that does not need indexes at all?