Deadlocked!

daghf · August 25, 2008, 12:00am

I recommend this episode of software engineering radio:

http://se-radio.net/podcast/2008-05/episode-99-transactions

This episode takes a close look at transactions from different angles, starting with their fundamental properties of Atomicity, Consistency, Isolation, Durability but also investigating advanced topics like distributed or business transactions.

codinghorror · August 25, 2008, 12:00am

If you’ve got a SELECT that operates on tables A and B and an UPDATE that operates on the same ones, then the SELECT will grab a lock on table A, the UPDATE will grab table B, and then both will sit there while they try to get a lock on the other table. Or if you’ve got a block of statements in a single transaction, you’ll get the same effect.

I get that, but I don’t understand why simple read SELECTS need to lock at all. I mean, maybe if that database was storing my bank account balance, in which case transactional orders are hugely important.

I guess what I’m saying is that Starbucks Doesn’t Use Two-Phase Commit

http://www.eaipatterns.com/ramblings/18_starbucks.html

JC111 · August 25, 2008, 12:00am

There are a couple of things you might want to look into:

First of all SQL Server locks pages by default (not rows) which makes deadlocks more likely if the offending rows lie on the same page
One aproach that migh help, is actually follow the advice from the error message, and retry the transaction.

DennisG · August 25, 2008, 12:00am

That’s ok to store number of replies for every post.
But instead of storing it into Post table, store number of replies to PostCommentCount table.
(PostId uniqueidentifier, CommentCount int)

One record in Post table would match with one or zero records in PostCommentCount table.

Another problem with locks in SQL Server is that typically SQL Server locks not just one record, but whole Page or even whole table.

P.S.: I’m really missing ability to edit my comments (or at least delete them so I can replace them with corrected ones).

codinghorror · August 25, 2008, 12:00am

is actually follow the advice from the error message, and retry the transaction.

OK, but why doesn’t SQL Server re-issue the transaction on my behalf, then? Automatic retries of some kind?

Josh_Hurley · August 25, 2008, 12:00am

Jeff:

I agree with Hoffmann, this sounds like a hack. I deployed dozens of enterprise level .NET/SQL and Classic ASP/SQL applications and have not come across this issue.

It sounds like a transaction is being created on the call to the database, either by LINQ or in code, and it not being released. I would pose the question to Scott Guthrie, I bet he can point to one of the LINQ developers that will shed more light on this subject.

Josh.

Jasmine · August 25, 2008, 12:00am

I think you misunderstand what a deadlock really is. A deadlock is not a lock that is taking a long time. It is a lock that will never be released until one of the two processes is killed. Here’s what happens in a deadlock:

Query 1 begins by locking resource A
Query 2 begins by locking resource B
Query 1, in order to continue, needs a lock on resource B, but Query 2 is locking that resource, so Query 1 starts waiting for it to release
In the meantime, Query 2 tries to finish, but it needs a lock on resource A in order to finish, but it can’t get that because Query 1 has the lock on that.

So, you have two queries fighting over SEPARATE objects, not fighting over the same object (in which case one query simply has to wait). That’s the important part - these queries will never finish. This is quite different from something that is simply taking a long time. There is no solution to the deadlock race condition other than to kill one of the processes.

For this reason, I would suggest focusing only on queries which involve multiple rows or multiple tables. Single-row, single-table queries, such as the one in your example, can never be deadlocked. They can be waiting on a locked resource which is involved in a deadlock condition involved two further queries, but in this case, killing the single-lock query still won’t fix the problem.

I do not advise using the NOLOCK hint, but you are correct that it can’t really hurt anything on website reads. Someone will see out of date information, but in most cases, that should be ok.

Jasmine · August 25, 2008, 12:00am

SQL Server does not re-issue your command because the command itself could be dependent on the data changes that were being made by the query that wasn’t killed. This gives the client application a chance to run the transaction again, including any reads which may need to be done to data that was changed and is now in a consistent state.

Tae_Kim · August 25, 2008, 12:00am

SQL Server 2005 sucks but you only find out after you’ve got several millions records and it’s too late to migrate off. I’ve lost all faith in SQL Server as an enterprise DB when performance started going in the toilets with heavy reads despite the fact that the behemoth it’s running on has almost no disk and CPU activity.

Tae_Kim · August 25, 2008, 12:00am

SQL Server 2005 sucks but you only find out after you’ve got several millions records and it’s too late to migrate off. I’ve lost all faith in SQL Server as an enterprise DB when performance started going in the toilets with heavy reads despite the fact that the behemoth it’s running on has almost no disk and CPU activity.

TM119 · August 25, 2008, 12:00am

The Stack Overflow beta started already?! How did I miss it? I hope I can still get in. hurries to send a sign up request

Joel_F · August 25, 2008, 12:00am

I get that, but I don’t understand why simple read SELECTS need to lock at all. I mean, maybe if that database was storing my bank account balance, in which case transactional orders are hugely important.

The description of the pessimistic behavior and the description of ‘read committed snapshot’ make the latter sound like multi-version concurrency. Near the beginning of the post, when you said that a SELECT was part of the problem, I was thinking, Why? If it was using MVC then a SELECT wouldn’t even lock.

Then you got to the part about snapshots. It seems SQL Server (which I do not use*) can do MVC, but not by default.

I work with MySQL/PostgreSQL/Firebird.

Justin · August 25, 2008, 12:00am

Been there, done that. Anyone using SQL Server in a large system will run into this. As a rule, when we select from a table we use with (nolock). Here’s another mostly unknown Gem you should know about:

[QUERY] with MAXDOP(4) – use a max of 4 processors

The command above tells SQLServer how many max processors it can use on a query. We ran into a query that would run MUCH MUCH faster on 1 processor than multiple. Our 8 core superserver was trying to be too smart with the query I guess.

Maximum Degree Of Parallelism (MAXDOP):
http://www.sqlmag.com/Articles/ArticleID/97044/97044.html?Ad=1

osp70 · August 25, 2008, 12:00am

I agree with Jasmine. To add to it, because you don’t show the comments in coversation style but rather order of ‘votes’ the appearance of out of date will be minimal as well.

Jacob · August 25, 2008, 12:00am

It should always be okay to be viewing out of date info on the web shouldn’t it? That’s the nature of a web page - it’s as stale as the last refresh. This web page may have had new comments added since I started viewing it two minutes ago… what’s the point in using fancy locking in that case.

Steve · August 25, 2008, 12:00am

Wasn’t the diner a drive system for some ship in one of the Hitchhiker’s Guide to the Galaxy books?

Yup. It was from Douglas Adam’s Life, The Universe, and Everything:

Bistromathics:

The most powerful computational force known to parascience. Bistromathics is a way of understanding the behavior of numbers. Just as Einstein observed that space was not an absolute, but depended on the observer’s movement in time, so it was realized that numbers are not absolute, but depend on the observer’s movement in restaurants.

The first nonabsolute number is the number of people for whom the table is reserved. This will vary during the course of the first three telephone calls to the restaurant, and then bear no apparent relation to the number of people who actually turn up, or to the number of people who subsequently join them after the show/match/party/gig, or to the number of people who leave when they see who else has turned up.

The second nonabsolute number is the given time of arrival, which is now known to be one of those most bizarre of mathematical concepts, a recipriversexclusion, a number whose existence can only be defined as being anything other than itself. In other words, the given time or arrival is the one moment of time at which it is impossible that any member of the party will arrive. Recipriversexclusions now play a vital part in many branches of math, including statistics and accountancy and also form the basic equations used to engineer the Somebody Else’s Problem field.

The third and most mysterious piece of nonabsoluteness of all lies in the relationship between the number of items on the check, the cost of each item, the number of people at the table and what they are each prepared to pay for. (The number of people who have actually brought any money is only a subphenomenon in this field.)

Numbers written on restaurant checks within the confines of restaurants do not follow the same mathematical laws as numbers written on any other pieces of paper in any other parts of the Universe…

J_Liles · August 25, 2008, 12:00am

I see JC already suggested this, but I have worked on two financial apps using SQL Server and both have had issues with deadlocks (and yes, we tried all the usual deadlock analysis. We arrived at two approaches:

NOLOCK hint. As Jeff mentions, this is rarely as bad as people make it out to be. Gee, so the data is 500 ms out of date. Tough. There are only a few cases where this actually matters.
Put in your own retry logic (detect the deadlock error and resubmit the query). This may feel like a hack, but guess what, it works. And would you rather have this hack or have your users see stupid database errors. If you have centralized database access methods, it is fairly simple to add retry logic.

You can’t fix SQL Server, so you just have to deal with the real world here and do whatever it takes to make sure your users don’t experience database errors.

Greg · August 25, 2008, 12:00am

Jeff, before you post on things like SQL Server locking you should do your homework beyond Google and blogs. Books still have a place, you know.

http://www.insidesqlserver.com/

My guess, without having access to your logs and such, is that your deadlocked transaction (the one with the SELECT) is probably part of a longer outer transaction that acquired a shared lock on the rows that are being UPDATEed by the other one.

Setting your database transactions to snapshot isolation mode can still result in problems, since it will raise an error if you try to update rows that have changed since the beginning of your transaction (version mismatch) and also takes out X locks on writes, so you can still get deadlocks.

The correct way to fix this problem beyond using snapshot isolation mode (which will help greatly, because it removes the use of shared locks) is to a) minimize the length of your transactions to minimize the number of locks and b) always set up your web server to re-run requests in the case of a update conflict or deadlock.

Matt_Davis · August 25, 2008, 12:00am

We’ve built a large service (Earth Class Mail) pretty much exclusively on LINQ to SQL. It’s worked great, except that we’re still battling all the transaction issues almost a year later (deadlocks, leaky transactions, etc). Make sure you FULLY understand the intersection between TransactionScope (if you’re using it) and SqlTransaction. We didn’t, and we’re still paying for it (with daily deadlocks just like what you describe). Make especially sure you’re aware how Read Committed Snapshot (and any other tx isolation level) can leak into your connection pool, otherwise, you’ll end up running stuff at the wrong isolation level (often causing more deadlocks and how the he*l did that happen- we’re not even in a transaction! moments. We ended up building a bunch of infrastructure into our own DataContext-extended base class to deal with it, and to ensure that the connection pool stays free of connections stuck in transactions.

Good luck!

Double_K · August 25, 2008, 12:00am

OK, but why doesn’t SQL Server re-issue the transaction on my behalf, then? Automatic retries of some kind?

Think of the banking application… you wouldn’t want to re-issue anything if you know something slipped… it IS better that SQL server team be pessimistic by default.