All Abstractions Are Failed Abstractions

Doug · June 30, 2009, 12:00am

So Jeff with your Wide, Wide World Of Sports comment maybe you should retitle this article to Blazing SQLs

MatthewK · June 30, 2009, 12:00am

It seems unfair to me that you criticize abstractions because of their lack of performance optimizations. That’s like attacking an automatic transmission for not having the performance of a stick. It’s a miracle the automatic transmission even approaches the performance of a stick.

Abstractions are all about functionality. You get the functionality without having to delve into the details. They are not about performance other than not being too awful.

This is just the nature of things. You can go play some pickup basketball with friends and not have to think into the details of what you are doing. But if you want to get the best possible performance as a player you have to get into all the little details of how you do every move and do specific exercises to tune your body for those moves.

Tolerable performance issues do not make an abstraction leaky or a failure. Suboptimal performance is not an issue that abstractions are intended to address.

tedeh · June 30, 2009, 12:00am

Abstractions are definitely the most important bit of programming, and are not as the title of this post implies, failed in any general way what so ever. The only way we can accomplish anything within a reasonable amount of time is by leveraging and learning various levels of abstraction. Else, you should resort to programming machine code. Oh wait, machine code is an abstraction aswell. How about carefully manipulating electrical signals instead?

Andrew · June 30, 2009, 12:00am

re: MS and L2Sql.

While Microsoft hasn’t come right out and said Linq to SQL is dead, they did the next best (worst?) thing by saying that they were going to focus all their efforts on EF. After making that statement, we quickly abandoned any thought of using L2Sql on our development projects.

Nicolas · June 30, 2009, 12:00am

You are all missing the point.

LINQ doesn’t make things slower. Doing silly things in database queries makes things slower. The point is that LINQ is a failed abstraction because you STILL have to know what things you aren’t supposed to do in a relational database if you want performance; so what is it abstracting?

Srdjan · June 30, 2009, 12:00am

@Jon Galloway,

Actually, within the Rails world, there is a plugin that monitors how ActiveRecord ORM is used by the application and the modifies the ActiveRecord calls (and thereby the SQL it outputs) to better suit that.

For example, if you start with Select * from Posts, but you only use ID, Body, Title, the plugin will modify the AR statement to only select those fields from Posts.

As for the article, n+1 queries will never be faster than n query selecting the records you want. It’s just silly saying otherwise.

MarkB · June 30, 2009, 12:00am

Did you clear input buffers and procedure caches before running each of these? The results don’t make any sense, unless there’s a pathologically bad query plan for statement 1. Since that’s doubtful - the query is pretty simple - I’d guess you didn’t account for cached reads.

Common sense should tell you that
SELECT * FROM Table WHERE Condition
is faster than
SELECT * FROM Table WHERE Id IN (SELECT Id FROM Table WHERE Condition)

So much so, in fact, that SQL may even optimize the 2nd case into the 1st case.

Do me a favor, and run SET STATISTICS IO ON before running your queries. Then take a look at the number of IO’s required for each scenario.

JohnS · June 30, 2009, 12:00am

Why compare the performance of SELECT * against SELECT Id? Of course SELECT Id is faster, it only has one column to deal with that’s probably the clustered index. Why not compare SELECT * against SELECT Id, Title, etc. If you need to populate a collection of Post objects (and not just get the Ids) that comparison would expose whether or not L2S is just being lazy with SELECT * or if it’s really no different performance-wise than selecting all the columns.

JohnS · June 30, 2009, 12:00am

Why compare the performance of SELECT * against SELECT Id? Of course SELECT Id is faster, it only has one column to deal with that’s probably the clustered index. Why not compare SELECT * against SELECT Id, Title, etc. If you need to populate a collection of Post objects (and not just get the Ids) that comparison would expose whether or not L2S is just being lazy with SELECT * or if it’s really no different performance-wise than selecting all the columns.

RickB · June 30, 2009, 12:00am

I think the use of the word ‘failed’ is an exaggeration. I know what you’re meaning to say and I agree though.

I’ve been having a lot of fun* engineering some abstractions for typography: fonts and text rendering, oh my. I need 1 abstraction that can cover both GDI and WPF’s typography systems. It’s not fun, and it is leaky or incomplete, but it will work and be beneficial. That is, until the next version when I decide I need to pivot the topology of the whole API just to enable some important rendering feature or increase reliability/performance.

not

JohnC · June 30, 2009, 12:00am

From reading the comments it seems that a lot of people don’t realize what it means for an abstraction to be “leaky”.

An abstraction is a simplification or reduction in the rules of an underlying system.

For an abstraction to be leaky simply means that there will be some cases where actions that, according to the abstraction, are equivalent (i.e. they should be exactly the same thing) but will behave differently. Because the abstraction defines them as equivalent but they aren’t when actually used, they demonstrate that the abstraction is not “reality”, but only a simplification of it.

If an abstraction were not leaky - if all actions that are equivalent according to the abstraction behaved identically - then it would be indistinguishable from the underlying system and thus considered to be equivalent to it (which by definition, is not abstract).

So all abstractions are leaky by definition, because if they aren’t leaky then they include all rules of the underlying system, which makes them the same as the underlying system.

Another affect of this is that the more you reduce the “leakiness” of an abstraction, the closer and closer it reflects the underlying system, and thus the more complicated and less useful it becomes as an abstraction - since the purpose of an abstraction is to simplify the underlying system.

Personally, I feel that the ideal solution is to understand the underlying system, and to define multiple abstractions that each handle a subset of the underlying system, but that together have 99% coverage. This results in smaller abstractions that can remain relatively simple, and the remaining 1% can be handled by directly using the underlying system (which will be complicated and a pain in the butt, but still possible).

securityhorror · June 30, 2009, 12:00am

Jeff,

one more year of shallowness on your blog and you will be bought out by Comedy Central.

Have you actually read and understood what you wrote? It is complete and utter nonsense.

First, you are comparing apples and oranges. Second, you do not provide a convincing scenario for using LINQ.

In your previous post, you are praising Apple and iPhone 3G - apparently not knowing anything about simple and elegant abstractions across APIs on Apple platform. Isn’t this some sort of hint, if not a full proof, that simple abstractions work well?

Tim · June 30, 2009, 12:00am

You’re not comparing apples to apples. You asked LINQ for the entire object, so it selected everything. Ask it for the ID, and it selects the ID. The SQL generated in this case is identical to the hand generated SQL.

AndersB · June 30, 2009, 12:00am

Jeff,
I suggest rewriting the article with examples comparing apples to apples. It makes no sense comparing the performance of a LINQ to SQL star (*) query against a single-column raw query (which, obviously, takes the lead).

It’s so easy to do compiled-, single-column- and narrow queries with LINQ to SQL (and LINQ to EF, for that matter). Did you even bother to look into compiled queried? Ever heard of the First, FirstOrDefault or Single extension methods?

I generally enjoy reading your blog entries, but this one was a waste of time, and just plain wrong.

AndersB · June 30, 2009, 12:00am

I think some of the comments are wrong.

The goal of LINQ to SQL (and other ORMs) is not to keep the developer from having to learn SQL; learning SQL is just as important as learning your primary programming language (e.g. C#). Obviously, you don’t have to, but it makes sense.

LINQ to SQL etc. provides a flexible CRUD mechanism that relieves the developer from maintaining a huge chunk of plumbing; who cares about plumbing such as CRUD tiers? Clients certainly don’t care about it; they care about business value and there’s (typically) little business value in a CRUD tier.

GregorP · June 30, 2009, 12:00am

Why not make an attribute, affecting strings, that would make the compiler know the string contains an SQL query and it would check the completeness of the query, causing compile-time errors when an SQL clause goes awry?

Something like:

[SqlString(Connection=“SqlConnectionString1”)]
string query = “select * from foo.bar”;

I’m sure we could complicate SqlStringAttribute further, providing schemas, SQL dialects, limitations and whatnot.

Nosredna · June 30, 2009, 12:00am

Jeff, you should have asked on Stackoverflow how to optimize your LINQ before you wrote this blogpost.

James · June 30, 2009, 12:00am

Higher-level abstractions don’t have to gloss over the lower-level details. You can have the ‘clean’ abstractions, and you can also have high-level abstractions for specifying the required lower-level details for performance.

That is, you don’t have to deal with the lower-level details in a lower-level way.

Daniel · June 30, 2009, 12:00am

To expand on the ‘select new{p.Id}’ comments, you can also avoid multiple db trips by selecting everything you need (not just the id):

(from p in db.Posts
where …
select new PostListItem{Id = p.Id, Title=p.Title, …})
.Take(48)

I like to strong-type the results, and draw a subtle distinction between the canonical domain class (ie Post) and a related type (PostListItem). One might even simplify with extension methods:

(from p in db.Posts
where …
select p)
.Take(48)
.AsPostListItems()

Paul · June 30, 2009, 12:00am

Good post.

I won’t go on about your example, that’s been covered. I won’t go on about the fact that everything is an abstraction, that’s been covered too.

I would like to bring you up on “Like desperate citizens manning a dike in a category 5 storm …”.

May I suggest it is more like “Desperately juggling bowling balls, chainsaws and car batteries while trying to keep a dozen plates spinning on their poles during a force 10 gale of pompous criticism.”

Paul