All Abstractions Are Failed Abstractions

BrunoL · June 30, 2009, 12:00am

While I agree the database is the main suspect, I don’t think it is guilty of a performance offence.

During the second example, the database might have cached the row ids from the first query knowing you were going to ask for the full rows in the next query.

It might not have the rows already in its hands, but it knew exactly where to retrieve them from the disks.

Disclaimer: I’m not a database developer, but it is what I’d do if I was one.

Drew · June 30, 2009, 12:00am

Absolutely agree with the core statement here: Abstractions are great until you hit their limits.

Abstractions are great for what they’re there for: Making code much easier to read and write. Yet as the demands on the machine you need increasingly precise movements of your 1s and 0s (or a whole lot more/better machinery to move them with.) If your abstraction doesn’t move your 1s and 0s as precisely as you need them (as is the case here) you need to drop down a level and get closer to them.

Eventually we will hit one boundary that we won’t be able to abstract away: Electrons.

Brent · June 30, 2009, 12:00am

“Selecting all columns with the star (*) operator is expensive, and that’s what LINQ to SQL always does by default.”

Um, not so much Jeff. LINQ to SQL selects exactly what you ask it to select by default. If you change the second-to-last line of your initial query from “select p).” to “select p.Id).”, LINQ to SQL will just select the Ids.

Saying that LINQ to SQL selects all columns with the * operator by default is no different than saying SQL itself selects all columns with the * operator by default. Which is to say, it’s absurd. If you write “select p” you’ll get all columns using the * operator, just as if you write “select *” in native SQL. In neither case is it the default, it’s merely what you asked for.

Hoffmann · June 30, 2009, 12:00am

The real problem is finding where the abstraction leaks. If you use a profiler along with it you only have to change your LINQ querys with SQL in the parts of your code where it really slows down everything.

Is LINQ really wroth it? Seriously, it is whole another language to learn and to use, in the end you still have to know SQL and know it well in order to optimize the slow parts of your code.

Darren_Kopp · June 30, 2009, 12:00am

Justin Chase is correct. All you need to do is project the data. I am not sure is saying

select category.ID

will do the projection, but i know that saying

select new { category.ID }

will project the data and give you what you want. Abstractions leak the most when you don’t spend the time to understand how the system works.

MarcelL · June 30, 2009, 12:00am

@Hoffmann: “Is LINQ really wroth it? Seriously, it is whole another language to learn and to use”

It’s important to note we’re talking about LINQ-to-SQL here, not just any LINQ to variant.

I don’t think it’s accurate to describe LINQ-to-SQL as a “whole another language” - it’s a bit of syntax to the .NET languages. It’s not like, say, sitting down and learning Ruby. If you already know SQL, even a little bit, learning LINQ-to-SQL isn’t a big deal.

configurator · June 30, 2009, 12:00am

I see you do

foreach (var post in posts.ToList())

in your code. Why do you call .ToList() before running over the results?

Raj · June 30, 2009, 12:00am

@Dennis:

Fail. Microsoft is not abandoning LINQ.

What’s the point of using LINQ? When you learn to write code, you’ll appreciate the difference between compile-time and run-time errors.

FrancescoR · June 30, 2009, 12:00am

Seems appropriate to share this link on the topic of abstractions: http://cuiltheory.wikidot.com/

Ryan_Tenney · June 30, 2009, 12:00am

@Bruno you’re absolutely right, there is caching, at very least in the sense of “Oh i know where that is!”.

I wrote a DB abstraction library based on LINQ, but written for PHP, and the performance compared to straight sql is HORRENDOUS (10ms using the lib, vs 0.2ms with mysql_query($query)). I’m working on improving it and PHP 5.3.0 will go a long way to help with that. But even with that performance hit, I so greatly prefer manipulating objects and calling functions than dealing with arrays and piecing together sql query strings.

Jasmine · June 30, 2009, 12:00am

@janos erdelyi

Yeah I find it totally creepy too. Database already has an abstraction layer - stored procedures, NOT written by procedural programmers who don’t know crap about how to treat a database. I’m working with an application right now which does nothing but client-side dynamic SQL, and the performance is horrible, it is prone to errors, it is capable of screwing up data integrity, and overall it is extremely buggy. Why? Because the people who wrote all the SQL don’t know crap about SQL. LINQ can not be good for this situation.

webdev_hb · June 30, 2009, 12:00am

If you think LinqToSql is bad then what do you think about developers that make up for their poor SQL skills by using SQL CLR functions?

…not that I’d know any developers that bad or anything… uh… like clearly not me…

DavidN · June 30, 2009, 12:00am

@Dennis

Microsoft is not abandoning LINQ. Nor are they completely abandoning LINQ to SQL, at least not yet. They have stated that they will focus the majority of their efforts on the Entity Framework instead, which (in theory) will have the same capabilities and usage pattern as LINQ to SQL, plus a lot more.

As “the point of dealing with LINQ”, it is that you can write queries that the compiler understands, instead of writing an opaque blob. Yes, you will probably still have to understand SQL in order to get the most out of it, but you don’t have to mix your C# code with your SQL code, which leads to a more maintainable system.

Russ · June 30, 2009, 12:00am

I think this is a poor argument as presented, but I think there is still a point to be made. The problems you have seem (to me) to be with declarative languages in general. The problems you have in your example exist in SQL alone so it’s wrong to point the finger at LINQ (as others have stated).

To the point you could make, though, all of what you say is true regarding SQL. Since there is a “shroud” (call it abstraction if you will) of sorts in every declarative language (no way to say exactly how to do things - just what you want), you will always have the behavior mentioned above.

The reason you are more forgiving to SQL is you have learned enough about the query engine to have some say in how the query will indeed run.

If I had to guess, your beef in general, is with declarative languages.

My $.02

SimoneD · June 30, 2009, 12:00am

we programmers keep piling up these leaky abstractions, shoring up as best we can, desperately attempting to stay ahead of the endlessly rising waters of complexity

I find the use of the word “piling” dangerous here. As in, too many coders are afraid to go back and readjust the “lower level” abstraction in the name of “frozen API”, instead opting for the little cozy hack that one day will [this part removed due to excessive swearing].

It is important to always strive to make abstractions better: ofcourse not changing everything every three minutes, but keeping things up-to-date

Anonymous · June 30, 2009, 12:00am

You’ve seen Django ORM, right? If not, check it out: http://docs.djangoproject.com/en/dev/#the-model-layer

Beats LINQ big time SQL-wise.

Kearns · June 30, 2009, 12:00am

I love LINQ. I have had SQL queries that were very complex and would take quite a bit to wrap my brain around come through much more optimized with LINQ. The issues here isn’t LINQ = minscore
orderby p.LastActivityDate descending
select p.Id).
Take(maxposts);

loop over those postIds, or use some method I can’t think of right now to turn it into a list of the correct variety (again I’m not typign this in Visual Studio)

Then:

var posts = from p in DB.Posts where postIds.Contain(p.Id) select p;

DamienG · June 30, 2009, 12:00am

There is no need to put ToList() in your code samples, just foreach over the IQueryable object.

If you just want the ID’s tell LINQ to SQL and it will do that for you too.

e.g.
var posts =
(from p in DB.Posts
where
p.PostTypeId == PostTypeId.Question &&
p.DeletionDate == null &&
p.Score >= minscore
orderby p.LastActivityDate descending
select p.ID).
Take(maxposts);

As to the performance figures of issuing multiple queries instead of one in my experience this has performed poorly. I wonder what exactly is going on with your database…

[)amien

DavidN · June 30, 2009, 12:00am

@Jeff

I agree that all abstractions are leaky. However, I disagree that “leak” == “fail”. The point of abstractions is NOT to make the underlying mechanism completely irrelevant; the point is to allow us to focus on the task instead of the mechanism as much as possible, and to accomplish the task more easily and more safely than is possible with the underlying mechanism.

LINQ to SQL allows me to focus on the task I am working on instead of the mechanism by which I am getting my data out of the database and into objects. It makes it easy and relatively safe to work with data, which is the whole point. Therefore it has succeeded in its goal. The fact that it is not a perfect abstraction is irrelevant, especially given that there is no such thing, as Joel pointed out in the first quote.

SteveJ · June 30, 2009, 12:00am

Even if an abstraction is horribly leaky, it’s still useful for communication purposes. If you have 10 people on your team that speak English and one that speaks English and Hindi, then that one person will translate for the others. If you’re translating a foreign concept, some foreign words might leak out, but having 95% of that translation in a form you understand makes that leakiness easier to handle.

[Aside: No one makes the argument that everyone must speak English and Hindi. But if you make the same argument that 10 developers speak C# and one speaks C# and SQL, the masses will proclaim that everyone should speak C# and SQL.]

It’s the same argument when you talk about customer documentation: The document is an abstraction for the application and the code behind it, it’s completely inadequate and inaccurate, but easier to deal with from the customer point of view. Oftentimes developer speak leaks out - dialog, server, xml, but it’s usually tolerable, because an attempt has been made to abstract away the gritty details (i.e. Be thankful I didn’t drop the source code in your lap and run).