Maybe Normalizing Isn't Normal

ScottF · July 16, 2008, 12:00am

If you have to consistently join many tables to get related data, then just stick a stub key in there and voila … faster performance, less joins.

Dali · July 17, 2008, 12:00am

Pragmatic is just another word to thinking and honestly trying to solve a problem, not waive a flag.
I have 2 questions though:

For a second there I am questioning my understanding of the concept of DB normalized design. Why on earth is the userScreenName a separate table? if 2 tables have 1:1 relationship, and each row in X has a row in B, what is the point to separate them? (maybe I am missing the business context of what the table means…)
how did you create this diagram? doesn’t look like the SQL 2005 diagram or is it?

TooL1 · July 17, 2008, 12:00am

i can denormalise from a normalised db in 1 (long) line of sql. i can’t say that i can normalise in 1 line of sql.

but then i’m just a dark little that burns dimly next to some of you frikkin geniuses

john23 · July 17, 2008, 12:00am

Shane - of course it’s not relational. s I think they use a giant excel spreadsheet to store data from google app engine.

Mike · July 17, 2008, 12:00am

It’s a tough nut to crack. I prefer to normalize simply to avoid data inconsistancy and logically it makes sense. Data updates are also easier in the normalized tables instead of inside the denormalized tables. But I do agree that using SQL to extract the data is crazy mad! I’m a newb to SQL and trying to figure out the complex joins is rediculous! Perhaps there is some sort of hybrid or way to meet normalization in the middle.

TJ113 · July 17, 2008, 12:00am

wow - there are a lot of miss-imformed IT staff out there

Frank · July 17, 2008, 12:00am

@Michael Hanney:

I don’t agree. The table USERS contains instances of class User. Hibernate is perfectly capable of mapping a classname in single, like User to a tablename in plural, like USERS.
Next thing you are telling us is that we should use camelcase in database object names…

Simon · July 17, 2008, 12:00am

Not the best post post Jeff, I see your sentiment, but some poorly chosen wording and a silver bullet solution have turned it into a battlefield.

As one poster said, I recommend you post some actual situations with your optimisations shown and explained.

locke · July 17, 2008, 12:00am

While I lean towards

User {UserID, UserName,Active }
Affiliate {AffiliateID, AffiliateName, Active}
UserAffiliateJoin {UserID,AffiliateID} - it’s a m…n (many-many) join

!And I value surrogate keys over natural keys (mostly because natural keys are misidentified and change, and cascade update is lame)

Whatever you choose, be consistent! It allows you to write code that can infer relationships, and makes your code, queries, etc look better. There’s nothing worse than databases with a hodgepodge of intermixed styles for naming (both table and column). It makes it terribly hard to read and understand for everyone involved, and you can’t infer anything via code or in your head.

PS Dali, I think
UserScreenName

badly named. It looks like a join table.
it’s meant for 0…n external system names. MSN, AIM, Yahoo, Facebook, etc?
poor naming of fields and is misleading
What’s an ‘im’ anyway? if it’s supposed to be external site id’s, you’d think that im_service would be better replaced with another foreign key e.g. ExternalServiceTypeID.

e.g.
ExternalSystem { UserID, ExternalServiceTypeID, ExternalUserName, Password (yeah right) }
ExternalServiceType { ExternalServiceTypeID, ExternalServiceTypeName, HomePage, LoginPage, Description}

et cet er a

I’m just a coder, but I try to code databases that don’t suck.

Mauro · July 17, 2008, 12:00am

I have to admit - I am a tables are singular type dude and for one reason only.

I use a self written code generator to create my classes, collections and data access layer as well as user controls and stored procs.

Having singular table names means its just a bit easier to programmatically generate all that stuff!

stEvil · July 17, 2008, 12:00am

Where I work we use automated object persistence to store everything. On the up-side, it takes 2 minutes to create a database for the app, and it is ALWAYS correct as it is built for the classes. On the down-side, EVERTHING has a table.
A User record is linked to a user data record in the user data table. The user class has a base class, so the user record is linked to from a user base record. Ad nauseum…
If you want to manually run a SQL statement to check something (as we sometimes do during testing) you need a second brain, a gallon of caffienated drinks, and a solid hour of JOIN statements.

stEvil · July 17, 2008, 12:00am

…and if you can’t handle table names/classes/properies/variables with both plural and singular names you should go back to writing HTML.

Honestly, as long as it works and the name relates to what it is (user or users) then job done. If you are getting flustered by the grammar of your code then you are not fit to use a computer.

Use whatever you want, test as you go, and don’t whinge about bad code when you can’t understand.

JoeScylla · July 17, 2008, 12:00am

imho, if you’re need to denormalize your database for performance you’re using the wrong DBMS.

I’ve read once the story about the development and database design of flickr and they were forced to denomalize and replicate data because they were using a database engine lacking features they need.

So, denomalize if you need more performance but don’t have money for appropriate hardware and DBMS.

Michael_T · July 17, 2008, 12:00am

I’ve read stories of DB/2 databases that appeared normalized to users, but in the back end several tables were largely denormalized with the ‘normalized’ world appearing primarily as views.

Help me out: Am I remembering this correctly, or did I miss a key detail?

John_B · July 18, 2008, 12:00am

I’ve experienced many databases with varying degrees of normalization.

I’ve enountered performance issues on both sides. I’ve seen cases where overly denormalized databases caused so much redundancy we hit serious performance issues.

I’ve seen cases where normalization purists have imposed a database design on users that had a substantially negative impact on business operations because performance went in the tank compared to previous systems. (And yes, they were specific cases, and the problem was resolved by denormalizing a small piece of data.)

My general rule is to design normalized. Then, when a need comes up to stray from normalization, justify each need (with data and business requirements).

The database design presented in this article is fairly simple. I’ve only run into problems with normalization when the design is more complex, and the volume of data is large.

JohnB · July 18, 2008, 12:00am

err, anonymous-

That is kind of what makes Jeff and Dare’s fictitious example pretty strange. It’s a simple database schema for a generic social networking site, yet it doesn’t include any timestamps. Social networks tend to be designed around streams of inforamtion, not sets of user data. Isn’t that what the folks at Twitter took forever to learn, especially when they said Rails doesn’t scale when primarily it was their system eventing model that blew?

Jay_Lynch · July 18, 2008, 12:00am

An entire post dedicated to an issue that is (within the circles of anyone with a clue) ientirely/i without contention?

Way to pander to a clueless cloud for blog ‘ratings’…

Long-time fan of both yourself and Joel for flipping through on a (very) brief skim-the-post basis, but now that I’m actually putting in time to read the podcast, etc…

You really need to learn how to recognize WHEN you need to STFU and let Joel speak. OH so often he’s giving perfectly good advice based on experience and nobody can hear because you’re busy desperately scrambling to restate everything you’ve said to fit his opinion… =(

BuggyF · July 18, 2008, 12:00am

@Michael T
In the Mainframe DB2 world (as quite distinct from Linux/windoze), most use of the database is retro-fit over (under?) 1970’s era COBOL programs which are written to VSAM semantics/structure. I live in that world for the moment. Nothing is normalized.

At a lower level, DB2 on the Mainframe is a layer over VSAM, even today. The zealots will say that the DB2 VSAM files aren’t supposed to be visible to casual VSAM calls. Nevertheless, the semantics of storage is VSAM.

Neil · July 18, 2008, 12:00am

This may have been mentioned already as I’m late in reading this article and there are a tone of comments but I’m a bit suprised that more developers haven’t worked on some Business Intelligence and Data Warehouse solutions. A schema set up for reporting and analytics is highly denormalized in terms of performance because you typically end up working with large amounts of business data. Reading Kimball and knowing how to model dimensionally should be part of the developer’s tool set.

http://en.wikipedia.org/wiki/Dimensional_modeling

Grom · July 18, 2008, 12:00am

@Jeff

Unless you are doing a lot more reading then writing

Aren’t most – perhaps in fact all – websites in this class of application? Millions of reads, handful of writes.

you should denormalize as much as it makes sense to do so for read-heavy loads. Most web applications are of this type and do not even require a regular database necessarily

My point exactly!

But you don’t need to denormalize to solve the problem. Caching is (usually) the answer. There are cases where you may want to denormalize, but there are a number of options before you get to that.