A Visual Explanation of SQL Joins

thanks, this is the best Joins explanation, the diagrams HELPS ALOT!
this solved my problem, thanks again

There are sixteen possible Venn states with two variables, you forgot eleven of them :wink: Seriously, I wonder how many people have independently made the connection between SQL join and Venn. I also worked this out a while back, it being a expansion of the UNIX sort and uniq combinations I used to.

Given two lists, and given that each list has only unique entries inside itself:

sort a b b | uniq -u (gives only what is in a)
sort a b | uniq -d (gives what is in both a and b)
sort a a b | uniq -u (gives what is only in b)

I found this useful for simple lists like hostnames or IPs

Thank you so much for this explannation!!! NOW I CAN FINALLY UNDERSTAND THIS!!!

Thanks for this great visual explanation - just what I needed.

I think a venn diagram of a cross join would show a single filled circle equal to the area^2 of the previous sets of circles.

Superbly explained Jeff. I must say that I am very naive when it comes to database programming since I am an applications and networking programmer primarily and as such, have a tendency to view databases as a big black box of data, nothing more…

I liked your usage of Venn Diagrams, it helped me see joins in a completely different light, I was very narrow minded.

That being said, could Venn Diagrams be applied to three or more table join statements? I find myself running into some problems recently in my rather inferior SQL statements hence the question.

Once again, great work, really enjoy reading your posts.

Old post, and this will probably be lost among the spam, but…

While joins themselves can’t be accurately defined by Venn diagrams, the relationships between joins can be. With the left and right circles representing the left and right joins, their union is the outer join and their intersection is the inner join.

Awesome explanation, thanks!

Nice Visuals…
thanks for the explanations…

7 or so years after this was originally posted, I still refer to it all of the time & refer others to it. Nicely done!

This post is hugely misleading. There is some relationship between a join and an intersection, sure. But they are not the same thing.

In particular, an intersection is a join in the lattice of sets. But tables/relations aren’t just sets. They’re sets of tuples, and they have a richer notion of a join, based on quotients and other algebraic constructs. The join of a pair of relations is the smallest relation that is bigger than them both. The inner join is a closure operator. The (left) outer join is an adjoint operator. Etc.

The funny thing is, the blog post gave you half the story in 10 times as many words. Don’t fight math. It will always win.

Great article! According to one of the post about other articles about sql joins I recommend another with similar approach (venn diagram, the example table and query with result for each case): http://www.vertabelo.com/blog/technical-articles/sql-joins

2015 and STILL your example is the very BEST on the web!

I have been looking for this explanation for years, thank you.

If you’ve been using SQL, but not primarily (in other words, as a means to an end), for many years, you know when you need to use a JOIN. Most programmers don’t care enough about SQL or databases or set theory or discrete math to learn this theory in detail and pages like this are critical for us to get our jobs done.

That’s mathematical rigor and 99% of programmers don’t need it. Programming is about getting the math good enough, or else no programs would ever get written. We already know where the math matters, for example, in encryption code.

You missed the point. The rigor makes things easier, not harder. It takes 10 times as many words to explain things the wrong way. And then you often need to know how to un-roll the wrong explanation into the right one.

How do you figure out a query’s computational complexity with the “set” explanation? You have to unroll it into the real thing (i.e., tuples) and then do a counting argument. If you start with the real thing, the counting argument takes less than a minute. If you start with the wrong explanation, you might never even reach a valid estimate.

How about something like this for the cross join diagram?

2 Likes

Hey thanks for making JOINS visually familiar with the notion of Venn diagrams. :slight_smile: I want to thank my colleague Hengineer for pointing me to this reference too. :wink: But I have a question:

I’m getting more comfortable with SQL syntax. I use this as a resource. In your examples, your functions are:

  • INNER JOIN
  • FULL OUTER JOIN
  • LEFT OUTER JOIN

On the site I mentioned, these are the JOIN functions:

  • INNER JOIN
  • LEFT JOIN
  • RIGHT JOIN
  • FULL JOIN
  • SELF JOIN
  • CARTESIAN JOIN

I see a couple matches and similarities, but I don’t want to assume. Also, the wiki page you mentioned in your intro kinda messes me up too :frowning:

Can you or anyone walk me through the syntax (maybe semantics?) of these function names, so I can be clear about what really does what?

Thanks!

It is a nice explanation, but why you use tables with only names?
A real scenario would be better.