While working on my webslice, I wanted to add some content to let the users know what’s happening on the website. So I was thinking to provide some random posts from my blog. Most of us know how to use the WordPress default methods to get the posts, to ask for random ones, but how to do it outside the warm WordPress environment?
I started digging in MySQL help (soon I should become a guru 😉 ) and find out that it is quite simple if all you want to do is to return some random rows from a table, let’s say a collection of random posts in your blog. Easily, MySQL’s base functions like ORDER BY RAND() come to the rescue. I found lots of examples, so I asume that everybody’s doing it. At least on my search on Google, all the PHP programmers did it:
SELECT * FROM wp_posts ORDER BY RAND() LIMIT 3
I set the limit to 3 because I don’t want to bother my users with all my posts, three are more than enough.
So I did it. But wait, there’s a catch! It really does what it’s supposed to. But what will happen when you reach high levels of users? Your webserver probably will suffer a lot from this. OK, it is not the case in articles,
Your social network keeps growing and growing and after about a year and 50000 new users you realize a slow-down on the page where you show random users. You’re thinking of caching it, but what’s the point? It wouldn’t be random. You break it down with EXPLAIN and realize with horror that your fancy query doesn’t use the nice index you placed on the table ages ago. And why would it? It’s calling RAND() for every row in your table, there’s just no way around it. So what to do? One alternative is to fetch random IDs first and then join the IDs found with the USERS table to fetch the real data. How do we do that? Why, using ORDER BY RAND(), of course. Wait, didn’t I just say you’re not supposed to use it? Well, I did say that, but the difference is that we’ll run the ORDER BY RAND() on the best-indexed column there is, the primary key. Let’s break it down and get our hands dirty:
SELECT USERS.* FROM (SELECT ID FROM USERS WHERE IS_ACTIVE = 1 ORDER BY RAND() LIMIT 20) AS RANDOM_USERS JOIN USERS ON USERS.ID = RANDOM_USERS.ID
And with a little bit of thinking we got ourselves a nice and fast way to fetch random data. Most likely there are other ways out there (sometimes I do miss Oracle’s ROWID), but this one worked fairly well for me. It probably won’t scale forever though, so be prepared to get back to it every once in a while.