Sorting Result Lists
Tim Bray talks about the trick to minimizing compute overhead when sorting result sets. Not being a search engineer this is a bit over my head but being a programmer (hack) it seems reasonable...
Here’s the trick: nobody will ever look at more than the first hundred or so results. So you don’t have to sort at all, you just have to find the highest relevance values. Here’s the algorithm:
- Maintain a list of twenty most relevant entries, and their relevancy numbers. Initially it’s empty.
- Fill the list with the first twenty results in whatever order they’re in, and sort them in descending order of relevance.
- For each of the remaining items in the list:
-- Compute its relevance value.
-- If it’s less relevant than #20 in the most-relevant list, you’re done with it.
-- If it’s more relevant than #20, add it to the most-relevant list in the proper place, shuffling down and losing #20.
What do you think search geeks?