The Trick to Sorting Search Results - Tim Bray Explains

3 comments
Thread Title:
Sorting Result Lists
Thread Description:

Tim Bray talks about the trick to minimizing compute overhead when sorting result sets. Not being a search engineer this is a bit over my head but being a programmer (hack) it seems reasonable...

Here’s the trick: nobody will ever look at more than the first hundred or so results. So you don’t have to sort at all, you just have to find the highest relevance values. Here’s the algorithm:

  1. Maintain a list of twenty most relevant entries, and their relevancy numbers. Initially it’s empty.
  2. Fill the list with the first twenty results in whatever order they’re in, and sort them in descending order of relevance.
  3. For each of the remaining items in the list:
    -- Compute its relevance value.
    -- If it’s less relevant than #20 in the most-relevant list, you’re done with it.
    -- If it’s more relevant than #20, add it to the most-relevant list in the proper place, shuffling down and losing #20.

What do you think search geeks?

Comments

If its "OpenText Tim" then sit up and notice

Living legend does not do it justice, having said that utter bolloxs doesn't do this justice either http://www.antarctica.net/About_Us/

With the exception of Elvis, Larry Page, Shane Mcgowan, Martijn Koster and Kylie he is the person I would most like to have a pint with in a quietish pub.

[would have to drop the goatee in this manor though]

Makes sense to me

Another way to put it: suppose you want to know the 100 closest points to a given point, and you've got one million points to screen. As you screen each point, you only need to keep it if it's one of the top 100 so far.

Hsy!

I get it! :)

Thanks GG....

I've been following Tim's blog for a little while now, he's a very interesting chap...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.