Peter Norvig on Semantic Web Ontologies & Search Spam

SEO Training.
Thread Title:
Semantic Web Ontologies: What Works and What Doesn't
Thread Description:

"Humans are very good at detecting this kind of spam, and machines aren't necessarily that good."

Google's director of Search quality Peter Norvig talks about semantic web ontologies and the challenges faced when looking at them in terms of Search:
On the difficulties of using semantic ontologies with public systems:

Now imagine what it would be like if instead of using our algorithms we relied on the news suppliers to put in all the right metadata and label their stories the way they wanted to. "Is my story a story that's going to be buried on page 20, or is it a top story? I'll put my metadata in. Are the people I'm talking about terrorists or freedom fighters? What's the definition of patriot? What's the definition of marriage?"
Just defining these kinds of ontologies when you're talking about these kinds of political questions rather than about part numbers; this becomes a political statement. People get killed over less than this. These are places where ontologies are not going to work. There's going to be arguments over them. And you've got to fall back on some other kinds of approaches.

On Search Spam

The last issue is the spam issue. When you're in the lab and you're defining your ontology, everything looks nice and neat. But then you unleash it on the world, and you find out how devious some people are. This is an example; it looks like two pages here. This is actually one page. On the left is the page as Googlebot sees it, and on the right is a page as any other user agent sees it. This website—when it sees Googlebot.com, it serves up the page that it thinks will most convince us to match against it, and then when a regular user comes, it shows the page that it wants to show.
What this indicates is, one, we've got a lot of work to do to deal with this kind of thing, but also you can't trust the metadata. You can't trust what people are going to say. In general, search engines have turned away from metadata, and they try to hone in more on what's exactly perceivable to the user. For the most part we throw away the meta tags, unless there's a good reason to believe them, because they tend to be more deceptive than they are helpful. And the more there's a marketplace in which people can make money off of this deception, the more it's going to happen. Humans are very good at detecting this kind of spam, and machines aren't necessarily that good. So if more of the information flows between machines, this is something you're going to have to look out for more and more.

Interesting read all round eh? thanks DG for the link...

- Y! MyWeb

We won't use meta data because it's deceptive

But we think letting webmasters determine whether or not a link counts in our ranking algorithm is a great idea. Go figure.


The Solution is Folksonomies

Just let users TELL you how valuable they think the site/page is in relation to the query. It could be as simple as adding a rating bar to the toolbar or a system to let users classify sites/pages on their own - del.icio.us has some abuse, sure, but the more users, the less a chance that abuse will overtake honesty. Just look at how Wikipedia manages...