Should Search Engines return to Hand Checking Results?

39 comments

SE's have to Return to Directories and Hand Checking

So it is now obvious everyone will soon be autogenerating content and auto link building. What do the search engines do to keep their indexs clean and with good quality? Hand checking!

If I go to google and search on "viagra" let us be honest what I should be seeing are:

  • the manufacturer
  • the largest sellers (not affiliates)
  • some quality research papers from established sources

How do google make this happen? Hand checks.

I propose that soon, if it is not already been done, the se's will have teams of people who's jobs it is is to investigate an industry and make sure that the major players all show up for the main searches in an arena. For 5 term searches let the rest fight it out.

DougS

Comments

Human editing

is going on.

If you are a search engine you don't "need" to human review every new page that is indexed.

All you need to do is keep the first page (or even just the first result) of the most common(poplar) queries free from crap.

A small group of oh say about 385 people in some small, yet to be named, near east country easily could do this using eval.google.com...

I considered a patent applica

I considered a patent application a while ago, but decided against it because I'm not sure it would be sufficiently unique.

But here was the idea:

Think of how a spreadsheet works - imagine, for example, a spreadsheet about your sales people - names, sales figures, address, etc.

Normally, you have the results sorted by Column A, or whatever the most common listing view is. Perhaps alphabetically, perhaps sorted by number of sales, etc.

Most of the time, that works fine. But sometimes, you want to take that same list and re-sort it using different criteria - perhaps by location, or number of clients, or whatever. Sometimes you will also want to use multiple criteria - sort by city and then by sales figures.

With me so far? Fine. Now apply to that to a SERP. When you re-sort something, you are in essence using a new algo. NOT a filter.

Imagine a search engine interface that had predefined (and even customizable) search algos that could be applied to a specific search.

You do a search on, to use an example from a previous post, "cars". By default, you would get the "best guess" algo listing. But look! There is a toolbar that lets you re-sort/research based on certain criteria.

Some examples could be:

Newness
Authority
Locale
Commercial (ie ready to buy)
Research
Popularity

and so forth.

You do the search on "cars" and then if you don't see what you need, you can refine it with a click of a button. Better yet, you can actually create your own search algo and save it. Someone doing a lot of searches everyday as part of their job probably has a really good idea about what criteria is important to them.

This is along the same lines as the Refine Search and and advanced search options in some search engines. But there is a big potential difference - all those options do is redefine the search using the original algo - which may not be the best one to use for a particular search.

Maybe I dont' care how many sites are linking to a site. Maybe I want to find sites with very few links, but outstanding content. Maybe newness is more important than links to me. Maybe I've descovered that for some SERPS the people with the most duplicated content are actually the most relevant. Maybe I've discovered that the sites that buy PPC in a certain SERP are more authoritative than sites with lots of links. Who knows?

Right now, almost all "advanced search" functions have many (or most) of the same limitations as the base algo, and there is no real ability to monkey around with it - all you are really doing is messing with filters. You are not changing the basic functions. If that basic function is based on a flawed premise for your particular search (ie lots of links = a good site, that's good sometimes, but not in other scenarios) then you have an issue with relevance.

I'm saying - use a whole new algo. Why restrict yourself to just one?

I really like MSN's new Search Builder function (if you haven't messed with the "Results Ranking" slider, you really should. I think it's pretty close to what I'm suggesting.

But I'd rather click a button.

Ian

Sure, why not have a fortune 500 algo then?

>>>Imagine I type in the word "cars", then I should see all the major players, ford etc. in the index.

This sounds good - then why not, for some terms like fortune 500 companies, etc / have a preset list of results for the top teir spot? For such small sets of data, nobody is going to get offended (eg, if you search for goodrich inc, a fortune 500 company, you should find them at the top...and nope, i'm not affiliated).

Similarly for Ford, etc / and there are probably local language variants of the same - this type of thing would prevent a lot of the issues that you're proposing hand jobbing the entire set would supposedly help, while still giving the web the collective power to determine what goes where in the result set.

The line would have to be clearly deliniated, of course - but using data, instead of people, to make the call on what gets 'rigged' or not takes out the element of human bias & perhaps questionable motivation whilst a virtual army of folks is setting up their favorite sites to be in the top spots.

simplifying it

The directory idea may not have to be a huge one that is monitored all the time, but means that G may have one that has only certains sites in.

Imagine I type in the word "cars", then I should see all the major players, ford etc. in the index. If I see some starnge url that redirects to Ford I won't click on it.

How far this concept is taken I don't know, but it would not be that hard to dig out the top 100 sites in the top 100 sectors and make sure they are in the serps.

The top "real" sites in any sector have similar linking structures anyway..ie the major type sites that link to Ford also link to GM. I don't mean the links are the same I mean from a high level view they have a similar pattern. This is hard to copy and will be the future of algo search, in my humble opinion. But these patterns can be replicated with enough resources and cash, but having the tick in the directory that says you are a "car" player would be the deciding factor.

Mind you......if you have the cash an resources to manipulate linking patterns to be the same as a plc then there is an arguement that you are a "car" player.

DougS

RSS

Well, RSS is cool, no doubt. It can mirror existing content. It can be used to aggregate existing content. But it can't auto-generate any new (unique) content that both SEs and humans would find tasty to munch.

>The real art is to, without

>The real art is to, without redirection or cloaking of any sort, auto-create content that:
-- Humans find useful
-- SEs like

an art no doubt! sounds like a job for RSS maybe?

Priorities & FrankenContent

Seems Google would set priorities on what to hand check. My guess is:

-- Search term volume
-- AdWords bid price

And to Auto-Gen: It's really easy to scrape, grab (grope), chop, slice, dice, mix, and mash existing content to re-create "unique" content. FrankenContent (you heard the term here first).

The real art is to, without redirection or cloaking of any sort, auto-create content that:

-- Humans find useful
-- SEs like

I'm no expert but I can't ima

I'm no expert but I can't imagine how they could do handchecking on any large scale either.

What I don't understand though is why they haven't been able to detect blog spam. Of course they couldn't penalize sites with links from blogs, but you would think they would be able to recognize them and discount them. Seems like that would solve a lot of their problems.

Ohh come on - this is naive the point of ridiculous

Quote:
Use this directory as a sifting process...ie sites within this directory linking to other sites within this directory are deemed as being more important by the algo.

Bait and switch!

I really don't have an issue

I really don't have an issue with the hand filtered serp theory, although to be honest I do sometimes wonder whether it really happens to a wide enough extent. Reason? Well, I know of a domain or two that uses a consistant layout(template) in 3 sectors very competitive) that are still there 3 years after I first spotted them and said hmmn, hidden text kw repetition in the main, along with a mini network of IBl's towards a multitude of target kw's, simple stuff but its working..

T'would be nice to think that you can build a site in a uber or semi uber comp cat, and so long as it rocked, it would be pretty safe, but at the same time you'd be forgiven for wondering whether its really worth the effort, especially when you see spammy knocked up alternatives consistantly outperforming your aff equivalent with extras...almost begs the question, why bother if its not rewarded.

IMO,second best in terms of content/usabilty/unique offerings shouldn't cut it in the free serps, good sites offering good stuff over and above should do better.

Hand jobs, are all well and good, but as has been said, its pretty tough to stop a relentless onslaught of spam, brought about by the relative ease of implementation...(maybe thats what the buy adwords filter is all about.)

SEo's are like the borg, whatever counter measures the se's stick in, the sem community, sooner or later finds a way around it.

Listing Switches

One problem that the hand checking idea has is one of the big issues DMOZ runs into - you check the site, see that it's nice and clean, and include it. Great!

Sometime later, said listing is now owned by someone with less than lily-white motives (perhaps even the same person) and the site is now a link pop generator and has little to do with the topic and is pretty useless to visitors.

You hand edit once, you need to start doing all the time. You can't just "fire and forget". You'd need a checker that monitored changes (DMOZ has one, but I think it only checks to see if the site is still there, not if it has been changed)

Worse, there are really good new sites that are now being back-burnered because there is a perception that the directory topic is "full" and therefore adding more is less important than filling up areas that have less listings in them. Full of garbage, in some cases. DMOZ tries hard to combat this, but other directories are less concerned (and don't have the resources).

All this is a result of humans trying to do their best. It won't work. Machines are faster then people. What you need to do is up the ante. People are better than machines, machines are faster. So maybe combine the two.

I would perhaps consider a machine assisted human, instead. Have aggressive spam checking systems (far more aggressive than you could fairly use in a purely automated system). The algo brings it to the human's attention, the human decides based on experience and set, public, consistent policy.

We all do this to a small degree when we use an automated DNS check or look at a cache, but I'm suggesting going further.

The one problem I have with spam reporting is that it's not even handed - the site targetted gets hit and everyone else goes free - kind of like a cop pulling over the last car in a group of speeders - sure, he was speeding, but why ticket one guy and let the rest go?

It should be uniform, fair, and even, otherwise it's actually encouraging abuse, IMO. Asking someone to voluntarily stop spamming in a SERP where only spammers are showing up is like asking someone to show up at a gunfight unarmed. I'm not even sure you could *call* them a spammer under those circumstances. They are just using the ruleset that is actually being used and enforced.

I'm not saying you should encourage them buying the guns, I'm saying you stop the gunfight in the first place - no favorites, no exceptions for your friends, no choosing sites for review just because they are competing with you or someone you know. Same rules for everyone, across the board.

Then let the best site win.

Personal opinion,

Ian

Good thread - search engines giving hand jobs ;)

>>> searcher how a site got to the top spot as long as it gives me what i want

That, to me, is the crux of the issue - as long as the search results are providing what the user wants - who f@##ing cares how the site got there, if it is (or isn't) "spam" ;) and the like?

If "joe user" searches -> clicks -> gets what they want, their perception of the search engine goes up - eg, they think it's "good".

Now, if "joe user" searches -> clicks -> and gets some off topic shite, then that's another story / bad for search engine, bad for publisher (didn't get that conversion) and bad for "joe" as he has to go someplace else & search again, revise the query, etc.

Hand editing simply doesn't scale like bots, crawlers, spiders & indexers. Sure, the 'button pushers' have more & more tools...but algos are progressing along the same lines. I just don't see a regression happening where every major player starts enabling hordes of folks to police the results just to make sure the right 'quality' is there.

If you get watchers to watch the search results, who watches the watchers? How do you know they're not fluffing their own spam, and letting the rest slide...? One step down a slippery slope might well lead to the bottom, ya?

Hand Jobs

I cant quite see hand checks working either, as a few others have said. It's just not viable - yet all the SE's hand check to a certain extent and it would certainly do Googe for example no harm to step this up in some areas.

We've heard about block level analysis and all manner of other clever things they are supposed to be doing but even with advanced algos and ways of looking at pages, you're right, it's not enough.

My view is that as it stands, search is fundamentally flawed, and if something radical isn't done rather soon then the money cats will become even more of a killing field.

Is that necessarily a bad thing though? I've long thought that it's no really very important to me as a searcher how a site got to the top spot as long as it gives me what i want. Who wants to push even a small button to rank for a term that is irrelevant to your $$$ making intentions?

Who cares how they do it

The high level point here is it makes sense to have some form of hand check.

The details I don't know I could probably come up with 50 ways, which one they choose doesn't matter.

Quality hand checking gaurantees keeping the good stuff in.

DougS

>they'd have there own

>they'd have there own

that entirely goes against their approach though. I can't see them wanting a pulic database. if you are talking about paying for inclusion it is going to be a public database because people need to see the value in it. and if it is not a public database what are people paying for? to manipulate the search database relevancy? I do not think Google wants to allow that with the crap they have spoke against Yahoo!'s paid inclusion.

Google and Y??????

Seobook

With all due respect, they wouldn't they'd have there own.

Forget the specifics and just think of the high level concepts.

DougS

>Would that not cripple some

>Would that not cripple some expert sites that just dont have $300 to pay for a dir listing?

I have seen some less than stellar sites in the Y! directory (and some were listed free). plus why would Google want their business model to rely on Yahoo! so much.

Y! Directory

not really - if you can get links from sites with listsing in Y!(free or paid) you will still beneift

Gerbot

Hey Gerbot, welcome to TW mate, do introduce yourself!

Quote:
My inside info from Y! is they will be leaning on their directory this year for that exact reason.

Would that not cripple some expert sites that just dont have $300 to pay for a dir listing?

btw, TW got one for free the other day heh...

giving expert sites outbounds a bonus..

.. chances are that is what they are doing with "flavours"

select the best

How about having an elite directory. This directory is built by hand checks and maybe even allow people to pay to be in it.

Use this directory as a sifting process...ie sites within this directory linking to other sites within this directory are deemed as being more important by the algo.

So the sites that are listed first in organic search are the ones that are in the dir and linked to by others in the dir. The second lot of sites are just listed in the directory. The third lot of sites are the rest.

Anyone know anybody who has a directory that is hand edited and you pay to get into????????

DougS

hand checking will never work

hand checking bad sites is a losing battle - Google will struggle to keep up with me, not to mentioned the 1000s of other button pushers.

What might work is checking expert sites (hilltop) and giving their outbounds a bonus credit or two

My inside info from Y! is they will be leaning on their directory this year for that exact reason.

API Limit

Chris

You are limited to 1000 queries per day. Fair enough you could have multiple keys. But a scraper running through proxies has no limit :-)

All it would take...

Affiliates vs Search Engines

What would it take to start a war the engines couldn't win? Hmmm... what ingredeints would be needed to bake that cake?

The SE's would need to contribute:

  • Piss everyone off with adsense shite
  • Sandbox new sites for months with no explanation
  • Experience an enormous boom in blogs and have millions of people
  • all of a sudden interested in Search and why they are "sandboxed"

Site builders could add:

  • Serious resentment over some of the above
  • More and more sophisitcated tools made available for dodgy, index crippling spamming
  • Those tools and instructions on how to use them made available to bloggers

You'd only have to have maybe 100 or so good spammers make a concerted attack on the SE's and they'd be is shit so deep it would make Wall street cry like a baby.

The war should end. In fact, it never should have started.

What does the scraper do that API can't?

The G api has been around for ages, what advantage does the scraper code give?

It seems the clever autogeneration stuff is link building to get the bloody thing to rank and automating comments and reviews so well that the links dont get pulled, anyone with a days worth of copying and pasting free scripts can put up a scraper, datafeed, rss, google|amazon api site or 3000 built very quickly but if only 0.001% of pages get indexed or the pages are so buried nobody visits whats the point?

I think that is what is missing from the buttonpushing thread - lots of people are going to go off and think they can be millionares right away by cranking out pages but if it was that easy I wouldn't be sat here at my day job desk writing this ;O)

On the other hand I am glad there is more to SEO than copy writing, gives me something to do!

If you have ever worked on th

If you have ever worked on the business side of search and done the calculations you would know that hand-checking the quality of the index on a broad scale is not going to happen. Especially not if you want to survive on the stock market :)

I don't think the problems are much bigger, from a technology and quality point of view, than they have always been. There are just more people looking into them and reporting them

Put them all in the sandbox...

..and don't let them out till a human has checked them !

Would not solve it all, but would solve a chunk.

Maybe that's what the sandbox is all about?

Auto Generation

Quote:
the fact that people can churn out a 10,000 page website in a couple of hours.

Try a couple of minutes :)

There are systems being built as we speak that trackback across networks and auto-gen comments and links - the auto-gen field is getting much more sophisticated - and with brandts now infamous scraper srcript out there for download there's no shortage of data to fill those pages and as i commented last night, somewhere else, due to the fact that with the boom in blogs, anyone can now throw up a website, SEARCH is becoming very interesting to a whole lot more people.

I'd say the engines are in deep shit if they dont get to grips with this right now.

brand building seo is it the future?

So should seo's be spending there time doing seo, ppc, banner buying, pr etc etc....so they have an arguement to be in those top 100?

DougS

manual removal

The whole point of button pushing as it was coined is the fact that people can churn out a 10,000 page website in a couple of hours. So thats a couple of sites per day. If I put up 100 sites you have to find them all and remove them. Now there are 100 people putting up 100 sites.

Big problem!

Manual quailty would flip that problem.

manual removal doesn't work

You would be doing it forever, removing more rubbish everyday.

The players are the players and they should be there. Maybe not number 1 but in the top 100.

DougS

DMOZ? Who said that?

Cleaning out the spam/scrape/misleading crap is better than saying "this fortune 100 company SHOULD be number one even though their site is crap" IMO.

On a large enough scale, comes to the same thing in the end, no? The WW thread was just gagging for someone to say 'Humans do it better' - and here it is.

Manual removal is better than manual inclusion

Why should a big pharm company be number one result for their product? Perhaps it would be in the users best intrest if the number one result was "dont use productx because it has these harmful side effects"? ;O) Also, who decides number one spot for "cola" or "transatlantic airline" etc ;)

I think a manual removal is more efficient and more fair than manual boost/inclusion of "big names".

Cleaning out the spam/scrape/misleading crap is better than saying "this fortune 100 company SHOULD be number one even though their site is crap" IMO.

No need to check the whole web...

Just major cats and no need to reinvent the ODP, just tickboxes

If Joe Schmo punter searches for something and the expected is not there they may start to think "Google is cack they dont even list 'insert company'".

The public are a fickle bunch and G are liable now to their shareholders. When / if something better comes along they will shift.

Fully automated algos can be beaten as it is simply a program. The 50 or so Phds that are employed at Google are no better than some of the people who make it their job to reverse engineer the algo.

You cannot beat the human aspect, we are quite good so a quick tick box on sites to give them a pass. It would not take long to weed out the cack, tick, tick, tick....category done.

Hired Spam Reporters

Nice post Doug :)

It looks as if Google started hiring "quality assurance" teams back in late november...

Do they have the resources ($$$'s) to hire enough people to police their index to any effective degree?

G Hate Hand Checking

If this thread http://www.webmasterworld.com/forum78/7634.htm is anthing to go by, a LOT of people are soon moving towards becoming prolific 'Button Pushers' (love that term!)

SERPS will be increasingly flooded with more and more auto-gen content. A BP takes content, messes it up and spits it out again to make unique content. The technology of the 'Button Pushers' (BP) is now a lot more advanced than before. I just don't see how Google stands any chance of combatting the really smart BPs adopting this technique. Therefore, I can envisage them having to do more manual hand-checking. Google love to be able to automate everything, they absolutely HATE having to do things manually. But can't see them having much choice in a lot of arenas...

Full circle?

So the move back to quality may well prevail (or rather have its place). Instead of producing shi## go for real quality instead if you want the top spot for a uber competitve feild. Leave the cack for bottom feeding of terms.

Interesting :)

Currently it's cleaning up the mess

As I pointed out a few days ago G is doing hand-bans/devaluations all the time and they also go through the competitive areas like pills, gambling, finance. There's not too much they can do otherwise unless they complete re-invent todays link-based algos as they're far too easy to cheat with all these people asking for their readers thoughts.

On the other hand it would also be beneficial for some security related problems if the SEs would invest more in "manual Algo tweaking".