Quality Indicators

As you may or may not know, I'm a link building guy all the way. As of late however I've been pondering (and tinkering with) on-site SEO, because, it still matters.

Of course, it's changed a lot--now that search engines can sort out 1000 word meta tags and white on white text, how else can it determine a page's value (within the site itself, linking algo's aside)?

This brings me to quality indicators.

By 'quality indicator' I mean the existence of something on a page that tell's a search engine "I'm better than the average web page!"

The theory being, that, for instance, a standards compliant web site that follows every possible accessibility standard *on average* has higher quality content than one that doesn't. This is certainly debatable, but for the purposes of my theory I'm going to assume this is true. And note that I'm not saying every standards-compliant site is better than every non-standards-compliant site, all I am saying that a search engine can get some useful data / correlations by tagging each for what it is.

I've tried to brainstorm for what I think might serve as quality indicators to an SE:

  • being hosted on a dedicated IP
  • outbound links (these might be the biggest IMHO -- not only to put your site in its topical neighborhood, but also just a plain old GOOD neighorhood)
  • doctype and language metadata in your header
  • valid code
  • invalid code but linking to the W3C validator ("we tried!")
  • existence of a print stylesheet
  • a file named privacy.*
  • the existence of Access keys (accessibility best practice)
  • a 'skip navigation' link (accessibility best practice)

My contention is that sites which have the above features on average are better than sites without them (with a wide, wide deviation -- but that doesn't mean the data isn't useful).

So how bout it guys? Am I speaking out of my butt? Or is this possible/probable?

What else could serve as quality indicators to an SE?

Comments

Spiders and Ranking

But the question here is whether that helps ranking. I certainly don't make that claim to my clients.

No, W3C validated pages do not help ranking. Neither validated HTML 4.01 Transitional table-based code nor validated XHTML 1.0 Strict CSS-based code. I don't claim it will, either. The benefits of validated pages and web standards are for a different discussion.

Indexing and caching on search engines do seem to benefit from "quality indicators". I don't have any research studies to reference; it's empirical analysis.

...if a spider can get through a document, it can index it.

That's the rub. Bloated and/or faulty code impede spider progress.

Positive side effects

I look at code cleaning as one of those things that's good to do for other reasons, and for SEO it won't hurt.

One side effect of lean code that likely helps SEO is that some of the things people do in an effort to make code more streamlined also nudge them towards more thoughtful semantic markup.

Another positive side effect might arise over time because users respond better to faster pages. That improves the chances of garnering organic links now and then.

No question

Sure, it makes the SE's job easier, and on a larger scale it can make a big difference for them. Does that mean that pages that contribute to that process get a boost?

I'm all for cleaning up pages. I've got a whole page devoted to it as a service offered on my site, and I generally bring page size down by 40-50%. But the question here is whether that helps ranking. I certainly don't make that claim to my clients.

qwerty, think bigger

>> You can cut a millisecond or two off of the time it takes a spider to crawl your page by cutting out half of the content. I kind of doubt that helps.

Y claim to have 20 billion objects in their index. If every webmaster cleaned up their code, switched to CSS, scrubbed empty tags, whatever, for an average saving of just 1 ms per object, that saves something like 230 machine/days of processing time per index cycle.

That will translate to fewer boxes needed, and a fresher index. That's the incentive for SEs to reward good code

Is this just conjecture?

Quote:
However, the "quality" code of the pages does render those pages robot friendly. They load faster. They are indexed faster. They are cached faster.

As far as I can tell, if a spider can get through a document, it can index it. Does getting through it faster help? I don't see any reason to believe it helps with ranking, or that it has any effect on the crawling schedule. Spider-friendly to me means a spider can get through it: it's not an incredibly large file and it doesn't contain any spider traps. Apart from that, we're talking about a difference of milliseconds in just about every case. Why would the algorithm take that into account? It's designed to find relevance, not pages that are easier to get through than others.

You can cut a millisecond or two off of the time it takes a spider to crawl your page by cutting out half of the content. I kind of doubt that helps.

Sorry, but I'd need to see proof of something like this. It just doesn't make sense to me.

Robot Usability Indicators

W3C Guidelines and web standards give pages "quality" code. I agree that pages which meet 508 Guidelines, web standards, several of Andy's indicators and validate do not represent quality for search engines.

However, the "quality" code of the pages does render those pages robot friendly. They load faster. They are indexed faster. They are cached faster.

I tend to separate domain "quality" indicators, page "quality" indicators and site/page robot usability indicators.

A domain indicator would be domain registration. A site usability indicator would be IA. A page quality indicator would be outbound/inbound links.

broken links

Andy, I don't think search engines could afford to look at most of that stuff. One thing that I have seen several times is "resource pages" dropping out of SERPs when they have a couple broken links, and bouncing right back when those links are updated or removed.

Cause and effect? I dunno, it would take testing and that means work... work that doesn't need to be done because everyone should just check their outbound links occasionally.

what is quality?

my definition of a quality site is certainly different than a load of other peoples, plus whatever definition of quality we agree it may not always mean best for a specific search.

If you're searching for information on 'Buffy' it could be absolutely the case that the most relevant site is a fan site developed by a 12 year old in their bedroom - they probably won't have any of your factors (except maybe the outbound links).

SEs don't care about quality, as we define it

Yep Chris, thats how I see it - the engines don't care about (or perhaps don't define the same way?) quality as you and I see it.
Pity.
I wonder what Time will bring?

Hmm..

For an engine to do some of these in the algo would require way more computing power than the benefit it would return, can't see it happening just yet. Having said that, unskilled labor is so cheap they could do it for the manual hand jobs. There is some right old crap high up in the serps that dont smell of quality to the point I dont believe SEs care about quality in the way you or I would.

I'm with you Andy. My

I'm with you Andy. My feeling is that on-page quality factors are absolutely being used in conjunction with link data. The reasons are simple - you can sort the wheat from the chaffe. If Google wants higher quality results in its top SERPs, there's no better way than to find methods of measuring CONTENT quality, rather than simply link quantity. Link quality is still highly important and being used to a great degree, but I can't imagine why this idea of filtering sites with higher "average" quality wouldn't pop into the heads of the engineers.

As for more factors:

- Reading Level
- On-Topic Analysis
- Sentence, Paragraph Quality
- Spelling & Grammar
- Quality Internal Link Architecture and Site Structure
- Usability

All of these indicate higher quality sites on average, too. So why not use them as well?

.

Quality is such a subjective value though ...

depends (no, not those)

I agree it would be nice if the industry had structure like that (quality matters! What a concept!) but so far I have to agree with others here.

But it really depends on the engine, no?

1. poor quality may get flagged more easily on one engine than another.

2. on-page factors may influence one engine more than another

3. one engine may deploy semantic sets more pervasively (in anchor text, across sites at context of backlink, etc) than another, so that on-page-content-factors may be more important than traditional on-page factors. What exactly are you measuring when you see meta tags have an impact? Their keyword density etc or the semantic relevance to the SERP competition? I vote the latter when talking about Google.

As Nick reflected, link building may buy you credits across the board and thus be a better investment.

Google Patent

the goog patent is probably the best bet for finding quality indicators that may be in use.

Yep!

I like the concept, but my research shows they *don't* do this.
My only quibble with your list would be the accesskeys - current feeling is that they cause more problems for those who would theoretically use them than they solve...

on page op has been replaced

on page op has been replaced by link bait tactics imo

I'm with Natasha

All of the things you're listing are signs of a good page -- to me. That doesn't mean a search engine cares, and I've certainly never seen any evidence that they do.

On-page optimization is, for the most part, about targeted content (visible to users and spiders) and making sure it's accessible to spiders.

The fact that I try to use valid code, remove formatting and put it into CSS, get rid of nested tables, use proper document structure, etc. means that I want to make the pages cross-browser compliant, fast to load and easy for clients to update in the future. I don't do it because I think it's what the SEs want.

"My website brings all the clicks to the Yard"

... sang to Kelis' "milkshake"

Thanks for posting this Andy. I think this will spark a great discussion.

Intresting theory... But... "My Mom & Pop Site ranks higher than Yours"...

With the exception of:

outbound links (these might be the biggest IMHO -- not only to put your site in its topical neighborhood, but also just a plain old GOOD neighorhood)

Most Mom and Pop sites (which still rank higher than most corporate ((sp?)) sites) don't employ any of those tactics.

They host their site on the cheapest host. Have no clue what a doctype is... Stylesheet: They don't need no stinking stylesheet.. and valid code!!! Please... Yet they still rank high...

So that still leaves us where...

Natasha "That Girl From Marketing" Robinson