Google Prefers Invalid HTML?

113 comments

Site Reference performed a study recently to try and confirm whether valid HTML can actually help your rankings. The result was a bit surprising.

Not only does it appear that Google does not give preference to valid HTML, it seems as if they actually prefer invalid HTML.

Comments

Those of us not preoccupied with perfection

have known this for a long time...despite all the friendly urging about compliance. ;-)

And the only reason we're mindful of it on new sites we develop is the possibility that at some point, it might actually matter. ;-)

ugh

probably the most annoying link bait I've seen in a long time ;-)

mad respect though, that page will be PR8 soon after getting outraged links from every design blog in the world.

Great read

that was a great read. I appreciate the author taking the time to both do the study - and then publish it.

Since the sites are so new and with so few links, my suspicion is that there's some other unknown factor that's causing the non-compliant sites to rank.

OTOH, there was a thread a while back on WMW talking about how Google could be looking for signs of seo. The premise that the compliant sites are being penalized is possible. The primary characteristics about the sites were:
- they're new
- they're compliant
If that doesn't reek like SEO I don't know what does. And if Google's actively looking for stuff, that'd make sense.

The question is - are they smart enough to figure this stuff out. Some would claim yes. I'm not sure, given that doorway pages, blog spamming, and hidden text are all still valid seo tactics.

new and compliant = spam?

new and compliant = spam? jeez - that'd be the worst baby with the bathwater debacle yet. Maybe the site builders are just geeks ;)

Good thing I have no idea how to write compliant code in any case :)

Sorry wheel but thats bullshit

Compliance is not an indicator of seo.

Any developer will tell you that we keep our sites valid so our scripts operate properly. There is nothing worse than a broken document object model and quirks mode html. SEO arguments for or against valid code have always been vague and lacking in substance.

way off

I know that's way off I've conclusively proven that good rankings are a factor of the title your favorite Bugs Bunny cartoon converted into binary multiplied by the number of times you scratched yourself while coding the page.

best argument against validation

law of diminishing returns
I like valid code, but when it takes me 10 hours to hunt down the fix for that last little bug solely so I can claim validation (assuming I've tested on multiple browsers/ platforms/ etc.), I am going to give up unless I'm really feeling like a mesochistic perfectionist. Even validating for a % of users (pesky mac heads) sometimes decreases the likelihood that I'll spend the time.

At some point, getting stuff to "just work" vs. "compliant" just isn't worth the effort/ resources/ money.

>linkbait
Damn Andy...why didn't we think of that? ;)

>conclusively proven
I concur. You forgot to mention, however, that the volume of caffeinated beverage you consume during construction may also play a role.

>prefer invalid HTML (Damn,

>prefer invalid HTML

(Damn, there goes MY neighborhood, now everyone will be hanging bad code.) No, listen everyone, this is just linkbait. Pay no attention to this. Go back and read Jill or some other code-hugger.

new and compliant = spam?

Yes, I've been considering the matter for a while now. Some people call it the Sandbox. I call it the CCP. Clean Code Penalty. :P Similar to the OOP, only more insidious, trapping former black hats masquerading as white hats now, thinking that they'll never be caught out on their (seemingly) perfect web sites.

>law of diminishing

>law of diminishing returns

Bingo. That's the only really significant factor. I have nothing against valid code but it's like trying to code for every oddball browser, somewhere along the line you have to quit stopping to pick up every stray kitten in the road.

Oh come on, its not that

Oh come on, its not that hard if you have the proper tools. However, if your not doing css/dhtml/dom stuff I don't think it matters either.

>not that hard

>not that hard

No, but there are some otherwise worthwhile cms aps that do minor code faux pas.

oh yes.. good point

oh yes.. good point

Or you could just be like my

Or you could just be like my team and not give a hoot. :o

We do make sure and get it right for client sites however. It's important for client sites. Though I don't really remember why right off hand.

I do know that on some of our sites, if we don't close all the tags, the pages load faster. 'course you gotta know which ones not to close. :P

Linkbait? What Linkbait?

Ok, so the title might scream linkbait, but it is actually an issue that has come up a lot on our forums - which is why I did the test. I had been one of the people who was claiming valid HTML would benefit your site in an SEO way - guess I was wrong.

The most interesting part about the study? The fact that MSN did give a boost to the properly coded websites.

Browsers Have Sod All To Do With Compliance

> assuming I've tested on multiple browsers/ platforms

What, all one of them?

I believe that the effect of valid code on SEO is minimal at this point in time, regardless of whether its positive or negative. But there are so many other good reasons for doing it, I mean SEO isn't really the reason people do it now is it.

>SEO isn't really the reason

Not for valid code, no. In the not-too-distant, dark past there was a minor exploit that successfully used invalid code. I doubt it works today.

finally

Finally somebody on TW had the nerve to start discussing what really matters in SEO these days. And an excellent discussion so far! Plenty that still needs clarifying, though.

Even with the above 16 replies I see half the camp still feels compliance matters, while half scoff at it. Excellent! Let's keep going! Anybody have more details to offer? What *exactly* is it that matters... the one-sided BR tags or the em vs strong, or is it just avoidance of invalid CSS2 stuff? Maybe it's the presence of the link-back to the validator? Anybody ever consider THAT??

Just to show how important threads like this are, I have created a set of 30 new websites (online as of today) that can be used by anyone here for testing these important theories. Half use soft, rounded typefaces including VAG Rounded (AKA Rundschrift), Helvetica Rounded, Arial Rounded, Bryant, and FF Cocon. Layout is complaint CSS (no tables). They use oranges, light blues, and lime-green. All corners are rounded. The other half use browser default fonts (usually serif) and primary red/blue/white with gradient backgrounds, with square corners. Tons of nested tables. All other on-page factors are identical. Irrelevant latin/greek content across the board.

I've already started running WebPosition to track the rankings. This should be great. I am so stoked that ThreadWatchers have finally gotten to what really matters with SEO. By Friday I'll wrap up al the analytics and post it here, so we discuss the results here as a group.

>or is it just avoidance of

>or is it just avoidance of invalid CSS2 stuff?

Hell no, in the exploit mentioned above the code had a hole in a fundamental html tag big enough to drive a truck through. Some of us couldn't believe it was even opening in a browser but faithfull ol' IE would still show the page, soooo....

>half the camp still feels compliance matters

Get rid of them and maybe we'll talk, hhh.

I have seen pages that were

I have seen pages that were just page titles rank. I have seen sites that were moved still rank when they were not hosted.

I am guessing the page title one may have been valid, but a page is not a page without tables.

I tried to go tableless and failed.

Validation is not for me.

If I was a sensationalist, I even might perhaps goes so far as to say that "Web Validation is for Suckers"?

I think validation is just

I think validation is just about making your content more accessible to more people and more information systems, but the person who is well rounded sits with a lump of coal.

I also think validation is about crafting a clean coded excuse to charge over 5 times as much for site design services.

If you are better at getting information found via another discipline then why should you sell yourself short, making your site validate when the standard is just going to change again. It's like making the bed in the morning. No purpose.

so validation is a USP?

Unique selling point for dev'rs?

kinda like being "whitehat", or certified?

Duplicate Content

Um...the two sets of sites were IDENTICAL in text

http://hontihes.com/
http://gohthone.com/

It's merely duplicate content penalty. One site is deemed original and one isn't; that's probably the biggest factor.

>> Duplicate Content

>>It's merely duplicate content penalty. One site is deemed original and one isn't; that's probably the biggest factor.

There were four sites, only one of them didn't make the index. I agree that there probably is a duplicate content penalty going on, but that doesn't explain why Google penalized the valid HTML sites over the invalid.

The article addresses this point (granted, the article is huge, so I understand that most probably won't read the full thing)

Furthermore, if the sites had different content, then the argument could have been made that one site was unduly effected by the keyword density - thus the need to make them identical.

Why do PDAs Require Valid Code?

Ever made sites for cell phones? One line of screwed up XML and they choke; usually refusing to load the page. Know why?

Quirks mode takes FARRR more CPU to process (and far larger codesets) than compliant XML. Plain tabled HTML takes FAR MORE processing power than XHTML+CSS when semantically done.

My non-compliant sites (xmule.ws, for instance) is visited almost exclusively by Googlebot/2.1. My compliant sites (incendiary.ws (xcept for YPN code)) are universally crawled by Mozilla/5.0 (Googlebot/2.1). I have watched this switch-over occur on multiple clients sites after I have standardized their code (takes ~3 months). Google, afaik, has no conceptualization at all of advanced methods of standards compliance (even less than Internet Explorer 6) and thus I do not believe it *values* it in the rankings, but consider why they would like standardized pages:

The bot sees the doctype
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

as the very first line. The bot then realizes, "A ha! This is pure XML, no quirks mode needed" and it then informs Master "This site will require significantly less resources (CPU time, memory, code complexity) to spider, tell an extra 50% of my siblings to come on over."

Hopefully no one will scream too much for giving away a secret that is so obvious that no one grasps it.

I actually never read the

I actually never read the article....I was just whinging on standards / validation in the comments. hehehe

Code that validates can be useless eh.

Yep

Good eye hopeseekr. :-)

the missing bit

so the missing bit is.... if your site reports strict and that brings the XMl spiders, do they actually care if it is compliant? C'mon, enquiring minds want to know.

rcjordan, you're kidding I hope!

Quote:
Go back and read Jill or some other code-hugger.

Me? I've got like 10 posts on this very forum saying that you don't need valid code and about 100 on my forum.

I'm just gonna assume you meant some other Jill.

[Just stumbled upon one of the threads here where I mentioned it.]

wait does this mean I can or

wait does this mean I can or can't get a hug from Jill ?
I am so damned confused!

you were a convenient

you were a convenient [insert-white-stuff-here]-hugger, jill. i'm sorry. hhh.

So, you take four pages out

So, you take four pages out of eight billion and ...

...?

I know step three, I saw that cartoon as well.

linkbait?

Looking at the linked page, that's an awful lot of work for the typical link bait article.

As far as html compliance goes, Google itself is a disaster in that area. Even after hiring those Netscape engineers, they are html challenged.

Do a view source in a groups.google.com result page and look at the div tags following the closing body and html tags. WTF? Even front page takes care of that one for their users.

Try doing a select on some part of a cached page in the serps on a single cpu machine using IE6. Guaranteed to peg the cpu for so long that task manager has to be used to close down iexplorer.exe, meaning that you lose all open windows. Maybe I should know better than to select text in that situation, but its reflex for me while reading a long article. Works everywhere else, including the original page. Google just F's it up because of the way they code the cache pages.
Now, I know *two* methods to avoid mangling the html with two heads, and two bodies, but since when have they ever listened to anyone else.

Do as They Say Not as They Do

Plumsauce, no offense, but when has google ever done what they preach? Remember when they got caught stuffing keywords in cloaked pages?

---

Quote:
you were a convenient [insert-white-stuff-here]-hugger, jill. i'm sorry. hhh.

Would be nice if you actually read what I write RCJ and took it at face value instead of making assumptions about me based on what you may have heard elsewhere. Hmphh

More than Just Google

Now, We know that Google is not compliant. Indeed, it doesn't even know how to handle *truly* 2003+ compliant pages. But what about the others? Indeed, MSN and Yahoo both do excellent jobs at handling XHTML applications (in the truest sense of the word) and I have personally seen the entire MSN site outputed via *client*side* [2001 W3C technology], which is really the wave of the future. Google would just return "Unknown filetype" and treat it as a URL listing.

However, check out how many XHTML 1.0-strict errors there are on the three major engines:
www.google.com - Failed validation, 145 errors
search.yahoo.com - Failed validation, 112 errors
search.msn.com - This Page Is Valid XHTML 1.0 Strict!

I have no doubt, because I have actually seen their code, that MSN is using [2001 W3C technology] internally and using something akin to PHP+DOMXML to translate it into perfect XHTML 1.0 Strict.

So for all of you people who piss and moan that XHTML 1.1 is so unfathomably hard to come by, esp. for CMS', I say; consult w/ me and in just a few hours your world will be illuminated :-) (it will also open up new business models as well :) It's time to abstract up and away from the procedural HTML and embrace existing OOP web standards, combined with PHP and XML.

Paradoxically, Google is where Internet Explorer 5 was in terms of standards knowledge+compliance, Yahoo is almost dead-even with Internet Explorer 6 (with 1 additional feature) and MSN is basically just a little bit behind Firefox 2.0.

Google's home page doesn't validate and that's mostly by design

From Philipp Lenssen's interview with Matt Cutts:

Quote:
Q: "In more general terms, what do you think is the relationship between Google and the W3C? Do you think it would be important for Google to e.g. be concerned about valid HTML?

A: I like the W3C a lot; if they didn't exist, someone would have to invent them. :) People sometimes ask whether Google should boost (or penalize) for valid (or invalid) HTML. There are plenty of clean, perfectly validating sites, but also lots of good information on sloppy, hand-coded pages that don't validate. Google's home page doesn't validate and that's mostly by design to save precious bytes. Will the world end because Google doesn't put quotes around color attributes? No, and it makes the page load faster. :) Eric Brewer wrote a page while at Inktomi that claimed 40% of HTML pages had syntax errors. We can't throw out 40% of the web on the principle that sites should validate; we have to take the web as it is and try to make it useful to searchers, so Google's index parsing is pretty forgiving."

http://blog.outer-court.com/archive/2005-11-17-n52.html

heh heh

a well known (and shyster) marketing company in the UK that's recently moved into SEO is going around telling potential clients they have been personally told by Google that all sites with non valid code will be dropped from the Google index in the next 12 months.

co-incidentally they've also just moved into web-dev.

They were saying that about November so everyone better get validating like mad.

I'll be risking it I think.....

" an awful lot of work for the typical link bait"

As plumsauce says, its a lot of work for it to be genuine link bait.

Before I allowed this one through for publishing at TW, I did check the article and on to the Google references.

My view was that it would stimulate a bit of lively debate here. and I think I was correct!

Why would it matter?

Valid html has nothing to do with how useful a page is, I can't imagine that it enters into the rankings one way or the other. I have heard SEO's say to use CSS, better code etc. I think that is just because they have run out of on page optimization techniques to talk about. No SEO wants to say sorry, there is nothing I can do to help your page. Can't make money like that!

I've seen data that suggests

I've seen data that suggests that Google prefers lighter pages, although the effect is very minor. I can't see a compelling argument for making validation (or not) a ranking element.

There are plenty of 100% compliant sites that aren't worth the pixels used to render them, and plenty of HTML train wreck sites that have useful, unique information. Trying to sort that out would be a mess.

Interesting study, but I think there's a hole in his methodology; Google randomise results that are close in overall rank score....

Absolutely Absurd

Quote:
I am not ready to admit that Google actually gives preference to invalid HTML, but the results seem to want to point us in this direction. The idea that Google actively rewards websites that put errors into their code simply does not make sense.

No it doesn't and to go through the steps you did and write about it doesn't make sense either. Is business that bad that you have to sit there and think of things like this to generate interest in your slog or whatever it is that you do?

This is absolutely the most absurd claim I've seen in the 10+ years I've worked in this industry. I can just see it now, all those people who have been writing valid html/xhtml/css all these years are now going to see your research and start making their code invalid? What are you tokin' on over there? I want some!

What is this industry coming to? The research and the way it was performed has way too many flaws to determine anything at all. Linkbait in the nth degree!

Just think of how much damage you are doing to your credibility with this type of misinformation.

P.S. What happened to the Signal round' here? ;)

Stands to reason that you

Stands to reason that you wouldn't want malformed code or mark-up crashing your parsers; ergo, build a parser that can tackle most abnormalities.

Part of the valid mark-up methodology is to provide meaningful document structure, such as, headings - If Hilltop is still at work (or components of it), then valid mark-up is paying dividends.

I think validation is just about making your content more accessible to more people and more information systems...

... and that really should mark the end of the discussion.

As for the time argument: Well, valid mark-up really doesn't add that much time to a project at all, particularly with the tools that are available now. What does add time is the hacking involved to get sommat to render correctly across browsers... that IS a ballache.

But there ARE time-savings down-the-line when you need to port that content elsewhere or into another format.

Reek like SEO?

Quote:
The premise that the compliant sites are being penalized is possible. The primary characteristics about the sites were: they're new, they're compliant.

News Flash: There are very few SEOs who can even write HTML 4.01 Transitional nor do they even care about it. There is no premise and there is no possibility and/or probability.

Quote:
If that doesn't reek like SEO I don't know what does. And if Google's actively looking for stuff, that'd make sense.

Whew, where is this stuff coming from. See what this type of misinformation leads to? These are probably some of the same people who use a revisit-after tag in their campaigns. Stop already!

Linkbait

>I can just see it now, all those people who have been writing valid html/xhtml/css all these years are now going to see your research and start making their code invalid?

No, but they'll probably get pissed off and link to the article or this thread. Not a new debate...just a new spin.

Re: Absolutely Absurd

Wow Pageone - obviously you are taking some issues with even the idea of doing the study.

Instead of just sitting back and hurling monkeysling, why don't you point out the flaws in the methodology. That is what I asked for in the article - I recognize that there are inherent flaws. But while there are flaws, there are also the results that Google, in two cases (admittedly probably too few to draw overreaching conclusions) Google took the invalid HTML over the valid HTML. As Tall Troll suggested (and as I suggested in the article), this is probably due to randomization. But heck, at least there was an opinion thrown out there.

I'd be interested to hear your thoughts behind the vitriol that was your post.

My Thoughts?

To tell you the truth, it really isn't worthy of my time nor anyone else's who "understands" what validation is all about.

Bitterly abusive? Come on now, I'm sure deep down, and I mean really deep down inside, the thought may have crossed your mind that what you were doing was Absolutely Absurd? ;)

If there is one thing that I don't take kindly to, that is misinformation such as this. You've basically assumed that Google may reward a site with invalid HTML over one that is valid, with all things being equal. That kind of goes against every published guideline out there from the authoritative resources on the subject.

What makes you think that four websites that have absolutely zero value in the scheme of things would be even remotely close to an acceptable testing ground? What?

Again, this type of misinformation is what puts all the newbies at risk because they may actually follow your advice. Can you believe that? So, instead of trying to develop linkbait from a Absolutely Absurd perspective, why not do it from the other angle?

I'm going to refute your entire test and let you know that writing valid code has its rewards from an indexing perspective. Writing invalid code has it's pitfalls. The major search engines have taken most of the errors into consideration and they are very lenient in their indexing. But, you are leaving so much open for interpretation by the spiders by not writing valid code that theory itself refutes everything you've wasted your time on in this Absolutely Absurb claim.

Do you think I've gotten my point across?

Quote:
As Tall Troll suggested (and as I suggested in the article), this is probably due to randomization.

I don't think that TallTroll is going to come to your rescue in this instance. :)

Just another

over zealous spam prevention method. If you can afford code that is valid, you are a suspect.

Nothing new here.

The Flaws in the Methodology

Quote:
Instead of just sitting back and hurling monkeysling, why don't you point out the flaws in the methodology.

Considering the fact that the home page of your site has 67 errors in itself is more than enough of a flaw for me.

Good Code Bad Code

To say that bad code would be a good thing for SEO is just as dumb as saying good code is good for SEO. The code is irrelevant as long as they can read what they need to read.

To say that bad code would

Quote:
To say that bad code would be a good thing for SEO is just as dumb as saying good code is good for SEO.

Jill, I'm really surprised to hear you say that and I'm going to disagree with you.

Quote:
The code is irrelevant as long as they can read what they need to read.

The code is absolutely 100% relevant to indexing. Did you fall prey to this Absolutely Absurd research too? ;)

What Were the Conclusions?

I tried to make the conclusions fairly clear - mainly that it appeared apparent from this study that valid code did not effect the ranking. I also made it clear that to state that to draw the conclusion that invalid code would help a website would be a premature conclusion based on the scope of the experiment.

What makes you think that four websites that have absolutely zero value in the scheme of things would be even remotely close to an acceptable testing ground? What?

The practice of setting up new sites, optimizing them for keywords that currently have no results, is a common SEO practice to gain some insight into how Google works. Now maybe the experiment had its flaws (I personally think it did), but you haven't really taken the time to actually point to specific problems - rather you just decided to criticize my site. That's fine - its not above reproach and neither am I, but its an interesting way to refute a study since my website had nothing to do with the study.

BTW, I would be genuinely and highly interested in any published studies showing how valid code helps your ranking. I'm not being sarcastic here - the only reason I ran this experiment was because I could not find any published studies on the issue, so if you have any, please let me know.

I tend to agree with Jill...

The conclusion I personally have reached from this is that Google just doesn't care about the code. If they did, then I think we would see it displayed a little on their website (still looking for that Google page that validates).

Its interesting to muse about 'over-optimization', but in the end, would Google really think that valid code is some dark scheme by a black hat SEO?

Here's a start for you Mark...

http://www.w3.org/QA/2002/07/WebAgency-Requirements

After that, you may want to proceed here...

http://www.w3.org/MarkUp/Guide/

Once you're finished with the above, you can then advance to the next level and go here...

http://www.w3.org/MarkUp/Guide/Advanced.html

Then, once you've carefully read the above, come back and let us know what you've found out. I'll be waiting patiently. :)

WOW over 75% of the above

WOW over 75% of the above comments could be reasonably classified as misinformation, rumor-mongering, tinfoilhatness or just plain stupidity.

Meanwhile, the TW readership demographic slowly inches towards merging with that of you know what... :-)

lol!

Quote:
I tend to agree with Jill...

First TallTroll and now Jill. Based on Jill's comment above, you may have some support from her. ;)

Reality Check...

Quote:
Its interesting to muse about 'over-optimization', but in the end, would Google really think that valid code is some dark scheme by a black hat SEO?

Would you please stop already, this is just too much!

Hey W3C, it is now being inferred that you may be promoting Black Hat SEO!

Keep it up Mark and you just may convince a few TW readers to go out and break the web some more. Just what we need, more broken websites. While you're at it, you might as well drop in some revisit-after tags, some HTML Comments and a few other things that Google might reward you for.

No Value Whatsoever

Quote:
The practice of setting up new sites, optimizing them for keywords that currently have no results, is a common SEO practice to gain some insight into how Google works.

lololol! That was a hearty laugh, you know, like Santa. Do you still believe in him?

SEO Contests

A little OT, but since you brought it up...

Quote:
The practice of setting up new sites, optimizing them for keywords that currently have no results, is a common SEO practice to gain some insight into how Google works.

You know what SEO Contests are good for?

  1. Detecting link networks.
  2. Detecting the "mostly" amateur SEO base.
  3. Games for Google Employees and Search Quality Engineers.

I've followed a few of the contests over the years. Bottom line, there is nothing of value in the results of the contests. Why? Because all it takes is one search quality engineer who has some free time on their hands to totally wreak havoc on your contests. In the process, link networks are exposed, sites are penalized, filters are implemented, etc. Yes, SEO Contests are great, but not for who you think they are.

From Tim Mayer...

http://searchenginepress.com/blog/

Quote:
Validity / standards will propogate organically; if top 7 results on serp are well-done, standards-compliant, 8th will copy those standards to try to rank, too.

I think I'll go with the above statement from the Director of Product Management at Yahoo! Search.

Quote:
Will valid / better sites rank higher? Yes. Because it improves "signal" of site.

Hmmm, where else have I seen that term Signal. I know it was around here somewhere but for the life of me, I just can't find it. Oh wait, I found it. It's in the Threadwatch Logo. ;)

umm..no

EXACTLY TWO WEEKS AGO I SAT NEXT TO TIM MAYER AS A PANELIST AT SXSW AND HE SAID POINT BLANK VALID CODE IS NOT CURRENTLY CONSIDERED A SIGNAL OF QUALITY AT YAHOO SEARCH.

*caps lock off*

btw... that "transcript" is extremely inaccurate (trust me, I know what I said)

trust me, I know what I

trust me, I know what I said

I went out drinking with Andy, and the first night we did that was the night after he gave his speech.

I think in certain verticals (say those pushing web standards) it may be easier to get citations if your site has valid code, but most of the web is not that way. With most of the web valid code simply is an indication of wasted resources.

Yikes!

Quote:
With most of the web valid code simply is an indication of wasted resources.

Wow! Mark, that is some strong juice you are peddling about. Heck, you've even got the owner of TW coming to your rescue.

Aaron, do you really believe what you just posted? I guess so, or you wouldn't have posted it. Well, I guess I'm no longer needed. It's been a pleasure. I can't convince a crowd that doesn't give validation a second thought.

So, we know one thing for certain. Google cannot surmise that websites that are valid are SEOd. That just wouldn't work. Why? Because 95% of the SEOs out there don't know the first thing about validation or the benefits thereof. That is clearly illustrated in this thread.

Don't mean to step on any toes, but I really think many of you are missing the underlying benefits of validation. That's okay, it's your loss, not mine. ;)

Hmmm...W3...what???

Thank you for your attempt at enlightening me on the ways of the W3C. The links are nice - albeit ones that I have already read and understand - but nice. :)

In case there is any confusion, I am not against web standards. I am actually in favor of them and have recommended to many people that they should move towards web standards. Of course, you wouldn't know this as you don't know me, so the mistake is understandable. This article is not against web standards - it is a look into the effect of standards on SEO. Take it for what you want.

Obviously you are misreading practically every letter I type. Somehow you think that I am promoting the idea that validation is blackhat SEO when I was saying the exact opposite. I'm sorry to say, but most of your arguments are shadowboxing.

I do appreciate the feedback, regardless of the form it came in. I think there have been some great posts in these comments (the Matt Cutts quote, the input from Andy, and yours PageOne) and am just happy to get the feedback.

Transcript / Signals of Quality

Andy,
If you read my synopsis of the panel, you'll see that I note that Mr. Mayer repeatedy indicated that web standards are *not* "signals of quality" for Yahoo!. I agree with you on that point wholeheartedly.

Mr. Mayer (mildly) contradicted himself when he proposed that following web standards would improve the "signal of the site" while at the same time stating that web standards are not "signals of quality". I think using the word "signal" in different ways was confusing.

As for the accuracy of the transcript, all I can say is that I have an mp3 recording of the entire panel. I often paraphrased, but I don't think I put words in anyone's mouth. I have updated the blog entry and posted the mp3.

MP3 in route...

Quote:
btw... that "transcript" is extremely inaccurate.

Ummm, I believe there is an MP3 of that interview that is going to be up for your audio pleasure here shortly.

I went out drinking with

I went out drinking with Andy, and the first night we did that was the night after he gave his speech.

thanks for the backup on my soberness man, cuz otherwise my assertion would be worth nothing! ;-)

As for the accuracy of the

As for the accuracy of the transcript, all I can say is that I have an mp3 recording of the entire panel. I often paraphrased, but I don't think I put words in anyone's mouth. I have updated the blog entry and posted the mp3.

I'm not hatin' on you man, a paraphrase can be useful and you labeled it as such, I'm just saying... a paraphrase isn't a quote, and there are some inaccuracies.

Mr. Mayer (mildly) contradicted himself when he proposed that following web standards would improve the "signal of the site" while at the same time stating that web standards are not "signals of quality". I think using the word "signal" in different ways was confusing.

Yes I agree he seemed to contradict himself at one point, but I think it was very clear from his answers that Yahoo is not giving any sort of 'bonus' to valid sites.

The rush is on...

Quote:
Yahoo is not giving any sort of 'bonus' to valid sites.

They don't have to. By nature of the beast, they do so automatically. Think about it, if a spider stumbles or has to stop and interpret invalid code over valid code and all other things are equal (which isn't going to happen), what should the end result be?

Imagine what would happen if any of three majors came out and publicly stated that valid code trumps invalid code. lol, I better get prepared for the phone calls and emails for assistance. I knew it would pay off one day. ;)

hink about it, if a spider

hink about it, if a spider stumbles or has to stop and interpret invalid code over valid code

Actually spiders are VERY good at digesting crap code -- since the vast majority of sites on the Net have it -- and since most of the relevant content that spiders NEED to digest is on those sites.

Yes, rarely, something can throw them off, but spiders certainly do not have any problem whatsoever digesting invalid code (and haven't for several years, by and large).

Back to the topic at hand...

Quote:
Actually spiders are VERY good at digesting crap code -- since the vast majority of sites on the Net have it -- and since most of the relevant content that spiders NEED to digest is on those sites.

Oh, I know, I look at it day in and day out. Not my own, but that of my peers and I cringe every time I see it.

Just because they digest crap and are able to interpret it, or at least we think they are, does that mean we should continue to promote it such as that being done with this Absolutely Absurb article being discussed?

Shall we continue to give Mark more fuel for his totally off the wall research?

Spider Food

(This might be old news but) Mike (from Newsvine) has some excellent empirical tests of what bad HTML can do to your rankings.

http://www.mikeindustries.com/blog/archive/2006/01/the-roundabout-seo-test

> Shall we continue to give

> Shall we continue to give Mark more fuel for his totally off the wall research?

Haha, touche.

I see the MP3 is up

over / under on getting a C&D from SXSW? ;-)

Gained Respect

Quote:
I do appreciate the feedback, regardless of the form it came in. I think there have been some great posts in these comments (the Matt Cutts quote, the input from Andy, and yours PageOne) and am just happy to get the feedback.

Mark, you just gained some of the respect back that I lost for you after reading that article. I appreciate you taking this like a man.

The problem is, your research is not conclusive of anything where you can make the statements that you did. You opened yourself up for this type of feedback and we're talking about a subject that I AM EXTREMELY PASSIONATE ABOUT!

There are way too many inconsistencies in the article along with much to be misinterpreted. I might suggest sticking with topics that you are totally sure about. When it comes to writing valid code, the benefits are far greater than those being discussed in this topic. ;)

P.S. Much of what we are discussing has to do with HTML. For those of you working in an XHTML environment, the playing field is completely different. You cannot afford to write invalid code because your applications may not work.

>> why don't you point out

>> why don't you point out the flaws in the methodology

Ever hear the expression "one swallow doesn't make a summer" ?

I'm sad to say so, but you can't use your four page experiment to conclude anything at all about any other pages than those same four. Nothing. It's a good idea, a very nice experiment, and you obviously put some thought into it, but four pages are much much less than a drop in the sea. Especially when they're ranking for partially made-up words, and clearly not real pages at all and totally unnatural in terms of all other factors with all the controls you've put on them.

So, you can say with some degree of certainty that in this particular case it appears like Google might have favoured this-or-that-page out of your four artificially crafted pages.

You can say nothing about any other pages based on that experiment.

suggestion for method

Here's how to do it:

(1)
You know that Google runs around 200M queries each day, so get a list of those and select a representative sample - perhaps 100K-500K queries or so. You should have common as well as uncommon queries, commercial as well as non-commercial etc. It's all in the word "representative"

(2)
Then write a list of all the factors that you suspect Google might be using for ranking: HTML, linking, age of domain, etc. etc.

(3)
Then run the 100K-500K queries on a representative sample of datacentres. Do so on a represetaative schedule, eg. not every query on a Sunday. Pay attention to the geograpical distribution, as that should correspond to that of the queries, unless you restrict queries to be from a specific area (in which case your results would also only be valid for that area)

(4)
For each of the pages in the top 5 to top ten, validate the HTML

(5)
Then, for each of those 500K to 1M pages write down all the factors that you suspect Google might be using for ranking: HTML, linking, age of domain, etc. etc. -as they apply to the page in question. Yes, you will get a pretty big list.

(6)
But, then you need to check if you really measure what you think you do.Here, you need to take every single item from the list you made in (5) and run a regression or similar to determine to which degree HTML validation corresponds with (or depends on) that other item.

(7)
As if that's not enough you need to look at combinations as well: linking+age, age+keyword density, etc. etc. and combinations of combinations.

(8)
Whenever some item correlate with another (eg. if HTML validation depends on age of domain, or whatever) you need to isolate which of the two is most important, and drop the other one. And then you need to start over from step 6 until you end up with only factors that are signifiant and independent. HTML validation may or may not be one of those.

(9)
If, and only if, HTML validation has been determined to be an independent and important factor this step is simple, just do a simple count: How many pages validate, and how many don't, in percent of the total number tested. If either side is larger than the other run a statistical test to see if the difference is significant or if it's just random.

(9a) Added:
You should of course also run the total regression (ranking = x1 + x2 + ... + xn) in order to determine the relative importance of HTML-validation relative to all the other factors that you end up with. This is because it's nice to be able to say, not only if HTML validation means something, but also if it means more than something else.

(10)
I imagine that if you have sufficient data processing capacity, a few weeks or months later you might find out that pages more than five years old tend to not validate, whereas more recent pages have a tendency to validate more. I'm not sure you'll be able to determene if HTML-validation has influence on ranking at all, in which case you will have to assume that it has not.

Of course there's some amount of humor in the list above, but seriously that would be the exact steps you needed to take in order to scientifically examine if valid HTML is preferred in ranking or not.

This isn't about the benefits of valid code

There are way too many inconsistencies in the article along with much to be misinterpreted. I might suggest sticking with topics that you are totally sure about. When it comes to writing valid code, the benefits are far greater than those being discussed in this topic. ;)

This isn't about the benefits of valid code - I understand those and am a proponent of valid code (regardless of how my current site validates). I think, and have written as much, that if a website owner has the capacity to do so, then they should, by all means, make sure that their site validates. To interpret this little study as an attack on valid code would be a misinterpretation of the study.

If anything, it is more of a critique on Google for apparantly not giving a hoot about standards or any weight to websites that actually take the time or effort to bring their sites in compliance. This was part of the reason for the section called "A Parting Shot at Google - and Compliments to MSN Search"

Arrrggghhh!

Quote:
If anything, it is more of a critique on Google for apparantly not giving a hoot about standards or any weight to websites that actually take the time or effort to bring their sites in compliance.

But wait, that is not what your research presents to us. Nor is it in any way, shape or form an indication of how Google handles valid and/or invalid code. Did you read the research from Mike yet?

Quote:
This isn't about the benefits of valid code.

No, it originally wasn't. It was about the so called benefits of writing invalid code, remember? :)

>> Ever hear the expression "one swallow doesn't make a summer"

The biggest problem I personally have with this study is the sample size and the fact that it is being done in an isolated environment. However, I did not see any other choice. There are so many different things that can effect a sites ranking. I suspected that if valid code was one of them, it would be a very small factor, so seeing it in 'live' results would be nearly impossible to detect.

Regarding your suggested way of doing this - that has even more flaws in it IMO. Probably the biggest factor in ranking that we know of is the type of inbound links a site has, and the values of those links. Even if I did have access to some of the information you suggest, determining the quality of the inbound links, which can vary from page to page (even if they have the same displayed pagerank) would be impossible.

Really, the only true way to determine this is if we were receiving paychecks as employees of Google. But then again, that applies to most of what we speculate about, but we still speculate anyways, right?

IBLs

Quote:
Probably the biggest factor in ranking that we know of is the type of inbound links a site has, and the values of those links.

I could probably sit here and dispute that one too as I have solid evidence of the exact opposite. It is just one of many factors. Add in the validation factor and everything else we as professionals do and there are just too many variables to come close to anything conclusive in this type of research. Speculating is one thing, blatant misinformation is another. Heck, you even got Jill to agree with you. What does that say? She's out there right now breaking all her code so that her clients will rank higher. Jill, I'm just kidding you know. I figured I'd jump on the bandwagon. ;)

Respoding to "Arrrgggghhhh!!"

But wait, that is not what your research presents to us. Nor is it in any way, shape or form an indication of Google's handling of valid and invalid code. Did you read the research from Mike yet?

Noooo...did you read the article? A few excerpts:

"I am not ready to admit that Google actually gives preference to invalid HTML, but the results seem to want to point us in this direction. The idea that Google actively rewards websites that put errors into their code simply does not make sense."

and...

"Although Google does not seem to reward site owners for putting together a site with valid HTML - a goal of many well respected webmasters - MSN seems to be flawless."

and a few other places that talk about how Google treats the code. I never said outright that it was beneficial to write bad code - what I did say is that it appeared from this study that Google preferred the bad code. However, I also backed off of this as a solid conclusion several times.

BTW, I did read Mike's study - I've seen that before. I think there are some big differences between what Mike did and what I did to lead to the different conclusions. The changes he made to the sites to get Google to cough and weeze were significant - to the point of making the page unrecognizable in the browsers. I was trying to emulate what most website owners would do - create a page that looks ok in IE, or close to it, with a handful of errors, and see if they could improve their situation from an SEO standpoint by validating their code.

Mike's study is definitely useful...thanks for reminding me of it.

No flaws in my method. It's

No flaws in my method. It's the only right way to do it. It's just not possible (for someone outside Google), that's all.

Btw. personally I don't think it would be wise for Google to discriminate on valid code -- neither positive nor negative -- but I can see many reasons why people who write valid code on average would see their sites ranking higher.

That's not due to Google, that's due to the "semantic web mindset" (for lack of better words) you get as a web developer, eventually. But nevermind, it's all here:

Sept 19, 2004: http://www.webmasterworld.com/forum21/8701.htm - post #5

I did, I did!

Mark, I did read and reread, and reread again to see if I could come to other conclusions. I couldn't. This one statment in itself leads me to think otherwise...

Quote:
I am not ready to admit that Google actually gives preference to invalid HTML, but the results seem to want to point us in this direction.

Point who in that direction? Not me. Not most who are participating in this enlightening topic. What results would seem to want to point "you" in that direction? You're already up to your neck in this mess, you might as well dive in all the way. ;)

Again...I'm agreeing

I could probably sit here and dispute that one too as I have solid evidence of the exact opposite. It is just one of many factors.

I wouldn't disagree with what you said here (again). Again, from the article:

"As we all know, Google does not rely on any one factor to rank a website. As a result, a website could be horribly optimized in one aspect of their website, but still reach the top of the rankings because they are well optimized elsewhere."

In my experience, and the experience of many others that I have read, quality links can be considered to be the most important factor. Do not read this as meaning that you can ignore everything else, or be bad in every other factor.

I am also open to being wrong on this issue - I haven't done any of my comprehensive and wizardly research ( -- filled with sarcasm for those who may not get that )

Touche...

Quote:
I am also open to being wrong on this issue - I haven't done any of my comprehensive and wizardly research.

lol, you're just too funny!

Point who in that direction?

Point who in that direction? Not me? Not most who are participating in this enlightening topic. What results would seem to want to point "you" in that direction? You're already up to your neck in this mess, you might as well dive in all the way. ;)

Enlightening? You think this topic is enlightning? I'm blushing! ;)

Seriously though, I was referring to whoever looked at the entire research. Admittedly, I could/should have stated this point more clearly: at face value the results seemed to point us in the direction that Google preferred invalid HTML, but in all reality (as I state elsewhere) I am not ready to accept this as a conclusion, and such a notion is quite contrary to everything else that we know.

And this is why I never fully made the jump to support this as a conclusion throughout the article. This is also why I ended the article with:

"The results of the study say one thing, but common sense would say another. Is it possible that Google is somehow biased towards sites that have erros in their HTML? It does not have to be a philosphical bias - could there be a technical bias?

Or, was there a problem with the study itself? Were there too few examples to draw any conclusions at all? "

Jill, I'm really surprised

Quote:
Jill, I'm really surprised to hear you say that and I'm going to disagree with you.
The code is absolutely 100% relevant to indexing. Did you fall prey to this Absolutely Absurd research too? ;)

Geez! Like I already told rcjordan earlier in this thread I've been saying as much for years. Code that validates is not an SEO technique. It's lovely to have, and I think people who aspire to it are doing a good thing. They certainly have more patience than I have.

But if we're talking SEO, then it's not a factor. Nothing new there out of my mouth that's for sure. Take a look at my site which pretty much proves it!

Jill, it is a factor!

Quote:
But if we're talking SEO, then it's not a factor. Nothing new there out of my mouth that's for sure. Take a look at my site which pretty much proves it!

Jill, in the overall scheme of all this, it is a major factor in SEO. I really wish people would read all the links that are being posted with supporting information before claiming that something isn't so. It's nice though to have these debates with my Peers, I thoroughly enjoy them, especially when it comes to validation. I've been valid for years and can attest to the benefits, both short and long term. The long term benefits are beyond what we've discussed here in this topic.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.