MSN Search Hack

23 comments

I was given this link about a weakness in the MSN alogorithm. Seems there is a major flaw in their code.... now who would take advantage of this???

Comments

hmmmm ..

AussieWebmaster I'm not having a POP here.. in fact I new about this problem but I decided not to blog it, Once it's goes public .. you will have every idiot Pushing links into other peoples sites to see what happens .,,

sometimes flaws aren't always good to tell the world

DaveN

very scary, but I don't

very scary, but I don't agree Dave.
The faster stuff like this become public the quicker the engines will react.
Worst case is when the top SEOs with systems spend 6 months killing sites before everyone else finds out.
Of course that is only worse case if you're not one of the few who figure this stuff out first. Then it is a pain when everyone else finds out.

To me this is just another example of the changing face of SEO. There is no longer room for everyone on WMW to find a niche and rank #1. SEO is serious cash and where there is cash people will find different ways to get their hands on it.
I'm pretty sure we can all safely start to talk about the good old days when SEOs would share publicly and terms like 'black hat', 'google bowling' and 'negative seo' did not exist.

speaking of which, is 'ass hat' a new name because I've read it on three different posts today.
ass hat = SEO engaged in negative optimisation seeking to negatively impact the ranking of other websites rather than increase the rankings of their own site.

LOL

ASS Hat - too freakin' funny. Sadly, also true. :(

I agree that the faster stuff like this is outed, the faster it can be fixed. Can be fixed, and will be fixed, however, are two very different terms.

why publicize this? by

why publicize this? by showing the public this weakness it helps expand the problem and create anarchy. it does not help the industry.

no search algo is perfect, algos are hard and complex to make. there are many more people trying to game them then there are trying to improve them. just looking at the odds of thousands of SEOs against a few engineeers and I know which side I would bet on. just because a flaw exists does not mean it should be made public.

What Greg said !

What Greg said !

"just because a flaw exists

"just because a flaw exists does not mean it should be made public"
really, does that same logic apply to all aspects of life?
toasters, tires, surgical procedures.....

my website is by Business, I'd like to know if there is a flaw just like I expect email virus alerts and software vulnerabilities or shoplifting opportunities etc to be brought to my attention.

This is an old discussion

Seems that this will crop up just about every time someone discovers a security flaw, be it in a web browser, a server setup, an online banking environment, virus development, etc. etc.

I agree with GerBot here - the sooner it's out, the sooner it will (hopefully) be fixed. And if not, it'll smear MSN/Live's name even more - which is when the marketing people may really move in because it could seriously hurt their intake. (Follow the money, eh ...)

Plus, it's a post on a third party site, not here on TW, AussieWebmaster pointed to. So it's out in the open anyway. No point in pretending that TW can be that instrumental in keeping it under wraps forever.

Finally: Can't get more topical than this, can you?

Shame the fix won't work though

Its a known fact that MSN won't follow a lot of 301 redirects at the moment - it just indexes the redirected page as a blank page.

So even if you try to redirect the pages MSN won't follow them

MSN is very prone to

MSN is very prone to negative feedback since they're under Googles shadow. I'm sure they'll find a way to fix it. Of course when they do so they'll totally destroy their serps lol.

ASP Solution

The author gave an example of PHP code to try and defend against this kind of attack.

Here is something similar in ASP:

At the top of any page you want to protect using a method similare to the authors:


if request.QueryString <> "" then
response.status="301 Moved Permanently"
response.addHeader "Location", "http://www.YourSite.com"
end if

keeping it real

While I understand putting it out there opens more to using it, I also think as a group we have a responsibility to look out for each other. I was only pointing out something that was already out there - it had been blogged about before I brought it here.

And I also passed the information along to the people I know at Microsoft and they are on it too. In future I will keep it to myself and fix my sites and ignore the moaning of sites dropping out of organic - yeah right.... as a mod I try and help where I can - that is what I thought I was doing here too.

And right you were!

Let those who feel competent to "protect" the Net by preventing information from being spread around do whatever their purported watchdog karma bids them - a public forum certainly isn't the arena to pursue that kind of attitude. If it doesn't get discussed here it will simply be moved elsewhere.

As I made clear before, personally I appreciate your effort, the more so because forums like TW are an ideal aggregator of exactly this kind of info, saving us all a lot of time and effort hunting it out ourselves. For this we owe you thanks, not pointless criticizing based on an essentially flawed concept of "necessary censorship".

great post - i suggest MSN will check the FRC

thanks a lot for sharing, I found it effected some of my sites, some things were done by a competitor but some was actually done by myself while I miss coded a url string.

I think it is VERY important to raise these things and do it fast and everywhere, so people will check their stuff but more then anything, since it is still according to the RFC to have a page that is not using the query string, and if it being sent a query string should return the result as like it was not given, I see it as MSN / LIVE search fault in wrongly identifying and panelizing for duplicate content - when it is not the case at all

Full disclosure is good

But I would urge anyone concerned with this to take a couple minutes to ensure that MS is made well aware of this issue, so that they can fix it sooner rather than later.

http://feedback.live.com/eform.aspx?productkey=wlsearch&mkt=en-us

Aussie, thanks for posting.

Aussie, thanks for posting. This happened to one of my sites and its only a matter of time before a few of my other sites are hit. I wont really miss the traffic, so if this is what it takes to fix the problem long term, then so be it.

Maybe Bill will tire of this crap that comes with reinventing wobbly wheels and just make Barry an offer he cant refuse and take Teoma off his hands.

MSN

People optimize for MSN?

Sure. MSN sends traffic too,

Sure. MSN sends traffic too, and that traffic may convert well for various niches.

MSN support wildcards in robots.txt

Does anyone here know if MSN supports wildcards in their robots.txt files?

I add the following for Googlebot to prevent it from mistaking tracking links and display parameters for content URLs

User-agent: Googlebot
Allow: /*?$
Disallow: /*?

The *? will block any URL that includes a ? followed by any string, followed by a question mark, followed by any string).

The *?$ will allow any URL that ends in a ? ().

These rules work together, so that the examples you list should be blocked/allowed the way you'd like with this syntax.

Both the Allow rule and the wildcards (* and $) are extensions to the robots standard by Google. I'm not sure they are supported by other search engines.

Hmmm

Seems like it does.

http://search.msn.com/docs/siteowner.aspx?t=SEARCH_WEBMASTER_REF_RestrictAccessToSite.htm

But they are talking about extensions/file types. Haven't tested if it is more universal than that.

robots.txt

Does the robots text file cover when spiders follow the link in to the site? Will it go back and delete the entry from its database? Though you have seemed to have lucked out is it more because the pages are lowered ranked than yours?

AussieWebmaster

Robots.txt should control what pages are indexed.

As long as you do not have any pages whose content are controlled by a dynamic parameter (eg, "?product_id=1234"), you can limit the spiders to indexing your pages once.

Straight from the Slurp's mouth

Yahoo! Search Crawler (Yahoo! Slurp) - Supporting wildcards in robots.txt
http://www.ysearchblog.com/archives/000372.html

problem

That is the biggest part of it... tracking code pages can cause canonical problems - and add to this the ability of competitors to just use code like that to swamp the index

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.