MSN Crawling RSS Feeds for Breaking News

8 comments
Source Title:
Picked Fresh Daily
Story Text:

Now this is interesting. When i originally saw Jeremy's post telling MSNBot to "take a chill pill" i figured it was just a glitch. The little devil had been going mad for his RSS feed. Now I see that this is by design and that MSN are crawling certain feeds in order to index breaking news as quickly as possible.

Kind of neat, but it begs some questions:

  • What is the criteria for inclusion?
  • What kind of boost is this content given (for example in ambiguous queries where the term/phrase is already populated)
  • Is TW on the list?

It's that last one that i want to know about :)

Comments

You would know ...

You'd know if TW is on the list because MSN bot would be hitting your feed several times a minute. This happened with one of my blogs that is *not* search or web design oriented at all.

Googlebot too

Googlebot-Mozilla shows an enormous greed on feeds too. She requests feeds up to 2-3 times per second, but at least every 15 minutes.

List is Here

SEORoundtable points to the MSNBC's list of news sources and to the submit page.

Question

Diane,
what should I be looking for because I've got the same query as Nick? I mean the Google Bot is through the roof anyway since I got back into Google News but Im seeing the MSNBot at 3 on my list, but still a lot higher than previous months. Scary thing with all these bots of course is that the bandwidth they use is running at approx 1GB below my actual traffic this month, basically they are using nearly 40% of the bandwidth at the Blog Herald, but I suppose given how people desperately seek and pay for search engine inclusion I shouldn't be complaining

Are you listed here?

MSN actually provides a list here of its sources. Don't know how accurate it is, though.

yae!

I'm on it. So is Threadwatch I'd note!

Duncan, I know what you

Duncan, I know what you mean. This is what I see in my logs:

GET /feed/rss2/ "msnbot/1.0 (+http://search.msn.com/msnbot.htm)

To slow it down, add this to your robots.txt (worked for me):

User-agent: msnbot
Crawl-delay: 180

Now I need one for Google.

Moreover submission for blogs

From the submission page:

"PLEASE DO NOT USE THIS FORM TO SUBMIT PRESS RELEASES OR BLOGS" (just below the content requirements.

I have used that form successfully for another news related site but am trying to get a blog listed (which is already in Google news, newsnow etc)

Is there another form specifically for blogs?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.