Google Sitemaps Love or Hate

31 comments

Ok, I'm moving more to the love side on this, in fact so much so that I started posting the stats from my blog www.davidnaylor.co.uk,

anyway Vanessa Fox from http://sitemaps.blogspot.com/

popped by and asked this :

"Are there particular stats around queries that we don’t have available that you would find more useful? "

ok over to TWer's want do we want in Google Sitemaps..

me : page penalty notifications and an open channel to a google.. it's me site if I have fucked up let me know.. at least warning so I can fix it .. (whitehat sites only of course)

DaveN

Comments

hmmm forgot

remove from SEARCH listing .. site: .. link: I don't want other webmasters knowing all my ins and outs, how many pages I have indexed and what links I have.

Backlinks the ability to see all your backlinks
and way to devaule that backlinks that shouldn't be( scrapers and hijackers )

Site: show pagerank per page, and IBL's to the page

DaveN

some ideas

I started using sitemaps right when it came out. I've since deleted all of the actual sitemaps as I saw absolutely no benefit to them, and would rather build content than update a useless xml page. If anyone has any data to back up that a sitemap actually helps crawling/indexing I'm still open. We've run about 18 test sites in a private forum and have seen no effect at all through a lot of permutations of the sitemap (with links/no links/drop sitemap/cascading links/interlinked and so on)

However I still use the console as I find valuable information in the statistics, as far as how google sees sites, anchor words etc.

So now that they've established a way to prove ownership of a site, why not as for some more valuable information like:

1. www vs. non-www. Instead of this constant 301 redirect mess that google cannot figure out, why not have an option for the webmaster to declare what version is desired, www, non-www, both.

2. a negative map. Granted just putting in a sitemap doesn't inspire google to index the pages you list, but what about a sitemap of URLs that you want out of there. You work your a$$ of getting them to index pages you want, but then they go and screw up and start showing old junk.

3. PR - A site summary with PR on all the pages would be great.

I'm sure we'll see some great ideas here.

410 vs 404

what about a sitemap of URLs that you want out of there

Or simply treating a 404 as a 404 and a 410 as a 410 instead of lumping both together as 4xx and treating them the same.

Google Sitemaps

Preferential crawling: If we choose to establish a dialog how about more frequent crawls
Faster reporting: I've found it takes 2-3 weeks for things I've fixed to get purged from the errors reports
Better technical Diagnosis: If there a problem technically that is causing things to not be crawled properly give us some indication of what it is and a gentle nudge in the right direction to fix it.
Page Analysis: Make the common word and common links phrases and not just words
Link Command: How about "unbreaking" the link command inside of the sitemaps dashboard

RE:Johnweb it's amazing the stuff you don't know is broken. People link to you in all sorts of bizzare ways you can use HTACCESS to fix 99% of what's wrong. If the sites are big or old chances are there are some dead internal links somewhere in there, this makes it easy to find and fix. I can't say for a fact it's helped me rank, but it has helped me fix stuff I didn't know was broken. Generating an XML by hand is a bitch, but if you are using any sort of CMS you should be able to tap into it and autogenerate something.

Anyone got any

Anyone got any (non-specific) thoughts on whether or not setting up multiple accounts if you have many websites or clients is really necessary? It's nice to see a lot from one account but... Think that's worse than lots of sites in one AdSense account? (which is something we've rarely ever done)

Seperate

If the sites are all using the same adwords/adsesne I assume they can already figure out they are together and lump them all together. Other sites I set up as onesies, whether they are my own or a clients. No need to give anybody extra clues for snooping around and I'm paranoid.

blah

I still use Google Sitemaps, but I don't know if how really done anything for me.

Graywolf, thanks. That's

Graywolf, thanks. That's about what I had been thinking and then also for those domains with the same reg. data.

I'm expecting some creative

I'm expecting some creative answers here. I'll phrase it more generally: Forget XML files or even what Sitemaps looks like currently. What info would you want as a webmaster?

If you could design your dream webmaster or site owner console on Google, what would it look like?

More suggestions

Matt Cutts said, "dream webmaster or site owner console"

In addition to my comments above, if it could have anything I wanted (not knowing how this would jive with Gs desire to keep some information hidden for obvious reasons)

1) LINKS. I second the notion that the link: command should be diabled in the general search, and expanded in the webmaster console. I'd love to see what sites link to what pages. Of course I pour over my logs to find referels, but what if someone has placed me in a bad neighborhood, I'd like the chance to fight that from my end.

2) PR: In addition to knowing what pages have what rank, it would be great to know how they got that way. 10 incoming links, 20 internal links, etc.

3) Pages: What pages does Google know about, what pages are indexed, what pages are not? In other words a true SITE: command, only known for the verified user of that site. I doubt they would be able to pass on a reason a page isn't listed, but someone in tune with their own site could use this to help build a better site.

4) Page/Directory Removal. The emergency removal is great for things that need to be out now, but the 180 day thing is a pain, sometimes they just pop back into suplimenthell and you have to start over. Along with the list of pages above, have a little radio button to include/exclude from the index, it's just a 0 or 1 in a database.

5) Site Flavor. They already give a list of keywords in the site, but this is not always accurate. Perhaps some of it is old data, anchor text errors etc. Part of what the webmaster console has added is the Trust factor, by communicating with a verified owner of the content. Why not include some editorial options for the site owner, say a description or keyword suggestions for addition or removal. Sites change flavor over time, and waiting for that to update on its own is tedious. G still holds all the trump cards, but something verified is much more accurate than some META tags which are easily gamed and discounted.

6) Penalities. I'm not naive enough that G is going to say your penalized because you got a bunch of spammy links etc, but if a site is 98% good and a few pages drop out of favor, just knowing what pages are bad would be a great help, to us and the quality of sites.

7) Reinclusion. I like the reinclusion form, but how about some feedback (save your mailservers some work and stop sending the automated thing) just a status would be great Unread/read/denied/accepted/not penalized something so that we know it went somewhere, or didnt for that matter.

graywolf: I've actually got the sitemaps to be generated and updated automatically, just moved off the front burner until I see proof that they do something.

Dream console

It would connect adwords, adsense, analytics, googlebase and sitemaps all together. We know you're smart enough to and are probably pulling it together on the backend so how about put it together for the site owner. As John said if you want to show limited data to the regular world for [site:] and [link:] ok fine, but on the inside give us the real deal. If you "see" something wrong let us know what it is (we think your site has a lot of duplicate content, or we see some hidden text, duplicate titles, and so on). Don't show a reinclusion request unless it's actually needed, you freak people out.

thread jack

Was that a threadjack, or did it suggest Dave asked this on behalf of Matt? Funny.

Anyway a common question is why www.uniquename.tld doesn't show for a query "uniquename.tld" or "www.uniquename.tld" yet does appear #1 for "uniquename", does have many pages ranking, and does have referral traffic (no quotes anywhere).

Offering a 180 day drop for re-inclusion is not attractive to someone with many productive pages in the index. Since GSitemaps knows about redirects and already checks/finds robots and sitemap, it should be able to describe what the "canonical problem" is or other reason for the home page taking a holiday.

Love or Hate...

They seem great - until you have a site map with n-pages, the crawl stats has a full green bar, yet some pages are not in the index - makes it a bit pointless.

As to what I would like to see, simply ;

n-pages in sitemap.xml
n-pages crawled ok
n-pages in the index
n-pages not in index ( which ones and why would be a bonus )

That seems like a fundamental to me.

Google Borg

Dashboard Items:

1. Heatmap overlay integration:
CrazyEgg http://crazyegg.com/ is about launch this type of product. However, I think Google can quantify on this. For example, imagine being able to pinpoint your Adsense locations with real user actions. This alone makes the tool invaluable.
2. URL removal tool:
The current method for removing a URL needs a lot of work. How about the ability to remove batches of URL's immediately.
3. Datacenter checker:
Ability to check your site over all of Google's data centers in one place. This could be limited to a domain you have already verified with Google. Thus, a person with a .com domain can't check or get results that a .de domain would. Removal of site: and link: commands from public search and moved into private for webmasters and their own domain. Just imagine, this alone could help with duplicate content and spam.
4. Google analytics integration:
Ability to track, manage, and create campaigns.
5. Google sitemaps integration:
Ability to have manual and automatic rebuilds for specific pages or campaigns. More advanced debugging and alerting options. A person would have the option of being alerted for specific items they've chosen. Ability to create sitemaps for mobile search.
6. Webmaster notification:
The ability to be notified if Google has a problem with my site. If I am about to be de-indexed, I want to know what is causing the problem and how to fix it. Re-inclusion requests could also be added here. If your page(s) have fallen into the supplemental index, you could have instant notification of the URL(s)
7. Webmaster alerts:
Google's current alert system needs a lot of work. Beverly Yang and Glen Jeh are on the road to making this better I think http://www2006.org/programme/files/pdf/3055.pdf but this tool could be invaluable for webmasters. If someone is saying something good or bad about my site...I want to know about it.
8. Ability to manage and track Adsense campaigns within the dashboard.
9. Google webmaster API's:
How about rolling out some more API's for webmasters into the dashboard. The API's could automatically generate a user specific key for Google behavior tracking, monitoring, and what not. A good example of an API is Google sitemaps. Being able to utilize an API that syncs up with my clients current or fresh inventory.
10. GoogleTrends:
Create new ways for webmasters to do market research and integrate into dashboard.
11. Google search team updates:
Create a central location on the dashboard for new algorithm updates, index shifts, problems, issues, or news. A webmaster could also subscribe to certain Google RSS feeds because they'll most likely be spending much time in their dashboard.
12. Google base:
Create new ways for webmasters to utilize its functions.
13. Froogle:
Create new ways for webmasters to integrate shopping feeds instantly. Allow webmasters to track everything that happens and ability to manage new campaigns.

Jonathan Nelson

Here is how I currently use

Here is how I currently use sitemaps, I make a new post to my blog, in Wordpress I update my sitemap then quickly login to sitemaps and and hit "resubmit".

Why am I doing the above strange action?

1.) I am afraid my weenie blog does not have high enough PR and that someone who aggregates my feed might get to it first and take full credit for my content (linking back with the nofollow or not at all).

2.) I am also afraid if I do not immediately let Google know that there is something new to crawl it will take forever.

The above is like a superstition because I am not fully aware of how sitemaps works BUT the above is how I would like to have it work, make sense?

When Google stops crawling a blog for days or weeks I want resubmit to notify and it should react instantly. After all it is proven to be my content with its bar code and history.

Levels of trust, if we follow the rules we gain deeper access and work together as a team, it would be a challenge just as it is now.

If we could submit real time would there be any need for being obsessed with links to get crawled? How about changing how the entire engine works? I have never seen more complaints in sitemaps and on the web and all are the same. New website owners complain that Google has stopped visiting them. Every single time I check and it turns out they have few to no backlinks, have even linked to my new blogs from my own blog just to get them going. To me it's the only value of PR and should be it's only value (depth of crawl).

Build a engine that focuses more on content and less on backlinks, the quality of the content and the accuracy of links to external resources NOT the other way around. External links can be gamed; those who share and link out should be rewarded because it is the act of an honest website.

That is all for now...

Lots of the above, and the

Lots of the above, and the ability to specify country - let me tell you directly what country my site applies to, rather than you deriving it from tld &/ ip
A long time ago I had a site which should have been seen as Australian with a .com tld and hosted in Canada. I've learnt better now, but doesn't it make sense to be able to *tell* Google where the site resides?
Its not very spammable, as you could only tell them one country.

The love has been good...

I'm hesitant to post wants, since the recent Sitemaps updates actually gave me some of the very things I'd suggested/requested in the Sitemaps group forum earlier.

As a developer, I've painfully felt the brunt at times of people who are never satisfied with the stuff you've delivered!

From reading some (not all) of the previous suggestions on this thread, I haveta say that a few of you describe requests in a very "marketing-department-like" manner: you don't give a very precise description of what you're wanting, so it's hard to translate into technical requirements. In some cases, what's being requested seems a bit illogical or unreasonable (I guess that's why a few of the suggestions smack of the sort of marketing requests that have occasionally given me pain as a developer/project-manager in the past).

Anyways, I'll throw a few more suggestions into the hat:

  • The stats that are there are interesting. BUT, it would be helpful to either post what timeframe they apply to, or provide explanatory statements about how they're derived. Rolling 90-day average? Last 14 days?
  • The smiley-face ratings irk slightly, because it's not clear to me how they should be interpreted. If I like a feature, but am unhappy with how it's implemented, should I click the unhappy face? I'm not sure you folx are getting the user-feedback through those ratings that you're intending. I'm paranoid about clicking to convey that I'm unhappy about how something's implemented, since if a bunch of people do that you might delete the feature altogether under the mistaken impression that it's not wanted/useful.
  • The Statistics tab is the most interesting part of the whole thing for me. More stats would be even better -- instead of just Top 20 Searches/Clicks how about a lengthier listing?
  • Even better: Google dislikes for folx to use automated methods for checking rankings on various keyword queries. But, this is one of the top most-desired analytics of SEOs and webmasters who optimize. You've even specifically made this against your guidelines. How about setting up an interface to provide ranking reports as a service, though? Allow Sitemaps users to specify up to a couple hundred keywords, and then provide a report on where the site's pages rank. I'm supposing you dislike that automated software because it can degrade performance for real users, and perhaps because you think it's unhealthy for people to obsess on specific rankings on pages. But, this is too useful of information for professionals to ignore, and if you provided it as a service, it could avoid impacting performance.
  • Anyways, I like what Google Sitemaps has done thus far, and it's nice to see a major search engine working with the web development community.

    If only there was a way that everything could be completely transparent... but, then there wouldn't be as much demand for the niche discipline of SEO, would there?

    usability

    Move the web crawl details out of a dropdown and into the left hand menu (presumably indented under web crawl)
    If you cant do that, *please* add the javascript to perform the action automatically on selection!

    I hate select dropdowns for navigation!

    First of All

    Well, first of all, how about a URL that is easy to remember and type (ie: sitemaps.google.com)? I actually use the site: operator for real world searching sometimes, so I would hate to see it go as an end user of Google's SE. I agree with most of what has been touched on...more stats. I like the search terms, the top PR page on the site...but, I want more. It is kind of like watching porn through the squiggly lines when I was 10 on cable channel 0. You guys were in tune enough to figure out webmasters would like to see the query data, keywords that get clicked and top PR page, now how about you stop teasing? I do like the improvements and it is cool that Sitemaps is getting some attention.

    Another Random Point

    Librarians and the like use both site: and link: commands as part of their research. I was a little suprised when I came across this site recommending looking into the backlinks to get a feel for the quality of the document.

    Jason I also use the site: command to research as I am sure many do.

    I'd love to offer more feedback but I am a little snowed atm.

    Silver said: "it's not clear

    Silver said: "it's not clear to me how they should be interpreted".

    When a feature does not reveal itself fully it is natural not to trust that feature.

    the site: command

    true, I use the site: command (when being a searcher) too - what if the site: command only worked publically with a modifier, eg "site:fruit.com oranges"

    But I can't think of a use for the link command that isn't snooping, even if it is legitimate snooping ala the librarian mentioned above.
    Get rid of it publically and make it actually *work* in the console.

    What about the console

    Why not include the little known google console in the sitemaps service?

    http://services.google.com:8882/urlconsole/controller

    Oh, and maybe pull in some search volume & CPC data from Adwords as well. We can find it out anyway so why not make it easy?

    You know how there is our

    You know how there is our search history details inside of our personalized search history accounts? Maybe have something like that for Google Sitemaps, which shows what pages Google sent traffic to and for what queries. Trending data, etc. Basically sorta build Urchin into Sitemaps with the option of getting our relevant Google referral data even if we didn't participate in Urchin.

    Related topics or queries we should maybe look to build content for (this could be by comparing our traffic and content against other related sites, and by showing us what people search for by directly using a Google search box on our site).

    I also would be able to sort citations by perceived trust or value (to help show the difference between good links and bad links). Also showing most recent citations would be interesting, while allowing brand managers to see how their new offerings and brand are being perceived by the public.

    Real time PageRank values (or as near to them as you can come).

    make it anon!

    make it anon!

    Lol Jason

    I think the point is to gain trust between Google and Site owner, a two way street where if the site owner takes the wrong turn, his Google Sat Nav will put him back on the right track so to speak ..

    DaveN

    "Not Found" pages

    Tell us WHERE the 'bot was when it encountered the non-functional link.

    And let us clear that list when we know we've fixed something.

    And what's with Google coop?? Why not let sitemaps have tags like that so we don't need to do both? The tags could help Google do better indexing too.

    Communication

    Make a contact page where we can correspond with Google via a contact form about different issues (ie:reinclusion, crawl issues, etc) and keep track of all correspondence history in one place? Of course, that would mean that we would get a response when we emailed Google.

    I've always assumed that the

    I've always assumed that the sitemaps thing was only for webmasters who didn't know how to make a spiderable web site, or were unable to bc. of a CMS or something.

    So, I've never really cared about it, and never used it.

    But if I'm to interpret this as if Google wants to turn it into a general webmaster tool, here's my two cents:

    (1) Give the tool another name. We know all about sitemaps, don't need Google for that.

    (2) Spidering frequency. Allow me to specify that the bot (any bot) should only visit me once per week or once per day in stead of several times per day. And that this particular section (group of pages) only need spidering once a month or once every half year.

    I think the point is to gain

    Quote:
    I think the point is to gain trust between Google and Site owner, a two way street where if the site owner takes the wrong turn, his Google Sat Nav will put him back on the right track so to speak ..

    DaveN

    And don't tell me that your sat nav has never sent you across a field or told you to undertake " A U Turn when possible" whilst you are on the M25!

    I know why G want it to be used, but the question has to be asked is why would the average webmaster use it?

    The average WM doesn't know nor care about sitemaps, which only leaves SEO savvy WMs which is the standard TW reader. Now why on earth would the average TW reader use the damn thing?

    Here you are Google, the one single piece of information you can't buy, collate, scrape or manage to dig out without us directly telling you. What sites I completely own or control.

    Will I use it? Yes, but not in the standard format and not like this!

    bingo

    Quote:
    Will I use it? Yes, but not in the standard format and not like this!

    Same as it ever was, eh?

    Do we forget that this is aimed at average webmasters? The bread and butter of Google is not the SEO or competitive webmaster. *IF* Google could see 80% of the sites via such a system, would they be able to simply ignore the other 20%? Sure they could.

    I have "recovered" more than one site for a client using Google Sitemaps. Using it the way Jason suggests, which appears to Google to be the way Google suggests it be used. Same as it ever was.

    Comment viewing options

    Select your preferred way to display the comments and click "Save settings" to activate your changes.