Threadwatch to Build Killer Link Analysis Tool, Give it Away Free

UPDATE: Due to comments made by Google and many, many warnings from 3rd parties, our major sponsor has decided to pull out rather than risk repercussions from the Search Engines - We are in talks with other companies, and are working to find a way to build the tool whilst remaining within Google TOS - See GoogleGuy's comments pointing out the TOS for more details.

At present, we are in a state of indecision: Some questions need to answered:

  1. Can a link analysis tool be built within Google TOS and still be useful?
  2. If we do get to build it, and it's within TOS, will our sponsor, and Threadwatch (and any other sites i build) be free from repercussions?

Back to our regularly scheduled programming...

Threadwatch, thanks to major sponsorship from Unspecified will be building the link analysis tool to end all link analysis tools.

Here's how it works:

  1. Unspecified pay for the initial development
  2. JasonD and DaveN provide the programming and technical expertise respectively
  3. We ask you what features you want in a dream link analysis tool
  4. We build it, based on your input
  5. We give it away for FREE

Follow the title link for more...

Some further Details

Here are some short answers to questions i can think you may want answers too. If you have more, please just comment, and ill answer them.

What kind of Tool?
We have a very simple (in concept) goal in mind: To build the best link analysis tool on the market. Period. It will be a standalone tool that you can download and run in your browser, with no "phone home" shite associated with it. The only thing hosted at Threadwatch will be the download page where you'll be able to upgrade it as improvements are made.

Sponsorship
Unspecified. are the main sponsor, the amount will not be disclosed, but it's considerable, and 100% of it will go into building the tool. There are opportunities for 10 smaller sponsors (this is why its free..) and any wishing to get involved should pm me.

Jason is donating the project management (as well as collaborating with DaveN on the technical SEO aspects) and DaveN his considerable seo expertise, these guys will also be named and credited. Both JasonD and DaveN have STELLA reputations for SEO and having their skills installed in the tool will make it truly awesome.

What's in it for Threadwatch?
Firstly there is the viral nature of building the best link tool on the market and giving it away for free - secondly, additional minor sponsorhips will help to monetize TW - we've talked about this subject so many times, but i've not managed to put much together to date and this looks like a win-win-win to me. You win as you get a great free tool, the sponsors win as there are limited slots and this tool will go mentally viral, i win as i get a few $$'s in my pocket and can stop eating boiled beef and cabbage for a week or two (only kidding :-)

This also carries on the initiative we started with the CashKeywords giveaway, while Threadwatch is still looking for viable ways to monetize, with utmost care and attention given to the membership, you can enjoy another cool freebie.

More Questions and Feature Requests
Fire away, im here all evening and will happily answer points i will have obviously missed, just ask...

- Y! MyWeb

Linux

Sounds excellent!

It will run on Linux, won't it? And maybe on Windows for those still using legacy operating systems... I'm not asking for open-source or anything like that, just that it's not limited to MS users.


Threadwatch in Conjunction with Text Link Ads to Build "Killer"

I think Text Link Ads is an incredibly smart company when it comes to marketing. Not just because they are a long time sponsor of this site, but also because they are one of the most visible companies in the...


Linux

We do hope so encyclo, the first priority is Windoze however - certainly the idea of cross compiling for Mac and *nix has been discussed and is amongst the core set of goals though.

It will ultimately depend on budget.


Open source?

Sorry for such a short bit of imput.

More to follow once some thought has been given :)


No

No, not open source, just free software. We will have to find the appropriate licsense for it before it's released.

Im also thinking that allowing webmasters to offer it for download on their own sites would not be a bad idea at all...


wxWidgets

Linux compatibility wold really be fantastic.
wxWidgets is compatible with Mac, Windows, and Linux/Gtk/Motif:
http://www.wxwindows.org/

I know Linux doesn't have the market reach, but we are talking about SEOs so they % will be higher than average. I do not even have a Windows PC.


features I would like

harvesting...

may not be on by default on all queries, but it should be an option people can turn on.

allow people to scrape yahoo backlinks. allow common URLs to be filtered out such that you can scrape a deeper set. keep scraping until you feel you have got most of them.

----

works on multiple engines...

G, Y!, MSN

----

allow me to sort by extention of the site the link is comming from...

I want to be able to quickly check how many .edu or .gov type backlinks exist

----

filtering...

allow me to filter site or sites out of the results before AND after I have done the search query.

also allow me to only find sites with extensions I want.

----

hub finder option...

another option which lets me find pages or sites that link to two or more sites which either exist in the top 10 search results for a keyword, or I can manually enter what sites I want to check co occurance from.

----

dmoz & yahoo check.

reports if the site is listed in either of those.

----

stats...

ip address, anchor text, all the usual shite too ;)

allow me to quickly know # of unique c class ip address. also allow me to see stats counting many links from one site separately or only count all of them as one link.


Features

Hi,

I thought it would be nice if a link tool could show the # of all the "indexed" webpages in each SE... separately from the backlinks from other websites.

Dan


Features

Ok ok, i should have know putting "dont post feature requests" wouldnt be enough - sheesh :-)

Go right ahead, i'll change the original post a bit.

Go Ahead, List your feature requests...


feature

Scrape and return unique tilde results based on the KW's in anchor text.


oops

sorry Nick... figure we will come up with something that is even more cool and the promotion will work much better if we can have input.

I was a bit too enthusiastic and whether or not you wrote it I did not read that you didnt want feedback ;)


Free Link Analysis Tool

NickW is building a free link analysis tool.


We want it, we want it

Check the original bullet points seobook :) we definately want your input - i was just going to start a new thread is all...


Suggestions

Collaborative options, in that if I have an interest in travel and a mate in telecoms we can share the data easily. Many PC's make light work.

mysql / php as that is more accessible

runnable from the command or browsers, so I can leave it running for weeks if need be.

Untraceable, rotating referers, UA etc. I dont want to see it posted on WMW "who is this SomeStupidUA Name"

Order backlinks in a variety of ways, eg url length, with without www, extension type, subdom

Option to record TBPR for each link

Ability to spider page if possible, for KW density or some other task, maybe even external links from that site, with anchor text.

Ability to go after not jusy link: type commands but also cotaining the term, but a way of distinguishing/filtering them for reporting.

Ability to multithread so more than one script can run at the same time

URL specific or search term specific, so I can get a feel for a particular market

Filter / note difference in internal and/or external links.

There will be more


I would ask

I would ask that it not scrape Google. Doing so is clearly outside Google's guidelines: http://www.google.com/webmasters/guidelines.html#quality
(the part about software querying Google)

It's also a load on our servers from bots rather than the people for whom our services are intended. In addition to being rude and using our services without permission, it may well open you up liability. Again, this is just a courtesy heads-up that such software would violate our guidelines. Please consider this a polite request not to produce software that scrapes Google without our permission.


suggestion: use in conjunctio

suggestion: use in conjunction with the API or analyze BL from other more accurate sources.


oops...

Guess I didn't read the original real well either Nick...I also of course meant...to use the api rather than scrape...since scraping is bad.


easy mr W, easy...


"I would ask that it not scrape Google"

Looks like it has to be open source, otherwise everyone involved could be liable for this. Let us write our own way to intergrate link maps.

(or maybe google could sponsor it and let us host their link map in a public repository).


To Search Engine Reps

Quote:
Please consider this a polite request not to produce software that scrapes Google without our permission.

GoogleGuy and all other search engine representatives, as I am managing the project I think it would be sensible to chat about how we can deliver this tool, without causing your search engines undue stress, hassle or inconvenience but whilst still having access to your publicly available information your respective engines provide.

The quality of your products are superb and the data you provide publicly accesable and I am believe that by having honest and open discussions we can find a workable solution that all parties will be happy with. My contact details are pretty public and I look forward to speaking to you all :)


To Get Back On Track.....

What other ideas have you got guys n gals.

Forget about where the data comes from or how it is gathered. I would love to hear what you want to see in the ultimate link analysis tool.

Please please please get that old grey matter working overtime so we can come up with some innovative ideas :)


Visual Map

I would love to see it plotted out visually, when all is said and done.

If possible, connect the lines of the nodes. So a green line would be a one way link. A red line would be a direct recip link, a pink line would be a triangular link and so on.


OSX, pleeeease...

I would be so VERY VERY happy if a version to run on OSX (even if it's through X11) could be arranged.

I only wish I were a big enough geek to offer to port it myself, but I'm clueless about programming... but surely there's someone out there who'd do it?


It is more likely than not th

It is more likely than not that the application will run under Windows, Linux, as well as OS X (though not 9 or below)

As Nick stated above Windows is #1 concern but hopefully the other target operating systems as well can be met at the same time.


Nice idea about the colours t

Nice idea about the colours to visualise the data RustyBrick


Suggestions

  • Deep linking percentage
  • List all anchor text with counts and % of whole
  • Anchor text to page relationships
  • Surrounding text variations
  • IP
  • One way/ Recip indicator counts and % of whole
  • List of outbound links with anchor text
  • PR of linking page
  • Internal/external link count on page

Let me know if you need clarification on anything.


I'd just like it to be able...

...to count - and show real results - OVER 1000, in some way. Surely there must be tricks for that. Like a multi-pass system, excluding the first set of results.... Am I making sense?

Presentation preferably flexible/customisable. Everyone likes a different setup/output format. Bit like a9.com (easy example)...


Keep thosde ideas coming. As

Keep thosde ideas coming. As the McD ads say, "I'm Loving it!"


Big List

JasonD,

I have a huge list PM me

The tool sucks because I do use the API and we all know how much Google gives us with the link command. No offense GG, I know you have your reasons.

Please replace my tool.


concept

Is this a tool that would be hosted exclusively at Threadwatch, or would it likely see deployment across multiple sites?


Thanks GG

Thanks GG - Send regards to Rose. I am hurt.


Standalone

Quote:
Is this a tool that would be hosted exclusively at Threadwatch, or would it likely see deployment across multiple sites?

The idea is to build a standalone tool that will run in your browser Brian, and allow other websites to distribute it on the condition that they do not intefere with the interface and sponsorship stuff of course.


For each site in the top 10 o

For each site in the top 10 of a query I would like to see a breakdown of the anchor text in a pie graph, for each site, then merged overall for the top 10.


Me too - sorta

What mivox said about OS X, except not the X11 part cos I never got that whole bit running (I think.) A real OS X version would be cool.

Also ditto what SEOBook said.


Nice one Nick

RustyBrick said:

Quote:
I would love to see it plotted out visually

I second that. A tool that allows you to quickly visualise inter-relationships and patterns.


How much data...

are the sponsors going to have access to?


I might be wrong, but I do no

I might be wrong, but I do not think the sponsors are gaining access to data. I believe they are just going to get ad space.

Giving a third party access to the linkage data you collect would make the tool worth way less than it otherwise would be.


Nice! This is my wish list:

- Flexible queries: I'd like to have a set of pre-defined models for queries, but also with an option to allow changes in search engines. I mean that we can change the SE we query, the parameter that points the query (ie, in Google, q=), the operator, other custom parameters in the URL, etc, so that we can adapt to new search engines or to the changes of current search engines, or make different queries (e.g., link:http://www versus linkdomain:www)
- What Optilink does (IP, PR, number of backlinks of the returned pages, title of the returned pages)
- To have a comprehensive set of results, an option so that after returning the first 100 results, we can keep widening the result list by filtering the websites present in the first set. E.g., if we make a backlink check at Yahoo and it returns 100 links from 56 different websites, make the query again adding a -site:www.whatever.com for every one of the 56 different websites.
- To return just the first backlink from every unique website.
- In the report: A column for all parameters present in the anchor tag other than the href, so that we can check if the backlinks have target, nofollow, title, etc.
- The anchor text, of course, or the ALT if it's an image.
- In the report: a column with all the anchor text for links that point to other externals URLs except for the URL we're checking.
- A custom field where we can place a snippet of code and for every page returned, it checks if the snipet is present and says so in the report. So we can check if the pages where are the links have a certain word, or a certain tag, or a certain link.
- A list of co-ocurring links ordered by number of ocurrences: links to pages other than the one we are checking that are present in the returned set of pages.

Now, confess: you aren't planning to give this tool away, right? This is just a trick to harvest ideas for your own private backlink-checking tool, true?


Hilltop Type Thingumy

Pick a site, get the all the backlinks for that site that also appear in the top 1000 for the same search, then split them down via category of IP.


A "search within these results" option

I'd like to be able to search within the sites returned before the other info is collected. Often very helpful in narrowing the list.


Just to clarify a couple of p

Just to clarify a couple of points.

The tool will be running on YOUR machine accessable via YOUR web browser. It is not a remotely hosted application and there is no central data repository. The only people who can see your research are those that have physical access to your computer.

The sponsors do not get access to any data at all. Nada, Zilch, not one iota. All they get out of this tool is a warm fuzzy feeling in their belly that they've helped deliver something very good indeed.

Oh, and a prominent position in the application noting them as the sponsor and hopefully your business :)


Hey GG and Tim

you both know how to get in touch with me, we will be staying within the guidelines of course :) would I ever bend the rules ... ever :)

As for data collection I don't think we are going to capture any, unless it gets run on servers, the local distribution model shouldn't anyway... unless we can find a real sneaky way without anyone knowing lol ...(joke)

DaveN


Break down all of the backlin

Break down all of the backlinks by IP and then by C class (or have it as an optoin) so I can see if a certain site is buying up site wide links with a certain forum or portal.

Once I can see if site X has 3000 links from a certain site what % have the same anchor text and the proximity with in other text i.e are they just footer links or are we talking links in articles etc...

Not sure how possible it would be to then go off and query for whois data, to see if we are just talking about large networks owned by one guy/gal.


Why a browser app?

As JasonD has seen, I have been developing my own link analysis tool (using google api before I get a b'tch slapping off GG heh) and quickly found that doing it in the browser is problematic, beyond basic link analysis the queries required are just too long. Unless it will be a java applet (definately would make it cross platform)?

My only feature request that will be different to others is please do not make it mySQL only if you are going to use a datastore, either make it file based (xml data store would be cool then could extract data) or DB agnostic (web services, an abstraction layer or somesuch) as I really do not want to have to install stuff on a computer that I don't need just for one app!

PM me if any of you want to see the aborted browser one.


Hi Chris. Main reason for

Hi Chris.

Main reason for a browser based version 1 is speed and costs of development to make it as multi OS as possible.

If we output HTML it's one area that looks the same (pretty much) across all operating systems so means we don't need to code a seperate GUI for Mac's, Windows, Linux etc users.

:)


multilingual & theming

I would like to know where the links come from (countries...) and from what theme the links are coming (there might be a database with topics - sub topics and keywords attached to these categories to identify a general theming structure...)


Major Sponsor pulls out - Discussions with SE's pending

UPDATE: Due to comments made by Google and many, many warnings from 3rd parties, our major sponsor has decided to pull out rather than risk repercussions from the Search Engines - We are in talks with other companies, and are working to find a way to build the tool whilst remaining within Google TOS - See GoogleGuy's comments pointing out the TOS for more details.

At present, we are in a state of indecision: Some questions need to answered:

  1. Can a link analysis tool be built within Google TOS and still be useful?
  2. If we do get to build it, and it's within TOS, will our sponsor, and Threadwatch (and any other sites i build) be free from repercussions?

What about the API?

Something like Digitalpoint do - everyone has to sign up for the API to use the tools and they just supply the technology?


Marginalize Google

>Can a link analysis tool be built within Google TOS and still be useful?

Probably. With enough other sources yes.

>If we do get to build it, and it's within TOS, will our sponsor, and Threadwatch (and any other sites i build) be free from repercussions?

No. You never are and never will be.


Talks with Search Engines

We are currently talking to several search engines - the idea being that through discussion, we may move forward...


No and No

Its just a bad idea all round imho.

And while I'm in that groove, the "se repurcussions" spin is a little too much, everybody should have the right to protect there own sites/servers.


Agreed

Good point - you know what im like, write first think later :)

I have changed the original post, just prior to seeing your comment NFFC, so im with you on that.

I understand that the guys are attempting to open up discussions with the SE's and untill we hear from them, the project remains in ICE

Cheers


I personally feel 90% certain

I personally feel 90% certain the app will happen guys so don't worry about that. As Nick said we are chatting to the Search Engines (not just Google) hoping to work with them rather than do this behind their backs.

Ultimately if we end up being in a position where the app can't ahead with the search engines being happy then there are alternatives. Watch this space :D


Legal Issues

Barry has a good thread on the legal issues concerned over at SEW - do go see, it's a great post...


SE's

I would find it hard to imagine how many major SE's will be interested in helping such a project along. Google have made a great point of limiting the amount of useful data than SEO's can access over the past year.

And as Xan has made a point of in SEW, SE's can easily see SEO's as interfering with their processes, rather than trying to add anything useful to them.

So I guess I wouldn't expect too much offered from discussions with search engines.

I figure there are compelling reasons for as to why such a wholly comprehensive tool suitehas not yet been developed, and people like Barry Schwartz and Shawn Hogan are people to talk to on what can actually be added in terms of tool features, with regards to their perception of the limitations.


Limitations are Huge

You can't have a win - win situation on this topic. You either go the WPG route and don't care (SEOs win - Google Loses) OR you go the API route and limit the users big time (SEOs lose - Google Wins).


the SE's would have benefitted indirectly surely?

if a really good and free tool were available then not only would a lot of SEO's use it but the SE's would be able to see exactly what we were seeing and know what we were working from?

If I were them I would have kept my mouth shut and let it go ahead... by stopping it you just have a load of good researchers/programmers using their own unique tools which do all of this stuff anyway and a load of the rest of us paying some backstreet programmer to hack something up which does the same. How much better for them to reverse engineer the tool a lot of SEO's use and break it?

I really don't think I'll ever learn to think like an SE rep.


I think you're close to spot

I think you're close to spot on Gurtie but actually believe that the search engines can get more out of it than that. They have the chance of knowing about and liasing with the developers, funders and supporters with what will become the leading SEO tool out there.

Through discussion amazing things can happen. A touch extreme I know but when these guys can chat I am sure that SEOers and SEs can as well!


Hello First time poster

I may sound crazy here but how can google/google guy ask you not to make a tool that uses their public data. They seem to scrape data from the entire internet themselves without asking, and they do this to make a profit (indirectly). There whole biz model depends on them taking your data that you produced and displaying it for the purpose of making cash....think about that. Now they may say well put up a robots.txt but how many average webmasters know to do that. What if I put up on all my sites tomorrow a terms of service saying it was not ok to scrape my data and display it on any site that is commercial in nature, would they stop. Hell no they wouldnt care one bit and you guys should not either. IF they can do it I dont see why you cant use their public data.
Thanks


A very good point Linkster an

A very good point Linkster and Hi and welcome (I'm sure Nick will pop in and send you over to the welcome thread:) )

I don't agree that the search engines are leeching our resources without us having an opportunity at stopping them doing so but I do feel a watered down version of your points of view are extremely valid.

Search Engines rose to prominence by breaking new ground and entering new territory. This tool will be entering similar areas, growing upon research that the engines have already done. They make that information public already (by displaying the SERPs) so by partnering with TW in the development of this application they will be giving back to the community of webmasters that helped them rise to prominence and build $multi billion companies.


and here i am...

http://www.threadwatch.org/node/814

Welcome, and do say hi in the above thread linkster...


Talks with Yahoo

Well, we've made contact with Yahoo! and talks have gone well, i didn't do much other than say hi, and i've not got all the details from DaveN yet but it sounds promising so far...

No sign of Google yet, but we're hoping they respond still...


Please consider this a polite request not to produce software that scrapes Google without our permission.

I can entirely see the position Google are coming from, but I feel its rather short-sighted and naive of them[you].

Google likes collecting data, this is obvious. Tools that scrape your data will continue to be written without your permission -- whether this one is or not, I don't know, but should it not be, someone else will take note and do it anyway -- the crux of the matter is how much you know about it. I'd suggest that having tools scraping from one IP (block) or one UA string is far better an option than the alternative: proxied scraping. Inherently slower that may be, but the data would be far harder to track, and what good does that do you? At least when its "above the water" you can make use of the data you can scrape about the scraper ;)

Using the API may be an alternative, but were it not, Google might be better off asking for restrictions - one query a second, one UA string - than explicitly denying the usage altogether, and driving the ilk of tool underground.

(of course, the slower the tool goes, the less of an argument server load becomes, because I could do it all manually, and that would increase load, simply by virtue of images loading and whatnot)

Plus, faux queries still count as queries, and I'm sure handling more queries a day would not look bad on the investor's board reports :p


Driving Underground

The thread/idea above with comments will likely be done irrespective of whether it is even remotely connected to TW (maybe in private me thinks). I can almost say that it is probably already out there, but now with added TW input.

I can see why people are backing off, it is a dangerous game. SE's, well Google, who rely heavily on linking will not what their privates on show to be reverse engineered using an easily accessible reverse engineering tool. I can see why they dont want load of muppets releasing a beast upon them.

I am with you gurtie, keeping quiet would have given untold riches, proxies would have been identified, money terms identified and watched etc etc.


Do no evil?

heh - perhaps we should just be greatful that clearly we're more evil minded than GG?


Free Link Analysis Tool - You Decide the Features

Your wish can come true if you participate in the following blog post in TreadWatch about Killer Link Analysis Tool, Give It Away Free.

Several suggestions, aka my wishlist:

Customizable filters for viewing only certain details
Sort data
Abilit...


Napster like sharing

The thing that would make this tool rock even more, is if there was a way to share info programmed in.

Ok many would be scared to on the trust basis :)

But imagine the info a group could put together!


Googleguy did I miss something

So we all have our wish list and the tools will be develoiped and swapped amongst people. But with facilities to stop you tracking it.
You know this is already going on and will continue more and more.

Nick, Dave and Jason could build a tool that would become the defacto tool, you could have access to see what it does and even point it to servers you own so as to not hit the rest of your network. Plus you could even look at the data in isolation and see what is going on and do your own analysis.

Overture already do this with their partner ppc management tools.

Bravo Nick for being so honourable and mentioning it to the se's, rather than just building it and selling it under the counter.

DougS


If Google has an objection to this kind of thing

Why are Axandra et al still able to sell their link management software?


Googleguy - DaveN

I could have been my fault, and I'm Giving Google the benifit of the doubt... has GG put it "I would ask that it not scrape Google" ... thats fine I have sent an email to the contact I have at the plex and Trust his judgement... if we are going to start shooting people shoot me :) I'm much bigger than GG ;)

DaveN


seo fun

Wouldn't it be funny if the TOS included a claus that prohibits SE employees and agents from downloading, examining, or using it? (or perhaps just certain SE employees and agents).

Wouldn't it be funny if a "submit your site" webpage promised to submit your site for free provided you agreed to contribute a certain small percentage of your background CPU power and local internet access to this new BLA-at-home distributed computing project? Think how attractive it would be to have your site submitted by jason and Dave...


TOS, GoogleGuy and AutoLink

n/a


Just do it

I say any organization or company that scrapes 8 billion pages of copy written content, then uses that content as a back bone to sell ad space doesn't get the right to have a TOS that says it can't be scraped back.

You scrape our back we scrape yours. You make more money at it so shut the hell up and take it like a man :)

I say build the thing. It's a service with ads on it just like a SERP If they wanna get legal then maybe we should get a nice fat class action lawsuit to get the royalties for all our scraped sites that have generated adwords revenues G makes would only be fair right? Come on, would G be a 200 dollar stock without you, me and the mom and pop stores cotent?

I know I have sites in the SERPS that are piggy backed by adwords that cost 15 bucks a click. I'm sure there's enough to go around :)


Yahoo is the solution

Just use Yahoo!'s new API. It will work fine. Google doesn't want to play, so don't play. Google is not needed, nor wanted. :)


re: Yahoo is the solution

> Yahoo is the solution
You're probably right, Backlinks are more reliable there too.


It would be nice

Have the tool display a prominent banner showing seomikes post (my vote for TW comment of the year) :O)


Yahoo API

I agree with SEOMike's sentiment, but I think the Yahoo API is the way to go.


Love the idea

Just want to throw in my support. I just love the idea and it is a incredibly smart marketing move from TLA.


Well...

Apart from the fact that they're not involved right? heh..

Welcome to threadwatch kservik, do introduce yourself