Search Engine for Hand Written Documents - History Meets Search Tech

Thread Title: Researchers create tool to automatically search handwritten historical documents Thread Url: http://www.umass.edu/umhome/news/articles/7683.php Thread Description:

The Center for Intelligent Information Retrieval department of the University of Massachusetts Amherst have created a manuscript retrieval system capable of scanning and understanding hand written documents.

Imagine the potential of that...

On scanning/searching George Washingtons Personal Diaries

The scanned pages of Washington’s papers can be searched by typing in a word such as “Washington” or “Virginia,” and the program produces a list of ranked pages showing where they appear.

Manmatha says, “Right now, searching a scanned handwritten document is very hard to do. Scanned historical documents are basically images, or pictures, and currently can only be searched if someone manually transcribes the documents or creates and index of their contents. This is time consuming and expensive to do. Given the cost, most handwritten documents are never transcribed or indexed,” Manmatha says. “But there is an enormous amount of handwritten, historical material.

According to Toni Rath, “The basic idea is analogous to searching text documents in one language, say French, using queries in another language, say English. This is usually done by learning models from documents written in both languages. By analogy, our system learns from a parallel body of transcribed scanned images. That is, the word images form a ‘visual language’ and the transcriptions are in English.” Once the model is learned it may be used for searching scanned pages for which no transcriptions are available.

story via slashdot

M$ Files Patent for Plain Text Email - Evil Evil Evil!

Thread Title: Method for reading electronic mail in plain text Thread Url: http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=/netahtml/srchnum.html&r=1&f=G&l=50&s1=200402439 Thread Description:

You just have to ask what kind of mad, grade A hallucinogenic drugs these people are on...

Abstract Systems and methods for converting text of an electronic mail message in a non-plain text format to text in a plain text format are disclosed.

link via varchars via Jeremy Z

Steve Rubel: Ebay & Craigslist to Merge in 2005

Thread Title: eBay and Craig's List Will Merge in 2005 to Create a P2P Media Giant Thread Url: http://www.micropersuasion.com/2004/12/ebay_and_craigs.html Thread Description:

Steve says p2p and citizen journalism is where it's at for 2005 and with several new companies waiting to emerge like backfence.com it may well be the new gold rush.

eBay and Craig's List are already the leaders in facilitating person-to-person commerce. They have also been steadily growing closer together - in August eBay acquired a 25% stake in Craig's List. In 2005 they will take this to the next level when eBay acquires the rest of Craig's List it doesn't own and then enables customers to blog right on their unified site. This will usher in a new era where citizen journalism is directly funded by person-to-person commerce.

I tend to think community as a whole is going to be massive in the next couple of years, mostly becuase i'd really really like it to be :) but partly becuase of all the rumbling you hear if you keep your ear close to the ground. He ends with this, and i think it's noteworthy for threadwatcher's looking to cash in on community:

We have been trained to categorize Internet companies into little discrete buckets. Yahoo is a portal. Google is a search site. eBay is an auction site. Amazon is an online retailer. That's all well and good, but I bet the the brilliant executives who run these innovative firms, however, are taking a much larger view of where the online medium is headed and they're watching blogs create trusted communities that can spur future revenues. You should too.

Google AdSense contributes to highest ever domain name registrations

Thread Title: The David Beckham Effect Spotted in the Wild Thread Url: http://www.jensense.com/archives/2004/12/google_adsense.html Thread Description:

"Interesting how one contextual advertising program can make such an impact. And no, I am not going to admit how many domain names I have registered because I thought it would be perfect for a content site with AdSense :)"

So we have adsense impacting the domain market, not too much of a leap of faith to connect it to sandbox too, imho.

From the increasingly excellent jensense.com

Forbes Ditch IntelliTXT Ads after Editors Complain

Thread Title: Forbes Ditches Embedded Text Ads After Complaints From Editors Thread Url: http://www.adweek.com/aw/iq_interactive/article_display.jsp?vnu_content_id=1000730917 Thread Description:

Forbes have dropped the embedded text ads from vibrant media - The IntelliTXT ads work much the same as adsense but instead of clearly marked ads the contextual ad links appear in the body of a web pages main content.

This will be an enormous blow to Vibrant, Forbes were their largest and highest profile client.

Apparently the Forbes editorial staff have been complaining about the practice of mixing ads in editorial, cant say i blame them..

details via jensense

An Economists View on Click Fraud - Reyes talking Bollocks?

Thread Title: An Economists View on Click Fraud Thread Url: http://weblogs.jupiterresearch.com/analysts/scevak/archives/005264.html Thread Description:

Jupiter analyst Niki Scevak gives an economists view on click fraud in the post threadlinked above.

In light of what Google CFO George Reyes said about click fraud threatening the G biz model Niki's thoughts on the subject make for a good read:

Firstly, click fraud is a bad thing that should be policed and eliminated by the engines and they have no excuse now that they have $50bn market valuations to hire scores of click fraud cops to eliminate it. But it will have zero impact on Google's revenue, or any other search company, and zero impact on the growth of that revenue.

Here's why. Click fraud is already priced into the cost per click. Marketers bid based upon how well the leads that Google and others send them convert into, in most cases, direct sales. That means that if one person out of every hundred buy, and they make $100 per sale then they will spend up to $1 per click. Now out of that 100 clicks, the fact that 50 (gross exageration used for effect!) of them are click fraud is irrelevant. If Google eliminates click fraud then that means that one person out of fifty will now buy, and so the marketer will be willing to pay up to $2 per click now.

The volume will decrease but the cost per click will rise to balance this.

Emphasis mine.

He goes on to say that Reyes would be better off doing his accounting than spouting off about click fraud (paraphrased heh..).

So, is George Reyes just spouting off about stuff he doesn't understand? Probably not eh? If that's the case, why is he making these statements?

The Video News Letter - Email News Letters get an Upgrade

Thread Title: Video Newsletter Gets High Viral Pass-Along & Unusually Strong Clicks Thread Url: http://www.marketingsherpa.com/sample.cfm?contentID=2868 Thread Description:

FC has an article about a firm specializing in video news letters in conjunction with thier clients traditional email news letters.

They report a 24-35% CTR from the email sent out that contains the link to the 3min video and FC reports that the fact that the "reading population" accounts for less than 10% of us makes this a more than viable option to cover a wider audience and get your message out there.

Drenik hired a local video production crew and had their trained scriptwriters turn the long newsletters into video scripts for an average three-minute video. He tells the scriptwriters which are the hottest stories so they know what to focus on and what to cut.

The final approved script includes camera angles and videographer direction in addition to the words to be read. (See sample below.)

The team selects and hires a local TV anchor or TV reporter to moonlight on the side as the official video newsletter presenter. They try to match personality to the brand personality of the company that will be sending out the newsletter. They also look for stability -- is this an on-air personality who'll be staying in the area for a while so they can be counted on for the long haul?

Then final edited video is transferred into a format which can be streamed from a Web site. Drenik insisted on a format that did not require the use of a player, because he knew it might be a hump some newsletter recipients are unwilling to pass over. Getting the information had to be as easy as turning on your TV, with no possible tech challenges.

It's good stuff, check it out at the threadlink above...

Goodbye Search Engine - Hello Sense Engine

Thread Title: Searching Doesn't Make 'Sense' Thread Url: http://turk.internet.com/haber/yazigoster.php3?yaziid=11472 Thread Description:

The threadlinked article above is essentially, as Techdirt point out talking about clustering - like clusty the clown the baggy trousered pie thrower of search.

Crystal Semantics has developed the 'Sense Engine' in order to produce relevant search results by utilising the senses of words, rather than statistical algorithms used by other search technology. Because any word in the English language can be part of a search enquiry, each word is analysed to determine its potential to discriminate which context the search should cover. The 'Sense Engine' identifies all the likely search words, advises the user of the different contexts the search should cover, and categorises the results encyclopedically providing users with results relevant to their request.

The 'Sense Engine' is the result of a six-year search linguistics development programme undertaken by Professor David Crystal, a world authority on linguistics, encyclopedia editor and published author for Cambridge University Press and Penguin Books. £4 million has been invested in lexicographical and encyclopedic research, giving the 'Sense Engine' a classification system of around 2,000 categories derived from an encyclopedia component of over five million words.

This all begs the question, will clustering take hold...?

by the way, the internet.com article above disables right click via javascript. We have a word for people that behave like that in england, starts with W and ends in ANKERS

Threadwatch Nominated for Best Tech Blog 2004

Thread Title: TW nominated as best tech blog 2004 Thread Url: http://2004weblogawards.com/archives/000071.php Thread Description:

Come on people you know it makes sense!

click that button and make TW a winner :)

Overture, Geico Settle Trademark Dispute

Thread Title: Overture, Geico settle trademark dispute Thread Url: http://news.zdnet.com/2100-9588_22-5473231.html Thread Description:

Overture Services has settled a lawsuit brought by insurance giant Geico, ending a battle in an ongoing war over the commercial use of trademarked terms in Web search results.

Geico vs Google suit is ongoing - judge denies Google's motion to dismiss the suit.

Interesting times for the PPC market.

Sitematch: Organic Rankings Suicide?

Thread Title: Msn, Yahoo/inktomi/overture Trusted Feed, And what happens to Organic Crawl data Thread Url: http://www.highrankings.com/forum/index.php?showtopic=10332&hl= Thread Description:

This is an interesting thread, as it shows that even in the minds of some of the more experienced practitioners such as Jill Whalen and ProjectPHP their still exists a degree of uncertainy and cloudiness when it comes to this PFI program. The main question is whether or not you reappear once your budget has expired, based upon your original 'natural' crawl position. Lots of 'possibly's', and 'should's' from David at Trellian, along with a few helpful suggestions.

Sitematch was launched back in May sometime. At the time I read various threads at WMW from confused webmasters grappling to get to grips with whether it was a good or a bad thing.

Questions like

If you submitted to sitematch, what would be the position once your budget was exhausted.

Would sitematch be the kiss of death for an affiliate content website.

What about a site that had an INK penalty, would they be considered under this scheme, would they be included whilst their budget was active this wmw thread threw up all sorts of issues.

I haven't really looked at Sitematch for a while, I dont know if its changed, improved or gotten worse even. At this moment in time, natural crawls (for me at least) seem to cut the mustard, I don't see a need or requirement for it and I don't entirely trust it either. Can anyone point to a definitive position? Is sitematch dead in the water, or has it undergone some mysterious not very well publicised rebirth?

Frame Busting - Breaking out of Malicious Framesets

Thread Title: What's the best frame-breaker? Thread Url: http://www.webmasterworld.com/forum91/2831.htm Thread Description:

Threadwatch member encyclo asks for the best way to break out of a mailicious frameset (where another site frames your pages) over at the wmw forums.

A technical discussion of the various javascript snippets that can do this follow and it makes for damn good reading!

MSN Beta Moving to MSN Main Site?

Thread Title: MSN Beta Slowly Propagating to Non Beta MSN Search Thread Url: http://www.webmasterworld.com/forum97/266.htm Thread Description:

Barry Schwartz is reporting on the wmw thread threadlinked above were members appear to be seeing the msn beta search moving to the main search.

ADDED: Looks like the real deal from where I sit...

What do the Threadwatchers think?

Meckler on Vertical Markets

Thread Title: Vertical, Vertical, Vertical Thread Url: http://weblogs.jupitermedia.com/meckler/archives/005249.html Thread Description:

Nice post by Jupiter boss Alan Meckler on, yep, you guessed it - vertical markets..

The fact is that this trend has been going on for years. But it is only in the previous few months that the business press is realizing that vertical "is in." Just as Danny Sullivan is predicting that vertical Search Engines might well be the next wave of Search, vertical shopping is already the next wave of etailing. We are even seeing this in the verticality of auctions sites, event ticket sites, travel sites and and a host of other fields.

Verticality is what has made our JupiterWeb sites more significant than the tired horizontal tech trade print magazines. Going further, our searchenginewatch.com owns the editorial side of the Search ad industry -- an honor that 5 years ago would have been part of the industry weekly Advertising Age

Interesting if not entirely new stuff...

Top Level Mobile Domains .MP - Now Available!

Thread Title: Dot MP Up and Running Thread Url: http://www.russellbeattie.com/notebook/1008191.html Thread Description:

Domains speculaters, get your CC's out and start buying!

What is .mp? What you get How it works

I got nick.mp for $50 and im afraid i knee jerked and bought moblog.mp for $300 (commercial) without having a clue what to do with it... maybe i'll sell it :-)

Casino, betting, pills etc - all gone, but it's VERY early days...

Have fun...

WebmasterWorld Bans the Mention of Search Engine Strategies

Thread Title: windy city december search conference [WMW Subscribers Only] Thread Url: http://www.webmasterworld.com/forum78/7299.htm Thread Description:

Forgive me for posting a link to a "Supporters Only" thread, but I felt it was something worth bringing up.

When I decided to leave WebmasterWorld, I made the decision to go as quitely as possible, but this thread is making that decision a bit hard to stick to.

Sometime yesterday, long time WmW member (and conference speaker) Chicago posted a thread intitled

SES Chicago - who's going?

(Very similar to this thread: http://forums.searchenginewatch.com/showthread.php?t=2744)

At last year's Chicago SES, Chicago hosted a nice private party for WmW members. His reason for posting the thread in the Supporter's section was to try and get an idea of how many members would be in town, so he could plan another party.

Someting today, a Supporters forum Admin changed the title of the thread to read

windy city search conference (without sending a sticky to Chicago)

When I first saw it, I thought it was a new thread announcing PubCon 7.333, but it turned out to just be the SES thread with a new title. Apparently, it isn't appropriate to promote Danny's show in that way. Letting that title stand might result in some WmW members deciding to attend.

I can't even begin to explain how dissapointed I am. Taking such a stance is probably the most disrespectful, rude and childish thing I've ever seen BT do.

Behavioral Targeting Moving Forward in 2005

Thread Title: New Applications for Behavioral Targeting Thread Url: http://weblogs.jupiterresearch.com/analysts/stein/archives/005233.html Thread Description:

There have been a couple of good posts about behavioral targeting today, i've linked to Jupiter analyst Gary Steins one. The other is this clickz piece by Dave Morgan.

Gary says:

Behavioral Targeting is in a pretty interesting space right now--it has becoming a category in its own right. That means:

More than a few vendors have developed sophisticated systems Publishers of note are integrated the systems The challenges--such as audience standardization--have been identified and solutions are making their way into the marketplace

Now that the systems are in place, its time for them to be used. That is, time for advertisers to begin to not only understand how the systems work, but also what they can be used for.

Dave points out that the the figure for marketers using behavioral targeting in 2004 is around 16% which seems high to me but he also says, on the subject of: Has behavioral targeting arrived yet?

My answer and those of most people I talk to are the same: No. Today's behavioral targeting applications are still a long way from fulfilling that elusive promise we all signed up for in helping build the marketers' "perfect medium."

and goes on to point out the major disciplines in BT that should see growth in 2005:

Look alike targeting and Life stage targeting

Go check out his article for the full details...

Bloglines vs NewsGator

Thread Title: NewsGator rakes in VC money, Bloglines stays cool Thread Url: http://www.siliconbeat.com/entries/2004/12/02/newsgator_rakes_in_vc_money_bloglines_stays_cool.html Thread Description:

I practically live on a diet of Bloglines these days and they still havn't accpeted any VC funding despite many offers.

Newsgator on the other hand, a desktop app rather than web based have just taken their 2nd round of VC $$$'s

The news story linked is a poor excuse for me to open this question up: Have you gone RSS yet?

Personally speaking, it has to be an absolutely stunning site for me to bother with it if it doesnt have a feed these days...

Lycos Shafted by Backbone Providers

Thread Title: Lycos Screensaver Site Blocked by Internet Backbones Thread Url: http://news.netcraft.com/archives/2004/12/02/lycos_screensaver_site_blocked_by_internet_backbones.html Thread Description:

News just in: Lycos have been told (figuratively speaking) to f**k right off by internet backbone providers:

Some major internet backbones are preventing access to the new Lycos "anti-spam" screensaver web site at www.MakeLoveNotSpam.com. This controversial site provides Internet users with the ability to participate in distributed attacks against web sites used by spammers, leaving the spammers with slow connections and high bandwidth costs.

We have been talking about Lycos's DDoS attack scheme and how later it was hacked recently and i must say that im glad to see it failing - what a stupid little stunt...

Viewpoint to Aquire Unicast

Thread Title: Viewpoint to Aquire Unicast Thread Url: http://hive.jup.com/analysts/elliott/archives/005223.html Thread Description:

News just in - Viewpoint is to aquire Unicast - a serious development in the ad biz.

Here's the press release

Marking the first true consolidation in the Rich Media space, Viewpoint Corporation is pleased to announce that it has reached an agreement to acquire Unicast Communications Corporation. The acquisition creates the first company that offers advertisers, agencies and Web publishers every major form of online advertising - from video to Macromedia Flash™, streaming to pre-cached, full-screen to in-page, interactive 3D to high-resolution 2D - all with full creative and campaign management and next-generation tracking and reporting.

Viewpoint currently serve MSN, Yahoo and AOL - there is a pdf to download here