The Center for Intelligent Information Retrieval department of the University of Massachusetts Amherst have created a manuscript retrieval system capable of scanning and understanding hand written documents.
Imagine the potential of that...
On scanning/searching George Washingtons Personal Diaries
The scanned pages of Washington’s papers can be searched by typing in a word such as “Washington” or “Virginia,” and the program produces a list of ranked pages showing where they appear.
Manmatha says, “Right now, searching a scanned handwritten document is very hard to do. Scanned historical documents are basically images, or pictures, and currently can only be searched if someone manually transcribes the documents or creates and index of their contents. This is time consuming and expensive to do. Given the cost, most handwritten documents are never transcribed or indexed,” Manmatha says. “But there is an enormous amount of handwritten, historical material.
According to Toni Rath, “The basic idea is analogous to searching text documents in one language, say French, using queries in another language, say English. This is usually done by learning models from documents written in both languages. By analogy, our system learns from a parallel body of transcribed scanned images. That is, the word images form a ‘visual language’ and the transcriptions are in English.” Once the model is learned it may be used for searching scanned pages for which no transcriptions are available.
Method for reading electronic mail in plain text
You just have to ask what kind of mad, grade A hallucinogenic drugs these people are on...
Systems and methods for converting text of an electronic mail message in a non-plain text format to text in a plain text format are disclosed.
eBay and Craig's List Will Merge in 2005 to Create a P2P Media Giant
Steve says p2p and citizen journalism is where it's at for 2005 and with several new companies waiting to emerge like backfence.com it may well be the new gold rush.
eBay and Craig's List are already the leaders in facilitating person-to-person commerce. They have also been steadily growing closer together - in August eBay acquired a 25% stake in Craig's List. In 2005 they will take this to the next level when eBay acquires the rest of Craig's List it doesn't own and then enables customers to blog right on their unified site. This will usher in a new era where citizen journalism is directly funded by person-to-person commerce.
I tend to think community as a whole is going to be massive in the next couple of years, mostly becuase i'd really really like it to be :) but partly becuase of all the rumbling you hear if you keep your ear close to the ground. He ends with this, and i think it's noteworthy for threadwatcher's looking to cash in on community:
We have been trained to categorize Internet companies into little discrete buckets. Yahoo is a portal. Google is a search site. eBay is an auction site. Amazon is an online retailer. That's all well and good, but I bet the the brilliant executives who run these innovative firms, however, are taking a much larger view of where the online medium is headed and they're watching blogs create trusted communities that can spur future revenues. You should too.
The David Beckham Effect Spotted in the Wild
"Interesting how one contextual advertising program can make such an impact. And no, I am not going to admit how many domain names I have registered because I thought it would be perfect for a content site with AdSense :)"
So we have adsense impacting the domain market, not too much of a leap of faith to connect it to sandbox too, imho.
Forbes Ditches Embedded Text Ads After Complaints From Editors
Forbes have dropped the embedded text ads from vibrant media - The IntelliTXT ads work much the same as adsense but instead of clearly marked ads the contextual ad links appear in the body of a web pages main content.
This will be an enormous blow to Vibrant, Forbes were their largest and highest profile client.
Apparently the Forbes editorial staff have been complaining about the practice of mixing ads in editorial, cant say i blame them..
An Economists View on Click Fraud
Jupiter analyst Niki Scevak gives an economists view on click fraud in the post threadlinked above.
In light of what Google CFO George Reyes said about click fraud threatening the G biz model Niki's thoughts on the subject make for a good read:
Firstly, click fraud is a bad thing that should be policed and eliminated by the engines and they have no excuse now that they have $50bn market valuations to hire scores of click fraud cops to eliminate it. But it will have zero impact on Google's revenue, or any other search company, and zero impact on the growth of that revenue.
Here's why. Click fraud is already priced into the cost per click. Marketers bid based upon how well the leads that Google and others send them convert into, in most cases, direct sales. That means that if one person out of every hundred buy, and they make $100 per sale then they will spend up to $1 per click. Now out of that 100 clicks, the fact that 50 (gross exageration used for effect!) of them are click fraud is irrelevant. If Google eliminates click fraud then that means that one person out of fifty will now buy, and so the marketer will be willing to pay up to $2 per click now.
The volume will decrease but the cost per click will rise to balance this.
He goes on to say that Reyes would be better off doing his accounting than spouting off about click fraud (paraphrased heh..).
So, is George Reyes just spouting off about stuff he doesn't understand? Probably not eh? If that's the case, why is he making these statements?
Video Newsletter Gets High Viral Pass-Along & Unusually Strong Clicks
FC has an article about a firm specializing in video news letters in conjunction with thier clients traditional email news letters.
They report a 24-35% CTR from the email sent out that contains the link to the 3min video and FC reports that the fact that the "reading population" accounts for less than 10% of us makes this a more than viable option to cover a wider audience and get your message out there.
Drenik hired a local video production crew and had their trained scriptwriters turn the long newsletters into video scripts for an average three-minute video. He tells the scriptwriters which are the hottest stories so they know what to focus on and what to cut.
The final approved script includes camera angles and videographer direction in addition to the words to be read. (See sample below.)
The team selects and hires a local TV anchor or TV reporter to moonlight on the side as the official video newsletter presenter. They try to match personality to the brand personality of the company that will be sending out the newsletter. They also look for stability -- is this an on-air personality who'll be staying in the area for a while so they can be counted on for the long haul?
Then final edited video is transferred into a format which can be streamed from a Web site. Drenik insisted on a format that did not require the use of a player, because he knew it might be a hump some newsletter recipients are unwilling to pass over. Getting the information had to be as easy as turning on your TV, with no possible tech challenges.
It's good stuff, check it out at the threadlink above...
The threadlinked article above is essentially, as Techdirt point out talking about clustering - like clusty the clown the baggy trousered pie thrower of search.
Crystal Semantics has developed the 'Sense Engine' in order to produce relevant search results by utilising the senses of words, rather than statistical algorithms used by other search technology. Because any word in the English language can be part of a search enquiry, each word is analysed to determine its potential to discriminate which context the search should cover. The 'Sense Engine' identifies all the likely search words, advises the user of the different contexts the search should cover, and categorises the results encyclopedically providing users with results relevant to their request.
The 'Sense Engine' is the result of a six-year search linguistics development programme undertaken by Professor David Crystal, a world authority on linguistics, encyclopedia editor and published author for Cambridge University Press and Penguin Books. £4 million has been invested in lexicographical and encyclopedic research, giving the 'Sense Engine' a classification system of around 2,000 categories derived from an encyclopedia component of over five million words.
This all begs the question, will clustering take hold...?
Msn, Yahoo/inktomi/overture Trusted Feed, And what happens to Organic Crawl data
This is an interesting thread, as it shows that even in the minds of some of the more experienced practitioners such as Jill Whalen and ProjectPHP their still exists a degree of uncertainy and cloudiness when it comes to this PFI program. The main question is whether or not you reappear once your budget has expired, based upon your original 'natural' crawl position. Lots of 'possibly's', and 'should's' from David at Trellian, along with a few helpful suggestions.
Sitematch was launched back in May sometime. At the time I read various threads at WMW from confused webmasters grappling to get to grips with whether it was a good or a bad thing.
If you submitted to sitematch, what would be the position once your budget was exhausted.
Would sitematch be the kiss of death for an affiliate content website.
What about a site that had an INK penalty, would they be considered under this scheme, would they be included whilst their budget was active this wmw thread threw up all sorts of issues.
I haven't really looked at Sitematch for a while, I dont know if its changed, improved or gotten worse even. At this moment in time, natural crawls (for me at least) seem to cut the mustard, I don't see a need or requirement for it and I don't entirely trust it either. Can anyone point to a definitive position? Is sitematch dead in the water, or has it undergone some mysterious not very well publicised rebirth?
Nice post by Jupiter boss Alan Meckler on, yep, you guessed it - vertical markets..
The fact is that this trend has been going on for years. But it is only in the previous few months that the business press is realizing that vertical "is in." Just as Danny Sullivan is predicting that vertical Search Engines might well be the next wave of Search, vertical shopping is already the next wave of etailing. We are even seeing this in the verticality of auctions sites, event ticket sites, travel sites and and a host of other fields.
Verticality is what has made our JupiterWeb sites more significant than the tired horizontal tech trade print magazines. Going further, our searchenginewatch.com owns the editorial side of the Search ad industry -- an honor that 5 years ago would have been part of the industry weekly Advertising Age
windy city december search conference [WMW Subscribers Only]
Forgive me for posting a link to a "Supporters Only" thread, but I felt it was something worth bringing up.
When I decided to leave WebmasterWorld, I made the decision to go as quitely as possible, but this thread is making that decision a bit hard to stick to.
Sometime yesterday, long time WmW member (and conference speaker) Chicago posted a thread intitled
SES Chicago - who's going?
(Very similar to this thread: http://forums.searchenginewatch.com/showthread.php?t=2744)
At last year's Chicago SES, Chicago hosted a nice private party for WmW members. His reason for posting the thread in the Supporter's section was to try and get an idea of how many members would be in town, so he could plan another party.
Someting today, a Supporters forum Admin changed the title of the thread to read
windy city search conference (without sending a sticky to Chicago)
When I first saw it, I thought it was a new thread announcing PubCon 7.333, but it turned out to just be the SES thread with a new title. Apparently, it isn't appropriate to promote Danny's show in that way. Letting that title stand might result in some WmW members deciding to attend.
I can't even begin to explain how dissapointed I am. Taking such a stance is probably the most disrespectful, rude and childish thing I've ever seen BT do.
New Applications for Behavioral Targeting
There have been a couple of good posts about behavioral targeting today, i've linked to Jupiter analyst Gary Steins one. The other is this clickz piece by Dave Morgan.
Behavioral Targeting is in a pretty interesting space right now--it has becoming a category in its own right. That means:
More than a few vendors have developed sophisticated systems
Publishers of note are integrated the systems
The challenges--such as audience standardization--have been identified and solutions are making their way into the marketplace
Now that the systems are in place, its time for them to be used. That is, time for advertisers to begin to not only understand how the systems work, but also what they can be used for.
Dave points out that the the figure for marketers using behavioral targeting in 2004 is around 16% which seems high to me but he also says, on the subject of: Has behavioral targeting arrived yet?
My answer and those of most people I talk to are the same: No. Today's behavioral targeting applications are still a long way from fulfilling that elusive promise we all signed up for in helping build the marketers' "perfect medium."
and goes on to point out the major disciplines in BT that should see growth in 2005:
Lycos Screensaver Site Blocked by Internet Backbones
News just in: Lycos have been told (figuratively speaking) to f**k right off by internet backbone providers:
Some major internet backbones are preventing access to the new Lycos "anti-spam" screensaver web site at www.MakeLoveNotSpam.com. This controversial site provides Internet users with the ability to participate in distributed attacks against web sites used by spammers, leaving the spammers with slow connections and high bandwidth costs.
We have been talking about Lycos's DDoS attack scheme and how later it was hacked recently and i must say that im glad to see it failing - what a stupid little stunt...
News just in - Viewpoint is to aquire Unicast - a serious development in the ad biz.
Here's the press release
Marking the first true consolidation in the Rich Media space, Viewpoint Corporation is pleased to announce that it has reached an agreement to acquire Unicast Communications Corporation. The acquisition creates the first company that offers advertisers, agencies and Web publishers every major form of online advertising - from video to Macromedia Flash™, streaming to pre-cached, full-screen to in-page, interactive 3D to high-resolution 2D - all with full creative and campaign management and next-generation tracking and reporting.
Viewpoint currently serve MSN, Yahoo and AOL - there is a pdf to download here