Open Synapse

Hi, I'm Abhay Kumar. I like to throw up interesting things I encounter here. You can also see the result of my mustache pact with my coworkers, check out what I'm up to or see the few photos I take. I like to receive email and instant messages (AIM / Jabber), as well. Some links of interest: Calais Text Tagger, Powerset

Feb 03
Permalink

Entity Extraction via the Calais Web Service

A few weeks ago, Barney mentioned that Clearforest, now owned by Reuters, had opened up an API to their Calais Web Service. After a few weeks of working through bugs on their end, I was able to publish a Ruby wrapper for their POST-based API. That was the easy part as my eventual goal was to have something functionally useful based on this library. Last night, I published the Autotagger, a really simple Merb application using my Calais library. Check it out. I’ve gotten some good results by pasting in text from Wikipedia or from the New York Times

Why is entity extraction useful? One of the hardest tasks for someone working with the Semantic Web is that not all data on the web is tagged. It’s difficult to generate and leverage relationships between types of data when there’s no way to make that content portable. The Calais Web Service tries to attack this problem by pulling names and terms from a body of text. It also has the ability to provide relationships between terms. (I haven’t, yet, exposed this in my Autotagger but the data is present in the Ruby library.)

Comments (View)
blog comments powered by Disqus

Change CongressVote Obama