You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Jorge Luis Betancourt Gonzalez <jl...@uci.cu> on 2012/09/17 16:52:44 UTC

Heuritics methods for image annotation

Hi all:

I'm working on an image search engine, using the combination of nutch and solr. With nutch and tika I get some metadata from the images extracted, so far so good. But I'm trying to improve the accuracy of the results using the surrounding text of the images. 

I know that there are several papers published around this subject, using several techniques and algorithms. Basically I'm trying to use some heuristics methods that don't require a lot of processing. In https://webarchive.jira.com/wiki/display/SOC06/Image+annotation+with+surrounding+text I've found a few heuristics methods, which I'm implementing in a custom nutch plugin:

the upper, or below <tr> node's text, and the <tr> node's text in which the image appears,
the text in the paragraph in which the image appears,
the textual content of the headings preceding the image,

But I think this is not enough, anyone can provide some advise or new heuristic methods to this quest?

Thanks in advance,

Greetings!

PS: Sorry for my english but it's not my native language :-S

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci