You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2011/08/09 17:54:28 UTC

[jira] [Commented] (NUTCH-296) Image Search

    [ https://issues.apache.org/jira/browse/NUTCH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081714#comment-13081714 ] 

Lewis John McGibbney commented on NUTCH-296:
--------------------------------------------

The parsing and extraction of metadata from images is handled by Apache Tika. If we were still working with a web app it would have been possible to get a plugin which combined metadata extraction with indexable thumbnail image snippets which would be available when searching, however this is not the case as search and indexing has been shifted to Solr.

What is the status with this issue? Personally I am tempted to suggested we close it, reasoning being that it has not been given any attention in years, it reflects a requirement from an old generation of Nutch functionality, all image related processing is covered by parse-tika and finally there are far far more important issues to be dealt with. 

One last thing, there has been no code contribution from the 2008 GSoC therefore I'm guessing it was never pursued.

  

> Image Search
> ------------
>
>                 Key: NUTCH-296
>                 URL: https://issues.apache.org/jira/browse/NUTCH-296
>             Project: Nutch
>          Issue Type: New Feature
>            Reporter: Thomas Delnoij
>            Priority: Minor
>
> Per the discussion in the Nutch-User mailing list, there is a wish for an "Image Search" add-on component that will index images.
> Must have:
> - retrieve outlinks to image files from fetched pages
> - generate thumbnails from images
> - thumbnails are stored in the segments as ImageWritable that contains the compressed binary data and some meta data 
> Should have:
> - implemented as hadoop map reduce job
> - should be seperate from main Nutch codeline as it breaks general Nutch logic of one url == one index document.
> Could  have:
> - store the original image in the segments
> Would like to have:
> - search interface for image index
> - parameterizable thumbnail generation (width, height, quality)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira