You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Raghavendra Prabhu <rr...@gmail.com> on 2006/02/13 11:43:45 UTC

file parser

Hi

How do we go about the process of adding more file types and parsers to
nutch?

How do we arrive at a new file parser so that we can contribute it to nutch
.

What about parsing through even image files and retrieving data?

Rgds
Prabhu

Re: file parser

Posted by Stefan Groschupf <sg...@media-style.com>.
You can easily add new file formats by writing new content type  
parser plugins.
Just browse the code of one of the existing parsers like pdf or the  
new swt parser to get an idea what you need to do.
In the end you only need to write a parser for the content and return  
some values. ...  and write a plugin.xml :)
Good luck.
Stefan
Am 13.02.2006 um 11:43 schrieb Raghavendra Prabhu:

> Hi
>
> How do we go about the process of adding more file types and  
> parsers to
> nutch?
>
> How do we arrive at a new file parser so that we can contribute it  
> to nutch
> .
>
> What about parsing through even image files and retrieving data?
>
> Rgds
> Prabhu

---------------------------------------------
George Orwel was an Optimist
blog: http://www.find23.org
company: http://www.media-style.com