You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Bipin Parmar <bi...@yahoo.com> on 2006/08/09 21:47:23 UTC

HTMLParseFilter is not called by ParseSegment (nutch parse command)

Hi,

I have written a plugin implementing the
org.apache.nutch.parse.HtmlParseFilter extension
point. When I execute "fetch", it gets appropriately
called. 

When I execute "fetch -noParsing", it does not get
called. I think this is how it is supposed to work.

However when I execute "parse", I thought my
HtmlParseFilter implementing plugin will be called.
However it is not. The parse of the segment is
executed successfully. 

Shouldn't "parse" call HTMLParseFilter implementing
plugins?

I have the same nutch-default.xml for both fetch as
well as parse commands. I tried changing
parse-plugins.xml by adding my plugin to "text/html"
content type but it did not help.

Please help!

Thank you,

Bipin
I am using nutch-nightly build date 08/07/2006.

Re: HTMLParseFilter is not called by ParseSegment (nutch parse command)

Posted by Bipin Parmar <bi...@yahoo.com>.
Hi,

Please ignore my earlier question regarding the parse
command / HTMLParseFilter plugin. It was my mistake.
The HTMLParseFilter implementing plugins are called
during parse.

Thank you,

Bipin

--- Bipin Parmar <bi...@yahoo.com> wrote:

> Hi,
> 
> I have written a plugin implementing the
> org.apache.nutch.parse.HtmlParseFilter extension
> point. When I execute "fetch", it gets appropriately
> called. 
> 
> When I execute "fetch -noParsing", it does not get
> called. I think this is how it is supposed to work.
> 
> However when I execute "parse", I thought my
> HtmlParseFilter implementing plugin will be called.
> However it is not. The parse of the segment is
> executed successfully. 
> 
> Shouldn't "parse" call HTMLParseFilter implementing
> plugins?
> 
> I have the same nutch-default.xml for both fetch as
> well as parse commands. I tried changing
> parse-plugins.xml by adding my plugin to "text/html"
> content type but it did not help.
> 
> Please help!
> 
> Thank you,
> 
> Bipin
> I am using nutch-nightly build date 08/07/2006.
>