You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Zennet Colburn <ze...@gmail.com> on 2005/03/29 08:15:12 UTC

plugins installation

Hello Nutch Developers,

How are plugins installed into Nutch?

The URLFilter plugin is already working from previous development but
my changes to the code don't take effect.

Here are the steps I've taken:
1. Modified the existing implementation of URLFilter interface 
2. Built the project with ant
3. Copied build/plugin/* to NUTCH_HOME/plugins
4. Ran the generate-fetch-index cycle

I modified filter() to write some debug statements to a file and
return null for every url (for debugging purposes). I know my code was
not executed because no urls should have been indexed and there were
debug statements in the file. I suspect that step 3 is what I am doing
incorrectly or there is some other file I need to modify.

I appreciate any help.

Thanks,
zennet

Re: plugins installation

Posted by Zennet Colburn <ze...@gmail.com>.
Thanks Chris, 

I followed your suggestion and looked at the crawl log and
$NUTCH_HOME/conf/nutch-default.xml.
I found that the myProject plugin was not being included. 

fetch.log:
050329 025333 loading file:/usr/local/nutch-0.6/conf/nutch-default.xml
050329 025333 loading file:/usr/local/nutch-0.6/conf/nutch-site.xml
050329 025333 No NutchFileSystem indicated, so defaulting to local fs.
050329 025334 Plugins: looking in: /usr/local/nutch-0.6/build/plugins
050329 025334 parsing: /usr/local/nutch-0.6/build/plugins/parse-html/plugin.xml
050329 025334 parsing: /usr/local/nutch-0.6/build/plugins/query-site/plugin.xml
050329 025334 parsing: /usr/local/nutch-0.6/build/plugins/parse-text/plugin.xml
050329 025334 not including: /usr/local/nutch-0.6/build/plugins/myProject
050329 025334 not including: /usr/local/nutch-0.6/build/plugins/parse-msword
050329 025334 not including: /usr/local/nutch-0.6/build/plugins/ontology
050329 025334 not including: /usr/local/nutch-0.6/build/plugins/parse-mp3
050329 025334 parsing: /usr/local/nutch-0.6/build/plugins/query-url/plugin.xml
050329 025334 not including: /usr/local/nutch-0.6/build/plugins/protocol-ftp
050329 025334 not including:
/usr/local/nutch-0.6/build/plugins/clustering-carrot2
050329 025334 not including: /usr/local/nutch-0.6/build/plugins/parse-pdf
050329 025334 not including:
/usr/local/nutch-0.6/build/plugins/language-identifier

So I modified conf/nutch-site.xml and then it worked. 

<property>
 <name>plugin.includes</name>
<value>myProject|protocol-http|parse-(text|html)|index-basic|query-(basic|site|url)</value>
</property>

Thanks for your help. 


On Mon, 28 Mar 2005 22:23:11 -0800, Chris Mattmann
<ch...@jpl.nasa.gov> wrote:
> Hi Zennet,
> 
> >
> > The URLFilter plugin is already working from previous development but
> > my changes to the code don't take effect.
> >
> > Here are the steps I've taken:
> > 1. Modified the existing implementation of URLFilter interface
> 
> Okay.
> 
> > 2. Built the project with ant
> 
> Good.
> 
> > 3. Copied build/plugin/* to NUTCH_HOME/plugins
> 
> You don't need to do this if you're running the crawl tool. The crawl tool
> will by default load plugins out of $NUTCH_HOME/build/plugins
> 
> > 4. Ran the generate-fetch-index cycle
> 
> Okay
> >
> > I modified filter() to write some debug statements to a file and
> > return null for every url (for debugging purposes). I know my code was
> > not executed because no urls should have been indexed and there were
> > debug statements in the file. I suspect that step 3 is what I am doing
> > incorrectly or there is some other file I need to modify.
> 
> Did you enable the plugin in the nutch-default.xml file within the conf
> directory? Make sure that you enable the plugin there. Can you post a txt
> capture of your crawl log?
> 
> Thanks,
>  Chris
> 
> 
> >
> > I appreciate any help.
> >
> > Thanks,
> > zennet
> 
>

RE: plugins installation

Posted by Chris Mattmann <ch...@jpl.nasa.gov>.
Hi Zennet,

> 
> The URLFilter plugin is already working from previous development but
> my changes to the code don't take effect.
> 
> Here are the steps I've taken:
> 1. Modified the existing implementation of URLFilter interface

Okay.

> 2. Built the project with ant

Good.

> 3. Copied build/plugin/* to NUTCH_HOME/plugins

You don't need to do this if you're running the crawl tool. The crawl tool
will by default load plugins out of $NUTCH_HOME/build/plugins

> 4. Ran the generate-fetch-index cycle

Okay
> 
> I modified filter() to write some debug statements to a file and
> return null for every url (for debugging purposes). I know my code was
> not executed because no urls should have been indexed and there were
> debug statements in the file. I suspect that step 3 is what I am doing
> incorrectly or there is some other file I need to modify.

Did you enable the plugin in the nutch-default.xml file within the conf
directory? Make sure that you enable the plugin there. Can you post a txt
capture of your crawl log?

Thanks,
 Chris


> 
> I appreciate any help.
> 
> Thanks,
> zennet