You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Tim Fletcher <zi...@gmail.com> on 2011/10/21 14:56:34 UTC

Setting up a development environment for writing a custom Indexer

Hi All,

First post on the list, and a relative Nutch newbie, so please be gentil.

I am writting a program that needs to analise the content of web pages
retrieved by Nutch. For this i was thinking of writing a custom Indexer.

Below is the sort of setup i would like to get to as this would allow me to
place break points in my IndexFilter plugin, but i can't figure out if i can
write and run my plugin code without compiling it into the plugin directory
and including the XML.
How do plugin writers setup a dev environment? I have found how to create
plugins on the wiki, but it doesn't really give any details on how to go
about setting up the dev environment.

Many thanks,
Tim

public static void main(String[] args) {
 NutchJob job = new NutchJob(new Configuration());
//job.set("plugin.folders", "plugins-1.4-snapshot");
//System.out.println(job.get("plugin.includes"));
//job.set("plugin.includes", "bla");
IndexerMapReduce imr = new IndexerMapReduce();
 imr.configure(job);
 Path crawldb = new Path("output/crawldb");
Path linkdb = new Path("output/linkdb");
Path segment = new Path("output/segments/20111020102744");
ArrayList<Path> segments = new ArrayList<Path>();
segments.add(segment);
imr.initMRJob(crawldb, linkdb, segments, job);
 }

Re: Setting up a development environment for writing a custom Indexer

Posted by hui wangh <st...@gmail.com>.
http://wiki.apache.org/nutch/RunNutchInEclipse
this can help you?

2011/10/21 Tim Fletcher <zi...@gmail.com>

> Hi All,
>
> First post on the list, and a relative Nutch newbie, so please be gentil.
>
> I am writting a program that needs to analise the content of web pages
> retrieved by Nutch. For this i was thinking of writing a custom Indexer.
>
> Below is the sort of setup i would like to get to as this would allow me to
> place break points in my IndexFilter plugin, but i can't figure out if i
> can
> write and run my plugin code without compiling it into the plugin directory
> and including the XML.
> How do plugin writers setup a dev environment? I have found how to create
> plugins on the wiki, but it doesn't really give any details on how to go
> about setting up the dev environment.
>
> Many thanks,
> Tim
>
> public static void main(String[] args) {
>  NutchJob job = new NutchJob(new Configuration());
> //job.set("plugin.folders", "plugins-1.4-snapshot");
> //System.out.println(job.get("plugin.includes"));
> //job.set("plugin.includes", "bla");
> IndexerMapReduce imr = new IndexerMapReduce();
>  imr.configure(job);
>  Path crawldb = new Path("output/crawldb");
> Path linkdb = new Path("output/linkdb");
> Path segment = new Path("output/segments/20111020102744");
> ArrayList<Path> segments = new ArrayList<Path>();
> segments.add(segment);
> imr.initMRJob(crawldb, linkdb, segments, job);
>  }
>