You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by UDd <de...@gmail.com> on 2010/02/15 19:40:44 UTC

Trying to Add an new NutchDoc from plugin

Hi there,
Im new to the forum and nutch as well...
I wrote a plugin to nutch that implements the IndexingFilter...
Now i want to add a new Document to the index from the plugin (split the
current doc)
I tryed testing it from something like this

NutchIndexWriter[] Writers =
NutchIndexWriterFactory.getNutchIndexWriters(getConf());
Writers[0].write(doc);

the doc is the doc i get in the method not something new i created.....(just
for testing)

And i get the error "it doesn't make sense to have a field that is neither
indexed nor stored"

Any suggestions?
-- 
View this message in context: http://old.nabble.com/Trying-to-Add-an-new-NutchDoc-from-plugin-tp27598076p27598076.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.


Re: Trying to Add an new NutchDoc from plugin

Posted by UDd <de...@gmail.com>.
Thx for the quick response,
Well i wrote a very simple plugin that tryes to the the same "doc" twice and
if there is and error
then put it in the orniginal doc custom field:

  public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
      CrawlDatum datum, Inlinks inlinks) throws IndexingException {
      
	  // filter out if url contains "archive", "label" or "feeds"
	  LOGGER.debug("Found Url: " + new String(url.getBytes()));  	  	  	  
	  NutchIndexWriter[] Writers =
NutchIndexWriterFactory.getNutchIndexWriters(getConf());
	  //doc.add("js", String.valueOf(Writers.length));
	  try {
		Writers[0].write(doc);
	  } catch (Exception e) {
		// TODO Auto-generated catch block
		  LOGGER.debug("Error adding Doc " + e.getMessage()); 
		  doc.add("js", e.getMessage());
	  }
	  doc.add("js", "AfterTest");	  
	  //return doc;
	  return doc;
  }

and after the nutch run i just look at the index with lukeall-1.0.0 ,
I added the compiled plugin jar if you can try to debug it... or
if you can tell me how to debug it will be great (I have the nutch working
from ecplise).




http://old.nabble.com/file/p27598879/myplugins.rar myplugins.rar 
-- 
View this message in context: http://old.nabble.com/Trying-to-Add-an-new-NutchDoc-from-plugin-tp27598076p27598879.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.


Re: Trying to Add an new NutchDoc from plugin

Posted by Sahil Shah <sa...@gmail.com>.
Maybe I can try.......debugging an Indexing plugin is kinda tricky.
can you attach the req files and folders and tell me exactly what procedure
to follow?
Also any settings to be modified....



On Tue, Feb 16, 2010 at 12:10 AM, UDd <de...@gmail.com> wrote:

>
> Hi there,
> Im new to the forum and nutch as well...
> I wrote a plugin to nutch that implements the IndexingFilter...
> Now i want to add a new Document to the index from the plugin (split the
> current doc)
> I tryed testing it from something like this
>
> NutchIndexWriter[] Writers =
> NutchIndexWriterFactory.getNutchIndexWriters(getConf());
> Writers[0].write(doc);
>
> the doc is the doc i get in the method not something new i
> created.....(just
> for testing)
>
> And i get the error "it doesn't make sense to have a field that is neither
> indexed nor stored"
>
> Any suggestions?
> --
> View this message in context:
> http://old.nabble.com/Trying-to-Add-an-new-NutchDoc-from-plugin-tp27598076p27598076.html
> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>
>