You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Saurabh Suman <sa...@rediff.com> on 2009/07/24 16:21:33 UTC

IO exception while adding field in Parsedata parsemeta.

Hi
I am usinh Nutch-1.0. I want to add field in parseData parseMeta.  
In org.apache.nutch.parse.html.HtmlParser two fields are already added in
original code.
			metadata.set(Metadata.ORIGINAL_CHAR_ENCODING, encoding);
			metadata.set(Metadata.CHAR_ENCODING_FOR_CONVERSION, encoding);
i added third field 
                      metadata.set(Metadata.AGE, "23");

in org.apache.nutch.indexer.IndexerMapReduce in public void reduce(Text key,
Iterator<NutchWritable> values,
                     OutputCollector<Text, NutchDocument> output, Reporter
reporter)
    throws IOException method 
two fields are being added  in NutchDocument.

   NutchDocument doc = new NutchDocument();
    final Metadata metadata = parseData.getContentMeta();
 
    // add segment, used to map from merged index back to segment files
    doc.add("segment", metadata.get(Nutch.SEGMENT_NAME_KEY));

    // add digest, used by dedup
    doc.add("digest", metadata.get(Nutch.SIGNATURE_KEY));
    

i added third field what i have set in HtmlParser like this.
  doc.add("age", parseData.getParseMeta().get("age"));

  By doing so , at indexing level i am getting exception as follow-

LinkDb: adding segment:
file:/home/ithurs/nutch-1.0/crawl/segments/20090724193527
LinkDb: done
Indexer: starting
   Exception in thread "main" java.io.IOException: Job failed!
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
	at org.apache.nutch.indexer.Indexer.index(Indexer.java:72)
	at org.apache.nutch.crawl.Crawl.main(Crawl.java:152)


please tell me 
(i)How to remove this exception?
(ii)how can i add new field in ParseData parseMeta?
-- 
View this message in context: http://www.nabble.com/IO-exception-while-adding-field-in-Parsedata-parsemeta.-tp24645429p24645429.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: IO exception while adding field in Parsedata parsemeta.

Posted by Doğacan Güney <do...@gmail.com>.
On Fri, Jul 24, 2009 at 17:21, Saurabh Suman<sa...@rediff.com> wrote:
>
> Hi
> I am usinh Nutch-1.0. I want to add field in parseData parseMeta.
> In org.apache.nutch.parse.html.HtmlParser two fields are already added in
> original code.
>                        metadata.set(Metadata.ORIGINAL_CHAR_ENCODING, encoding);
>                        metadata.set(Metadata.CHAR_ENCODING_FOR_CONVERSION, encoding);
> i added third field
>                      metadata.set(Metadata.AGE, "23");
>
> in org.apache.nutch.indexer.IndexerMapReduce in public void reduce(Text key,
> Iterator<NutchWritable> values,
>                     OutputCollector<Text, NutchDocument> output, Reporter
> reporter)
>    throws IOException method
> two fields are being added  in NutchDocument.
>
>   NutchDocument doc = new NutchDocument();
>    final Metadata metadata = parseData.getContentMeta();
>
>    // add segment, used to map from merged index back to segment files
>    doc.add("segment", metadata.get(Nutch.SEGMENT_NAME_KEY));
>
>    // add digest, used by dedup
>    doc.add("digest", metadata.get(Nutch.SIGNATURE_KEY));
>
>
> i added third field what i have set in HtmlParser like this.
>  doc.add("age", parseData.getParseMeta().get("age"));
>
>  By doing so , at indexing level i am getting exception as follow-
>
> LinkDb: adding segment:
> file:/home/ithurs/nutch-1.0/crawl/segments/20090724193527
> LinkDb: done
> Indexer: starting
>   Exception in thread "main" java.io.IOException: Job failed!
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>        at org.apache.nutch.indexer.Indexer.index(Indexer.java:72)
>        at org.apache.nutch.crawl.Crawl.main(Crawl.java:152)
>
>
> please tell me
> (i)How to remove this exception?
> (ii)how can i add new field in ParseData parseMeta?

You are probably adding your field to parseMeta so trying to get it
from contentMeta fails.

Just do a parseData.getParseMeta in indexer and it may work.

> --
> View this message in context: http://www.nabble.com/IO-exception-while-adding-field-in-Parsedata-parsemeta.-tp24645429p24645429.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>



-- 
Doğacan Güney