You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by MilleBii <mi...@gmail.com> on 2009/12/10 23:06:37 UTC

Filtering ParseSegment

I'm thinking of develop a special ParseSegment that will filter content out
in the following way:

My scoring-plugin determines which page content to keep or drop.

So I intend to store via metadata in the scoring-plugin
 with parse.getData().getContentMeta().set ("KEY_KEEP", true/false);

and in ParseSegment.map
 instead of line 119       output.collect(url, new ParseImpl(new
ParseText(parse.getText()),
                                        parse.getData(),
parse.isCanonical()));

I plan to make a conditional  ParseText(null) when
parse.getData().getContentMeta().get ("KEY_KEEP")==false

Before I start doing/testing/verifying, I'd like to check if I'm missing
something and I understand correctly the mechanics


-- 
-MilleBii-