You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by MilleBii <mi...@gmail.com> on 2009/12/10 23:06:37 UTC
Filtering ParseSegment
I'm thinking of develop a special ParseSegment that will filter content out
in the following way:
My scoring-plugin determines which page content to keep or drop.
So I intend to store via metadata in the scoring-plugin
with parse.getData().getContentMeta().set ("KEY_KEEP", true/false);
and in ParseSegment.map
instead of line 119 output.collect(url, new ParseImpl(new
ParseText(parse.getText()),
parse.getData(),
parse.isCanonical()));
I plan to make a conditional ParseText(null) when
parse.getData().getContentMeta().get ("KEY_KEEP")==false
Before I start doing/testing/verifying, I'd like to check if I'm missing
something and I understand correctly the mechanics
--
-MilleBii-