You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Scott Gonyea <sc...@aitrus.org> on 2010/07/15 03:55:19 UTC
Re: [jira] Updated: (NUTCH-855) ScoringFilter and IndexingFilter: To
allow for the propagation of URL Metatags and their subsequent indexing.
Sorry about the spam, everyone. I hope my patch didn't suck too much :).
On Wed, Jul 14, 2010 at 6:53 PM, Scott Gonyea (JIRA) <ji...@apache.org>wrote:
>
> [
> https://issues.apache.org/jira/browse/NUTCH-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Scott Gonyea updated NUTCH-855:
> -------------------------------
>
> Attachment: nutch-855.txt
>
> > ScoringFilter and IndexingFilter: To allow for the propagation of URL
> Metatags and their subsequent indexing.
> >
> -------------------------------------------------------------------------------------------------------------
> >
> > Key: NUTCH-855
> > URL: https://issues.apache.org/jira/browse/NUTCH-855
> > Project: Nutch
> > Issue Type: New Feature
> > Components: generator, indexer
> > Affects Versions: 1.1
> > Reporter: Scott Gonyea
> > Fix For: 1.2
> >
> > Attachments: nutch-855.txt
> >
> > Original Estimate: 168h
> > Remaining Estimate: 168h
> >
> > This plugin is designed to enhance the NUTCH-655 patch, by doing two
> things:
> > 1. Meta Tags that are supplied with your Crawl URLs, during injection,
> will be propagated throughout the outlinks of those Crawl URLs.
> > 2. When you index your URLs, the meta tags that you specified with your
> URLs will be indexed alongside those URLs--and can be directly queried,
> assuming you have done everything else correctly.
> > The flat-file of URLs you are injecting should, per NUTCH-655, be
> tab-delimited in the form of:
> > [www.url.com]\t[key1]=[value1]\t[key2]=[value2]...[keyN]=[valueN]
> > or:
> > http://slashdot.org/ corp_owner=Geeknet will_it_blend=indubitably
> > http://engadget.com/ corp_owner=Weblogs genre=geeksquad_thriller
> > To activate this plugin, you must modify two properties in your
> nutch-sites.xml:
> > 1. plugin.includes
> > from: index-(basic|anchor)
> > to: index-(basic|anchor|urlmeta)
> > 2. urlmeta.tags
> > Insert a comma-delimited list of metatags. Using the above example:
> > <value>corp_owner, will_it_blend, genre</value>
> > Note that you do not need to include the tag with every URL. However,
> you must specify each tag if you want it to be propagated and later indexed.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>