You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Scott Gonyea <sc...@aitrus.org> on 2010/07/15 03:55:19 UTC

Re: [jira] Updated: (NUTCH-855) ScoringFilter and IndexingFilter: To allow for the propagation of URL Metatags and their subsequent indexing.

Sorry about the spam, everyone.  I hope my patch didn't suck too much :).

On Wed, Jul 14, 2010 at 6:53 PM, Scott Gonyea (JIRA) <ji...@apache.org>wrote:

>
>     [
> https://issues.apache.org/jira/browse/NUTCH-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Scott Gonyea updated NUTCH-855:
> -------------------------------
>
>     Attachment: nutch-855.txt
>
> > ScoringFilter and IndexingFilter: To allow for the propagation of URL
> Metatags and their subsequent indexing.
> >
> -------------------------------------------------------------------------------------------------------------
> >
> >                 Key: NUTCH-855
> >                 URL: https://issues.apache.org/jira/browse/NUTCH-855
> >             Project: Nutch
> >          Issue Type: New Feature
> >          Components: generator, indexer
> >    Affects Versions: 1.1
> >            Reporter: Scott Gonyea
> >             Fix For: 1.2
> >
> >         Attachments: nutch-855.txt
> >
> >   Original Estimate: 168h
> >  Remaining Estimate: 168h
> >
> > This plugin is designed to enhance the NUTCH-655 patch, by doing two
> things:
> > 1. Meta Tags that are supplied with your Crawl URLs, during injection,
> will be propagated throughout the outlinks of those Crawl URLs.
> > 2. When you index your URLs, the meta tags that you specified with your
> URLs will be indexed alongside those URLs--and can be directly queried,
> assuming you have done everything else correctly.
> > The flat-file of URLs you are injecting should, per NUTCH-655, be
> tab-delimited in the form of:
> > [www.url.com]\t[key1]=[value1]\t[key2]=[value2]...[keyN]=[valueN]
> > or:
> > http://slashdot.org/  corp_owner=Geeknet      will_it_blend=indubitably
> > http://engadget.com/  corp_owner=Weblogs      genre=geeksquad_thriller
> > To activate this plugin, you must modify two properties in your
> nutch-sites.xml:
> > 1. plugin.includes
> >    from: index-(basic|anchor)
> >    to:   index-(basic|anchor|urlmeta)
> > 2. urlmeta.tags
> >    Insert a comma-delimited list of metatags. Using the above example:
> >    <value>corp_owner, will_it_blend, genre</value>
> >    Note that you do not need to include the tag with every URL. However,
> you must specify each tag if you want it to be propagated and later indexed.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>