You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (Commented) (JIRA)" <ji...@apache.org> on 2012/02/06 15:19:59 UTC

[jira] [Commented] (NUTCH-1264) Configurable indexing plugin (index-extra)

    [ https://issues.apache.org/jira/browse/NUTCH-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201303#comment-13201303 ] 

Markus Jelsma commented on NUTCH-1264:
--------------------------------------

+1

Didn't manage to test last week but it works like a charm now! I'll upload a headings plugin without indexing that works with this plugin. 
                
> Configurable indexing plugin (index-extra) 
> -------------------------------------------
>
>                 Key: NUTCH-1264
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1264
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: 1.5
>            Reporter: Julien Nioche
>         Attachments: NUTCH-1264-trunk.patch
>
>
> We currently have several plugins already distributed or proposed which do very comparable things : 
> - parse-meta [NUTCH-809] to generate metadata fields in parse-metadata and index them
> - headings [NUTCH-1005] to generate headings fields in parse-metadata and index them
> - index-extra [NUTCH-422] to index configurable fields 
> - urlmeta [NUTCH-855] to propagate metadata from the seeds to the outlinks and index them
> - index-static [NUTCH-940] to generate configurable static fields 
> All these plugins have in common that they allow to extract information from various sources and generate fields from them and are largely redundant. Instead this issue proposes to have a single plugin allowing to generate configurable fields from : 
> - static values
> - parse metadata
> - content metadata
> - crawldb metadata
> and let the other plugins focus on the parsing and extraction of the values to index. This will make the addition of new fields simpler by relying on a stable common plugin instead of multiplying the code in various plugins.
> This plugin will replace index-static [NUTCH-940] and index-extra [NUTCH-422] and will serve as a basis for further improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira