You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Julien Nioche (Updated) (JIRA)" <ji...@apache.org> on 2012/02/01 14:51:02 UTC

[jira] [Updated] (NUTCH-1264) Configurable indexing plugin (index-extra)

     [ https://issues.apache.org/jira/browse/NUTCH-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julien Nioche updated NUTCH-1264:
---------------------------------

    Attachment: NUTCH-1264-trunk.patch
    
> Configurable indexing plugin (index-extra) 
> -------------------------------------------
>
>                 Key: NUTCH-1264
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1264
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: 1.5
>            Reporter: Julien Nioche
>         Attachments: NUTCH-1264-trunk.patch
>
>
> We currently have several plugins already distributed or proposed which do very comparable things : 
> - parse-meta [NUTCH-809] to generate metadata fields in parse-metadata and index them
> - headings [NUTCH-1005] to generate headings fields in parse-metadata and index them
> - index-extra [NUTCH-422] to index configurable fields 
> - urlmeta [NUTCH-855] to propagate metadata from the seeds to the outlinks and index them
> - index-static [NUTCH-940] to generate configurable static fields 
> All these plugins have in common that they allow to extract information from various sources and generate fields from them and are largely redundant. Instead this issue proposes to have a single plugin allowing to generate configurable fields from : 
> - static values
> - parse metadata
> - content metadata
> - crawldb metadata
> and let the other plugins focus on the parsing and extraction of the values to index. This will make the addition of new fields simpler by relying on a stable common plugin instead of multiplying the code in various plugins.
> This plugin will replace index-static [NUTCH-940] and index-extra [NUTCH-422] and will serve as a basis for further improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira