You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Doğacan Güney (JIRA)" <ji...@apache.org> on 2009/01/21 16:03:59 UTC

[jira] Commented: (NUTCH-655) Injecting Crawl metadata

    [ https://issues.apache.org/jira/browse/NUTCH-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665848#action_12665848 ] 

Doğacan Güney commented on NUTCH-655:
-------------------------------------

Is everyone OK with moving this issue to target 1.1 release?

> Injecting Crawl metadata
> ------------------------
>
>                 Key: NUTCH-655
>                 URL: https://issues.apache.org/jira/browse/NUTCH-655
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>            Reporter: julien nioche
>            Priority: Minor
>         Attachments: Injector.patch
>
>
> the patch attached allows to inject metadata into the crawlDB. The input file has to contain fields separated by tabs, with the URL being on the first column. The metadata names and values are separated by '='. A input line might look like this:
> http://www.myurl.com  \t  categ=value1 \t categ2=value2
> This functionality can be useful to store external knowledge and index it with a custom plugin

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.