You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2014/01/17 13:02:23 UTC

[jira] [Comment Edited] (NUTCH-1478) Parse-metatags and index-metadata plugin for Nutch 2.x series

    [ https://issues.apache.org/jira/browse/NUTCH-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874705#comment-13874705 ] 

Lewis John McGibbney edited comment on NUTCH-1478 at 1/17/14 12:02 PM:
-----------------------------------------------------------------------

Previous patch did not compile.
This patch adds in index-metadata plugin as per origin patch, adds correct formatting. Finally, in addition to the existing patch, I've added a small improvement which checks that the metatags string array has more than one value before adding \t.
if you apply the patch you will see the test failing for TestMetatagsParser... this needs fixed but i won't be able to do it right now.
[~kiranch] do you fancy having a look at this if you get time?


was (Author: lewismc):
Previous patch did not compile.
This patch adds in index-metadata plugin as per origin patch, adds correct formatting. Finally, in addition to the existing patch, I've added a small improvement which checks that the metatags string array has more than one value before adding \t.
if you apply the patch you will see the test failing for TestMetatagsParser... this needs fixed but i won't be able to do it right now.

> Parse-metatags and index-metadata plugin for Nutch 2.x series 
> --------------------------------------------------------------
>
>                 Key: NUTCH-1478
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1478
>             Project: Nutch
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 2.1
>            Reporter: kiran
>             Fix For: 2.3
>
>         Attachments: NUTCH-1478-parse-v2.patch, NUTCH-1478v3.patch, Nutch1478.patch, Nutch1478.zip, metadata_parseChecker_sites.png
>
>
> I have ported parse-metatags and index-metadata plugin to Nutch 2.x series.  This will take multiple values of same tag and index in Solr as i patched before (https://issues.apache.org/jira/browse/NUTCH-1467).
> The usage is same as described here (http://wiki.apache.org/nutch/IndexMetatags) but one change is that there is no need to give 'metatag' keyword before metatag names. For example my configuration looks like this (https://github.com/salvager/NutchDev/blob/master/runtime/local/conf/nutch-site.xml) 
> This is only the first version and does not include the junit test. I will update the new version soon.
> This will parse the tags and index the tags in Solr. Make sure you create the fields in 'index.parse.md' in nutch-site.xml in schema.xml in Solr.
> Please let me know if you have any suggestions
> This is supported by DLA (Digital Library and Archives) of Virginia Tech.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)