You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2016/06/30 06:46:10 UTC
[jira] [Assigned] (NUTCH-1553) Property
'indexer.delete.robots.noindex' not working when using parser-html.
[ https://issues.apache.org/jira/browse/NUTCH-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel reassigned NUTCH-1553:
--------------------------------------
Assignee: Sebastian Nagel
> Property 'indexer.delete.robots.noindex' not working when using parser-html.
> ----------------------------------------------------------------------------
>
> Key: NUTCH-1553
> URL: https://issues.apache.org/jira/browse/NUTCH-1553
> Project: Nutch
> Issue Type: Bug
> Components: indexer, parser
> Affects Versions: 1.6
> Reporter: Alfonso Presa
> Assignee: Sebastian Nagel
> Priority: Minor
> Fix For: 1.13
>
> Attachments: NUTCH-1553-trunk-1.patch
>
>
> May be I'm doing something wrong, but it seems to me that +NUTCH-1434+ patch only works when using tika's parser. When using parser-html, "robots" metatag is only populated if parse-metatags plugin is enabled and it's done with the prefix "metatag.". So parseData.getMeta("robots") returns nothing if not using tika.
> I guess the simplest solution would be to provide a fallback in case parseData.getMeta("robots") is null and then get parseData.getMeta("metatag.robots") in that case.
> Also dependency of this property with parse-metadata plugin when using parse-html would be something interesting to document somewhere... (nutch-default.xml?)
> Thanks!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)