You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2014/07/30 11:00:46 UTC
[Nutch Wiki] Update of "IndexMetatags" by JulienNioche
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "IndexMetatags" page has been changed by JulienNioche:
https://wiki.apache.org/nutch/IndexMetatags?action=diff&rev1=4&rev2=5
Comment:
https://issues.apache.org/jira/browse/NUTCH-1561
<value>protocol-http|urlfilter-regex|parse-(html|tika|metatags)|index-(basic|anchor|metadata)|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
</property>
}}}
- 1. In the file `conf/nutch-site.xml`, specify which metatags should be indexed. Either specify specific metatags you want to index, or you can index all metatags. To index all, provide a '*' for the value of the property "metatags.names", otherwise provide the list of names separated by ';'. For example, to only index the metatag 'role', add the following configuration to conf/nutch-site.xml:
+ 1. In the file `conf/nutch-site.xml`, specify which metatags should be indexed. Either specify specific metatags you want to index, or you can index all metatags. To index all, provide a '*' for the value of the property "metatags.names", otherwise provide the list of names separated by ','. For example, to only index the metatag 'role', add the following configuration to conf/nutch-site.xml:
{{{
<!-- Used only if plugin parse-metatags is enabled. -->
<property>
<name>metatags.names</name>
- <value>description;keywords</value>
+ <value>description,keywords</value>
- <description> Names of the metatags to extract, separated by;.
+ <description> Names of the metatags to extract, separated by ','.
Use '*' to extract all metatags. Prefixes the names with 'metatag.'
in the parse-metadata. For instance to index description and keywords,
you need to activate the plugin index-metadata and set the value of the
- parameter 'index.parse.md' to 'metatag.description;metatag.keywords'.
+ parameter 'index.parse.md' to 'metatag.description,metatag.keywords'.
</description>
</property>
}}}