You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2011/07/07 13:44:16 UTC
[jira] [Commented] (NUTCH-783) IndexerChecker Utilty
[ https://issues.apache.org/jira/browse/NUTCH-783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061222#comment-13061222 ]
Markus Jelsma commented on NUTCH-783:
-------------------------------------
Hey, this code is not compatible with Nutch API in 1.4
{code}
+ List<String> values = doc.getFieldValues(fname);
+ if (values != null) {
+ for (String value : values){
+ int minText = Math.min(100, value.length());
+ System.out.println(fname + " :\t" + value.substring(0, minText));
+ }
+ }
{code}
changed to
{code}
List<Object> values = Arrays.asList(doc.getFieldValue(fname));
if (values != null) {
for (Object value : values) {
String str = value.toString();
int minText = Math.min(100, str.length());
System.out.println(fname + " :\t" + str.substring(0, minText));
}
}
{code}
It works now. I think it's nice to have in 1.4 and 2.0.
> IndexerChecker Utilty
> ---------------------
>
> Key: NUTCH-783
> URL: https://issues.apache.org/jira/browse/NUTCH-783
> Project: Nutch
> Issue Type: New Feature
> Components: indexer
> Reporter: Julien Nioche
> Assignee: Julien Nioche
> Fix For: 2.0
>
> Attachments: NUTCH-783.patch
>
>
> This patch contains a new utility which allows to check the configuration of the indexing filters. The IndexerChecker reads and parses a URL and run the indexers on it. Displays the fields obtained and the first
> 100 characters of their value.
> Can be used e.g. ./nutch org.apache.nutch.indexer.IndexerChecker http://www.lemonde.fr/
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira