You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2011/07/07 13:44:16 UTC

[jira] [Commented] (NUTCH-783) IndexerChecker Utilty

    [ https://issues.apache.org/jira/browse/NUTCH-783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13061222#comment-13061222 ] 

Markus Jelsma commented on NUTCH-783:
-------------------------------------

Hey, this code is not compatible with Nutch API in 1.4

{code}
+      List<String> values = doc.getFieldValues(fname);
+      if (values != null) {
+        for (String value : values){
+          int minText = Math.min(100, value.length());
+          System.out.println(fname + " :\t" + value.substring(0, minText));
+        }
+      }
{code}

changed to

{code}
      List<Object> values = Arrays.asList(doc.getFieldValue(fname));
      if (values != null) {
        for (Object value : values) {
          String str = value.toString();
          int minText = Math.min(100, str.length());
          System.out.println(fname + " :\t" + str.substring(0, minText));
        }
      }
{code}


It works now. I think it's nice to have in 1.4 and 2.0. 

> IndexerChecker Utilty
> ---------------------
>
>                 Key: NUTCH-783
>                 URL: https://issues.apache.org/jira/browse/NUTCH-783
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>             Fix For: 2.0
>
>         Attachments: NUTCH-783.patch
>
>
> This patch contains a new utility which allows to check the configuration of the indexing filters. The IndexerChecker reads and parses a URL and run the indexers on it. Displays the fields obtained and the first
>  100 characters of their value.
> Can be used e.g. ./nutch org.apache.nutch.indexer.IndexerChecker http://www.lemonde.fr/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira