You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/11/24 09:42:35 UTC

[jira] [Created] (NUTCH-1673) Title isn't reset in MoreIndexingFilter

Nguyen Manh Tien created NUTCH-1673:
---------------------------------------

             Summary: Title isn't reset in MoreIndexingFilter
                 Key: NUTCH-1673
                 URL: https://issues.apache.org/jira/browse/NUTCH-1673
             Project: Nutch
          Issue Type: Bug
          Components: indexer
    Affects Versions: 2.2.1
            Reporter: Nguyen Manh Tien


In resetTitle function, title is added to doc. We need remove old title before add. Currently it will resulted in error when indexing to solr when title field is not multivalue field.

private NutchDocument resetTitle(NutchDocument doc, WebPage page, String url) {
...
    for (int i = 0; i < patterns.length; i++) {
      if (matcher.contains(contentDisposition.toString(), patterns[i])) {
...
        doc.add("title", result.group(1));
        break;
      }
    }

    return doc;
  }



--
This message was sent by Atlassian JIRA
(v6.1#6144)