You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Nguyen Manh Tien (JIRA)" <ji...@apache.org> on 2013/11/24 09:42:35 UTC
[jira] [Created] (NUTCH-1673) Title isn't reset in
MoreIndexingFilter
Nguyen Manh Tien created NUTCH-1673:
---------------------------------------
Summary: Title isn't reset in MoreIndexingFilter
Key: NUTCH-1673
URL: https://issues.apache.org/jira/browse/NUTCH-1673
Project: Nutch
Issue Type: Bug
Components: indexer
Affects Versions: 2.2.1
Reporter: Nguyen Manh Tien
In resetTitle function, title is added to doc. We need remove old title before add. Currently it will resulted in error when indexing to solr when title field is not multivalue field.
private NutchDocument resetTitle(NutchDocument doc, WebPage page, String url) {
...
for (int i = 0; i < patterns.length; i++) {
if (matcher.contains(contentDisposition.toString(), patterns[i])) {
...
doc.add("title", result.group(1));
break;
}
}
return doc;
}
--
This message was sent by Atlassian JIRA
(v6.1#6144)