You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2017/10/10 18:21:00 UTC
[jira] [Created] (SOLR-11462) TokenizerChain's normalize() doesn't
work
Tim Allison created SOLR-11462:
----------------------------------
Summary: TokenizerChain's normalize() doesn't work
Key: SOLR-11462
URL: https://issues.apache.org/jira/browse/SOLR-11462
Project: Solr
Issue Type: Bug
Security Level: Public (Default Security Level. Issues are Public)
Reporter: Tim Allison
Priority: Trivial
TokenizerChain's {{normalize()}} is not currently used so this doesn't currently have any negative effects on search. However, there is a bug, and we should fix it.
If applied to a TokenizerChain with {{filters.length > 1}}, only the last would apply.
{noformat}
@Override
protected TokenStream normalize(String fieldName, TokenStream in) {
TokenStream result = in;
for (TokenFilterFactory filter : filters) {
if (filter instanceof MultiTermAwareComponent) {
filter = (TokenFilterFactory) ((MultiTermAwareComponent) filter).getMultiTermComponent();
result = filter.create(in);
}
}
return result;
}
{noformat}
The fix is trivial:
{noformat}
- result = filter.create(in);
+ result = filter.create(result);
{noformat}
If you'd like to swap out {{TextField#analyzeMultiTerm()}} with, say:
{noformat}
public static BytesRef analyzeMultiTerm(String field, String part, Analyzer analyzerIn) {
if (part == null || analyzerIn == null) return null;
return analyzerIn.normalize(field, part);
}
{noformat}
I'm happy to submit a PR with unit tests. Let me know.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org