You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Esther Quansah (JIRA)" <ji...@apache.org> on 2015/11/12 18:03:10 UTC

[jira] [Commented] (SOLR-8212) Standard Highlighter Inconsistent with NGram Tokenizer

    [ https://issues.apache.org/jira/browse/SOLR-8212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002416#comment-15002416 ] 

Esther Quansah commented on SOLR-8212:
--------------------------------------

Update: problem identified: in TokenGroup.java,  private static final int MAX_NUM_TOKENS_PER_GROUP = 50. Terms with query contained farther in word (bronchos*co*py, blood *ca*ncer, etc) end up having 50+ tokens and therefore private int matchStartOffset and private int matchEndOffset are not calculated correctly in void addToken() and entire term eventually returned with no formatting. 

> Standard Highlighter Inconsistent with NGram Tokenizer
> ------------------------------------------------------
>
>                 Key: SOLR-8212
>                 URL: https://issues.apache.org/jira/browse/SOLR-8212
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Esther Quansah
>            Priority: Minor
>         Attachments: SOLR-8212.patch
>
>
> Noticing some inconsistent behavior with the Standard Highlighter and its function on terms that use the NGram Tokenizer. Ex: 
> I created a field called "title_contains" which uses the NGram Tokenizer and I indexed the term "bronchoscopy". Querying "co" on the title_contains field should return "bronchos<em>co</em>py", but the Standard highlighter returns "bronchoscopy" without the highlighting information.
> I created a test called testNgram() which tests the above example using (1) the Standard Highlighter on the ngram field type and (2) the Fast Vector Highlighter on the ngram field type. The first fails and the second passes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org