You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Georg Sorst (JIRA)" <ji...@apache.org> on 2017/01/17 22:20:26 UTC
[jira] [Commented] (SOLR-9968) Cannot use special characters in Suggester Context Query

    [ https://issues.apache.org/jira/browse/SOLR-9968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826939#comment-15826939 ] 

Georg Sorst commented on SOLR-9968:
-----------------------------------

I've implemented a fix ({{SOLR-9968-configurable-tokenizer.patch}}) for this that fulfills my use case: Make the tokenizer to use for context filter queries configurable. This makes it possible to use {{KeywordTokenizer}}, which handles special characters just fine.

The config setting is {{contextFilterQueryTokenizer}}, it defaults to {{StandardTokenizer}}.

The patch also contains a testcase.

The configuration uses the registered name (eg. {{keyword}}, {{standard}}) of the Tokenizer instead of the class name (eg. {{solr.KeywordTokenizerFactory}}, {{solr.StandardTokenizerFactory}}. I would have preferred the latter way but couldn't figure out how to do this properly.
I'll gladly change the behavior if it makes sense and someone can point me in the right direction.

> Cannot use special characters in Suggester Context Query
> --------------------------------------------------------
>
>                 Key: SOLR-9968
>                 URL: https://issues.apache.org/jira/browse/SOLR-9968
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Suggester
>    Affects Versions: 6.0, 6.3
>            Reporter: Georg Sorst
>         Attachments: SOLR-9968-configurable-tokenizer.patch, test_context_query_with_special_characters.patch
>
>
> h4. Reproduce
> 1. Configure the Suggester to use a {{contextField}}, eg. {{context}}
> 2. Add a document containing special characters in that field, eg. '{{c#x}}'
> 3. Use a context query with the Suggester, eg. {noformat}suggest.cfq=context:c#x{noformat}
>   * Escaping the character makes no difference, eg. 
> {noformat}suggest.cfq=context:c\#x{noformat}
> h4. What happens
> The suggestions are not properly filtered
> h4. What should happen
> The suggestions should be limited to documents where the field {{context}} is '{{c#x}}'
> ----
> What happens is this:
> 1. {{SolrSuggester.contextFilterQueryAnalyzer}} is hardwired to use {{StandardTokenizer}}
> 2. The context query is parsed like this:
> {code:title=SolrSuggester.parseContextFilterQuery}
> query = new StandardQueryParser(contextFilterQueryAnalyzer).parse(contextFilter, CONTEXTS_FIELD_NAME);
> {code}
> 3. The {{StandardQueryParser}} together with {{StandardTokenizer}} will turn the context query into '{{context:c context:x}}'
> 4. This is used for filtering the suggestions
> 5. Thus, the suggestion where {{context}} is '{{c(x}}' is not returned
> Attached is an extension to {{SuggestComponentContextFilterQueryTest}} to reproduce this behavior.
> So, the question is, how to get the parser and tokenizer to use these special characters verbatim? Two ways I can think of:
> * Make {{contextFilterQueryAnalyzer}} configurable so {{KeywordTokenizer}} can be used
> * Use the analyzer defined for the context field in the schema



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org