You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "James Dyer (JIRA)" <ji...@apache.org> on 2017/03/13 15:01:41 UTC

[jira] [Commented] (SOLR-10256) Parentheses in SpellCheckCollator

    [ https://issues.apache.org/jira/browse/SOLR-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15907616#comment-15907616 ] 

James Dyer commented on SOLR-10256:
-----------------------------------

[~asingh2411]  Assuming you can't use the collation provided because of the added parenthesis, could you just specify "spellcheck.collateExtendedResults=true" and could your application use the information contained therein to craft a new query the way you want it?  This might be a good workaround until/if we decide to change the current behavior.

Really, instead of having a flag, I'd like to either keep it as-is, or fix it to work correctly for most common use cases, but not to have an obscure flag that users need to worry about.  You may want to review the "testCollate()" method in [WorkBreakSolrSpellChecker's unit test|https://github.com/apache/lucene-solr/blob/master/solr/core/src/test/org/apache/solr/spelling/WordBreakSolrSpellCheckerTest.java], and suggest better outcomes for these queries with a little discussion as to _why_ your suggestions would be better.

The idea here is, if "pine" is a required term and the spellcheck breaks it to "pi ne", then both "pi" and "ne" should be required also.  But maybe this is not the best thing to try and enforce?

bq. And when surrounded by brackets, they represent the same position by EdismaxParser 

I'm trying to find some documentation somewhere that says this, or maybe a test case that demonstrates it?  I apologize for my ignorance on the ins and outs of edismax here.

> Parentheses in SpellCheckCollator
> ---------------------------------
>
>                 Key: SOLR-10256
>                 URL: https://issues.apache.org/jira/browse/SOLR-10256
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: spellchecker
>            Reporter: Abhishek Kumar Singh
>         Attachments: SOLR-10256.patch
>
>
> SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which have space between them.  
> This should be configurable, because if *_WordBreakSpellCheckComponent_* is being used, queries like : *applejuice* will be broken down to *apple juice*. Such suggestions are being surrounded by braces by current *SpellCheckCollator*. 
> And when surrounded by brackets, they represent the same position by _EdismaxParser_ , which is not required. 
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227  
> A solution to this will be to have a flag, which can help disable this parenthesisation of spell check suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org