You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "James Dyer (JIRA)" <ji...@apache.org> on 2018/04/27 14:44:00 UTC

[jira] [Commented] (SOLR-12284) WordBreakSolrSpellChecker incorrectly adds parenthesis when breaking words

    [ https://issues.apache.org/jira/browse/SOLR-12284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456538#comment-16456538 ] 

James Dyer commented on SOLR-12284:
-----------------------------------

With this patch, parenthesis are still added when the user is using boolean operators.  The change is when the query is based on optional/required clauses.

So, _+pineapple_ can result in a collation like _+pine +apple_ instead of _(+pine +apple)_ .

On the other hand, _pineapple OR goodness_  would still possibly collate to _(pine AND apple) OR goodness_ , same as before.

I will commit this in a few days if there are no objections.

> WordBreakSolrSpellChecker incorrectly adds parenthesis when breaking words
> --------------------------------------------------------------------------
>
>                 Key: SOLR-12284
>                 URL: https://issues.apache.org/jira/browse/SOLR-12284
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: spellchecker
>    Affects Versions: 7.3
>            Reporter: James Dyer
>            Assignee: James Dyer
>            Priority: Minor
>         Attachments: SOLR-12284.patch
>
>
> When using WordBreakSolrSpellChecker to break single words into multiple, the collation queries include parenthesis around the original term.  In some cases, this causes required terms to become optional and users get spurious nonsensical collation results.
> For instance, if I search: +eward +smith 
> ...If +ward +smith is a match, it might give a collation like: (+e +ward) +smith
> ...This requires either the "e" or the "ward" to exist, but not both.  But users are more likely to want both terms to be required, so it would be better if it was not adding parenthesis.
> This might be the cause of SOLR-5995 and [this SO issue|https://stackoverflow.com/questions/23849747/solr-wordbreak-spellchecker-breaking-words-into-letters-excessive-breaking]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org