You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2018/10/19 16:52:00 UTC

[jira] [Comment Edited] (SOLR-12243) Edismax missing phrase queries when phrases contain multiterm synonyms

    [ https://issues.apache.org/jira/browse/SOLR-12243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657074#comment-16657074 ] 

Uwe Schindler edited comment on SOLR-12243 at 10/19/18 4:51 PM:
----------------------------------------------------------------

That's waht I mean, it's still linked together. The main bug is still in Lucene, because the Lucene Query builder creates a query that does not correctly implement span queries on multi-term synonyms, because it uses the wrong query type. The issues here are coming from the fact that dismax relies on the interal implementation of the lucene code, which is not a good thing. The solr code should not do this and instead we should add something into Lucene that can create those pf auto-phrase queries. I was missing that in an own query parser, too. So basically it would be good to have some additional query builder method in Lucene that analyzes some text and then builds configureable shingles that are connected with span/phrase using a slop. This code should not depend on the structure of a span/boolean query that was parsed before.

I'd like to wait a few days until the Lucene issue is solved and then review the changes here and adapt them as necessary. On the longer term, I'd like to get rid of the query instanceof spaghetticode and move the query construction for dismax-like queries using term shingles (bigrams, trigrams) to a separate builder class. So it's better resuseable.


was (Author: thetaphi):
That's waht I mean, it's still linked together. The main bug is still in Lucene, because the Lucene Query builder creates a query that does not correctly implement span queries on multi-term synonyms, because it uses the wrong query type. The issues here are coming from the fact that dismax relies on the interal implementation of the lucene code, which is not a good thing. The solr code should not do this and instead we should add something into Lucene that can create those pf auto-phrase queries. I was missing that in an own query parser, too. So basically it would be good to have some additional query builder method in Lucene that analyzes some text and then builds configureable shingles that are connected with span/phrase using a slop. This code should not depend on the structure of a span/boolean query that was parsed before.

I'd like to wait a few days until the Lucene issue is solved and then review the changes here and adapt them as necessary. On the longer term, I'd like to get rid of the query instance of shingling and move the query construction for dismax-like queries to a separate builder class. So it's better resuseable.

> Edismax missing phrase queries when phrases contain multiterm synonyms
> ----------------------------------------------------------------------
>
>                 Key: SOLR-12243
>                 URL: https://issues.apache.org/jira/browse/SOLR-12243
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers
>    Affects Versions: 7.1
>         Environment: RHEL, MacOS X
> Do not believe this is environment-specific.
>            Reporter: Elizabeth Haubert
>            Assignee: Uwe Schindler
>            Priority: Major
>         Attachments: SOLR-12243.patch, SOLR-12243.patch, SOLR-12243.patch, SOLR-12243.patch, SOLR-12243.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> synonyms.txt:
> {code}
> allergic, hypersensitive
> aspirin, acetylsalicylic acid
> dog, canine, canis familiris, k 9
> rat, rattus
> {code}
> request handler:
> {code:xml}
> <requestHandler name="/test_qparse_error" class="solr.SearchHandler">
>  <lst name="defaults">
> <!-- Query settings -->
>  <str name="defType">edismax</str>
>  <str name="tie"> 0.4</str>
>  <str name="qf">title^100</str>
>  <str name="pf">title~20^5000</str>
>  <str name="pf2">title~11</str>
>  <str name="pf3">title~22^1000</str>
>  <str name="df">text</str>
>  <!-- mm If two or fewer clauses exist, they all must match. 
>  If three to five clauses exist, one can be missing. If six to eight clauses exist, all but three must match. 
>  If more than nine clauses exist, only require 30% to match.-->
>  <str name="mm">3&lt;-1 6&lt;-3 9&lt;30%</str>
>  <str name="q.alt">*:*</str>
>  <str name="rows">25</str>
> </lst>
> </requestHandler>
> {code}
> Phrase queries (pf, pf2, pf3) containing "dog" or "aspirin"  against the above list will not be generated.
> "allergic reaction dog" will generate pf2: "allergic reaction", but not pf:"allergic reaction dog", pf2: "reaction dog", or pf3: "allergic reaction dog"
> "aspirin dose in rats" will generate pf3: "dose ? rats" but not pf2: "aspirin dose" or pf3:"aspirin dose ?"
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org