You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Dominique Béjean (JIRA)" <ji...@apache.org> on 2018/02/11 09:31:00 UTC
[jira] [Updated] (SOLR-11968) Multi-words query time synonyms

     [ https://issues.apache.org/jira/browse/SOLR-11968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dominique Béjean updated SOLR-11968:
------------------------------------
    Environment: Centos 7.x
    Description: 
I am trying multi words query time synonyms with Solr 6.6.2and SynonymGraphFilterFactory filter as explain in this article
 [https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/]
  
 My field type is :
{code:java}
<fieldType name="textSyn" class="solr.TextField" positionIncrementGap="100">
     <analyzer type="index">
       <tokenizer class="solr.StandardTokenizerFactory"/>
       <filter class="solr.ElisionFilterFactory" ignoreCase="true" 
             articles="lang/contractions_fr.txt"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.ASCIIFoldingFilterFactory"/>
       <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
       <filter class="solr.FrenchMinimalStemFilterFactory"/>
     </analyzer>
     <analyzer type="query">
       <tokenizer class="solr.StandardTokenizerFactory"/>
       <filter class="solr.ElisionFilterFactory" ignoreCase="true" 
             articles="lang/contractions_fr.txt"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
             ignoreCase="true" expand="true"/>
       <filter class="solr.ASCIIFoldingFilterFactory"/>
       <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
       <filter class="solr.FrenchMinimalStemFilterFactory"/>
     </analyzer>
   </fieldType>{code}
 
 synonyms.txt contains the line :
{code:java}
om, olympique de marseille{code}
 
 stopwords.txt contains the word 
{code:java}
de{code}
 
 The order of words in my query has an impact on the generated query in edismax
{code:java}
q={!edismax qf='name_text_gp' v=$qq}
 &sow=false
 &qq=...{code}

 with "qq=om maillot" or "qq=olympique de marseille maillot", I can see the synonyms expansion. It is working as expected.
{code:java}
"parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil +name_text_gp:maillot) name_text_gp:om))",
 "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu +name_text_gp:marseil +name_text_gp:maillot)))",{code}

 with "qq=maillot om" or "qq=maillot olympique de marseille", I can see the same generated query 
{code:java}
"parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
 "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",{code}

 I don't understand these generated queries. The first one looks like the synonym expansion is ignored, but the second one shows it is not ignored and only the synonym term is used.
  
 When I test the analisys for the field type the synonyms are correctly expanded for both expressions
{code:java}
om maillot  
 maillot om
 olympique de marseille maillot
 maillot olympique de marseille{code}

 resulting outputs always include the following terms (obvioulsly not always in the same order)
{code:java}
olympiqu om marseil maillot {code}
 
 So, i suspect an issue with edismax query parser.

  was:
I am trying multi words query time synonyms with Solr 6.6.2and SynonymGraphFilterFactory filter as explain in this article
[https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/]
 
My field type is :
 
<fieldType name="textSyn" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.ElisionFilterFactory" ignoreCase="true" 
            articles="lang/contractions_fr.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.ASCIIFoldingFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.FrenchMinimalStemFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.ElisionFilterFactory" ignoreCase="true" 
            articles="lang/contractions_fr.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
            ignoreCase="true" expand="true"/>
      <filter class="solr.ASCIIFoldingFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
      <filter class="solr.FrenchMinimalStemFilterFactory"/>
    </analyzer>
  </fieldType>
 
 
synonyms.txt contains the line :
 
om, olympique de marseille
 
stopwords.txt contains the word "de"
 
 
The order of words in my query has an impact on the generated query in edismax
 
q=\{!edismax qf='name_text_gp' v=$qq}
&sow=false
&qq=...
 
with "qq=om maillot" or "qq=olympique de marseille maillot", I can see the synonyms expansion. It is working as expected.
 
"parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil +name_text_gp:maillot) name_text_gp:om))",
"parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu +name_text_gp:marseil +name_text_gp:maillot)))",
 
 
with "qq=maillot om" or "qq=maillot olympique de marseille", I can see the same generated query 
 
"parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
"parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
 
I don't understand these generated queries. The first one looks like the synonym expansion is ignored, but the second one shows it is not ignored and only the synonym term is used.
 
When I test the analisys for the field type the synonyms are correctly expanded for both expressions
 
om maillot  
maillot om
olympique de marseille maillot
maillot olympique de marseille
 
resulting outputs always include the following terms (obvioulsly not always in the same order)
 
olympiqu om marseil maillot 
 
 
So, i suspect an issue with edismax query parser.


> Multi-words query time synonyms
> -------------------------------
>
>                 Key: SOLR-11968
>                 URL: https://issues.apache.org/jira/browse/SOLR-11968
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers, Schema and Analysis
>    Affects Versions: master (8.0), 6.6.2
>         Environment: Centos 7.x
>            Reporter: Dominique Béjean
>            Priority: Major
>
> I am trying multi words query time synonyms with Solr 6.6.2and SynonymGraphFilterFactory filter as explain in this article
>  [https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/]
>   
>  My field type is :
> {code:java}
> <fieldType name="textSyn" class="solr.TextField" positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.ElisionFilterFactory" ignoreCase="true" 
>              articles="lang/contractions_fr.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.ASCIIFoldingFilterFactory"/>
>        <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
>        <filter class="solr.FrenchMinimalStemFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.ElisionFilterFactory" ignoreCase="true" 
>              articles="lang/contractions_fr.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
>              ignoreCase="true" expand="true"/>
>        <filter class="solr.ASCIIFoldingFilterFactory"/>
>        <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
>        <filter class="solr.FrenchMinimalStemFilterFactory"/>
>      </analyzer>
>    </fieldType>{code}
>  
>  synonyms.txt contains the line :
> {code:java}
> om, olympique de marseille{code}
>  
>  stopwords.txt contains the word 
> {code:java}
> de{code}
>  
>  The order of words in my query has an impact on the generated query in edismax
> {code:java}
> q={!edismax qf='name_text_gp' v=$qq}
>  &sow=false
>  &qq=...{code}
>  with "qq=om maillot" or "qq=olympique de marseille maillot", I can see the synonyms expansion. It is working as expected.
> {code:java}
> "parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil +name_text_gp:maillot) name_text_gp:om))",
>  "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu +name_text_gp:marseil +name_text_gp:maillot)))",{code}
>  with "qq=maillot om" or "qq=maillot olympique de marseille", I can see the same generated query 
> {code:java}
> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
>  "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",{code}
>  I don't understand these generated queries. The first one looks like the synonym expansion is ignored, but the second one shows it is not ignored and only the synonym term is used.
>   
>  When I test the analisys for the field type the synonyms are correctly expanded for both expressions
> {code:java}
> om maillot  
>  maillot om
>  olympique de marseille maillot
>  maillot olympique de marseille{code}
>  resulting outputs always include the following terms (obvioulsly not always in the same order)
> {code:java}
> olympiqu om marseil maillot {code}
>  
>  So, i suspect an issue with edismax query parser.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org