You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Manish Bafna <ma...@gmail.com> on 2020/09/08 11:52:17 UTC

Multi-word Synonyms not working properly with Edismax

Hi,
We are using the following configuration:

------------------------------------------
*Schema: *
<fieldType name="text_en_splitting_custom" class="solr.TextField"
positionIncrementGap="100"  autoGeneratePhraseQueries="true"
omitNorms="true">
 <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LengthFilterFactory" min="1" max="100"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordRepeatFilterFactory" />
<filter class="solr.HunspellStemFilterFactory"
dictionary="../hunspell_dictionary/en_US.dic"
affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" />
    <filter class="solr.RemoveDuplicatesTokenFilterFactory" /
</analyzer>
 <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LengthFilterFactory" min="1" max="100"/>
        <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.*ManagedSynonymGraphFilterFactory*" managed="${
solr.core.name}_english"/>
        <filter class="solr.KeywordRepeatFilterFactory" />
<filter class="solr.HunspellStemFilterFactory"
dictionary="../hunspell_dictionary/en_US.dic"
affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" />
    <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
</analyzer>
</fieldType>
--------------------------------
*Managed Synonyms:* "abc implement",  "bike", "xyz traders", "xyz transport"
---------------------------------
*Query*: bike
*parser Type:* edismax
---------------------------------
*Parsed query (from debug)* : +DisjunctionMaxQuery((((field1:"abc
implement" field1:bike field1:"xyz traders" field1:"xyz trade"))
---------------------------------

If you notice, there are 2 multi-word keywords starting with xyz, but only
1 of them is getting added to the query. If we change xyz transport to xy
transport, then it works properly. The issue is only when the 2 multi-word
keywords start with the same word. Though we are using graph synonyms, it
is not working properly.

Are we doing anything wrong here?

Thanks,
Manish.

Re: Multi-word Synonyms not working properly with Edismax

Posted by Manish Bafna <ma...@gmail.com>.
Yes, we tried that and it worked. We removed only for query analyzer and it
is working properly now.


On Wed, Sep 9, 2020 at 2:24 AM Dominique Bejean <do...@eolya.fr>
wrote:

> Hi,
>
> Can you try to remove the RemoveDuplicatesTokenFilter ?
>
> Dominique
>
> Le mar. 8 sept. 2020 à 13:52, Manish Bafna <ma...@gmail.com> a
> écrit :
>
> > Hi,
> >
> > We are using the following configuration:
> >
> >
> >
> > ------------------------------------------
> >
> > *Schema: *
> >
> > <fieldType name="text_en_splitting_custom" class="solr.TextField"
> >
> > positionIncrementGap="100"  autoGeneratePhraseQueries="true"
> >
> > omitNorms="true">
> >
> >  <analyzer type="index">
> >
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >
> >         <filter class="solr.LengthFilterFactory" min="1" max="100"/>
> >
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >
> >         <filter class="solr.KeywordRepeatFilterFactory" />
> >
> > <filter class="solr.HunspellStemFilterFactory"
> >
> > dictionary="../hunspell_dictionary/en_US.dic"
> >
> > affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" />
> >
> >     <filter class="solr.RemoveDuplicatesTokenFilterFactory" /
> >
> > </analyzer>
> >
> >  <analyzer type="query">
> >
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >
> >         <filter class="solr.LengthFilterFactory" min="1" max="100"/>
> >
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >
> > <filter class="solr.*ManagedSynonymGraphFilterFactory*" managed="${
> >
> > solr.core.name}_english"/>
> >
> >         <filter class="solr.KeywordRepeatFilterFactory" />
> >
> > <filter class="solr.HunspellStemFilterFactory"
> >
> > dictionary="../hunspell_dictionary/en_US.dic"
> >
> > affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" />
> >
> >     <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
> >
> > </analyzer>
> >
> > </fieldType>
> >
> > --------------------------------
> >
> > *Managed Synonyms:* "abc implement",  "bike", "xyz traders", "xyz
> > transport"
> >
> > ---------------------------------
> >
> > *Query*: bike
> >
> > *parser Type:* edismax
> >
> > ---------------------------------
> >
> > *Parsed query (from debug)* : +DisjunctionMaxQuery((((field1:"abc
> >
> > implement" field1:bike field1:"xyz traders" field1:"xyz trade"))
> >
> > ---------------------------------
> >
> >
> >
> > If you notice, there are 2 multi-word keywords starting with xyz, but
> only
> >
> > 1 of them is getting added to the query. If we change xyz transport to xy
> >
> > transport, then it works properly. The issue is only when the 2
> multi-word
> >
> > keywords start with the same word. Though we are using graph synonyms, it
> >
> > is not working properly.
> >
> >
> >
> > Are we doing anything wrong here?
> >
> >
> >
> > Thanks,
> >
> > Manish.
> >
> >
>

Re: Multi-word Synonyms not working properly with Edismax

Posted by Dominique Bejean <do...@eolya.fr>.
Hi,

Can you try to remove the RemoveDuplicatesTokenFilter ?

Dominique

Le mar. 8 sept. 2020 à 13:52, Manish Bafna <ma...@gmail.com> a
écrit :

> Hi,
>
> We are using the following configuration:
>
>
>
> ------------------------------------------
>
> *Schema: *
>
> <fieldType name="text_en_splitting_custom" class="solr.TextField"
>
> positionIncrementGap="100"  autoGeneratePhraseQueries="true"
>
> omitNorms="true">
>
>  <analyzer type="index">
>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>
>         <filter class="solr.LengthFilterFactory" min="1" max="100"/>
>
>         <filter class="solr.LowerCaseFilterFactory"/>
>
>         <filter class="solr.KeywordRepeatFilterFactory" />
>
> <filter class="solr.HunspellStemFilterFactory"
>
> dictionary="../hunspell_dictionary/en_US.dic"
>
> affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" />
>
>     <filter class="solr.RemoveDuplicatesTokenFilterFactory" /
>
> </analyzer>
>
>  <analyzer type="query">
>
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>
>         <filter class="solr.LengthFilterFactory" min="1" max="100"/>
>
>         <filter class="solr.LowerCaseFilterFactory"/>
>
> <filter class="solr.*ManagedSynonymGraphFilterFactory*" managed="${
>
> solr.core.name}_english"/>
>
>         <filter class="solr.KeywordRepeatFilterFactory" />
>
> <filter class="solr.HunspellStemFilterFactory"
>
> dictionary="../hunspell_dictionary/en_US.dic"
>
> affix="../hunspell_dictionary/en_US.aff" ignoreCase="true" />
>
>     <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>
> </analyzer>
>
> </fieldType>
>
> --------------------------------
>
> *Managed Synonyms:* "abc implement",  "bike", "xyz traders", "xyz
> transport"
>
> ---------------------------------
>
> *Query*: bike
>
> *parser Type:* edismax
>
> ---------------------------------
>
> *Parsed query (from debug)* : +DisjunctionMaxQuery((((field1:"abc
>
> implement" field1:bike field1:"xyz traders" field1:"xyz trade"))
>
> ---------------------------------
>
>
>
> If you notice, there are 2 multi-word keywords starting with xyz, but only
>
> 1 of them is getting added to the query. If we change xyz transport to xy
>
> transport, then it works properly. The issue is only when the 2 multi-word
>
> keywords start with the same word. Though we are using graph synonyms, it
>
> is not working properly.
>
>
>
> Are we doing anything wrong here?
>
>
>
> Thanks,
>
> Manish.
>
>