You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Dominique Bejean <do...@eolya.fr> on 2018/02/09 17:25:23 UTC
Multi words query time synonyms
Hi,
I am trying multi words query time synonyms with Solr 6.6.2and
SynonymGraphFilterFactory filter as explain in this article
https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
My field type is :
<fieldType name="textSyn" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ElisionFilterFactory" ignoreCase="true"
articles="lang/contractions_fr.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
<filter class="solr.FrenchMinimalStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ElisionFilterFactory" ignoreCase="true"
articles="lang/contractions_fr.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
<filter class="solr.FrenchMinimalStemFilterFactory"/>
</analyzer>
</fieldType>
synonyms.txt contains the line
om, olympique de marseille
The order of words in my query has an impact on the generated query in
edismax
q={!edismax qf='name_text_gp' v=$qq}
&sow=false
&qq=...
with "qq=om maillot" or "qq=olympique de marseille maillot", I can see the
synonyms expansion. It is working as expected.
"parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil
+name_text_gp:maillot) name_text_gp:om))",
"parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu
+name_text_gp:marseil +name_text_gp:maillot)))",
with "qq=maillot om" or "qq=maillot olympique de marseille", I can see the
same generated query
"parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
"parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
I don't understand these generated queries. The first one looks like the
synonym expansion is ignored, but the second one shows it is not ignored
and only the synonym term is used.
What is wrong in the way I am doing this ?
Regards
Dominique
--
Dominique Béjean
06 08 46 12 43
Re: Multi words query time synonyms
Posted by Dominique Bejean <do...@eolya.fr>.
Steve,
According to your comment, I made this test :
1/ put the SynonymGraphFilterFactory after the StopFilterFactory in query
time analyze chain
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ElisionFilterFactory" ignoreCase="true"
articles="lang/contractions_fr.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
<filter class="solr.SynonymGraphFilterFactory"
synonyms="gosport_synonyms.txt"
ignoreCase="true" expand="true" />
<filter class="solr.FrenchMinimalStemFilterFactory"/>
</analyzer>
2/ remove the stop word in the synonyms file
om, olympique marseille
The parsed query string are :
for "om maillot"
"parsedquery_toString":"+(((((+name_text_gp:olympiqu +name_text_gp:marseil)
name_text_gp:om)) (name_text_gp:maillot))~1)",
for "olympique de marseille maillot"
"parsedquery_toString":"+((((name_text_gp:om (+name_text_gp:olympiqu
+name_text_gp:marseil))) (name_text_gp:maillot))~1)",
for "maillot om"
parsedquery_toString":"+(((name_text_gp:maillot) (((+name_text_gp:olympiqu
+name_text_gp:marseil) name_text_gp:om)))~1)",
for "maillot olympique de marseille"
"parsedquery_toString":"+(((name_text_gp:maillot) ((name_text_gp:om
(+name_text_gp:olympiqu +name_text_gp:marseil))))~1)",
The query result are the same for all queries.
It looks like this could be an acceptable workaround.
Thank you
Dominique
Le dim. 11 févr. 2018 à 10:31, Dominique Bejean <do...@eolya.fr>
a écrit :
> Hi Steve,
>
> Thank you for your response.
> The Jira was created : SOLR-11968
>
> I let you add your comments.
>
> Regards.
>
> Dominique
>
>
> Le sam. 10 févr. 2018 à 20:30, Steve Rowe <sa...@gmail.com> a écrit :
>
>> Hi Dominique,
>>
>> Looks like it’s a bug, not sure where exactly though. Can you please
>> create a JIRA?
>>
>> I can see the same behavior on master too, not just on the
>> releases/lucene-solr/6.6.2 tag.
>>
>> One interesting thing I found is that if I remove the stop filter from
>> the query analyzer, I get the following for qq=“maillot om”:
>>
>> +((name_text_gp:maillot) (((+name_text_gp:olympiqu +name_text_gp:de
>> +name_text_gp:marseil) name_text_gp:om)))
>>
>> (btw my stop list only has “de” on it)
>>
>> Thanks,
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>> > On Feb 10, 2018, at 2:12 AM, Dominique Bejean <
>> dominique.bejean@eolya.fr> wrote:
>> >
>> > Hi,
>> >
>> > More info.
>> >
>> > When I test the analisys for the field type the synonyms are correctly
>> > expanded for both expressions
>> >
>> > om maillot
>> > maillot om
>> > olympique de marseille maillot
>> > maillot olympique de marseille
>> >
>> > resulting outputs always include the following terms (obvioulsly not
>> always
>> > in the same order)
>> >
>> > olympiqu om marseil maillot
>> >
>> >
>> > So, i suspect an issue with edismax query parser.
>> >
>> > Regards.
>> >
>> > Dominique
>> >
>> >
>> > Le ven. 9 févr. 2018 à 18:25, Dominique Bejean <
>> dominique.bejean@eolya.fr>
>> > a écrit :
>> >
>> >> Hi,
>> >>
>> >> I am trying multi words query time synonyms with Solr 6.6.2and
>> >> SynonymGraphFilterFactory filter as explain in this article
>> >>
>> >>
>> https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
>> >>
>> >> My field type is :
>> >>
>> >> <fieldType name="textSyn" class="solr.TextField"
>> >> positionIncrementGap="100">
>> >> <analyzer type="index">
>> >> <tokenizer class="solr.StandardTokenizerFactory"/>
>> >> <filter class="solr.ElisionFilterFactory" ignoreCase="true"
>> >> articles="lang/contractions_fr.txt"/>
>> >> <filter class="solr.LowerCaseFilterFactory"/>
>> >> <filter class="solr.ASCIIFoldingFilterFactory"/>
>> >> <filter class="solr.StopFilterFactory" words="stopwords.txt"
>> >> ignoreCase="true"/>
>> >> <filter class="solr.FrenchMinimalStemFilterFactory"/>
>> >> </analyzer>
>> >> <analyzer type="query">
>> >> <tokenizer class="solr.StandardTokenizerFactory"/>
>> >> <filter class="solr.ElisionFilterFactory" ignoreCase="true"
>> >> articles="lang/contractions_fr.txt"/>
>> >> <filter class="solr.LowerCaseFilterFactory"/>
>> >> <filter class="solr.SynonymGraphFilterFactory"
>> >> synonyms="synonyms.txt"
>> >> ignoreCase="true" expand="true"/>
>> >> <filter class="solr.ASCIIFoldingFilterFactory"/>
>> >> <filter class="solr.StopFilterFactory" words="stopwords.txt"
>> >> ignoreCase="true"/>
>> >> <filter class="solr.FrenchMinimalStemFilterFactory"/>
>> >> </analyzer>
>> >> </fieldType>
>> >>
>> >>
>> >> synonyms.txt contains the line
>> >>
>> >> om, olympique de marseille
>> >>
>> >>
>> >> The order of words in my query has an impact on the generated query in
>> >> edismax
>> >>
>> >> q={!edismax qf='name_text_gp' v=$qq}
>> >> &sow=false
>> >> &qq=...
>> >>
>> >> with "qq=om maillot" or "qq=olympique de marseille maillot", I can see
>> the
>> >> synonyms expansion. It is working as expected.
>> >>
>> >> "parsedquery_toString":"+(((+name_text_gp:olympiqu
>> +name_text_gp:marseil
>> >> +name_text_gp:maillot) name_text_gp:om))",
>> >> "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu
>> >> +name_text_gp:marseil +name_text_gp:maillot)))",
>> >>
>> >>
>> >> with "qq=maillot om" or "qq=maillot olympique de marseille", I can see
>> the
>> >> same generated query
>> >>
>> >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
>> >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
>> >>
>> >> I don't understand these generated queries. The first one looks like
>> the
>> >> synonym expansion is ignored, but the second one shows it is not
>> ignored
>> >> and only the synonym term is used.
>> >>
>> >>
>> >> What is wrong in the way I am doing this ?
>> >>
>> >> Regards
>> >>
>> >> Dominique
>> >>
>> >> --
>> >> Dominique Béjean
>> >> 06 08 46 12 43
>> >>
>> > --
>> > Dominique Béjean
>> > 06 08 46 12 43
>>
>> --
> Dominique Béjean
> 06 08 46 12 43
>
--
Dominique Béjean
06 08 46 12 43
Re: Multi words query time synonyms
Posted by Dominique Bejean <do...@eolya.fr>.
Hi Steve,
Thank you for your response.
The Jira was created : SOLR-11968
I let you add your comments.
Regards.
Dominique
Le sam. 10 févr. 2018 à 20:30, Steve Rowe <sa...@gmail.com> a écrit :
> Hi Dominique,
>
> Looks like it’s a bug, not sure where exactly though. Can you please
> create a JIRA?
>
> I can see the same behavior on master too, not just on the
> releases/lucene-solr/6.6.2 tag.
>
> One interesting thing I found is that if I remove the stop filter from the
> query analyzer, I get the following for qq=“maillot om”:
>
> +((name_text_gp:maillot) (((+name_text_gp:olympiqu +name_text_gp:de
> +name_text_gp:marseil) name_text_gp:om)))
>
> (btw my stop list only has “de” on it)
>
> Thanks,
>
> --
> Steve
> www.lucidworks.com
>
> > On Feb 10, 2018, at 2:12 AM, Dominique Bejean <do...@eolya.fr>
> wrote:
> >
> > Hi,
> >
> > More info.
> >
> > When I test the analisys for the field type the synonyms are correctly
> > expanded for both expressions
> >
> > om maillot
> > maillot om
> > olympique de marseille maillot
> > maillot olympique de marseille
> >
> > resulting outputs always include the following terms (obvioulsly not
> always
> > in the same order)
> >
> > olympiqu om marseil maillot
> >
> >
> > So, i suspect an issue with edismax query parser.
> >
> > Regards.
> >
> > Dominique
> >
> >
> > Le ven. 9 févr. 2018 à 18:25, Dominique Bejean <
> dominique.bejean@eolya.fr>
> > a écrit :
> >
> >> Hi,
> >>
> >> I am trying multi words query time synonyms with Solr 6.6.2and
> >> SynonymGraphFilterFactory filter as explain in this article
> >>
> >>
> https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
> >>
> >> My field type is :
> >>
> >> <fieldType name="textSyn" class="solr.TextField"
> >> positionIncrementGap="100">
> >> <analyzer type="index">
> >> <tokenizer class="solr.StandardTokenizerFactory"/>
> >> <filter class="solr.ElisionFilterFactory" ignoreCase="true"
> >> articles="lang/contractions_fr.txt"/>
> >> <filter class="solr.LowerCaseFilterFactory"/>
> >> <filter class="solr.ASCIIFoldingFilterFactory"/>
> >> <filter class="solr.StopFilterFactory" words="stopwords.txt"
> >> ignoreCase="true"/>
> >> <filter class="solr.FrenchMinimalStemFilterFactory"/>
> >> </analyzer>
> >> <analyzer type="query">
> >> <tokenizer class="solr.StandardTokenizerFactory"/>
> >> <filter class="solr.ElisionFilterFactory" ignoreCase="true"
> >> articles="lang/contractions_fr.txt"/>
> >> <filter class="solr.LowerCaseFilterFactory"/>
> >> <filter class="solr.SynonymGraphFilterFactory"
> >> synonyms="synonyms.txt"
> >> ignoreCase="true" expand="true"/>
> >> <filter class="solr.ASCIIFoldingFilterFactory"/>
> >> <filter class="solr.StopFilterFactory" words="stopwords.txt"
> >> ignoreCase="true"/>
> >> <filter class="solr.FrenchMinimalStemFilterFactory"/>
> >> </analyzer>
> >> </fieldType>
> >>
> >>
> >> synonyms.txt contains the line
> >>
> >> om, olympique de marseille
> >>
> >>
> >> The order of words in my query has an impact on the generated query in
> >> edismax
> >>
> >> q={!edismax qf='name_text_gp' v=$qq}
> >> &sow=false
> >> &qq=...
> >>
> >> with "qq=om maillot" or "qq=olympique de marseille maillot", I can see
> the
> >> synonyms expansion. It is working as expected.
> >>
> >> "parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil
> >> +name_text_gp:maillot) name_text_gp:om))",
> >> "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu
> >> +name_text_gp:marseil +name_text_gp:maillot)))",
> >>
> >>
> >> with "qq=maillot om" or "qq=maillot olympique de marseille", I can see
> the
> >> same generated query
> >>
> >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
> >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
> >>
> >> I don't understand these generated queries. The first one looks like the
> >> synonym expansion is ignored, but the second one shows it is not ignored
> >> and only the synonym term is used.
> >>
> >>
> >> What is wrong in the way I am doing this ?
> >>
> >> Regards
> >>
> >> Dominique
> >>
> >> --
> >> Dominique Béjean
> >> 06 08 46 12 43
> >>
> > --
> > Dominique Béjean
> > 06 08 46 12 43
>
> --
Dominique Béjean
06 08 46 12 43
Re: Multi words query time synonyms
Posted by Steve Rowe <sa...@gmail.com>.
Hi Dominique,
Looks like it’s a bug, not sure where exactly though. Can you please create a JIRA?
I can see the same behavior on master too, not just on the releases/lucene-solr/6.6.2 tag.
One interesting thing I found is that if I remove the stop filter from the query analyzer, I get the following for qq=“maillot om”:
+((name_text_gp:maillot) (((+name_text_gp:olympiqu +name_text_gp:de +name_text_gp:marseil) name_text_gp:om)))
(btw my stop list only has “de” on it)
Thanks,
--
Steve
www.lucidworks.com
> On Feb 10, 2018, at 2:12 AM, Dominique Bejean <do...@eolya.fr> wrote:
>
> Hi,
>
> More info.
>
> When I test the analisys for the field type the synonyms are correctly
> expanded for both expressions
>
> om maillot
> maillot om
> olympique de marseille maillot
> maillot olympique de marseille
>
> resulting outputs always include the following terms (obvioulsly not always
> in the same order)
>
> olympiqu om marseil maillot
>
>
> So, i suspect an issue with edismax query parser.
>
> Regards.
>
> Dominique
>
>
> Le ven. 9 févr. 2018 à 18:25, Dominique Bejean <do...@eolya.fr>
> a écrit :
>
>> Hi,
>>
>> I am trying multi words query time synonyms with Solr 6.6.2and
>> SynonymGraphFilterFactory filter as explain in this article
>>
>> https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
>>
>> My field type is :
>>
>> <fieldType name="textSyn" class="solr.TextField"
>> positionIncrementGap="100">
>> <analyzer type="index">
>> <tokenizer class="solr.StandardTokenizerFactory"/>
>> <filter class="solr.ElisionFilterFactory" ignoreCase="true"
>> articles="lang/contractions_fr.txt"/>
>> <filter class="solr.LowerCaseFilterFactory"/>
>> <filter class="solr.ASCIIFoldingFilterFactory"/>
>> <filter class="solr.StopFilterFactory" words="stopwords.txt"
>> ignoreCase="true"/>
>> <filter class="solr.FrenchMinimalStemFilterFactory"/>
>> </analyzer>
>> <analyzer type="query">
>> <tokenizer class="solr.StandardTokenizerFactory"/>
>> <filter class="solr.ElisionFilterFactory" ignoreCase="true"
>> articles="lang/contractions_fr.txt"/>
>> <filter class="solr.LowerCaseFilterFactory"/>
>> <filter class="solr.SynonymGraphFilterFactory"
>> synonyms="synonyms.txt"
>> ignoreCase="true" expand="true"/>
>> <filter class="solr.ASCIIFoldingFilterFactory"/>
>> <filter class="solr.StopFilterFactory" words="stopwords.txt"
>> ignoreCase="true"/>
>> <filter class="solr.FrenchMinimalStemFilterFactory"/>
>> </analyzer>
>> </fieldType>
>>
>>
>> synonyms.txt contains the line
>>
>> om, olympique de marseille
>>
>>
>> The order of words in my query has an impact on the generated query in
>> edismax
>>
>> q={!edismax qf='name_text_gp' v=$qq}
>> &sow=false
>> &qq=...
>>
>> with "qq=om maillot" or "qq=olympique de marseille maillot", I can see the
>> synonyms expansion. It is working as expected.
>>
>> "parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil
>> +name_text_gp:maillot) name_text_gp:om))",
>> "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu
>> +name_text_gp:marseil +name_text_gp:maillot)))",
>>
>>
>> with "qq=maillot om" or "qq=maillot olympique de marseille", I can see the
>> same generated query
>>
>> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
>> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
>>
>> I don't understand these generated queries. The first one looks like the
>> synonym expansion is ignored, but the second one shows it is not ignored
>> and only the synonym term is used.
>>
>>
>> What is wrong in the way I am doing this ?
>>
>> Regards
>>
>> Dominique
>>
>> --
>> Dominique Béjean
>> 06 08 46 12 43
>>
> --
> Dominique Béjean
> 06 08 46 12 43
Re: Multi words query time synonyms
Posted by Dominique Bejean <do...@eolya.fr>.
Hi,
More info.
When I test the analisys for the field type the synonyms are correctly
expanded for both expressions
om maillot
maillot om
olympique de marseille maillot
maillot olympique de marseille
resulting outputs always include the following terms (obvioulsly not always
in the same order)
olympiqu om marseil maillot
So, i suspect an issue with edismax query parser.
Regards.
Dominique
Le ven. 9 févr. 2018 à 18:25, Dominique Bejean <do...@eolya.fr>
a écrit :
> Hi,
>
> I am trying multi words query time synonyms with Solr 6.6.2and
> SynonymGraphFilterFactory filter as explain in this article
>
> https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
>
> My field type is :
>
> <fieldType name="textSyn" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.ElisionFilterFactory" ignoreCase="true"
> articles="lang/contractions_fr.txt"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.ASCIIFoldingFilterFactory"/>
> <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
> <filter class="solr.FrenchMinimalStemFilterFactory"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.ElisionFilterFactory" ignoreCase="true"
> articles="lang/contractions_fr.txt"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.SynonymGraphFilterFactory"
> synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
> <filter class="solr.ASCIIFoldingFilterFactory"/>
> <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
> <filter class="solr.FrenchMinimalStemFilterFactory"/>
> </analyzer>
> </fieldType>
>
>
> synonyms.txt contains the line
>
> om, olympique de marseille
>
>
> The order of words in my query has an impact on the generated query in
> edismax
>
> q={!edismax qf='name_text_gp' v=$qq}
> &sow=false
> &qq=...
>
> with "qq=om maillot" or "qq=olympique de marseille maillot", I can see the
> synonyms expansion. It is working as expected.
>
> "parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil
> +name_text_gp:maillot) name_text_gp:om))",
> "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu
> +name_text_gp:marseil +name_text_gp:maillot)))",
>
>
> with "qq=maillot om" or "qq=maillot olympique de marseille", I can see the
> same generated query
>
> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
>
> I don't understand these generated queries. The first one looks like the
> synonym expansion is ignored, but the second one shows it is not ignored
> and only the synonym term is used.
>
>
> What is wrong in the way I am doing this ?
>
> Regards
>
> Dominique
>
> --
> Dominique Béjean
> 06 08 46 12 43
>
--
Dominique Béjean
06 08 46 12 43