You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bernd Fehling <be...@uni-bielefeld.de> on 2011/10/06 13:26:24 UTC

query synonym expansion howto?

Hi list,

has anyone managed to get querytime synonym expansion working?

Synonym expansion itself is working but I get no search results.

synonyms_test.txt
erwachsenenbildung, adult education, educación de adultos, éducation des adultes

search for "erwachsenenbildung"   -->  8 hits
search for "adult education"      --> 13 hits
search for "educación de adultos" -->  3 hits

search for "adult education" with synonym expansion --> 0 hits.

RESULT:
-------
<str name="q">textth:"adult education"</str>
<str name="q.op">OR</str>

<result name="response" numFound="0" start="0" maxScore="0.0"/>
−
<lst name="debug">
<str name="rawquerystring">textth:"adult education"</str>
<str name="querystring">textth:"adult education"</str>
−
<str name="parsedquery">
+((textth:erwachsenenbildung textth:adult education textth:educación de adultos textth:éducation des adultes)~4)
</str>
−
<str name="parsedquery_toString">
+((textth:erwachsenenbildung textth:adult education textth:educación de adultos textth:éducation des adultes)~4)
</str>
<lst name="explain"/>
<str name="QParser">ExtendedDismaxQParser</str>


Can it be that the "q.op=OR" parameter is ignored?

Why is the a slop of ~4 added to the parsedquery?

Regards,
Bernd



Re: query synonym expansion howto?

Posted by Bernd Fehling <be...@uni-bielefeld.de>.
OK, I have changed my synonyms_test.txt:
philosophie, philosophy, filosofia

So there are no multi-word synonyms but it is still not working.
And also if setting qs=0 I get a query slop.


search for "philosophie" --> 13 hits
search for "philosophy"  --> 21 hits
search for "filosofia"   --> 51 hits

search for "philosophy" with synonym expansion --> 0 hits.

<str name="q">textth:philosophy</str>
</lst>
</lst>
<result name="response" numFound="0" start="0" maxScore="0.0"/>
−
<lst name="debug">
<str name="rawquerystring">textth:philosophy</str>
<str name="querystring">textth:philosophy</str>
−
<str name="parsedquery">
+((textth:philosophie textth:philosophy textth:filosofia)~3)
</str>
−
<str name="parsedquery_toString">
+((textth:philosophie textth:philosophy textth:filosofia)~3)
</str>
<lst name="explain"/>
<str name="QParser">ExtendedDismaxQParser</str>


org.apache.solr.analysis.SynonymFilterFactory {tokenizerFactory=solr.WhitespaceTokenizerFactory, synonyms=synonyms_test.txt, expand=true, 
format=solr, ignoreCase=true, luceneMatchVersion=LUCENE_35}
position        1
term text       philosophie
                 philosophy
                 filosofia
type            SYNONYM
                 SYNONYM
                 SYNONYM
startOffset     0
                 0
                 0
endOffset       10
                 10
                 10


Very strange.
Anything else to try?

Regards
Bernd


Am 06.10.2011 13:58, schrieb Ahmet Arslan:
> Query time synonym expansion has problems with multi-word synonyms.
> Query parser splits query string according to white-spaces before query string reaches to analysis chain.
>
>
> This is a known limitation explained here :
>
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>
> But I think using synonyms at index time has its problems as well. E.g. You need to re-index if you add/remove/edit synonym list. For some systems re-indexing takes a lot of time.
>
> I am wondering if a "query expansion module" that injects (before analysis chain) synonymy to initial query string would makes sense.
> E.g. If the query string contains 'adult education' it will add "educación de adultos" phrase as an injected optional clause.
>
> About query slop, since you are using (e)dismax query parser, it is controlled via qs parameter.
>
> http://wiki.apache.org/solr/DisMaxQParserPlugin#qs_.28Query_Phrase_Slop.29
>
>
>> has anyone managed to get querytime synonym expansion
>> working?
>>
>> Synonym expansion itself is working but I get no search
>> results.
>>
>> synonyms_test.txt
>> erwachsenenbildung, adult education, educación de adultos,
>> éducation des adultes
>>
>> search for
>> "erwachsenenbildung"   -->   8 hits
>> search for "adult education"      -->  13
>> hits
>> search for "educación de adultos" -->   3 hits
>>
>> search for "adult education" with synonym expansion -->
>> 0 hits.
>>
>> RESULT:
>> -------
>> <str name="q">textth:"adult education"</str>
>> <str name="q.op">OR</str>
>>
>> <result name="response" numFound="0" start="0"
>> maxScore="0.0"/>
>> −
>> <lst name="debug">
>> <str name="rawquerystring">textth:"adult
>> education"</str>
>> <str name="querystring">textth:"adult
>> education"</str>
>> −
>> <str name="parsedquery">
>> +((textth:erwachsenenbildung textth:adult education
>> textth:educación de adultos textth:éducation des
>> adultes)~4)
>> </str>
>> −
>> <str name="parsedquery_toString">
>> +((textth:erwachsenenbildung textth:adult education
>> textth:educación de adultos textth:éducation des
>> adultes)~4)
>> </str>
>> <lst name="explain"/>
>> <str
>> name="QParser">ExtendedDismaxQParser</str>
>>
>>
>> Can it be that the "q.op=OR" parameter is ignored?
>>
>> Why is the a slop of ~4 added to the parsedquery?
>>
>> Regards,
>> Bernd
>>
>>
>>

-- 
*************************************************************
Bernd Fehling                Universitätsbibliothek Bielefeld
Dipl.-Inform. (FH)                        Universitätsstr. 25
Tel. +49 521 106-4060                   Fax. +49 521 106-4052
bernd.fehling@uni-bielefeld.de                33615 Bielefeld

BASE - Bielefeld Academic Search Engine - www.base-search.net
*************************************************************

Re: query synonym expansion howto?

Posted by Ahmet Arslan <io...@yahoo.com>.
Query time synonym expansion has problems with multi-word synonyms.
Query parser splits query string according to white-spaces before query string reaches to analysis chain.


This is a known limitation explained here : 

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

But I think using synonyms at index time has its problems as well. E.g. You need to re-index if you add/remove/edit synonym list. For some systems re-indexing takes a lot of time. 

I am wondering if a "query expansion module" that injects (before analysis chain) synonymy to initial query string would makes sense. 
E.g. If the query string contains 'adult education' it will add "educación de adultos" phrase as an injected optional clause.

About query slop, since you are using (e)dismax query parser, it is controlled via qs parameter.

http://wiki.apache.org/solr/DisMaxQParserPlugin#qs_.28Query_Phrase_Slop.29 


> has anyone managed to get querytime synonym expansion
> working?
> 
> Synonym expansion itself is working but I get no search
> results.
> 
> synonyms_test.txt
> erwachsenenbildung, adult education, educación de adultos,
> éducation des adultes
> 
> search for
> "erwachsenenbildung"   -->  8 hits
> search for "adult education"      --> 13
> hits
> search for "educación de adultos" -->  3 hits
> 
> search for "adult education" with synonym expansion -->
> 0 hits.
> 
> RESULT:
> -------
> <str name="q">textth:"adult education"</str>
> <str name="q.op">OR</str>
> 
> <result name="response" numFound="0" start="0"
> maxScore="0.0"/>
> −
> <lst name="debug">
> <str name="rawquerystring">textth:"adult
> education"</str>
> <str name="querystring">textth:"adult
> education"</str>
> −
> <str name="parsedquery">
> +((textth:erwachsenenbildung textth:adult education
> textth:educación de adultos textth:éducation des
> adultes)~4)
> </str>
> −
> <str name="parsedquery_toString">
> +((textth:erwachsenenbildung textth:adult education
> textth:educación de adultos textth:éducation des
> adultes)~4)
> </str>
> <lst name="explain"/>
> <str
> name="QParser">ExtendedDismaxQParser</str>
> 
> 
> Can it be that the "q.op=OR" parameter is ignored?
> 
> Why is the a slop of ~4 added to the parsedquery?
> 
> Regards,
> Bernd
> 
> 
>