You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sivaprasad <si...@echidnainc.com> on 2010/11/15 16:24:57 UTC

Problem with synonyms

Hi,

I have a set of synonyms in synonyms.txt file.

For ex:
hdtv,High Definition Television, High Definition TV


In the admin screen when i type "High Definition Television" as the query
term to analyze , i got hdtv as the result of the analysis.

But when is search for the term hdtv and "High Definition Television" the
results count is mismatching.

The analysis chain is given below

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
     <analyzer>
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>        
       <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>       
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>           
 </fieldType>

As part of search results i enabled debugQuery=true and then the query term
is coming as shown below.

+searchtext:high +searchtext:definit +searchtext:televis

But if i put the query term in double quotes(for ex:"High Definition
Television") , it is working fine.

What is the cause for this problem?

Regards,
Siva
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-synonyms-tp1905051p1905051.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with synonyms

Posted by sivaprasad <si...@echidnainc.com>.

In synonyms.txt file i have the below synonyms.

ipod, i-pod, i pod

If expand==false  during the index time, Is it going to replace all the
occurences of "i-pod", "i pod" with "ipod" ?


-- 
View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-synonyms-tp1905051p1946336.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with synonyms

Posted by sivaprasad <si...@echidnainc.com>.
Hi,
This is looks like a bug.See the below url.

https://issues.apache.org/jira/browse/LUCENE-1622

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-synonyms-tp1905051p1944183.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with synonyms

Posted by Robert Muir <rc...@gmail.com>.
On Tue, Nov 16, 2010 at 1:16 AM, sivaprasad <si...@echidnainc.com> wrote:
> Query1:hdtv
>
> <str name="parsedquery">MultiPhraseQuery(searchtext:"high definit (televis
> tv tvs)")</str>
>
> and the number of results returned is ZERO.
>
> Query2:High Definition Television
>
> The parsed query is given below.
> <str name="parsedquery">+searchtext:high +searchtext:definit
> +(searchtext:televis searchtext:tv searchtext:tvs)</str>
>
> And the number of resullts is 1.
>

Please see http://mail-archives.apache.org/mod_mbox/lucene-dev/201011.mbox/%3CAANLkTimaTGvpLPH_mGfbSUGhDOEDC8TC2bRRWxhiDO1K@mail.gmail.com%3E
which explains the problem, which is "autophrase" generation by the queryparser.

you will need to either use the workaround, or upgrade to an
unreleased version and manually turn off this *very bad* default.

Re: Problem with synonyms

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, Nov 22, 2010 at 10:29 AM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Sat, Nov 20, 2010 at 5:59 AM, sivaprasad <si...@echidnainc.com> wrote:
>> Even after expanding the synonyms also i am unable to get same results.
>
> What you are trying to do should work with index-time synonym expansion.
> Just make sure to remove the synonym filter at query time (or use a
> synonym filter w/o multi-word synonyms).

Actually, to be more precise, the current query-time restriction is
that you can't produce synonyms of different lengths.
Hence you could normalize "High Definition TV" to "hdtv" at both query
time and index time.

Optionally you can expand to both "High Definition TV" and "hdtv" at
index time (in which case you would normally turn off query time
synonym processing).

-Yonik
http://www.lucidimagination.com

Re: Problem with synonyms

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Sat, Nov 20, 2010 at 5:59 AM, sivaprasad <si...@echidnainc.com> wrote:
> Even after expanding the synonyms also i am unable to get same results.

What you are trying to do should work with index-time synonym expansion.
Just make sure to remove the synonym filter at query time (or use a
synonym filter w/o multi-word synonyms).

What's the original text in the document you are trying to match?

-Yonik
http://www.lucidimagination.com

Re: Problem with synonyms

Posted by sivaprasad <si...@echidnainc.com>.
Even after expanding the synonyms also i am unable to get same results.

Is there any other method to achieve this
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-synonyms-tp1905051p1935419.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with synonyms

Posted by Ahmet Arslan <io...@yahoo.com>.
What happens  when you use synonym filter at index time only with expand="true" with this synonym_index.txt?

I use only comma operator:

hdtv, High Definition Television, High Definition TV, High Definition
Televisions, High Definition TVs

Also putting the synonym filter under the stem filter can be useful in your case. Porter can own televisions to television transformation.

--- On Tue, 11/16/10, sivaprasad <si...@echidnainc.com> wrote:

> From: sivaprasad <si...@echidnainc.com>
> Subject: Re: Problem with synonyms
> To: solr-user@lucene.apache.org
> Date: Tuesday, November 16, 2010, 8:16 AM
> 
> I did changes to the schema file as shown below.
> 
> <analyzer>
>         <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>         <filter
> class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>     
>   
>         <filter
> class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>       
>         <filter
> class="solr.LowerCaseFilterFactory"/>
>         <filter
> class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>         <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
> </analyzer>      
> 
> And i have an entry in the synonym.txt file as shown
> below.
> 
> hdtv => High Definition Television, High Definition
> TV,High Definition
> Televisions,High Definition TVs
> 
> Now i submitted the query with debugQuery=on .
> 
> Query1:hdtv
> 
> The parsed query is given below.
> 
> <str name="rawquerystring">hdtv</str> 
> <str name="querystring">hdtv</str> 
> <str
> name="parsedquery">MultiPhraseQuery(searchtext:"high
> definit (televis
> tv tvs)")</str> 
> <str name="parsedquery_toString">searchtext:"high
> definit (televis tv
> tvs)"</str> 
> 
> and the number of results returned is ZERO.
> 
> Query2:High Definition Television
> 
> The parsed query is given below.
> <str name="rawquerystring">High Definition
> Television</str> 
> <str name="querystring">High Definition
> Television</str> 
> <str name="parsedquery">+searchtext:high
> +searchtext:definit
> +(searchtext:televis searchtext:tv
> searchtext:tvs)</str> 
> <str name="parsedquery_toString">+searchtext:high
> +searchtext:definit
> +(searchtext:televis searchtext:tv
> searchtext:tvs)</str> 
> 
> And the number of resullts is 1.
> 
> Why i am getting the results like this even after expanding
> the synonyms.
> -- 
> View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-synonyms-tp1905051p1909369.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
> 


      

Re: Problem with synonyms

Posted by sivaprasad <si...@echidnainc.com>.
I did changes to the schema file as shown below.

<analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>        
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>       
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer> 	 

And i have an entry in the synonym.txt file as shown below.

hdtv => High Definition Television, High Definition TV,High Definition
Televisions,High Definition TVs

Now i submitted the query with debugQuery=on .

Query1:hdtv

The parsed query is given below.

<str name="rawquerystring">hdtv</str> 
<str name="querystring">hdtv</str> 
<str name="parsedquery">MultiPhraseQuery(searchtext:"high definit (televis
tv tvs)")</str> 
<str name="parsedquery_toString">searchtext:"high definit (televis tv
tvs)"</str> 

and the number of results returned is ZERO.

Query2:High Definition Television

The parsed query is given below.
<str name="rawquerystring">High Definition Television</str> 
<str name="querystring">High Definition Television</str> 
<str name="parsedquery">+searchtext:high +searchtext:definit
+(searchtext:televis searchtext:tv searchtext:tvs)</str> 
<str name="parsedquery_toString">+searchtext:high +searchtext:definit
+(searchtext:televis searchtext:tv searchtext:tvs)</str> 

And the number of resullts is 1.

Why i am getting the results like this even after expanding the synonyms.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-synonyms-tp1905051p1909369.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with synonyms

Posted by Ahmet Arslan <io...@yahoo.com>.
> Do i need to expand the synonyms at index time?

Probably yes. You can play with its parameters and experiment. 


      

Re: Problem with synonyms

Posted by sivaprasad <si...@echidnainc.com>.
Do i need to expand the synonyms at index time?
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-synonyms-tp1905051p1905976.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with synonyms

Posted by Ahmet Arslan <io...@yahoo.com>.
Multi-word synonyms are meant to be used at index time. QueryParser will split your query on white spaces unless you use quotes.

"The Lucene QueryParser tokenizes on white space before giving any text to the Analyzer...." [1]

[1]http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

--- On Mon, 11/15/10, sivaprasad <si...@echidnainc.com> wrote:

> From: sivaprasad <si...@echidnainc.com>
> Subject: Problem with synonyms
> To: solr-user@lucene.apache.org
> Date: Monday, November 15, 2010, 5:24 PM
> 
> Hi,
> 
> I have a set of synonyms in synonyms.txt file.
> 
> For ex:
> hdtv,High Definition Television, High Definition TV
> 
> 
> In the admin screen when i type "High Definition
> Television" as the query
> term to analyze , i got hdtv as the result of the
> analysis.
> 
> But when is search for the term hdtv and "High Definition
> Television" the
> results count is mismatching.
> 
> The analysis chain is given below
> 
> <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer>
>        <tokenizer
> class="solr.WhitespaceTokenizerFactory"/>
>        <filter
> class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>     
>   
>        <filter
> class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>       
>        <filter
> class="solr.LowerCaseFilterFactory"/>
>        <filter
> class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"/>
>        <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>   
>        
>  </fieldType>
> 
> As part of search results i enabled debugQuery=true and
> then the query term
> is coming as shown below.
> 
> +searchtext:high +searchtext:definit +searchtext:televis
> 
> But if i put the query term in double quotes(for ex:"High
> Definition
> Television") , it is working fine.
> 
> What is the cause for this problem?
> 
> Regards,
> Siva
> -- 
> View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-synonyms-tp1905051p1905051.html
> Sent from the Solr - User mailing list archive at
> Nabble.com.
>