You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Nair, Manas" <Ma...@mtvnmix.com> on 2009/11/12 14:43:05 UTC

Multi word synonym problem

Hi Experts,
 
I would like help on multi word synonyms. The scenario is like:
 
I have a name Micheal Jackson(wrong term) which has a synonym Michael Jackson i.e.
 
Micheal Jackson => Michael Jackson
 
When I try to search for the word Micheal Jackson (not a phrase search), it is searching for text: Micheal , text: Jackson  and not for Michael Jackson.
But when I search for "Micheal Jackson" (phrase search), solr is searching for "Michael Jackson" (the correct term).
 
The schema.xml for the particular core contains the  SynonymFilterFactory for text analyzer and is enabled during index as well as query time. The  SynonymFilterFactory during index and query time has the parameter expand=true.
 
Please help me as to how a multiword synonym can be made effective i.e I want a search for 
Micheal Jackson (not phrase search) to return the results for Michael Jackson.
 
What should be done so that Micheal Jackson is considered as one search term instead of splitting it.
 
Any help is greatly appreciated.
 
Thankyou,
Manas Nair

RE: Multi word synonym problem

Posted by Chris Hostetter <ho...@fucit.org>.
: The response is not searching for Michael Jackson. Instead it is 
: searching for (text:Micheal and text: Jackson).To monitor the parsed 
: query, i turned on debugQuery, but in the present case, the parsed query 
: string was searching Micheal and Jackson separately.

using index time synonyms isn't ggoing to have any effect on how your 
query is parsed.  the Lucene/Solr query parsers uses whitespace as 
"markup" and will still analyze each of the "words" in your input 
seperately and build up a boolean query containing each of your words 
individually (the only way to change that is to use quotes to force 
"phrase query" behavior where everything in quotes is analyzed as one 
chunk, or pick a different queyr parse like the "field" parser)

...but none of that changes the point of *why* you can/should use index 
time synonyms for situations like this.  the point of doing that is that 
at index time the alternate versions of the multi-word sequences can all 
be expanded and all varients are put in the index ... so it doesn't matter 
if you use a phrase query, or term queries, all of the synonyms are in the 
index document.



-Hoss


RE: Multi word synonym problem

Posted by "Nair, Manas" <Ma...@mtvnmix.com>.
Hi,
 
I tried using the recommended approach but to no benefit. The multiword synonyms are still not appearing in the result.
 
My schema.xml has the following fieldType:
 
 
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
<!--        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
<!--        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

This "text" field is the defaultSearchField too.
 
If I give the synonym for Micheal Jackson as Michael Jackson, i.e. in my synonyms.txt file, he entry is:
Micheal Jackson => Michael Jackson
 
The response is not searching for Michael Jackson. Instead it is searching for (text:Micheal and text: Jackson).To monitor the parsed query, i turned on debugQuery, but in the present case, the parsed query string was searching Micheal and Jackson separately.
 
I was able to somehow bring the corret response by modifying the synonyms.txt file. I changed the entry as:
Micheal Jackson , Michael Jackson  (replaced '=>' with ',').
 
Is there something that needs to be done with the schema part that has been mentioned above. I would want the synonyms to work when I map them using =>.
 
Kindly help.
 
Thankyou,
Manas
________________________________

From: AHMET ARSLAN [mailto:iorixxx@yahoo.com]
Sent: Thu 11/12/2009 1:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Multi word synonym problem



It is recommended [1] to use synonyms at index time only for various reasons especially with multi-word synonyms.

[1]http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

only at index time use expand=true ingoreCase=true with synonym.txt :

micheal, michael

OR:

micheal jackson, michael jackson

Note it it is important to what filters you have before synonym filter.
Bu sure that you restart tomcat and re-index.

Query Micheal Jackson (not phrase search) should return the results
for Michael Jackson.

Hope this helps.

--- On Thu, 11/12/09, Nair, Manas <Ma...@mtvnmix.com> wrote:

> From: Nair, Manas <Ma...@mtvnmix.com>
> Subject: Multi word synonym problem
> To: solr-user@lucene.apache.org
> Cc: "Arumugam, Senthil Kumar" <Se...@mtvncontractor.com>
> Date: Thursday, November 12, 2009, 3:43 PM
> Hi Experts,
> 
> I would like help on multi word synonyms. The scenario is
> like:
> 
> I have a name Micheal Jackson(wrong term) which has a
> synonym Michael Jackson i.e.
> 
> Micheal Jackson => Michael Jackson
> 
> When I try to search for the word Micheal Jackson (not a
> phrase search), it is searching for text: Micheal , text:
> Jackson  and not for Michael Jackson.
> But when I search for "Micheal Jackson" (phrase search),
> solr is searching for "Michael Jackson" (the correct term).
> 
> The schema.xml for the particular core contains the 
> SynonymFilterFactory for text analyzer and is enabled during
> index as well as query time. The  SynonymFilterFactory
> during index and query time has the parameter expand=true.
> 
> Please help me as to how a multiword synonym can be made
> effective i.e I want a search for
> Micheal Jackson (not phrase search) to return the results
> for Michael Jackson.
> 
> What should be done so that Micheal Jackson is considered
> as one search term instead of splitting it.
> 
> Any help is greatly appreciated.
> 
> Thankyou,
> Manas Nair
>


     



Re: Multi word synonym problem

Posted by AHMET ARSLAN <io...@yahoo.com>.
It is recommended [1] to use synonyms at index time only for various reasons especially with multi-word synonyms.

[1]http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

only at index time use expand=true ingoreCase=true with synonym.txt :

micheal, michael

OR:

micheal jackson, michael jackson

Note it it is important to what filters you have before synonym filter.
Bu sure that you restart tomcat and re-index.

Query Micheal Jackson (not phrase search) should return the results
for Michael Jackson.

Hope this helps.

--- On Thu, 11/12/09, Nair, Manas <Ma...@mtvnmix.com> wrote:

> From: Nair, Manas <Ma...@mtvnmix.com>
> Subject: Multi word synonym problem
> To: solr-user@lucene.apache.org
> Cc: "Arumugam, Senthil Kumar" <Se...@mtvncontractor.com>
> Date: Thursday, November 12, 2009, 3:43 PM
> Hi Experts,
>  
> I would like help on multi word synonyms. The scenario is
> like:
>  
> I have a name Micheal Jackson(wrong term) which has a
> synonym Michael Jackson i.e.
>  
> Micheal Jackson => Michael Jackson
>  
> When I try to search for the word Micheal Jackson (not a
> phrase search), it is searching for text: Micheal , text:
> Jackson  and not for Michael Jackson.
> But when I search for "Micheal Jackson" (phrase search),
> solr is searching for "Michael Jackson" (the correct term).
>  
> The schema.xml for the particular core contains the 
> SynonymFilterFactory for text analyzer and is enabled during
> index as well as query time. The  SynonymFilterFactory
> during index and query time has the parameter expand=true.
>  
> Please help me as to how a multiword synonym can be made
> effective i.e I want a search for 
> Micheal Jackson (not phrase search) to return the results
> for Michael Jackson.
>  
> What should be done so that Micheal Jackson is considered
> as one search term instead of splitting it.
>  
> Any help is greatly appreciated.
>  
> Thankyou,
> Manas Nair
>