You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Donald Organ <do...@donaldorgan.com> on 2012/03/02 17:39:49 UTC

Help with Synonyms

I am trying to get synonyms working correctly, I want to map  floor locker
  to    storage locker

currently searching for storage locker produces results were as searching
for floor locker  does not produce any results.
I have the following setup for index time synonyms:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100"
omitNorms="false">
          <analyzer type="index">
            <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"
tokenizerFactory="KeywordTokenizerFactory"/>
            <charFilter class="solr.HTMLStripCharFilterFactory"/>
            <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" />
            <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
            <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
          </analyzer>
......



And my synonyms.txt looks like this:

floor locker=>storage locker



What am I doing wrong?

Re: Help with Synonyms

Posted by Donald Organ <do...@donaldorgan.com>.
Excellent thank you, it is now working!

On Mon, Mar 5, 2012 at 9:37 PM, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:

> (12/03/06 11:23), Donald Organ wrote:
>
>> Ok so do I need to use a different format in my synonyms.txt file in order
>> to do this at index time?
>>
>>
> Right, if you want to apply synonym rules to only index time.
> Use "," like this:
>
> floor locker, storage locker
>
> And don't forget to set expand="true" in your index time synonym
> definition.
> This makes if you have "floor locker" in your document, it will be
> expanded not only
> "floor locker" but also "storage locker" in index, then you can search
> the document by any of q=floor locker or storage locker.
>
> koji
> --
> Query Log Visualizer for Apache Solr
> http://soleami.com/
>

Re: Help with Synonyms

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(12/03/06 11:23), Donald Organ wrote:
> Ok so do I need to use a different format in my synonyms.txt file in order
> to do this at index time?
>

Right, if you want to apply synonym rules to only index time.
Use "," like this:

floor locker, storage locker

And don't forget to set expand="true" in your index time synonym definition.
This makes if you have "floor locker" in your document, it will be expanded not only
"floor locker" but also "storage locker" in index, then you can search
the document by any of q=floor locker or storage locker.

koji
-- 
Query Log Visualizer for Apache Solr
http://soleami.com/

Re: Help with Synonyms

Posted by Donald Organ <do...@donaldorgan.com>.
Ok so do I need to use a different format in my synonyms.txt file in order
to do this at index time?

On Monday, March 5, 2012, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
> (12/03/06 11:07), Donald Organ wrote:
>>
>> No I do synonyms at index time.
>>
> :
>>>>
>>>> I am still getting results for storage locker  and no results for floor
>>>> locker
>>>>
>>>> synonyms.txt still looks like this:
>>>>
>>>> floor locker=>storage locker
>
> So that's the cause of the problem. Due to the definition "floor
locker=>storage locker"
> on index time analysis, you got "storage" / "locker" in your index, no
"floor" terms
> in your index at all. In general, if you use "=>" method in your
synonyms.txt,
> you should apply same rule to both index and query time.
>
> koji
> --
> Query Log Visualizer for Apache Solr
> http://soleami.com/
>

Re: Help with Synonyms

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(12/03/06 11:07), Donald Organ wrote:
> No I do synonyms at index time.
>
:
>>> I am still getting results for storage locker  and no results for floor
>>> locker
>>>
>>> synonyms.txt still looks like this:
>>>
>>> floor locker=>storage locker

So that's the cause of the problem. Due to the definition "floor locker=>storage locker"
on index time analysis, you got "storage" / "locker" in your index, no "floor" terms
in your index at all. In general, if you use "=>" method in your synonyms.txt,
you should apply same rule to both index and query time.

koji
-- 
Query Log Visualizer for Apache Solr
http://soleami.com/

Re: Help with Synonyms

Posted by Donald Organ <do...@donaldorgan.com>.
No I do synonyms at index time.

On Monday, March 5, 2012, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:
> (12/03/06 0:11), Donald Organ wrote:
>>>
>>> Try to remove tokenizerFactory="**KeywordTokenizerFactory" in your
>>> synonym filter
>>> definition because I think you would want to tokenize the synonym
settings
>>> in
>>> synonyms.txt as "floor" / "locker" =>  "storage" / "locker". But if you
set
>>> it
>>> to KeywordTokenizer, it will be a map of "floor locker" =>  "storage
>>> locker", and as you
>>> are using WhitespaceTokenizer for your<tokenizer/>  in<analyzer/>, then
>>> if you
>>> try to index "floor locker", it will be "floor"/"locker" (not "floor
>>> locker"),
>>> as a result, it will not match to your synonym map.
>>>
>>> Aside, I recommend that you would set<charFilter/>  -<tokenizer/>  -
>>> <filter/>
>>> chain in the natural order in<analyzer/>, though if those are wrong it
>>> won't
>>> be the cause of the problem at all.
>>>
>>>
>>>
>> OK so I have updated my schema.xml to the following:
>>
>> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
>> omitNorms="false">
>>           <analyzer type="index">
>>             <charFilter class="solr.HTMLStripCharFilterFactory"/>
>>             <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>             <filter class="solr.SynonymFilterFactory"
>> synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
>>             <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" />
>>             <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" />
>>             <filter class="solr.EnglishPorterFilterFactory"
>> protected="protwords.txt" />
>>             <filter class="solr.LowerCaseFilterFactory" />
>>             <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>>           </analyzer>
>>           .....
>>
>> I am still getting results for storage locker  and no results for floor
>> locker
>>
>> synonyms.txt still looks like this:
>>
>> floor locker=>storage locker
>
> Hi Donald,
>
> Do you use same SynonymFilter setting to the query analyzer part
> (<analyzer type="query">)?
>
> koji
> --
> Query Log Visualizer for Apache Solr
> http://soleami.com/
>

Re: Help with Synonyms

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(12/03/06 0:11), Donald Organ wrote:
>> Try to remove tokenizerFactory="**KeywordTokenizerFactory" in your
>> synonym filter
>> definition because I think you would want to tokenize the synonym settings
>> in
>> synonyms.txt as "floor" / "locker" =>  "storage" / "locker". But if you set
>> it
>> to KeywordTokenizer, it will be a map of "floor locker" =>  "storage
>> locker", and as you
>> are using WhitespaceTokenizer for your<tokenizer/>  in<analyzer/>, then
>> if you
>> try to index "floor locker", it will be "floor"/"locker" (not "floor
>> locker"),
>> as a result, it will not match to your synonym map.
>>
>> Aside, I recommend that you would set<charFilter/>  -<tokenizer/>  -
>> <filter/>
>> chain in the natural order in<analyzer/>, though if those are wrong it
>> won't
>> be the cause of the problem at all.
>>
>>
>>
> OK so I have updated my schema.xml to the following:
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
> omitNorms="false">
>            <analyzer type="index">
>              <charFilter class="solr.HTMLStripCharFilterFactory"/>
>              <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>              <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
>              <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" />
>              <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>              <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt" />
>              <filter class="solr.LowerCaseFilterFactory" />
>              <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>            </analyzer>
>            .....
>
> I am still getting results for storage locker  and no results for floor
> locker
>
> synonyms.txt still looks like this:
>
> floor locker=>storage locker

Hi Donald,

Do you use same SynonymFilter setting to the query analyzer part
(<analyzer type="query">)?

koji
-- 
Query Log Visualizer for Apache Solr
http://soleami.com/

Re: Help with Synonyms

Posted by Donald Organ <do...@donaldorgan.com>.
>
>
>>
> Hi Donald,
>
> Try to remove tokenizerFactory="**KeywordTokenizerFactory" in your
> synonym filter
> definition because I think you would want to tokenize the synonym settings
> in
> synonyms.txt as "floor" / "locker" => "storage" / "locker". But if you set
> it
> to KeywordTokenizer, it will be a map of "floor locker" => "storage
> locker", and as you
> are using WhitespaceTokenizer for your <tokenizer/> in <analyzer/>, then
> if you
> try to index "floor locker", it will be "floor"/"locker" (not "floor
> locker"),
> as a result, it will not match to your synonym map.
>
> Aside, I recommend that you would set <charFilter/> - <tokenizer/> -
> <filter/>
> chain in the natural order in <analyzer/>, though if those are wrong it
> won't
> be the cause of the problem at all.
>
>
>
OK so I have updated my schema.xml to the following:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100"
omitNorms="false">
          <analyzer type="index">
            <charFilter class="solr.HTMLStripCharFilterFactory"/>
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
            <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" />
            <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
            <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
          </analyzer>
          .....

I am still getting results for storage locker  and no results for floor
locker

synonyms.txt still looks like this:

floor locker=>storage locker

Re: Help with Synonyms

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
(12/03/03 1:39), Donald Organ wrote:
> I am trying to get synonyms working correctly, I want to map  floor locker
>    to    storage locker
>
> currently searching for storage locker produces results were as searching
> for floor locker  does not produce any results.
> I have the following setup for index time synonyms:
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"
> omitNorms="false">
>            <analyzer type="index">
>              <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"
> tokenizerFactory="KeywordTokenizerFactory"/>
>              <charFilter class="solr.HTMLStripCharFilterFactory"/>
>              <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" />
>              <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>              <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt" />
>              <filter class="solr.LowerCaseFilterFactory" />
>              <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
>              <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>            </analyzer>
>
> And my synonyms.txt looks like this:
>
> floor locker=>storage locker
>
> What am I doing wrong?

Hi Donald,

Try to remove tokenizerFactory="KeywordTokenizerFactory" in your synonym filter
definition because I think you would want to tokenize the synonym settings in
synonyms.txt as "floor" / "locker" => "storage" / "locker". But if you set it
to KeywordTokenizer, it will be a map of "floor locker" => "storage locker", and as you
are using WhitespaceTokenizer for your <tokenizer/> in <analyzer/>, then if you
try to index "floor locker", it will be "floor"/"locker" (not "floor locker"),
as a result, it will not match to your synonym map.

Aside, I recommend that you would set <charFilter/> - <tokenizer/> - <filter/>
chain in the natural order in <analyzer/>, though if those are wrong it won't
be the cause of the problem at all.

koji
-- 
Query Log Visualizer for Apache Solr
http://soleami.com/