You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by David Philip <da...@gmail.com> on 2014/10/15 15:46:45 UTC

Solr Synonyms, Escape space in case of multi words

Hi All,

   I remember using multi-words in synonyms in Solr 3.x version. In case of
multi words, I was escaping space with back slash[\] and it work as
intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
each other and so when I searched for ride makers, I obtained the search
results for all of them. The field type was same as below. I have same set
up in solr 4.10 but now the multi word space escape is getting ignored. It
is tokenizing on spaces.

 synonyms.txt
    ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care


Analysis page:

ridemakersrideridemakerzrideridemarkridemakersmakerzcare

Field Type

    <fieldType name="text_syn" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
      </analyzer>
    </fieldType>



Could you please tell me what could be the issue? How do I handle
multi-word cases?




    synonyms.txt
    ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care


Thanks - David

Re: Solr Synonyms, Escape space in case of multi words

Posted by Rajani Maski <ra...@gmail.com>.
Hi David,

  I think you should have the filter class with tokenizer specified. [As
shown below]

  <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"

*tokenizerFactory="solr.KeywordTokenizerFactory"/>*



So your field type should be as shown below:

<fieldType name="text_syn" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"
tokenizerFactory="solr.KeywordTokenizerFactory"/>
      </analyzer>
    </fieldType>


On Wed, Oct 15, 2014 at 7:25 PM, David Philip <da...@gmail.com>
wrote:

> Sorry, analysis page clip is getting trimmed off and hence the indention is
> lost.
>
> Here it is :
>
> ridemakers | ride | ridemakerz | ride | ridemark | ride | makers | makerz|
> care
>
> expected:
>
> ridemakers | ride | ridemakerz | ride | ridemark | ride | makers |
> makerz| *ride
> care*
>
>
>
> On Wed, Oct 15, 2014 at 7:21 PM, David Philip <davidphilipsheron@gmail.com
> >
> wrote:
>
> > contd..
> >
> > expectation was that the "ride care"  should not have split into two
> > tokens.
> >
> > It should have been as below. Please correct me/point me where I am
> wrong.
> >
> >
> > Input : ridemakers, ride makers, ridemakerz, ride makerz, ride\mark,
> ride\
> > care
> >
> > o/p
> >
> > ridemakersrideridemakerzrideridemarkridemakersmakerz
> >
> > *ride care*
> >
> >
> >
> >
> > On Wed, Oct 15, 2014 at 7:16 PM, David Philip <
> davidphilipsheron@gmail.com
> > > wrote:
> >
> >> Hi All,
> >>
> >>    I remember using multi-words in synonyms in Solr 3.x version. In case
> >> of multi words, I was escaping space with back slash[\] and it work as
> >> intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
> >> each other and so when I searched for ride makers, I obtained the search
> >> results for all of them. The field type was same as below. I have same
> set
> >> up in solr 4.10 but now the multi word space escape is getting ignored.
> It
> >> is tokenizing on spaces.
> >>
> >>  synonyms.txt
> >>     ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
> >> care
> >>
> >>
> >> Analysis page:
> >>
> >> ridemakersrideridemakerzrideridemarkridemakersmakerzcare
> >>
> >> Field Type
> >>
> >>     <fieldType name="text_syn" class="solr.TextField"
> >> positionIncrementGap="100">
> >>       <analyzer>
> >>         <tokenizer class="solr.KeywordTokenizerFactory"/>
> >>         <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt"
> >> ignoreCase="true" expand="true"/>
> >>       </analyzer>
> >>     </fieldType>
> >>
> >>
> >>
> >> Could you please tell me what could be the issue? How do I handle
> >> multi-word cases?
> >>
> >>
> >>
> >>
> >>     synonyms.txt
> >>     ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
> >> care
> >>
> >>
> >> Thanks - David
> >>
> >>
> >>
> >
> >
>

Re: Solr Synonyms, Escape space in case of multi words

Posted by David Philip <da...@gmail.com>.
Sorry, analysis page clip is getting trimmed off and hence the indention is
lost.

Here it is :

ridemakers | ride | ridemakerz | ride | ridemark | ride | makers | makerz|
care

expected:

ridemakers | ride | ridemakerz | ride | ridemark | ride | makers |
makerz| *ride
care*



On Wed, Oct 15, 2014 at 7:21 PM, David Philip <da...@gmail.com>
wrote:

> contd..
>
> expectation was that the "ride care"  should not have split into two
> tokens.
>
> It should have been as below. Please correct me/point me where I am wrong.
>
>
> Input : ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
> care
>
> o/p
>
> ridemakersrideridemakerzrideridemarkridemakersmakerz
>
> *ride care*
>
>
>
>
> On Wed, Oct 15, 2014 at 7:16 PM, David Philip <davidphilipsheron@gmail.com
> > wrote:
>
>> Hi All,
>>
>>    I remember using multi-words in synonyms in Solr 3.x version. In case
>> of multi words, I was escaping space with back slash[\] and it work as
>> intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
>> each other and so when I searched for ride makers, I obtained the search
>> results for all of them. The field type was same as below. I have same set
>> up in solr 4.10 but now the multi word space escape is getting ignored. It
>> is tokenizing on spaces.
>>
>>  synonyms.txt
>>     ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
>> care
>>
>>
>> Analysis page:
>>
>> ridemakersrideridemakerzrideridemarkridemakersmakerzcare
>>
>> Field Type
>>
>>     <fieldType name="text_syn" class="solr.TextField"
>> positionIncrementGap="100">
>>       <analyzer>
>>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> ignoreCase="true" expand="true"/>
>>       </analyzer>
>>     </fieldType>
>>
>>
>>
>> Could you please tell me what could be the issue? How do I handle
>> multi-word cases?
>>
>>
>>
>>
>>     synonyms.txt
>>     ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
>> care
>>
>>
>> Thanks - David
>>
>>
>>
>
>

Re: Solr Synonyms, Escape space in case of multi words

Posted by David Philip <da...@gmail.com>.
contd..

expectation was that the "ride care"  should not have split into two tokens.

It should have been as below. Please correct me/point me where I am wrong.


Input : ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
care

o/p

ridemakersrideridemakerzrideridemarkridemakersmakerz

*ride care*




On Wed, Oct 15, 2014 at 7:16 PM, David Philip <da...@gmail.com>
wrote:

> Hi All,
>
>    I remember using multi-words in synonyms in Solr 3.x version. In case
> of multi words, I was escaping space with back slash[\] and it work as
> intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
> each other and so when I searched for ride makers, I obtained the search
> results for all of them. The field type was same as below. I have same set
> up in solr 4.10 but now the multi word space escape is getting ignored. It
> is tokenizing on spaces.
>
>  synonyms.txt
>     ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care
>
>
> Analysis page:
>
> ridemakersrideridemakerzrideridemarkridemakersmakerzcare
>
> Field Type
>
>     <fieldType name="text_syn" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer>
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>       </analyzer>
>     </fieldType>
>
>
>
> Could you please tell me what could be the issue? How do I handle
> multi-word cases?
>
>
>
>
>     synonyms.txt
>     ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care
>
>
> Thanks - David
>
>
>