You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by elisabeth benoit <el...@gmail.com> on 2012/08/02 09:56:33 UTC

matching with whole field

Hello,

I am using Solr 3.4.

I'm trying to define a type that it is possible to match with only if
request contains exactly the same words.

Let's say I have two different values for ONLY_EXACT_MATCH_FIELD

ONLY_EXACT_MATCH_FIELD: salon de coiffure
ONLY_EXACT_MATCH_FIELD: salon de coiffure pour femmes

I would like to match only with the first ont when requesting Solr with
fq=ONLY_EXACT_MATCH_FIELD:(salon de coiffure)

As far has I understood, the solution is to do not tokenize on white
spaces, and use instead solr.KeywordTokenizerFactory


My actual type is defined as followed in schema.xml

    <fieldType name="ONLY_EXACT_MATCH_FIELD" class="solr.TextField"
omitNorms="true" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
        <filter class="solr.ISOLatin1AccentFilterFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.LengthFilterFactory" min="1" max="100" />
      </analyzer>
    </fieldType>

But matching with fields with more then one word doesn't work. Does someone
have a clue what I am doing wrong?

Thanks,
Elisabeth

Re: matching with whole field

Posted by elisabeth benoit <el...@gmail.com>.

Hello Chantal,

Thanks for your answer.

In fact, my analyzer contains the same tokenizer chain for "query". I just
removed it in my email for lisibility (but maybe not good for clarity). And
I did check with the admin interface, and it says there is a match. But
with a real query to Solr, it doesn't match.

I've once read in the mailing list that one should not always trust the
admin interface for analysis...

I don't think this should interfer, but my default request handler (the one
used by fq I guess) is not edismax.


If you have more clues, I'd be glad to read.

Thanks again,
Elisabeth



2012/8/2 Chantal Ackermann <c....@it-agenten.com>

> Hi Elisabeth,
>
> try adding the same tokenizer chain for "query", as well, or simply remove
> the type="index" from the analyzer element.
>
> Your chain is analyzing the input of the indexer and removing diacritics
> and lowercasing. With your current setup, the input to the search is not
> analyzed likewise so inputs that are not lowercased or contain diacritics
> will not match.
>
> You might want to use the analysis frontend in the Admin UI to see how
> input to the indexer and the searcher is transformed and matched.
>
> Cheers,
> Chantal
>
> Am 02.08.2012 um 09:56 schrieb elisabeth benoit:
>
> > Hello,
> >
> > I am using Solr 3.4.
> >
> > I'm trying to define a type that it is possible to match with only if
> > request contains exactly the same words.
> >
> > Let's say I have two different values for ONLY_EXACT_MATCH_FIELD
> >
> > ONLY_EXACT_MATCH_FIELD: salon de coiffure
> > ONLY_EXACT_MATCH_FIELD: salon de coiffure pour femmes
> >
> > I would like to match only with the first ont when requesting Solr with
> > fq=ONLY_EXACT_MATCH_FIELD:(salon de coiffure)
> >
> > As far has I understood, the solution is to do not tokenize on white
> > spaces, and use instead solr.KeywordTokenizerFactory
> >
> >
> > My actual type is defined as followed in schema.xml
> >
> >    <fieldType name="ONLY_EXACT_MATCH_FIELD" class="solr.TextField"
> > omitNorms="true" positionIncrementGap="100">
> >      <analyzer type="index">
> >        <tokenizer class="solr.KeywordTokenizerFactory"/>
> >        <charFilter class="solr.MappingCharFilterFactory"
> > mapping="mapping-ISOLatin1Accent.txt"/>
> >        <filter class="solr.ISOLatin1AccentFilterFactory"/>
> >        <filter class="solr.StandardFilterFactory"/>
> >        <filter class="solr.LowerCaseFilterFactory"/>
> >        <filter class="solr.LengthFilterFactory" min="1" max="100" />
> >      </analyzer>
> >    </fieldType>
> >
> > But matching with fields with more then one word doesn't work. Does
> someone
> > have a clue what I am doing wrong?
> >
> > Thanks,
> > Elisabeth
>
>

Re: matching with whole field

Posted by Chantal Ackermann <c....@it-agenten.com>.

Hi Elisabeth,

try adding the same tokenizer chain for "query", as well, or simply remove the type="index" from the analyzer element.

Your chain is analyzing the input of the indexer and removing diacritics and lowercasing. With your current setup, the input to the search is not analyzed likewise so inputs that are not lowercased or contain diacritics will not match.

You might want to use the analysis frontend in the Admin UI to see how input to the indexer and the searcher is transformed and matched.

Cheers,
Chantal

Am 02.08.2012 um 09:56 schrieb elisabeth benoit:

> Hello,
> 
> I am using Solr 3.4.
> 
> I'm trying to define a type that it is possible to match with only if
> request contains exactly the same words.
> 
> Let's say I have two different values for ONLY_EXACT_MATCH_FIELD
> 
> ONLY_EXACT_MATCH_FIELD: salon de coiffure
> ONLY_EXACT_MATCH_FIELD: salon de coiffure pour femmes
> 
> I would like to match only with the first ont when requesting Solr with
> fq=ONLY_EXACT_MATCH_FIELD:(salon de coiffure)
> 
> As far has I understood, the solution is to do not tokenize on white
> spaces, and use instead solr.KeywordTokenizerFactory
> 
> 
> My actual type is defined as followed in schema.xml
> 
>    <fieldType name="ONLY_EXACT_MATCH_FIELD" class="solr.TextField"
> omitNorms="true" positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.KeywordTokenizerFactory"/>
>        <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent.txt"/>
>        <filter class="solr.ISOLatin1AccentFilterFactory"/>
>        <filter class="solr.StandardFilterFactory"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.LengthFilterFactory" min="1" max="100" />
>      </analyzer>
>    </fieldType>
> 
> But matching with fields with more then one word doesn't work. Does someone
> have a clue what I am doing wrong?
> 
> Thanks,
> Elisabeth

Re: matching with whole field

Posted by elisabeth benoit <el...@gmail.com>.

Thanks you so much Franck Brisbart.

It's working!

Best regards,
Elisabeth

2012/8/2 fbrisbart <fb...@bestofmedia.com>

> It's a parsing problem.
> You must tell the query parser to consider spaces as real characters.
> This should work (backslashing the spaces):
> fq=ONLY_EXACT_MATCH_FIELD:salon\ de\ coiffure
>
> or you may use something like that :
> fq={!term f=ONLY_EXACT_MATCH_FIELD v=$qq}&qq=salon de coiffure
>
>
> Hope it helps,
> Franck Brisbart
>
>
> Le jeudi 02 août 2012 à 09:56 +0200, elisabeth benoit a écrit :
> > Hello,
> >
> > I am using Solr 3.4.
> >
> > I'm trying to define a type that it is possible to match with only if
> > request contains exactly the same words.
> >
> > Let's say I have two different values for ONLY_EXACT_MATCH_FIELD
> >
> > ONLY_EXACT_MATCH_FIELD: salon de coiffure
> > ONLY_EXACT_MATCH_FIELD: salon de coiffure pour femmes
> >
> > I would like to match only with the first ont when requesting Solr with
> > fq=ONLY_EXACT_MATCH_FIELD:(salon de coiffure)
> >
> > As far has I understood, the solution is to do not tokenize on white
> > spaces, and use instead solr.KeywordTokenizerFactory
> >
> >
> > My actual type is defined as followed in schema.xml
> >
> >     <fieldType name="ONLY_EXACT_MATCH_FIELD" class="solr.TextField"
> > omitNorms="true" positionIncrementGap="100">
> >       <analyzer type="index">
> >         <tokenizer class="solr.KeywordTokenizerFactory"/>
> >         <charFilter class="solr.MappingCharFilterFactory"
> > mapping="mapping-ISOLatin1Accent.txt"/>
> >         <filter class="solr.ISOLatin1AccentFilterFactory"/>
> >         <filter class="solr.StandardFilterFactory"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.LengthFilterFactory" min="1" max="100" />
> >       </analyzer>
> >     </fieldType>
> >
> > But matching with fields with more then one word doesn't work. Does
> someone
> > have a clue what I am doing wrong?
> >
> > Thanks,
> > Elisabeth
>
>
>

Re: matching with whole field

Posted by fbrisbart <fb...@bestofmedia.com>.

It's a parsing problem.
You must tell the query parser to consider spaces as real characters.
This should work (backslashing the spaces):
fq=ONLY_EXACT_MATCH_FIELD:salon\ de\ coiffure

or you may use something like that :
fq={!term f=ONLY_EXACT_MATCH_FIELD v=$qq}&qq=salon de coiffure


Hope it helps,
Franck Brisbart


Le jeudi 02 août 2012 à 09:56 +0200, elisabeth benoit a écrit :
> Hello,
> 
> I am using Solr 3.4.
> 
> I'm trying to define a type that it is possible to match with only if
> request contains exactly the same words.
> 
> Let's say I have two different values for ONLY_EXACT_MATCH_FIELD
> 
> ONLY_EXACT_MATCH_FIELD: salon de coiffure
> ONLY_EXACT_MATCH_FIELD: salon de coiffure pour femmes
> 
> I would like to match only with the first ont when requesting Solr with
> fq=ONLY_EXACT_MATCH_FIELD:(salon de coiffure)
> 
> As far has I understood, the solution is to do not tokenize on white
> spaces, and use instead solr.KeywordTokenizerFactory
> 
> 
> My actual type is defined as followed in schema.xml
> 
>     <fieldType name="ONLY_EXACT_MATCH_FIELD" class="solr.TextField"
> omitNorms="true" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent.txt"/>
>         <filter class="solr.ISOLatin1AccentFilterFactory"/>
>         <filter class="solr.StandardFilterFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.LengthFilterFactory" min="1" max="100" />
>       </analyzer>
>     </fieldType>
> 
> But matching with fields with more then one word doesn't work. Does someone
> have a clue what I am doing wrong?
> 
> Thanks,
> Elisabeth