You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Gauri Dhawan <ga...@sheroes.in.INVALID> on 2018/10/23 14:20:21 UTC

Regarding multi keyword search

Hi!
I have been facing an issue for quite some time and haven't been able to
come to a solution as of yet. We are trying to implement search on our
platform and all our data is stored in Solr.

I have a field `description` which is the field where I have to search.
It is of the field type `text_edit_suggest` and it looks something like this

<fieldType name="text_suggest_edge" class="solr.TextField">
>       <analyzer type="index">
>         <!--charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent.txt"/-->
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.PatternReplaceFilterFactory"
> pattern="([\.,;:-_])" replacement=" " replace="all"/>
>         <filter class="solr.EdgeNGramFilterFactory" maxGramSize="30"
> minGramSize="1"/>
>         <filter class="solr.PatternReplaceFilterFactory"
> pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="false"/>
>           <tokenizer class="solr.StandardTokenizerFactory "/>
>           <filter class="solr.PorterStemFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <!--charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping-ISOLatin1Accent.txt"/-->
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.PatternReplaceFilterFactory"
> pattern="([\.,;:-_])" replacement=" " replace="all"/>
>         <filter class="solr.PatternReplaceFilterFactory"
> pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all"/>
>         <filter class="solr.PatternReplaceFilterFactory"
> pattern="^(.{30})(.*)?" replacement="$1" replace="all"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="false"/>
>       </analyzer>



When I search for multiple keywords, the results are unexpected.
For example :
I want to search for the words `first` and `post` and both these words
should be present in the description field of the document else it
shouldn't return the document.
I've tried some 50+ queries for this. Used `edismax` parser as well but in
vain.

Tried boosting as well. But most queries result in weight given to either
one of the keywords and results in documents that have that keyword but not
the other. Can you guys help? Thanks in advance!


Gauri Dhawan
Associate Software Engineer
SHEROES

Re: Regarding multi keyword search

Posted by Walter Underwood <wu...@wunderwood.org>.

100% on mm with dangerous. If there is one misspelled or wrong word, there are zero matches.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 23, 2018, at 8:25 AM, ANNAMANENI RAVEENDRA <a....@gmail.com> wrote:
> 
> You should use mm parameter and it should be set to 100 if you use dismax
> or edismax
> 
> 
> On Tue, Oct 23, 2018 at 11:18 AM Gauri Dhawan <ga...@sheroes.in.invalid>
> wrote:
> 
>> Hi!
>> I have been facing an issue for quite some time and haven't been able to
>> come to a solution as of yet. We are trying to implement search on our
>> platform and all our data is stored in Solr.
>> 
>> I have a field `description` which is the field where I have to search.
>> It is of the field type `text_edit_suggest` and it looks something like
>> this
>> 
>> <fieldType name="text_suggest_edge" class="solr.TextField">
>>>      <analyzer type="index">
>>>        <!--charFilter class="solr.MappingCharFilterFactory"
>>> mapping="mapping-ISOLatin1Accent.txt"/-->
>>>        <tokenizer class="solr.KeywordTokenizerFactory"/>
>>>        <filter class="solr.LowerCaseFilterFactory"/>
>>>        <filter class="solr.PatternReplaceFilterFactory"
>>> pattern="([\.,;:-_])" replacement=" " replace="all"/>
>>>        <filter class="solr.EdgeNGramFilterFactory" maxGramSize="30"
>>> minGramSize="1"/>
>>>        <filter class="solr.PatternReplaceFilterFactory"
>>> pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all"/>
>>>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>>> ignoreCase="true" expand="false"/>
>>>          <tokenizer class="solr.StandardTokenizerFactory "/>
>>>          <filter class="solr.PorterStemFilterFactory"/>
>>>      </analyzer>
>>>      <analyzer type="query">
>>>        <!--charFilter class="solr.MappingCharFilterFactory"
>>> mapping="mapping-ISOLatin1Accent.txt"/-->
>>>        <tokenizer class="solr.KeywordTokenizerFactory"/>
>>>        <filter class="solr.LowerCaseFilterFactory"/>
>>>        <filter class="solr.PatternReplaceFilterFactory"
>>> pattern="([\.,;:-_])" replacement=" " replace="all"/>
>>>        <filter class="solr.PatternReplaceFilterFactory"
>>> pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all"/>
>>>        <filter class="solr.PatternReplaceFilterFactory"
>>> pattern="^(.{30})(.*)?" replacement="$1" replace="all"/>
>>>        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>>> ignoreCase="true" expand="false"/>
>>>      </analyzer>
>> 
>> 
>> 
>> When I search for multiple keywords, the results are unexpected.
>> For example :
>> I want to search for the words `first` and `post` and both these words
>> should be present in the description field of the document else it
>> shouldn't return the document.
>> I've tried some 50+ queries for this. Used `edismax` parser as well but in
>> vain.
>> 
>> Tried boosting as well. But most queries result in weight given to either
>> one of the keywords and results in documents that have that keyword but not
>> the other. Can you guys help? Thanks in advance!
>> 
>> 
>> Gauri Dhawan
>> Associate Software Engineer
>> SHEROES
>>

Re: Regarding multi keyword search

Posted by ANNAMANENI RAVEENDRA <a....@gmail.com>.

You should use mm parameter and it should be set to 100 if you use dismax
or edismax


On Tue, Oct 23, 2018 at 11:18 AM Gauri Dhawan <ga...@sheroes.in.invalid>
wrote:

> Hi!
> I have been facing an issue for quite some time and haven't been able to
> come to a solution as of yet. We are trying to implement search on our
> platform and all our data is stored in Solr.
>
> I have a field `description` which is the field where I have to search.
> It is of the field type `text_edit_suggest` and it looks something like
> this
>
> <fieldType name="text_suggest_edge" class="solr.TextField">
> >       <analyzer type="index">
> >         <!--charFilter class="solr.MappingCharFilterFactory"
> > mapping="mapping-ISOLatin1Accent.txt"/-->
> >         <tokenizer class="solr.KeywordTokenizerFactory"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.PatternReplaceFilterFactory"
> > pattern="([\.,;:-_])" replacement=" " replace="all"/>
> >         <filter class="solr.EdgeNGramFilterFactory" maxGramSize="30"
> > minGramSize="1"/>
> >         <filter class="solr.PatternReplaceFilterFactory"
> > pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all"/>
> >         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> > ignoreCase="true" expand="false"/>
> >           <tokenizer class="solr.StandardTokenizerFactory "/>
> >           <filter class="solr.PorterStemFilterFactory"/>
> >       </analyzer>
> >       <analyzer type="query">
> >         <!--charFilter class="solr.MappingCharFilterFactory"
> > mapping="mapping-ISOLatin1Accent.txt"/-->
> >         <tokenizer class="solr.KeywordTokenizerFactory"/>
> >         <filter class="solr.LowerCaseFilterFactory"/>
> >         <filter class="solr.PatternReplaceFilterFactory"
> > pattern="([\.,;:-_])" replacement=" " replace="all"/>
> >         <filter class="solr.PatternReplaceFilterFactory"
> > pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all"/>
> >         <filter class="solr.PatternReplaceFilterFactory"
> > pattern="^(.{30})(.*)?" replacement="$1" replace="all"/>
> >         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> > ignoreCase="true" expand="false"/>
> >       </analyzer>
>
>
>
> When I search for multiple keywords, the results are unexpected.
> For example :
> I want to search for the words `first` and `post` and both these words
> should be present in the description field of the document else it
> shouldn't return the document.
> I've tried some 50+ queries for this. Used `edismax` parser as well but in
> vain.
>
> Tried boosting as well. But most queries result in weight given to either
> one of the keywords and results in documents that have that keyword but not
> the other. Can you guys help? Thanks in advance!
>
>
> Gauri Dhawan
> Associate Software Engineer
> SHEROES
>

Re: Regarding multi keyword search

Posted by Shawn Heisey <ap...@elyograg.org>.

On 10/23/2018 8:20 AM, Gauri Dhawan wrote:
> I have been facing an issue for quite some time and haven't been able to
> come to a solution as of yet. We are trying to implement search on our
> platform and all our data is stored in Solr.
>
> I have a field `description` which is the field where I have to search.
> It is of the field type `text_edit_suggest` and it looks something like this
>
> <fieldType name="text_suggest_edge" class="solr.TextField">
>>        <analyzer type="index">
<snip>
>>          <tokenizer class="solr.KeywordTokenizerFactory"/>
<snip>
>>            <tokenizer class="solr.StandardTokenizerFactory "/>
<snip>
>>       <analyzer type="query">
<snip>
>>          <tokenizer class="solr.KeywordTokenizerFactory"/>
<snip>
> When I search for multiple keywords, the results are unexpected.
> For example :
> I want to search for the words `first` and `post` and both these words
> should be present in the description field of the document else it
> shouldn't return the document.

Your index analysis has two tokenizers.  You can only have one.  There 
is at least one typo in the fieldType definition provided.  After I fix 
that, Solr 7.5.0 won't load the core, with this as the error:

Plugin init failure for [schema.xml] fieldType "text_suggest_edge": 
Plugin init failure for [schema.xml] analyzer/tokenizer: The schema 
defines multiple tokenizers for: [tokenizer: null]

What version of Solr are you running?  Have you explicitly included the 
"sow" parameter on your query, or in the handler definition?

The KeywordTokenizerFactory that you're using probably doesn't do what 
you think it does.  It preserves the entire input as a single token -- 
doesn't split it into separate words.  The kind of searching you 
mentioned likely isn't possible with the analysis chain you've got.  It 
might take a bunch of back and forth question/answer cycles to get to 
something useful.

In my strong opinion, that KeywordTokenizerFactory has a terrible name 
and needs a new one.  Anyone want to bikeshed the possibilities?

Thanks,
Shawn