You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Brian Narsi <bn...@gmail.com> on 2015/10/14 20:03:50 UTC

partial search EdgeNGramFilterFactory

I have the following fieldtype in my schema:

   <fieldType name="text_edgngrm" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
maxGramSize="25"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
  </fieldType>

and the following field:
<field name="SellerName" type="text_edgngrm" indexed="true" stored="true"
required="true" multiValued="false" />

With the following data:
 SellerName:CARDINAL HEALTH

When I do the following search

q:SellerName:cardinal

I get back the results with SellerName: CARDINAL HEALTH (correct)

or I do the search

q:SellerName:cardinal he

I get back the results with SellerName: CARDINAL HEALTH (correct)

But when I do the search

q:SellerName:cardinal hea

I am getting the results back with SellerName:INTEGRA RADIONICS

Why is that?

I need it to continue to return the correct results with CARDINAL HEALTH.
How do I make that happen?

Thanks in advance,

Re: partial search EdgeNGramFilterFactory

Posted by Alessandro Benedetti <be...@gmail.com>.
let's analyse your query requirement :


On 15 October 2015 at 03:08, Brian Narsi <bn...@gmail.com> wrote:
>
> 1) cardinal healthcare products
> 2) cardinal healthcare
> 3) postoperative cardinal healthcare
> 4) surgical cardinal products
>
>

> q=SellerName:cardinal - all 4 records returned
> q=SellerName:healthcare - 1,2,3 returned
> q=SellerName:surgical cardinal - 4 returned
> q=SellerName:cardinal healthcare - 1,2,3 returned
> q=SellerName:products - 1,4 returned
>

These fours are easy to get with a standard tokeniser ( if classic western
language) + analysis you want to add for your specific language ( stemming,
lemmatisation, etc etc)
Then at query time it's only matter to use the PhraseQuery functionality in
solr. Possibly you would like to give some tolerance in the distance
between the terms ( positional queries ~N)
Eg:
"cardinal healthcare"~2
This depends on your use case really.



> q=SellerName:car - nothing returned
> q=SellerName:card - all 4 returned
>
> This is a little bit different, as we enter the autocompletion world, and
in a very weird way,
why car should not return all the docs ?
Only because it's length ?
In the case it's easy peasy, you configure your edge ngram token filter
with a min Ngram of 4 and maximum of N .

Cheers


> How should I setup my fieldtype?
>
> Thanks
>
>
> On Wed, Oct 14, 2015 at 1:14 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
> > try adding &debug=true to your query. The query
> > q=SellerName:cardinal he
> > actually parses as
> > q=SellerName:cardinal defaultSearchField:he
> >
> > so I suspect you're getting on the default search field.
> >
> > I'm not sure EdgeNGram is what you want here though.
> > That only grams individual tokens, so CARDINAL is grammed
> > totally separately from HEALTH. You might consider
> > a different tokenizer, say KeywordTokenizer and LowerCaseFilter
> > followed by edgeNGram to treat the whole thing as a unit. You'd have
> > to take some care to make sure you escaped spaces to get
> > the whole thing through the query parser though.
> >
> > Best,
> > Erick
> >
> > On Wed, Oct 14, 2015 at 11:03 AM, Brian Narsi <bn...@gmail.com>
> wrote:
> > > I have the following fieldtype in my schema:
> > >
> > >    <fieldType name="text_edgngrm" class="solr.TextField"
> > > positionIncrementGap="100">
> > > <analyzer type="index">
> > > <tokenizer class="solr.StandardTokenizerFactory"/>
> > > <filter class="solr.LowerCaseFilterFactory"/>
> > > <filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
> > > maxGramSize="25"/>
> > > </analyzer>
> > > <analyzer type="query">
> > > <tokenizer class="solr.StandardTokenizerFactory"/>
> > > <filter class="solr.LowerCaseFilterFactory"/>
> > > </analyzer>
> > >   </fieldType>
> > >
> > > and the following field:
> > > <field name="SellerName" type="text_edgngrm" indexed="true"
> stored="true"
> > > required="true" multiValued="false" />
> > >
> > > With the following data:
> > >  SellerName:CARDINAL HEALTH
> > >
> > > When I do the following search
> > >
> > > q:SellerName:cardinal
> > >
> > > I get back the results with SellerName: CARDINAL HEALTH (correct)
> > >
> > > or I do the search
> > >
> > > q:SellerName:cardinal he
> > >
> > > I get back the results with SellerName: CARDINAL HEALTH (correct)
> > >
> > > But when I do the search
> > >
> > > q:SellerName:cardinal hea
> > >
> > > I am getting the results back with SellerName:INTEGRA RADIONICS
> > >
> > > Why is that?
> > >
> > > I need it to continue to return the correct results with CARDINAL
> HEALTH.
> > > How do I make that happen?
> > >
> > > Thanks in advance,
> >
>



-- 
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: partial search EdgeNGramFilterFactory

Posted by Brian Narsi <bn...@gmail.com>.
Thank you Erick. Yes it was the default search field.

So for the following SellerName:

1) cardinal healthcare products
2) cardinal healthcare
3) postoperative cardinal healthcare
4) surgical cardinal products

My requirement is:
q=SellerName:cardinal - all 4 records returned
q=SellerName:healthcare - 1,2,3 returned
q=SellerName:surgical cardinal - 4 returned
q=SellerName:cardinal healthcare - 1,2,3 returned
q=SellerName:products - 1,4 returned
q=SellerName:car - nothing returned
q=SellerName:card - all 4 returned

How should I setup my fieldtype?

Thanks


On Wed, Oct 14, 2015 at 1:14 PM, Erick Erickson <er...@gmail.com>
wrote:

> try adding &debug=true to your query. The query
> q=SellerName:cardinal he
> actually parses as
> q=SellerName:cardinal defaultSearchField:he
>
> so I suspect you're getting on the default search field.
>
> I'm not sure EdgeNGram is what you want here though.
> That only grams individual tokens, so CARDINAL is grammed
> totally separately from HEALTH. You might consider
> a different tokenizer, say KeywordTokenizer and LowerCaseFilter
> followed by edgeNGram to treat the whole thing as a unit. You'd have
> to take some care to make sure you escaped spaces to get
> the whole thing through the query parser though.
>
> Best,
> Erick
>
> On Wed, Oct 14, 2015 at 11:03 AM, Brian Narsi <bn...@gmail.com> wrote:
> > I have the following fieldtype in my schema:
> >
> >    <fieldType name="text_edgngrm" class="solr.TextField"
> > positionIncrementGap="100">
> > <analyzer type="index">
> > <tokenizer class="solr.StandardTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > <filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
> > maxGramSize="25"/>
> > </analyzer>
> > <analyzer type="query">
> > <tokenizer class="solr.StandardTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > </analyzer>
> >   </fieldType>
> >
> > and the following field:
> > <field name="SellerName" type="text_edgngrm" indexed="true" stored="true"
> > required="true" multiValued="false" />
> >
> > With the following data:
> >  SellerName:CARDINAL HEALTH
> >
> > When I do the following search
> >
> > q:SellerName:cardinal
> >
> > I get back the results with SellerName: CARDINAL HEALTH (correct)
> >
> > or I do the search
> >
> > q:SellerName:cardinal he
> >
> > I get back the results with SellerName: CARDINAL HEALTH (correct)
> >
> > But when I do the search
> >
> > q:SellerName:cardinal hea
> >
> > I am getting the results back with SellerName:INTEGRA RADIONICS
> >
> > Why is that?
> >
> > I need it to continue to return the correct results with CARDINAL HEALTH.
> > How do I make that happen?
> >
> > Thanks in advance,
>

Re: partial search EdgeNGramFilterFactory

Posted by Erick Erickson <er...@gmail.com>.
try adding &debug=true to your query. The query
q=SellerName:cardinal he
actually parses as
q=SellerName:cardinal defaultSearchField:he

so I suspect you're getting on the default search field.

I'm not sure EdgeNGram is what you want here though.
That only grams individual tokens, so CARDINAL is grammed
totally separately from HEALTH. You might consider
a different tokenizer, say KeywordTokenizer and LowerCaseFilter
followed by edgeNGram to treat the whole thing as a unit. You'd have
to take some care to make sure you escaped spaces to get
the whole thing through the query parser though.

Best,
Erick

On Wed, Oct 14, 2015 at 11:03 AM, Brian Narsi <bn...@gmail.com> wrote:
> I have the following fieldtype in my schema:
>
>    <fieldType name="text_edgngrm" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
> maxGramSize="25"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
>   </fieldType>
>
> and the following field:
> <field name="SellerName" type="text_edgngrm" indexed="true" stored="true"
> required="true" multiValued="false" />
>
> With the following data:
>  SellerName:CARDINAL HEALTH
>
> When I do the following search
>
> q:SellerName:cardinal
>
> I get back the results with SellerName: CARDINAL HEALTH (correct)
>
> or I do the search
>
> q:SellerName:cardinal he
>
> I get back the results with SellerName: CARDINAL HEALTH (correct)
>
> But when I do the search
>
> q:SellerName:cardinal hea
>
> I am getting the results back with SellerName:INTEGRA RADIONICS
>
> Why is that?
>
> I need it to continue to return the correct results with CARDINAL HEALTH.
> How do I make that happen?
>
> Thanks in advance,