You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Brian Narsi <bn...@gmail.com> on 2015/10/26 19:24:51 UTC

Query differently or change fieldtype

I have the following field type on a field ClientName:

<fieldType name="txt_edgngrm" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
maxGramSize="25"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
  </fieldType>


For data where

ClientName = st jude medical inc

When querying I get the following:

1) st --> result = st jude medical inc (works correctly)
2) st j  --> No results are returned (NOT correct) - Expect to find st jude
medical inc
3) st ju m --> No results are returned (NOT correct) - Expect to find st
jude medical inc
4) st ju me --> result = st jude medical inc (works correctly)
5) st ju inc --> No results are returned (NOT correct) - Expect to find st
jude medical inc

Is my field type definition correct? Or do I need to query differently?

Thanks

Re: Query differently or change fieldtype

Posted by Upayavira <uv...@odoko.co.uk>.
Use the analysis tab on the admin UI to see what analysis is doing to
your terms.

Then bear in mind that a query parser will split on space. So, you might
want to do clientName:"st ju me" to make the tokenisation happen within
the analysis chain rather than the query parser.

Upayavira

On Mon, Oct 26, 2015, at 06:24 PM, Brian Narsi wrote:
> I have the following field type on a field ClientName:
> 
> <fieldType name="txt_edgngrm" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
> maxGramSize="25"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
>   </fieldType>
> 
> 
> For data where
> 
> ClientName = st jude medical inc
> 
> When querying I get the following:
> 
> 1) st --> result = st jude medical inc (works correctly)
> 2) st j  --> No results are returned (NOT correct) - Expect to find st
> jude
> medical inc
> 3) st ju m --> No results are returned (NOT correct) - Expect to find st
> jude medical inc
> 4) st ju me --> result = st jude medical inc (works correctly)
> 5) st ju inc --> No results are returned (NOT correct) - Expect to find
> st
> jude medical inc
> 
> Is my field type definition correct? Or do I need to query differently?
> 
> Thanks

Re: Query differently or change fieldtype

Posted by Ray Niu <ne...@gmail.com>.
I think this is how StandardTokenizerFactory works, if you want different
behavior, you should try to use a different tokenizer, also like Upayavira
said,use the analysis tab on the admin UI to see what analysis is doing to your
terms.

2015-10-26 12:33 GMT-07:00 Brian Narsi <bn...@gmail.com>:

> That is right Ray, that is exactly what I found out and that is why I am
> asking the question.
>
> On Mon, Oct 26, 2015 at 2:19 PM, Ray Niu <ne...@gmail.com> wrote:
>
> > I found the conf minGramSize="2",which will only create index with at
> least
> > 2 chars,j will not match
> > also StandardTokenizerFactory will tokenize st j to st and j
> >
> > 2015年10月26日星期一,Brian Narsi <bn...@gmail.com> 写道:
> >
> > > I have the following field type on a field ClientName:
> > >
> > > <fieldType name="txt_edgngrm" class="solr.TextField"
> > > positionIncrementGap="100">
> > > <analyzer type="index">
> > > <tokenizer class="solr.StandardTokenizerFactory"/>
> > > <filter class="solr.LowerCaseFilterFactory"/>
> > > <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
> > > maxGramSize="25"/>
> > > </analyzer>
> > > <analyzer type="query">
> > > <tokenizer class="solr.StandardTokenizerFactory"/>
> > > <filter class="solr.LowerCaseFilterFactory"/>
> > > </analyzer>
> > >   </fieldType>
> > >
> > >
> > > For data where
> > >
> > > ClientName = st jude medical inc
> > >
> > > When querying I get the following:
> > >
> > > 1) st --> result = st jude medical inc (works correctly)
> > > 2) st j  --> No results are returned (NOT correct) - Expect to find st
> > jude
> > > medical inc
> > > 3) st ju m --> No results are returned (NOT correct) - Expect to find
> st
> > > jude medical inc
> > > 4) st ju me --> result = st jude medical inc (works correctly)
> > > 5) st ju inc --> No results are returned (NOT correct) - Expect to find
> > st
> > > jude medical inc
> > >
> > > Is my field type definition correct? Or do I need to query differently?
> > >
> > > Thanks
> > >
> >
>

Re: Query differently or change fieldtype

Posted by Brian Narsi <bn...@gmail.com>.
That is right Ray, that is exactly what I found out and that is why I am
asking the question.

On Mon, Oct 26, 2015 at 2:19 PM, Ray Niu <ne...@gmail.com> wrote:

> I found the conf minGramSize="2",which will only create index with at least
> 2 chars,j will not match
> also StandardTokenizerFactory will tokenize st j to st and j
>
> 2015年10月26日星期一,Brian Narsi <bn...@gmail.com> 写道:
>
> > I have the following field type on a field ClientName:
> >
> > <fieldType name="txt_edgngrm" class="solr.TextField"
> > positionIncrementGap="100">
> > <analyzer type="index">
> > <tokenizer class="solr.StandardTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
> > maxGramSize="25"/>
> > </analyzer>
> > <analyzer type="query">
> > <tokenizer class="solr.StandardTokenizerFactory"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > </analyzer>
> >   </fieldType>
> >
> >
> > For data where
> >
> > ClientName = st jude medical inc
> >
> > When querying I get the following:
> >
> > 1) st --> result = st jude medical inc (works correctly)
> > 2) st j  --> No results are returned (NOT correct) - Expect to find st
> jude
> > medical inc
> > 3) st ju m --> No results are returned (NOT correct) - Expect to find st
> > jude medical inc
> > 4) st ju me --> result = st jude medical inc (works correctly)
> > 5) st ju inc --> No results are returned (NOT correct) - Expect to find
> st
> > jude medical inc
> >
> > Is my field type definition correct? Or do I need to query differently?
> >
> > Thanks
> >
>

Re: Query differently or change fieldtype

Posted by Ray Niu <ne...@gmail.com>.
I found the conf minGramSize="2",which will only create index with at least
2 chars,j will not match
also StandardTokenizerFactory will tokenize st j to st and j

2015年10月26日星期一,Brian Narsi <bn...@gmail.com> 写道:

> I have the following field type on a field ClientName:
>
> <fieldType name="txt_edgngrm" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
> maxGramSize="25"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
>   </fieldType>
>
>
> For data where
>
> ClientName = st jude medical inc
>
> When querying I get the following:
>
> 1) st --> result = st jude medical inc (works correctly)
> 2) st j  --> No results are returned (NOT correct) - Expect to find st jude
> medical inc
> 3) st ju m --> No results are returned (NOT correct) - Expect to find st
> jude medical inc
> 4) st ju me --> result = st jude medical inc (works correctly)
> 5) st ju inc --> No results are returned (NOT correct) - Expect to find st
> jude medical inc
>
> Is my field type definition correct? Or do I need to query differently?
>
> Thanks
>