You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Teague James <te...@insystechinc.com> on 2014/02/05 22:52:49 UTC

Partial Word Search

I cannot get Solr 4.6.0 to do partial word search on a particular field that
is used for faceting. Most of the information I have found suggests
modifying the fieldType "text" to include either the NGramFilterFactory or
EdgeNGramFilterFactory in the filter. However since I am copying many other
fields to "text" for searching my expectation is that the NGramFilterFactory
would create ngrams for everything sent to it, which is unnecessary and
probably costly - right? 

In an effort to try and troubleshoot the issue I created a new field in the
schema and stored it so that I could see what was getting populated.
However, what I'm finding is that no ngrams are being generated, just the
actual data that gets indexed from the database.

Here's what my setup looks like:
NOTE: Every record in my test environment has the same value "Example"

<field name="PartialSubject" type="partialWord" indexed="true" stored="true"
multiValued="true" />

<copyField source="PartialSubject" dest="text">

<fieldType name="partialWord" class="solr.TextField"
positionIncrementGap="100">
	<analyzer>
		<tokenizer class="solr.StandardTokenizerFactory"/>
		<filter class="solr.LowerCaseFilterFactory"/>
		<filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
maxGramSize="10" side="front"/>
	</analyzer>
</fieldType>

When I query Solr it reports:
<arr name="PartialSubject">
	<str>Example</str>
</arr>

I was expecting exa, exam, examp, example, example to be the values for
PartialSubject so that a search for "exam" would turn up all of the records
in this test index. Instead I get 0 results.

Can anyone provide any guidance on this please?

RE: Partial Word Search

Posted by Teague James <te...@insystechinc.com>.

Update: RESOLVED

On a hunch I decided to forego trying to separate the EdgeNGramFilterFactory
from this one column and apply it to all columns that are copied into the
'text' filed that Solr uses for searching. I moved the filter factory into
fieldType  'text_general' which is the type that 'text' uses. Everything
worked! Thanks for your help Jack!

-Teague

-----Original Message-----
From: Jack Krupansky [mailto:jack@basetechnology.com] 
Sent: Wednesday, February 05, 2014 6:07 PM
To: solr-user@lucene.apache.org
Subject: Re: Partial Word Search

1. The ngramming occurs in the index, but does not modify the original,
"stored" value that a query will return. So, "Example" will be returned even
though the index will have all the sub-terms indexed (but not stored.)

2. You need the ngram filters to be asymmetric with regard to indexing and
query - the index analyzer does ngramming, but the query analyzer will not. 
You have a single analyzer, which means that the query will be expanded into
a sequence of sub-terms, which will be ORed or ANDed depending on your
default query operator. OR will generally work since it will query for all
the sub-terms, but AND will only work if all the sub-terms occur in the
document field.

-- Jack Krupansky

-----Original Message-----
From: Teague James
Sent: Wednesday, February 5, 2014 4:52 PM
To: solr-user@lucene.apache.org
Subject: Partial Word Search

I cannot get Solr 4.6.0 to do partial word search on a particular field that
is used for faceting. Most of the information I have found suggests
modifying the fieldType "text" to include either the NGramFilterFactory or
EdgeNGramFilterFactory in the filter. However since I am copying many other
fields to "text" for searching my expectation is that the NGramFilterFactory
would create ngrams for everything sent to it, which is unnecessary and
probably costly - right?

In an effort to try and troubleshoot the issue I created a new field in the
schema and stored it so that I could see what was getting populated.
However, what I'm finding is that no ngrams are being generated, just the
actual data that gets indexed from the database.

Here's what my setup looks like:
NOTE: Every record in my test environment has the same value "Example"

<field name="PartialSubject" type="partialWord" indexed="true" stored="true"
multiValued="true" />

<copyField source="PartialSubject" dest="text">

<fieldType name="partialWord" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
maxGramSize="10" side="front"/>
</analyzer>
</fieldType>

When I query Solr it reports:
<arr name="PartialSubject">
<str>Example</str>
</arr>

I was expecting exa, exam, examp, example, example to be the values for
PartialSubject so that a search for "exam" would turn up all of the records
in this test index. Instead I get 0 results.

Can anyone provide any guidance on this please?

Re: Partial Word Search

Posted by Jack Krupansky <ja...@basetechnology.com>.

Did you remember to completely reindex your data after changing the 
analyzer?

Also, use the Solr admin UI Analysis page to verify the analysis for the 
test cases, both index and query.

The fact that you are using the default query operator of OR means that even 
the symmetric analysis should work, at least for some simple cases.

-- Jack Krupansky

-----Original Message----- 
From: Teague James
Sent: Thursday, February 6, 2014 11:11 AM
To: solr-user@lucene.apache.org
Subject: RE: Partial Word Search

Jack,

Thanks for responding! I had tried configuring this asymmetrically before
with no luck, so I tried it again, and still no luck. My understanding is
that the default behavior for Solr is "OR" and I do not have a 'q.op='
anywhere that would change that behavior. Since it is only a 1 term search
for 'exam' the operator shouldn't matter, right? So here's my asymmetric
config:

NOTE: Every record in my test environment has the same value for
PartialSubject "Example"

<field name="PartialSubject" type="partialWord" indexed="true" stored="true"
multiValued="true" />

<copyField source="PartialSubject" dest="text">

<fieldType name="partialWord" class="solr.TextField"
positionIncrementGap="100">
<analyzer ="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
maxGramSize="10" side="front"/>
</analyzer>
<analyzer="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer></fieldType>

Searching for 'exam' yields 0 results, even though every record has
'Example' in the PartialSubject field. Any thoughts on what my configuration
might be missing?

-Teague

-----Original Message-----
From: Jack Krupansky [mailto:jack@basetechnology.com]
Sent: Wednesday, February 05, 2014 6:07 PM
To: solr-user@lucene.apache.org
Subject: Re: Partial Word Search

1. The ngramming occurs in the index, but does not modify the original,
"stored" value that a query will return. So, "Example" will be returned even
though the index will have all the sub-terms indexed (but not stored.)

2. You need the ngram filters to be asymmetric with regard to indexing and
query - the index analyzer does ngramming, but the query analyzer will not.
You have a single analyzer, which means that the query will be expanded into
a sequence of sub-terms, which will be ORed or ANDed depending on your
default query operator. OR will generally work since it will query for all
the sub-terms, but AND will only work if all the sub-terms occur in the
document field.

-- Jack Krupansky

-----Original Message-----
From: Teague James
Sent: Wednesday, February 5, 2014 4:52 PM
To: solr-user@lucene.apache.org
Subject: Partial Word Search

I cannot get Solr 4.6.0 to do partial word search on a particular field that
is used for faceting. Most of the information I have found suggests
modifying the fieldType "text" to include either the NGramFilterFactory or
EdgeNGramFilterFactory in the filter. However since I am copying many other
fields to "text" for searching my expectation is that the NGramFilterFactory
would create ngrams for everything sent to it, which is unnecessary and
probably costly - right?

In an effort to try and troubleshoot the issue I created a new field in the
schema and stored it so that I could see what was getting populated.
However, what I'm finding is that no ngrams are being generated, just the
actual data that gets indexed from the database.

Here's what my setup looks like:
NOTE: Every record in my test environment has the same value "Example"

<field name="PartialSubject" type="partialWord" indexed="true" stored="true"
multiValued="true" />

<copyField source="PartialSubject" dest="text">

<fieldType name="partialWord" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
maxGramSize="10" side="front"/>
</analyzer>
</fieldType>

When I query Solr it reports:
<arr name="PartialSubject">
<str>Example</str>
</arr>

I was expecting exa, exam, examp, example, example to be the values for
PartialSubject so that a search for "exam" would turn up all of the records
in this test index. Instead I get 0 results.

Can anyone provide any guidance on this please?

RE: Partial Word Search

Posted by Teague James <te...@insystechinc.com>.

Jack,

Thanks for responding! I had tried configuring this asymmetrically before
with no luck, so I tried it again, and still no luck. My understanding is
that the default behavior for Solr is "OR" and I do not have a 'q.op='
anywhere that would change that behavior. Since it is only a 1 term search
for 'exam' the operator shouldn't matter, right? So here's my asymmetric
config:

NOTE: Every record in my test environment has the same value for
PartialSubject "Example"

<field name="PartialSubject" type="partialWord" indexed="true" stored="true"
multiValued="true" />

<copyField source="PartialSubject" dest="text">

<fieldType name="partialWord" class="solr.TextField"
positionIncrementGap="100">
<analyzer ="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
maxGramSize="10" side="front"/>
</analyzer>
<analyzer="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer></fieldType>

Searching for 'exam' yields 0 results, even though every record has
'Example' in the PartialSubject field. Any thoughts on what my configuration
might be missing?

-Teague

-----Original Message-----
From: Jack Krupansky [mailto:jack@basetechnology.com] 
Sent: Wednesday, February 05, 2014 6:07 PM
To: solr-user@lucene.apache.org
Subject: Re: Partial Word Search

1. The ngramming occurs in the index, but does not modify the original,
"stored" value that a query will return. So, "Example" will be returned even
though the index will have all the sub-terms indexed (but not stored.)

2. You need the ngram filters to be asymmetric with regard to indexing and
query - the index analyzer does ngramming, but the query analyzer will not. 
You have a single analyzer, which means that the query will be expanded into
a sequence of sub-terms, which will be ORed or ANDed depending on your
default query operator. OR will generally work since it will query for all
the sub-terms, but AND will only work if all the sub-terms occur in the
document field.

-- Jack Krupansky

-----Original Message-----
From: Teague James
Sent: Wednesday, February 5, 2014 4:52 PM
To: solr-user@lucene.apache.org
Subject: Partial Word Search

I cannot get Solr 4.6.0 to do partial word search on a particular field that
is used for faceting. Most of the information I have found suggests
modifying the fieldType "text" to include either the NGramFilterFactory or
EdgeNGramFilterFactory in the filter. However since I am copying many other
fields to "text" for searching my expectation is that the NGramFilterFactory
would create ngrams for everything sent to it, which is unnecessary and
probably costly - right?

In an effort to try and troubleshoot the issue I created a new field in the
schema and stored it so that I could see what was getting populated.
However, what I'm finding is that no ngrams are being generated, just the
actual data that gets indexed from the database.

Here's what my setup looks like:
NOTE: Every record in my test environment has the same value "Example"

<field name="PartialSubject" type="partialWord" indexed="true" stored="true"
multiValued="true" />

<copyField source="PartialSubject" dest="text">

<fieldType name="partialWord" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
maxGramSize="10" side="front"/>
</analyzer>
</fieldType>

When I query Solr it reports:
<arr name="PartialSubject">
<str>Example</str>
</arr>

I was expecting exa, exam, examp, example, example to be the values for
PartialSubject so that a search for "exam" would turn up all of the records
in this test index. Instead I get 0 results.

Can anyone provide any guidance on this please?

Re: Partial Word Search

Posted by Jack Krupansky <ja...@basetechnology.com>.

1. The ngramming occurs in the index, but does not modify the original, 
"stored" value that a query will return. So, "Example" will be returned even 
though the index will have all the sub-terms indexed (but not stored.)

2. You need the ngram filters to be asymmetric with regard to indexing and 
query - the index analyzer does ngramming, but the query analyzer will not. 
You have a single analyzer, which means that the query will be expanded into 
a sequence of sub-terms, which will be ORed or ANDed depending on your 
default query operator. OR will generally work since it will query for all 
the sub-terms, but AND will only work if all the sub-terms occur in the 
document field.

-- Jack Krupansky

-----Original Message----- 
From: Teague James
Sent: Wednesday, February 5, 2014 4:52 PM
To: solr-user@lucene.apache.org
Subject: Partial Word Search

I cannot get Solr 4.6.0 to do partial word search on a particular field that
is used for faceting. Most of the information I have found suggests
modifying the fieldType "text" to include either the NGramFilterFactory or
EdgeNGramFilterFactory in the filter. However since I am copying many other
fields to "text" for searching my expectation is that the NGramFilterFactory
would create ngrams for everything sent to it, which is unnecessary and
probably costly - right?

In an effort to try and troubleshoot the issue I created a new field in the
schema and stored it so that I could see what was getting populated.
However, what I'm finding is that no ngrams are being generated, just the
actual data that gets indexed from the database.

Here's what my setup looks like:
NOTE: Every record in my test environment has the same value "Example"

<field name="PartialSubject" type="partialWord" indexed="true" stored="true"
multiValued="true" />

<copyField source="PartialSubject" dest="text">

<fieldType name="partialWord" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3"
maxGramSize="10" side="front"/>
</analyzer>
</fieldType>

When I query Solr it reports:
<arr name="PartialSubject">
<str>Example</str>
</arr>

I was expecting exa, exam, examp, example, example to be the values for
PartialSubject so that a search for "exam" would turn up all of the records
in this test index. Instead I get 0 results.

Can anyone provide any guidance on this please?