You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Van Tassell, Kristian" <kr...@siemens.com> on 2013/03/05 18:18:01 UTC

Unable to match partial word

I'm doing a search for "prod" and would assume it would pull back matches for product, production, etc. but I get zero hits. Any ideas?

Here is my field type:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
               <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"
                              />
               <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
               <filter class="solr.LowerCaseFilterFactory"/>
               <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
        </analyzer>
        <analyzer type="query">
               <tokenizer class="solr.WhitespaceTokenizerFactory"/>
               <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
               <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"
                />
               <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
               <filter class="solr.LowerCaseFilterFactory"/>
               <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
        </analyzer>
</fieldType>

RE: Unable to match partial word

Posted by "Van Tassell, Kristian" <kr...@siemens.com>.

Thank you!

-----Original Message-----
From: Walter Underwood [mailto:wunder@wunderwood.org] 
Sent: Tuesday, March 05, 2013 11:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Unable to match partial word

Your assumption is wrong. Solr and Lucene match entire words.

You can use wildcards, but you need to be aware of the performance issues.

If there words are related parts of speech, like singular and plural, you can use a stemmer to index a root form.

You can also configure synonyms at index time, for things like "TV" and "television".

wunder

On Mar 5, 2013, at 9:18 AM, Van Tassell, Kristian wrote:

> I'm doing a search for "prod" and would assume it would pull back matches for product, production, etc. but I get zero hits. Any ideas?
> 
> Here is my field type:
> 
> <fieldType name="text" class="solr.TextField" 
> positionIncrementGap="100"> <analyzer type="index">
>                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>               <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"
>                              />
>               <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>               <filter class="solr.LowerCaseFilterFactory"/>
>               <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
>        </analyzer>
>        <analyzer type="query">
>               <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>               <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>               <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"
>                />
>               <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>               <filter class="solr.LowerCaseFilterFactory"/>
>               <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
>        </analyzer>
> </fieldType>

Re: Unable to match partial word

Posted by Walter Underwood <wu...@wunderwood.org>.

Your assumption is wrong. Solr and Lucene match entire words.

You can use wildcards, but you need to be aware of the performance issues.

If there words are related parts of speech, like singular and plural, you can use a stemmer to index a root form.

You can also configure synonyms at index time, for things like "TV" and "television".

wunder

On Mar 5, 2013, at 9:18 AM, Van Tassell, Kristian wrote:

> I'm doing a search for "prod" and would assume it would pull back matches for product, production, etc. but I get zero hits. Any ideas?
> 
> Here is my field type:
> 
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
> <analyzer type="index">
>                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>               <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"
>                              />
>               <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>               <filter class="solr.LowerCaseFilterFactory"/>
>               <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
>        </analyzer>
>        <analyzer type="query">
>               <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>               <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>               <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"
>                />
>               <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>               <filter class="solr.LowerCaseFilterFactory"/>
>               <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
>        </analyzer>
> </fieldType>

Re: Unable to match partial word

Posted by Jack Krupansky <ja...@basetechnology.com>.

You could also consider using EdgeNGramFilterFactory at index time, which 
can index all or some of the prefixes for each term, so that a query of 
"prod" would find "product", "production", etc.

See:
http://lucene.apache.org/core/4_1_0/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramFilterFactory.html

Set minGramSize="3" or 4 (your case) or more to minimize lots of short 
prefixes.

-- Jack Krupansky

-----Original Message----- 
From: Van Tassell, Kristian
Sent: Tuesday, March 05, 2013 12:18 PM
To: solr-user@lucene.apache.org
Subject: Unable to match partial word

I'm doing a search for "prod" and would assume it would pull back matches 
for product, production, etc. but I get zero hits. Any ideas?

Here is my field type:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
               <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt" enablePositionIncrements="true"
                              />
               <filter class="solr.WordDelimiterFilterFactory" 
generateWordParts="1" generateNumberParts="1" catenateWords="1" 
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
               <filter class="solr.LowerCaseFilterFactory"/>
               <filter class="solr.SnowballPorterFilterFactory" 
language="English" protected="protwords.txt"/>
        </analyzer>
        <analyzer type="query">
               <tokenizer class="solr.WhitespaceTokenizerFactory"/>
               <filter class="solr.SynonymFilterFactory" 
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
               <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt" enablePositionIncrements="true"
                />
               <filter class="solr.WordDelimiterFilterFactory" 
generateWordParts="1" generateNumberParts="1" catenateWords="0" 
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
               <filter class="solr.LowerCaseFilterFactory"/>
               <filter class="solr.SnowballPorterFilterFactory" 
language="English" protected="protwords.txt"/>
        </analyzer>
</fieldType>