You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pete Smith <pe...@lovefilm.com> on 2009/04/07 16:36:01 UTC

using NGramTokenizerFactory for partial matching

Hi,

I want to use the NGramTokenizerFactory tokeniser to enable partial
matching on a field in my index. For instance for the field:

"Lorem ipsum"

I want it to match "lor" "lorem" and "lorem i". However I am finding it
matches the first two but not the third - the white space is causing
problems. Here are the relevant parts of my config: 

        <fieldType name="text_substring" class="solr.TextField"
positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.NGramTokenizerFactory"
minGramSize="3" maxGramSize="15" />  
                <filter class="solr.LowerCaseFilterFactory"/>  
  </analyzer>
</fieldType>

<field name="title_partial" type="text_substring" indexed="true"
stored="true" required="true" />

I believe it is due to the mingramsize setting and that is applying to
each word. Can anyone tell me how I can support what I want to do?

Cheers,
Pete

-- 
Pete Smith
Developer

No.9 | 6 Portal Way | London | W3 6RU |
T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111

LOVEFiLM.com

Re: using NGramTokenizerFactory for partial matching

Posted by Chris Hostetter <ho...@fucit.org>.

: I want it to match "lor" "lorem" and "lorem i". However I am finding it
: matches the first two but not the third - the white space is causing
: problems. Here are the relevant parts of my config: 
: 
:         <fieldType name="text_substring" class="solr.TextField"
: positionIncrementGap="100">
:             <analyzer type="index">
:                 <tokenizer class="solr.NGramTokenizerFactory"
: minGramSize="3" maxGramSize="15" />  

NGramTokenizer doesn't do anything special with whitespace -- but teh 
QueryParser does ... what does your query for "lorem i" look like?

if you're using the example query parser nad request handler configs then 
this won't work like you want...

   http://localhost:8963/select?q=lorem+i

...because the query parser will split on the whitespace.

try quoting your string, or using the FieldQParserPlugin.



-Hoss