You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tom Morton <ts...@gmail.com> on 2009/04/23 16:52:15 UTC

prefix matching

Hi all,
  I'm trying to use prefixes to match similar strings to a query string.  I
have the following field type:

  <fieldtype name="prefix" stored="true" indexed="true"
class="solr.TextField">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory"/>
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
maxGramSize="10"/>
      </analyzer>
  </fieldtype>

field:
   <field name="wordPrefix" type="prefix" indexed="true" stored="true"/>

copyField:
<copyField source="word" dest="wordPrefix"/>

If I apply this to an indexed string: "ipod shuffle" and query string:
"shufle" (missing f) I get matching terms for "sh", "shu" "shuf"
Index Analyzer  ipodshuffle  ipodshuffle  ipodshuffle  ipipoipodshshushuf
shuffshufflshuffle Query Analyzer  shufle  shufle  shufle shshushufshufl
shufle
However when I query for with "shufle" i get no results:

http://localhost:8983/solr/select?q=wordPrefix%3Ashufle&fl=wordPrefix&qt=standard&debugQuery=on

<lst name="debug">
<str name="rawquerystring">wordPrefix:shufle</str>
<str name="querystring">wordPrefix:shufle</str>
-
<str name="parsedquery">
PhraseQuery(wordPrefix:"sh hu uf fl le shu huf ufl fle shuf hufl ufle shufl
hufle shufle")
</str>
-
<str name="parsedquery_toString">
wordPrefix:"sh hu uf fl le shu huf ufl fle shuf hufl ufle shufl hufle
shufle"
</str>

This post suggests that I need to set the Position Increment for the my
token filter, but I'm not sure how to do that or if it's possible.

http://www.lucidimagination.com/search/document/bc643c39f0b6e423/queryparser_and_ngrams#629b39ea39aa9cd4

Thoughts?  Thanks...Tom

Re: prefix matching

Posted by Grant Ingersoll <gs...@apache.org>.
Hmm, did some poking around and this conversation rung a bell from the  
Lucene list see http://www.lucidimagination.com/search/document/3e4ce083206664d2/ngrams_and_positions#3e4ce083206664d2

Looks like Lucene would need to solve LUCENE-1224 and LUCENE-1225.

https://issues.apache.org/jira/browse/LUCENE-1224
https://issues.apache.org/jira/browse/LUCENE-1225

-Grant


On Apr 23, 2009, at 10:52 AM, Tom Morton wrote:

> Hi all,
>  I'm trying to use prefixes to match similar strings to a query  
> string.  I
> have the following field type:
>
>  <fieldtype name="prefix" stored="true" indexed="true"
> class="solr.TextField">
>      <analyzer>
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.StopFilterFactory"/>
>        <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
> maxGramSize="10"/>
>      </analyzer>
>  </fieldtype>
>
> field:
>   <field name="wordPrefix" type="prefix" indexed="true"  
> stored="true"/>
>
> copyField:
> <copyField source="word" dest="wordPrefix"/>
>
> If I apply this to an indexed string: "ipod shuffle" and query string:
> "shufle" (missing f) I get matching terms for "sh", "shu" "shuf"
> Index Analyzer  ipodshuffle  ipodshuffle  ipodshuffle   
> ipipoipodshshushuf
> shuffshufflshuffle Query Analyzer  shufle  shufle  shufle  
> shshushufshufl
> shufle
> However when I query for with "shufle" i get no results:
>
> http://localhost:8983/solr/select?q=wordPrefix%3Ashufle&fl=wordPrefix&qt=standard&debugQuery=on
>
> <lst name="debug">
> <str name="rawquerystring">wordPrefix:shufle</str>
> <str name="querystring">wordPrefix:shufle</str>
> -
> <str name="parsedquery">
> PhraseQuery(wordPrefix:"sh hu uf fl le shu huf ufl fle shuf hufl  
> ufle shufl
> hufle shufle")
> </str>
> -
> <str name="parsedquery_toString">
> wordPrefix:"sh hu uf fl le shu huf ufl fle shuf hufl ufle shufl hufle
> shufle"
> </str>
>
> This post suggests that I need to set the Position Increment for the  
> my
> token filter, but I'm not sure how to do that or if it's possible.
>
> http://www.lucidimagination.com/search/document/bc643c39f0b6e423/queryparser_and_ngrams#629b39ea39aa9cd4
>
> Thoughts?  Thanks...Tom

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search