You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tom Morton <ts...@gmail.com> on 2009/04/23 16:52:15 UTC
prefix matching
Hi all,
I'm trying to use prefixes to match similar strings to a query string. I
have the following field type:
<fieldtype name="prefix" stored="true" indexed="true"
class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
maxGramSize="10"/>
</analyzer>
</fieldtype>
field:
<field name="wordPrefix" type="prefix" indexed="true" stored="true"/>
copyField:
<copyField source="word" dest="wordPrefix"/>
If I apply this to an indexed string: "ipod shuffle" and query string:
"shufle" (missing f) I get matching terms for "sh", "shu" "shuf"
Index Analyzer ipodshuffle ipodshuffle ipodshuffle ipipoipodshshushuf
shuffshufflshuffle Query Analyzer shufle shufle shufle shshushufshufl
shufle
However when I query for with "shufle" i get no results:
http://localhost:8983/solr/select?q=wordPrefix%3Ashufle&fl=wordPrefix&qt=standard&debugQuery=on
<lst name="debug">
<str name="rawquerystring">wordPrefix:shufle</str>
<str name="querystring">wordPrefix:shufle</str>
-
<str name="parsedquery">
PhraseQuery(wordPrefix:"sh hu uf fl le shu huf ufl fle shuf hufl ufle shufl
hufle shufle")
</str>
-
<str name="parsedquery_toString">
wordPrefix:"sh hu uf fl le shu huf ufl fle shuf hufl ufle shufl hufle
shufle"
</str>
This post suggests that I need to set the Position Increment for the my
token filter, but I'm not sure how to do that or if it's possible.
http://www.lucidimagination.com/search/document/bc643c39f0b6e423/queryparser_and_ngrams#629b39ea39aa9cd4
Thoughts? Thanks...Tom
Re: prefix matching
Posted by Grant Ingersoll <gs...@apache.org>.
Hmm, did some poking around and this conversation rung a bell from the
Lucene list see http://www.lucidimagination.com/search/document/3e4ce083206664d2/ngrams_and_positions#3e4ce083206664d2
Looks like Lucene would need to solve LUCENE-1224 and LUCENE-1225.
https://issues.apache.org/jira/browse/LUCENE-1224
https://issues.apache.org/jira/browse/LUCENE-1225
-Grant
On Apr 23, 2009, at 10:52 AM, Tom Morton wrote:
> Hi all,
> I'm trying to use prefixes to match similar strings to a query
> string. I
> have the following field type:
>
> <fieldtype name="prefix" stored="true" indexed="true"
> class="solr.TextField">
> <analyzer>
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.StopFilterFactory"/>
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
> maxGramSize="10"/>
> </analyzer>
> </fieldtype>
>
> field:
> <field name="wordPrefix" type="prefix" indexed="true"
> stored="true"/>
>
> copyField:
> <copyField source="word" dest="wordPrefix"/>
>
> If I apply this to an indexed string: "ipod shuffle" and query string:
> "shufle" (missing f) I get matching terms for "sh", "shu" "shuf"
> Index Analyzer ipodshuffle ipodshuffle ipodshuffle
> ipipoipodshshushuf
> shuffshufflshuffle Query Analyzer shufle shufle shufle
> shshushufshufl
> shufle
> However when I query for with "shufle" i get no results:
>
> http://localhost:8983/solr/select?q=wordPrefix%3Ashufle&fl=wordPrefix&qt=standard&debugQuery=on
>
> <lst name="debug">
> <str name="rawquerystring">wordPrefix:shufle</str>
> <str name="querystring">wordPrefix:shufle</str>
> -
> <str name="parsedquery">
> PhraseQuery(wordPrefix:"sh hu uf fl le shu huf ufl fle shuf hufl
> ufle shufl
> hufle shufle")
> </str>
> -
> <str name="parsedquery_toString">
> wordPrefix:"sh hu uf fl le shu huf ufl fle shuf hufl ufle shufl hufle
> shufle"
> </str>
>
> This post suggests that I need to set the Position Increment for the
> my
> token filter, but I'm not sure how to do that or if it's possible.
>
> http://www.lucidimagination.com/search/document/bc643c39f0b6e423/queryparser_and_ngrams#629b39ea39aa9cd4
>
> Thoughts? Thanks...Tom
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search