You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Robert Gründler <ro...@dubture.com> on 2010/11/18 16:36:06 UTC
Respect token order in matches
Hi,
is there a way to make solr respect the order of token matches when the query is a multi-term string?
Here's an example:
Query String: "John C"
Indexed Strings:
- "John Cage"
- "Cargill John"
This will return both indexed strings as a result. However, "Cargill John" should not match in that case, because the order
of the tokens is not the same as in the query.
Here's the fieldtype:
<fieldType name="edgytext" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all" />
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all" />
</analyzer>
</fieldType>
Is there a way to achieve this using this fieldtype?
thanks!
Re: Respect token order in matches
Posted by Markus Jelsma <ma...@openindex.io>.
Hi,
I'm not sure what QParser you're using but with the DismaxQParser you can
specify slop on explicit phrase queries, did you set it because it can make a
difference. Check it out:
http://wiki.apache.org/solr/DisMaxQParserPlugin#qs_.28Query_Phrase_Slop.29
Cheers,
> Hi,
>
> is there a way to make solr respect the order of token matches when the
> query is a multi-term string?
>
> Here's an example:
>
> Query String: "John C"
>
> Indexed Strings:
>
> - "John Cage"
> - "Cargill John"
>
> This will return both indexed strings as a result. However, "Cargill John"
> should not match in that case, because the order of the tokens is not the
> same as in the query.
>
> Here's the fieldtype:
>
> <fieldType name="edgytext" class="solr.TextField"
> positionIncrementGap="100">
>
> <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" /> <filter
> class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement=""
> replace="all" /> <filter class="solr.EdgeNGramFilterFactory"
> minGramSize="1" maxGramSize="25" /> </analyzer>
>
> <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" /> <filter
> class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement=""
> replace="all" /> </analyzer>
>
> </fieldType>
>
> Is there a way to achieve this using this fieldtype?
>
>
> thanks!