You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Robert Gründler <ro...@dubture.com> on 2010/11/18 16:36:06 UTC

Respect token order in matches

Hi,

is there a way to make solr respect the order of token matches when the query is a multi-term string?

Here's an example:

Query String: "John C"

Indexed Strings:

- "John Cage"
- "Cargill John"

This will return both indexed strings as a result. However, "Cargill John" should not match in that case, because the order 
of the tokens is not the same as in the query.

Here's the fieldtype:

  <fieldType name="edgytext" class="solr.TextField" positionIncrementGap="100">

   <analyzer type="index">
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
     <filter class="solr.LowerCaseFilterFactory"/>
     <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />     
     <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all" />
     <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" />
   </analyzer>

   <analyzer type="query">
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
     <filter class="solr.LowerCaseFilterFactory"/>
     <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
     <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all" />
   </analyzer>

  </fieldType>

Is there a way to achieve this using this fieldtype?


thanks!

Re: Respect token order in matches

Posted by Markus Jelsma <ma...@openindex.io>.

Hi,

I'm not sure what QParser you're using but with the DismaxQParser you can 
specify slop on explicit phrase queries, did you set it because it can make a 
difference. Check it out:

http://wiki.apache.org/solr/DisMaxQParserPlugin#qs_.28Query_Phrase_Slop.29

Cheers,

> Hi,
> 
> is there a way to make solr respect the order of token matches when the
> query is a multi-term string?
> 
> Here's an example:
> 
> Query String: "John C"
> 
> Indexed Strings:
> 
> - "John Cage"
> - "Cargill John"
> 
> This will return both indexed strings as a result. However, "Cargill John"
> should not match in that case, because the order of the tokens is not the
> same as in the query.
> 
> Here's the fieldtype:
> 
>   <fieldType name="edgytext" class="solr.TextField"
> positionIncrementGap="100">
> 
>    <analyzer type="index">
>      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>      <filter class="solr.LowerCaseFilterFactory"/>
>      <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" /> <filter
> class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement=""
> replace="all" /> <filter class="solr.EdgeNGramFilterFactory"
> minGramSize="1" maxGramSize="25" /> </analyzer>
> 
>    <analyzer type="query">
>      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>      <filter class="solr.LowerCaseFilterFactory"/>
>      <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" /> <filter
> class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement=""
> replace="all" /> </analyzer>
> 
>   </fieldType>
> 
> Is there a way to achieve this using this fieldtype?
> 
> 
> thanks!