You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Lance Norskog (JIRA)" <ji...@apache.org> on 2010/07/01 05:38:50 UTC

[jira] Commented: (SOLR-1980) Implement boundary match support

    [ https://issues.apache.org/jira/browse/SOLR-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884147#action_12884147 ] 

Lance Norskog commented on SOLR-1980:
-------------------------------------

Another use case is with phrases, especially sloppy phrases.
"^hello kitty" would find "hello kitty" at the beginning of the text.
"^hello"~5 would find "hello" among the first 5 words, but the closer to the beginning, the better. This is especially interesting for consumer searches- people tend to type the first word of a movie title first.

> Implement boundary match support
> --------------------------------
>
>                 Key: SOLR-1980
>                 URL: https://issues.apache.org/jira/browse/SOLR-1980
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Jan Høydahl
>
> Sometimes you need to specify that a query should match only at the start or end of a field, or be an exact match.
> Example content:
> 1) a quick fox is brown
> 2) quick fox is brown
> Example queries:
> "^quick fox" -> should only match 2)
> "brown$" -> should match 1) and 2)
> "^quick fox is brown$" -> should only match 2)
> Proposed way of implmementation is through a new BoundaryMatchTokenFilter which behaves like this:
> On the index side it inserts special unique tokens at beginning and end of field. These could be some weird unicode sequence.
> On the query side, it looks for the first character matching "^" or the last character matching "$" and replaces them with the special tokens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: [jira] Commented: (SOLR-1980) Implement boundary match support

Posted by Jan Høydahl / Cominvent <ja...@cominvent.com>.

I think the TokenFilter approach is the easiest. Another option would be to go deeper and introduce it as a native query language syntax in some way and add boundarymatch="true" as a parameter in the schema. Any opinions?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 1. juli 2010, at 05.38, Lance Norskog (JIRA) wrote:

> 
>    [ https://issues.apache.org/jira/browse/SOLR-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884147#action_12884147 ] 
> 
> Lance Norskog commented on SOLR-1980:
> -------------------------------------
> 
> Another use case is with phrases, especially sloppy phrases.
> "^hello kitty" would find "hello kitty" at the beginning of the text.
> "^hello"~5 would find "hello" among the first 5 words, but the closer to the beginning, the better. This is especially interesting for consumer searches- people tend to type the first word of a movie title first.
> 
>> Implement boundary match support
>> --------------------------------
>> 
>>                Key: SOLR-1980
>>                URL: https://issues.apache.org/jira/browse/SOLR-1980
>>            Project: Solr
>>         Issue Type: New Feature
>>         Components: Schema and Analysis
>>           Reporter: Jan Høydahl
>> 
>> Sometimes you need to specify that a query should match only at the start or end of a field, or be an exact match.
>> Example content:
>> 1) a quick fox is brown
>> 2) quick fox is brown
>> Example queries:
>> "^quick fox" -> should only match 2)
>> "brown$" -> should match 1) and 2)
>> "^quick fox is brown$" -> should only match 2)
>> Proposed way of implmementation is through a new BoundaryMatchTokenFilter which behaves like this:
>> On the index side it inserts special unique tokens at beginning and end of field. These could be some weird unicode sequence.
>> On the query side, it looks for the first character matching "^" or the last character matching "$" and replaces them with the special tokens.
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org