You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jan Høydahl / Cominvent <ja...@cominvent.com> on 2010/03/10 16:51:24 UTC

Boundary match as part of query language?

Hi,

Sometimes you need to anchor your search to start/end of field.

Example:
1. title=New York Yankees
2. title=New York
3. title=York

If I search title:"New York", or title:"York" I would get a match, but I'd like to anchor my search to beginning and/or end of the field, e.g. with regex syntax, title:"^New York$"

Now, I know how to work-around this, by appending some unique character sequence at each end of the field and then include this in my search in the front end. However, I wonder if any of you have been planning a patch to add a native boundary match feature to Solr that would automagically add tokens (also for multi-value fields!), and expand the query language to allow querying for starts-with(), ends-with() and equals()

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com


Re: Boundary match as part of query language?

Posted by "David Smiley @MITRE.org" <DS...@mitre.org>.
By the way, you'll probably want to shingle or use CommonGrams (with _BEGIN &
_END being "common") for acceptable performance.

I'm wondering, if Lucene's new payload features might provide an alternative
mechanism to mark the first and last term.

~ David Smiley


hossman wrote:
> 
> 
> : Now, I know how to work-around this, by appending some unique character 
> : sequence at each end of the field and then include this in my search in 
> : the front end. However, I wonder if any of you have been planning a 
> : patch to add a native boundary match feature to Solr that would 
> : automagically add tokens (also for multi-value fields!), and expand the 
> : query language to allow querying for starts-with(), ends-with() and 
> : equals()
> 
> well, if you *always* want boundary rules to be applied, that can be done 
> as simply as adding your boundary tokens automaticly in both the index and 
> query time analyzers ... then a search for q="New York" can 
> automaticly be translated into a PhraseQuery for "_BEGIN New York _END"
> 
> If you want special QueryParser markup to specify when you wnat specific 
> boundary conditions that can also be done with a custom QParser, and 
> automaicly applying the boundry tokens in your indexing analyzer (but not 
> the query analyzer -- the QParser would take care of that part)  In 
> general though it's hard to see how something like q=begin(New York) is 
> easier syntax then q="_BEGIN New York"
> 
> THe point is it's realtively easy to implement something like this when 
> meeting specific needs, but i don't know of any working on a truely 
> generalized Qparser that deals with this -- largely because most people 
> who care about this sort of thing either have really complicated use cases 
> (ie: not just begin/end boudnary markers, but also want sentence, 
> paragraph, page, chapter, section, etc...) or want extremely specific 
> query syntax (ie: they're trying to recreate the syntax of an existing 
> system they are replacing) so a general solution doesn't work well.
> 
> The cosest i've ever seen is Mark Miller's QSolr parser, which actually 
> went a completley differnet direction using a home grown syntax to 
> generate Span queries ... if that slacker ever gets off his butt and 
> starts running his webserver again, you could download it and try it out, 
> and probably find that it would be trivial to turn it into a QParser.
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://old.nabble.com/Boundary-match-as-part-of-query-language--tp27851560p27976989.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Boundary match as part of query language?

Posted by Chris Hostetter <ho...@fucit.org>.
: Now, I know how to work-around this, by appending some unique character 
: sequence at each end of the field and then include this in my search in 
: the front end. However, I wonder if any of you have been planning a 
: patch to add a native boundary match feature to Solr that would 
: automagically add tokens (also for multi-value fields!), and expand the 
: query language to allow querying for starts-with(), ends-with() and 
: equals()

well, if you *always* want boundary rules to be applied, that can be done 
as simply as adding your boundary tokens automaticly in both the index and 
query time analyzers ... then a search for q="New York" can 
automaticly be translated into a PhraseQuery for "_BEGIN New York _END"

If you want special QueryParser markup to specify when you wnat specific 
boundary conditions that can also be done with a custom QParser, and 
automaicly applying the boundry tokens in your indexing analyzer (but not 
the query analyzer -- the QParser would take care of that part)  In 
general though it's hard to see how something like q=begin(New York) is 
easier syntax then q="_BEGIN New York"

THe point is it's realtively easy to implement something like this when 
meeting specific needs, but i don't know of any working on a truely 
generalized Qparser that deals with this -- largely because most people 
who care about this sort of thing either have really complicated use cases 
(ie: not just begin/end boudnary markers, but also want sentence, 
paragraph, page, chapter, section, etc...) or want extremely specific 
query syntax (ie: they're trying to recreate the syntax of an existing 
system they are replacing) so a general solution doesn't work well.

The cosest i've ever seen is Mark Miller's QSolr parser, which actually 
went a completley differnet direction using a home grown syntax to 
generate Span queries ... if that slacker ever gets off his butt and 
starts running his webserver again, you could download it and try it out, 
and probably find that it would be trivial to turn it into a QParser.


-Hoss


Re: Boundary match as part of query language?

Posted by Jan Høydahl / Cominvent <ja...@cominvent.com>.
Sure, this is how we do it now. But wouldn't it be nice with native support for it? I could start coding it myself but wanted to know if there is a patch out there already or something...

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 14. mars 2010, at 00.17, Lance Norskog wrote:

> One way is to add magic 'beginning' and 'end' terms, then do phrase
> searches with those terms.
> 
> On Wed, Mar 10, 2010 at 7:51 AM, Jan Høydahl / Cominvent
> <ja...@cominvent.com> wrote:
>> Hi,
>> 
>> Sometimes you need to anchor your search to start/end of field.
>> 
>> Example:
>> 1. title=New York Yankees
>> 2. title=New York
>> 3. title=York
>> 
>> If I search title:"New York", or title:"York" I would get a match, but I'd like to anchor my search to beginning and/or end of the field, e.g. with regex syntax, title:"^New York$"
>> 
>> Now, I know how to work-around this, by appending some unique character sequence at each end of the field and then include this in my search in the front end. However, I wonder if any of you have been planning a patch to add a native boundary match feature to Solr that would automagically add tokens (also for multi-value fields!), and expand the query language to allow querying for starts-with(), ends-with() and equals()
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Training in Europe - www.solrtraining.com
>> 
>> 
> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com


Re: Boundary match as part of query language?

Posted by Lance Norskog <go...@gmail.com>.
One way is to add magic 'beginning' and 'end' terms, then do phrase
searches with those terms.

On Wed, Mar 10, 2010 at 7:51 AM, Jan Høydahl / Cominvent
<ja...@cominvent.com> wrote:
> Hi,
>
> Sometimes you need to anchor your search to start/end of field.
>
> Example:
> 1. title=New York Yankees
> 2. title=New York
> 3. title=York
>
> If I search title:"New York", or title:"York" I would get a match, but I'd like to anchor my search to beginning and/or end of the field, e.g. with regex syntax, title:"^New York$"
>
> Now, I know how to work-around this, by appending some unique character sequence at each end of the field and then include this in my search in the front end. However, I wonder if any of you have been planning a patch to add a native boundary match feature to Solr that would automagically add tokens (also for multi-value fields!), and expand the query language to allow querying for starts-with(), ends-with() and equals()
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Training in Europe - www.solrtraining.com
>
>



-- 
Lance Norskog
goksron@gmail.com