You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Daniel Bigham <da...@wolfram.com> on 2016/06/24 14:06:59 UTC

Favoring Terms Occurring in Close Proximity

Something significant that I've noticed about using the default Lucene 
query parser is that if your user enters a query like:

"temperate climates"

... it will get turned into an OR query:

temperate OR climates

This means that a document that contains the literal substring 
"temperate climates" will be on equal footing with a document that 
contains "temperate emotions may go a long way to keeping the peace as 
we continue to discuss climate change".

So far as I know, your typical search engine definitely does not ignore 
the relative positions of terms.

And so my question is -- how do people typically deal with this when 
using Lucene?  What is wanted is a query that desires search terms to be 
close together, but failing that, is ok with the terms simply occurring 
in the document.

And again -- the ultimate desire isn't just to construct a Query object 
to accomplish that, but to hook things up in such a way that a user can 
enter a query in an input box and have the system take their flat string 
and turn it into an intelligent query that acts somewhat like today's 
modern search engines in terms of wanting terms to be close to each other.

This is such a "basic" use case of a search system that I'm tempted to 
think there must be well worn paths for doing this in Lucene.

Thanks,
Daniel

Re: Favoring Terms Occurring in Close Proximity

Posted by Ian Lea <ia...@gmail.com>.
No, it implies that Lucene is a low level library that allows people like
you and me, application developers, to develop applications that meet our
business and technical needs.

Like you, most of the things I work with prefer documents where the search
terms are close together, often preferably in the right order.  They also
rate certain fields as being more important than others.  So we build
fairly complex boolean queries with an assortment of queries - lots of
spans - with boosting, to try and get the results we need.

Some of the projects built on top of lucene may provide some built in
support for this but as far as I'm aware Lucene doesn't, and nor do I
expect it to.


--
Ian.


On Mon, Jun 27, 2016 at 1:55 PM, Daniel Bigham <da...@wolfram.com> wrote:

> Hi Ahmet,
>
> Yes, thanks... that did come to mind and is the strategy I'm playing with.
>
> However, if you are giving a user a plain text field and using the Lucene
> query parser, it doesn't create optional clauses for boosting purposes.
>
> Does this imply that anyone wanting to use Lucene in conjunction with an
> input field needs to write a custom query parser if they want reasonable
> results?
>
> ----- On Jun 24, 2016, at 12:25 PM, Ahmet Arslan <io...@yahoo.com.INVALID>
> wrote:
>
> > Hi Daniel,
>
> > You can add optional clauses to your query for boosting purposes.
>
> > for example,
>
> > temperate OR climates OR "temperate climates"~5^100
>
> > ahmet
>
> > On Friday, June 24, 2016 5:07 PM, Daniel Bigham <da...@wolfram.com>
> wrote:
> > Something significant that I've noticed about using the default Lucene
> > query parser is that if your user enters a query like:
>
> > "temperate climates"
>
> > ... it will get turned into an OR query:
>
> > temperate OR climates
>
> > This means that a document that contains the literal substring
> > "temperate climates" will be on equal footing with a document that
> > contains "temperate emotions may go a long way to keeping the peace as
> > we continue to discuss climate change".
>
> > So far as I know, your typical search engine definitely does not ignore
> > the relative positions of terms.
>
> > And so my question is -- how do people typically deal with this when
> > using Lucene? What is wanted is a query that desires search terms to be
> > close together, but failing that, is ok with the terms simply occurring
> > in the document.
>
> > And again -- the ultimate desire isn't just to construct a Query object
> > to accomplish that, but to hook things up in such a way that a user can
> > enter a query in an input box and have the system take their flat string
> > and turn it into an intelligent query that acts somewhat like today's
> > modern search engines in terms of wanting terms to be close to each
> other.
>
> > This is such a "basic" use case of a search system that I'm tempted to
> > think there must be well worn paths for doing this in Lucene.
>
> > Thanks,
> > Daniel
>
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>

Re: Favoring Terms Occurring in Close Proximity

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Daniel,

Solr has (e)dismax just for the propose you described.
https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser

Please see pf pf2 pf3 parameters 

Ahmet


On Monday, June 27, 2016 3:55 PM, Daniel Bigham <da...@wolfram.com> wrote:
Hi Ahmet, 

Yes, thanks... that did come to mind and is the strategy I'm playing with. 

However, if you are giving a user a plain text field and using the Lucene query parser, it doesn't create optional clauses for boosting purposes. 

Does this imply that anyone wanting to use Lucene in conjunction with an input field needs to write a custom query parser if they want reasonable results? 


----- On Jun 24, 2016, at 12:25 PM, Ahmet Arslan <io...@yahoo.com.INVALID> wrote: 

> Hi Daniel,

> You can add optional clauses to your query for boosting purposes.

> for example,

> temperate OR climates OR "temperate climates"~5^100

> ahmet

> On Friday, June 24, 2016 5:07 PM, Daniel Bigham <da...@wolfram.com> wrote:
> Something significant that I've noticed about using the default Lucene
> query parser is that if your user enters a query like:

> "temperate climates"

> ... it will get turned into an OR query:

> temperate OR climates

> This means that a document that contains the literal substring
> "temperate climates" will be on equal footing with a document that
> contains "temperate emotions may go a long way to keeping the peace as
> we continue to discuss climate change".

> So far as I know, your typical search engine definitely does not ignore
> the relative positions of terms.

> And so my question is -- how do people typically deal with this when
> using Lucene? What is wanted is a query that desires search terms to be
> close together, but failing that, is ok with the terms simply occurring
> in the document.

> And again -- the ultimate desire isn't just to construct a Query object
> to accomplish that, but to hook things up in such a way that a user can
> enter a query in an input box and have the system take their flat string
> and turn it into an intelligent query that acts somewhat like today's
> modern search engines in terms of wanting terms to be close to each other.

> This is such a "basic" use case of a search system that I'm tempted to
> think there must be well worn paths for doing this in Lucene.

> Thanks,
> Daniel

> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Favoring Terms Occurring in Close Proximity

Posted by Daniel Bigham <da...@wolfram.com>.
Hi Ahmet, 

Yes, thanks... that did come to mind and is the strategy I'm playing with. 

However, if you are giving a user a plain text field and using the Lucene query parser, it doesn't create optional clauses for boosting purposes. 

Does this imply that anyone wanting to use Lucene in conjunction with an input field needs to write a custom query parser if they want reasonable results? 

----- On Jun 24, 2016, at 12:25 PM, Ahmet Arslan <io...@yahoo.com.INVALID> wrote: 

> Hi Daniel,

> You can add optional clauses to your query for boosting purposes.

> for example,

> temperate OR climates OR "temperate climates"~5^100

> ahmet

> On Friday, June 24, 2016 5:07 PM, Daniel Bigham <da...@wolfram.com> wrote:
> Something significant that I've noticed about using the default Lucene
> query parser is that if your user enters a query like:

> "temperate climates"

> ... it will get turned into an OR query:

> temperate OR climates

> This means that a document that contains the literal substring
> "temperate climates" will be on equal footing with a document that
> contains "temperate emotions may go a long way to keeping the peace as
> we continue to discuss climate change".

> So far as I know, your typical search engine definitely does not ignore
> the relative positions of terms.

> And so my question is -- how do people typically deal with this when
> using Lucene? What is wanted is a query that desires search terms to be
> close together, but failing that, is ok with the terms simply occurring
> in the document.

> And again -- the ultimate desire isn't just to construct a Query object
> to accomplish that, but to hook things up in such a way that a user can
> enter a query in an input box and have the system take their flat string
> and turn it into an intelligent query that acts somewhat like today's
> modern search engines in terms of wanting terms to be close to each other.

> This is such a "basic" use case of a search system that I'm tempted to
> think there must be well worn paths for doing this in Lucene.

> Thanks,
> Daniel

> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Favoring Terms Occurring in Close Proximity

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Daniel,

You can add optional clauses to your query for boosting purposes.

for example, 

temperate OR climates OR "temperate climates"~5^100

ahmet


On Friday, June 24, 2016 5:07 PM, Daniel Bigham <da...@wolfram.com> wrote:
Something significant that I've noticed about using the default Lucene 
query parser is that if your user enters a query like:

"temperate climates"

... it will get turned into an OR query:

temperate OR climates

This means that a document that contains the literal substring 
"temperate climates" will be on equal footing with a document that 
contains "temperate emotions may go a long way to keeping the peace as 
we continue to discuss climate change".

So far as I know, your typical search engine definitely does not ignore 
the relative positions of terms.

And so my question is -- how do people typically deal with this when 
using Lucene?  What is wanted is a query that desires search terms to be 
close together, but failing that, is ok with the terms simply occurring 
in the document.

And again -- the ultimate desire isn't just to construct a Query object 
to accomplish that, but to hook things up in such a way that a user can 
enter a query in an input box and have the system take their flat string 
and turn it into an intelligent query that acts somewhat like today's 
modern search engines in terms of wanting terms to be close to each other.

This is such a "basic" use case of a search system that I'm tempted to 
think there must be well worn paths for doing this in Lucene.

Thanks,
Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org