You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Preetam Rao <bl...@gmail.com> on 2008/07/14 15:07:47 UTC

Query to match sub phrases

Hi,

Is there a query in Lucene which matches sub phrases ?

For example if the document text is "new york  existing homes *3 bed 2
bath*homes 3 miles from city center 2 rooms" and if user enters
"Brooklyn homes
with *3 bed *rooms  and swimming pools", I would like to recognize the fact
the the document contained a sub prefix of the user query and give it more
score compared to a document which contained all the terms, but in correct
order, for example, " new york *2 beds 3 baths*".

This kind of query will be useful when we do not interpret or parse the user
query. As seen in the example, it will prove useful when numbers are
involved since numbers usually make sense with the term immediately
following it.

This is something of a middle ground between pure 'boolean OR' query and a
'exact phrase query' as far as directly using the user query is concerned.

I have documented my thoughts in the below document and if there is nothing
similar already implemented,
will open a JIRA issue and work on it.

http://docs.google.com/Doc?id=dgzj3nsp_0z2j48hc6

Please let me know your thoughts on alternate solutions and approaches since
it will be very useful for my current project.

Thanks
Preetam

Re: Query to match sub phrases

Posted by Preetam Rao <bl...@gmail.com>.
Thanks for the response. My apologies for mis-using this list.

Will look into it and if I have further comments will continue it in the
lucene-user list  :-)

On Mon, Jul 14, 2008 at 9:42 PM, Steven A Rowe <sa...@syr.edu> wrote:

> Hi Preetam,
>
> Questions like yours are better served in the java-user mailing list, which
> is devoted to Q&A about *using* Lucene, rather than here in the java-dev
> list, which is reserved for discussions concerning Lucene's *development*.
>
> In the contrib/ area on the trunk (not yet part of any release), there is a
> class called ShingleFilter that might be useful to you - here is a link to
> the nightly javadoc for it:
>
> <
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/contrib-analyzers/org/apache/lucene/analysis/shingle/ShingleFilter.html
> >
>
> If you create a field containing token n-grams (a.k.a. shingles) and use it
> as a component of your queries, I think you can achieve something similar to
> what you want.
>
> Steve
>
> On 07/14/2008 at 9:07 AM, Preetam Rao wrote:
> > Hi,
> >
> > Is there a query in Lucene which matches sub phrases ?
> >
> > For example if the document text is "new york  existing homes 3 bed 2
> > bath homes 3 miles from city center 2 rooms" and if user enters
> > "Brooklyn homes with 3 bed rooms  and swimming pools", I would like to
> > recognize the fact the the document contained a sub prefix of the user
> > query and give it more score compared to a document which contained all
> > the terms, but in correct order, for example, " new york 2 beds 3 baths".
> >
> > This kind of query will be useful when we do not interpret or parse the
> > user query. As seen in the example, it will prove useful when numbers
> > are involved since numbers usually make sense with the term immediately
> > following it.
> >
> > This is something of a middle ground between pure 'boolean OR' query and
> > a 'exact phrase query' as far as directly using the user query is
> > concerned.
> >
> > I have documented my thoughts in the below document and if there is
> > nothing similar already implemented,
> > will open a JIRA issue and work on it.
> >
> > http://docs.google.com/Doc?id=dgzj3nsp_0z2j48hc6
> >
> > Please let me know your thoughts on alternate solutions and approaches
> > since it will be very useful for my current project.
> >
> > Thanks
> > Preetam
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

RE: Query to match sub phrases

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Preetam,

Questions like yours are better served in the java-user mailing list, which is devoted to Q&A about *using* Lucene, rather than here in the java-dev list, which is reserved for discussions concerning Lucene's *development*.

In the contrib/ area on the trunk (not yet part of any release), there is a class called ShingleFilter that might be useful to you - here is a link to the nightly javadoc for it:

<http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/contrib-analyzers/org/apache/lucene/analysis/shingle/ShingleFilter.html>

If you create a field containing token n-grams (a.k.a. shingles) and use it as a component of your queries, I think you can achieve something similar to what you want.

Steve

On 07/14/2008 at 9:07 AM, Preetam Rao wrote:
> Hi,
> 
> Is there a query in Lucene which matches sub phrases ?
> 
> For example if the document text is "new york  existing homes 3 bed 2
> bath homes 3 miles from city center 2 rooms" and if user enters
> "Brooklyn homes with 3 bed rooms  and swimming pools", I would like to
> recognize the fact the the document contained a sub prefix of the user
> query and give it more score compared to a document which contained all
> the terms, but in correct order, for example, " new york 2 beds 3 baths".
> 
> This kind of query will be useful when we do not interpret or parse the
> user query. As seen in the example, it will prove useful when numbers
> are involved since numbers usually make sense with the term immediately
> following it.
> 
> This is something of a middle ground between pure 'boolean OR' query and
> a 'exact phrase query' as far as directly using the user query is
> concerned.
> 
> I have documented my thoughts in the below document and if there is
> nothing similar already implemented,
> will open a JIRA issue and work on it.
> 
> http://docs.google.com/Doc?id=dgzj3nsp_0z2j48hc6
> 
> Please let me know your thoughts on alternate solutions and approaches
> since it will be very useful for my current project.
> 
> Thanks
> Preetam

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org