You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Kapil Chhabra <ka...@naukri.com> on 2006/12/26 10:33:41 UTC
Nested Queries
Hi,
Please see the following data-structure
+--------+----------+
| FIELD1 | FIELD2 |
+--------+----------+
| 1 | 2,3,4,6, |
| 2 | 3,1,5,7, |
| 3 | 1,2, |
| 4 | 1,8,10, |
| 5 | 2,9, |
| 6 | 1, |
| 7 | 2,9, |
| 8 | 4,9, |
| 9 | 5,7,8, |
| 10 | 4, |
+--------+----------+
My requirement is to find all values in FIELD1 where FIELD2 contains all
values of FIELD1 where FIELD2 contains 3
Which means something like
FIELD2:(FIELD2:3)
Is it possible to achieve this in a single query? If yes, then how
should I go about it?
Thanks in anticipation,
kapilChhabra
Re: Nested Queries
Posted by Steven Rowe <sa...@syr.edu>.
Hi Kapil,
Kapil Chhabra wrote:
> Hi Steve,
> Thanks for the response.
> Actually I am not looking for a query language. My question is, whether
> Lucene supports Nested Queries or self joins?
> As per
> http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html
>
> In BNF, the query grammar is:
>
> Query ::= ( Clause )*
> Clause ::= ["+", "-"] [<TERM> ":"] ( <TERM> | "(" Query ")" )
>
> Which means that FIELD2:(FIELD2:3) is a correct query. Correct me if I
> am wrong.
>
> What will this query translate into? Will it be same as FIELD2: 1 OR
> FIELD2: 2
"FIELD2:(FIELD2:3)" translates to "FIELD2:3". This is because the
FieldX in "FieldX:(TermA OR TermB)" is interpreted distributively - this
query is equivalent to "FieldX:TermA OR FieldX:TermB". A field
specifier on a nested query term or clause overrides the containing
field specifier, so "FIELD1:(FIELD2:3)" translates to "FIELD2:3".
A more complicated example:
"Field2:(Field3:TermA OR (Field4:TermB AND TermC))"
translates to:
"Field3:TermA OR (Field4:TermB AND Field2:TermC)"
Lucene does have nested queries, but these are not the same thing as SQL
nested queries. Unlike SQL nested queries, in which the nested query is
evaluated and the *results* of the nested query are used as input to the
containing query, Lucene's queries are evaluated all at once.
Of course, you could achieve (self) joins with Lucene manually, by
submitting two queries serially, first the nested query, and then the
containing query, constructed with results returned from the nested
query. But I know of no built-in Lucene functionality that will invoke
the search machinery for you in this fashion[1].
>From <http://lucene.apache.org/java/docs/scoring.html>:
Lucene scoring uses a combination of the Vector Space
Model (VSM) of Information Retrieval[2] and the
Boolean model[3] to determine how relevant a given
Document is to a User's query. In general, the idea
behind the VSM is the more times a query term appears
in a document relative to the number of times the
term appears in all the documents in the collection,
the more relevant that document is to the query. It
uses the Boolean model to first narrow down the
documents that need to be scored based on the use of
boolean logic in the Query specification.
Hope it helps,
Steve
[1] There is a tradition of using something like joins in Information
Retrieval: (Pseudo-)Relevance Feedback, in which a subset of the terms
found in a subset of the documents of an initial query's result set are
combined with the intial query's terms to produce an augmented query.
See Grant Ingersoll's ApacheCon 2005 presentation and code at
<http://www.cnlp.org/apachecon2005/> for an implementation of
Pseudo-Relevance Feedback using Lucene.
[2] <http://en.wikipedia.org/wiki/Vector_Space_Model>
[3] <http://en.wikipedia.org/wiki/Standard_Boolean_model>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Nested Queries
Posted by Erick Erickson <er...@gmail.com>.
No. Lucene is a text search engine, NOT a RDBMS. Whenever you think of joins
(self or otherwise), you're thinking in RDMBS terms, which Lucene is not. At
best, you'll have to use one of the DB integrations that Steve mentioned
(assuming they work). But I wouldn't keep looking for any magic internal to
Lucene to solve this problem.
On 12/28/06, Kapil Chhabra <ka...@naukri.com> wrote:
>
> Hi Steve,
> Thanks for the response.
> Actually I am not looking for a query language. My question is, whether
> Lucene supports Nested Queries or self joins?
> As per
>
> http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html
>
> In BNF, the query grammar is:
>
> Query ::= ( Clause )*
> Clause ::= ["+", "-"] [<TERM> ":"] ( <TERM> | "(" Query ")" )
>
> Which means that
> FIELD2:(FIELD2:3)
> is a correct query. Correct me if I am wrong.
>
> What will this query translate into? Will it be same as
> FIELD2: 1 OR FIELD2: 2
>
>
> Thanks
> kapilChhabra
>
> Steven Rowe wrote:
> > Hi Kapil,
> >
> > Kapil Chhabra wrote:
> >
> >> Just to mention, I have tokenized FIELD2 on "," and indexed it.
> >>
> >> FIELD2:3 should return 1,2
> >> FIELD2:(FIELD2:3) should return something like the output of:
> >>
> >> *FIELD2: 1 OR FIELD2: 2
> >>
> >
> > Given your data table, I assume you mean:
> >
> > FIELD1:3 should return 1,2
> > FIELD1:(FIELD2:3) should return something like the output of:
> >
> > *FIELD1: 1 OR FIELD1: 2
> >
> > If you make FIELD1 stored, and search using "FIELD2:3", you can iterate
> > through the hits and return the values for FIELD1.
> >
> > Are you looking for a query language that does this work for you? That
> > is, one which can query on any field and then return information from
> > other field(s) in matching documents? If so, I don't know of such a
> > query language that exists for Lucene.
> >
> > There have been several integrations of Lucene with databases, some of
> > which enable SQL queries something like:
> >
> > SELECT FIELD1 WHERE FIELD2 CONTAINS('3');
> >
> > 1. Marcelo Ochoa's Oracle JVM implementation for Lucene DataStore:
> > <http://issues.apache.org/jira/browse/LUCENE-724>; see also these
> > threads on the Lucene Java-Users list:
> > <http://www.nabble.com/Oracle-and-Lucene-Integration-tf2689965.html>
> > <http://www.nabble.com/Oracle-Lucene-integration--status--tf2865873.html
> >
> >
> > 2. Mark Harwood's Lucene database bindings for HSQLDB and Derby:
> > <http://issues.apache.org/jira/browse/LUCENE-434>; see also this thread
> > on the Lucene Java-Users list:
> > <http://www.nabble.com/Lucene-database-bindings-tf316816.html>
> >
> > 3. Hibernate (as of v3.1, I think) Lucene Integration:
> > <
> http://www.hibernate.org/hib_docs/annotations/reference/en/html/lucene.html
> >
> >
> > 4. DBSight enables Lucene search with databases: <
> http://www.dbsight.net/>
> >
> >
> > Hope it helps,
> > Steve
> >
> >
> >> Kapil Chhabra wrote:
> >>
> >>> Hi,
> >>>
> >>> Please see the following data-structure
> >>> +--------+----------+
> >>> | FIELD1 | FIELD2 |
> >>> +--------+----------+
> >>> | 1 | 2,3,4,6, |
> >>> | 2 | 3,1,5,7, |
> >>> | 3 | 1,2, |
> >>> | 4 | 1,8,10, |
> >>> | 5 | 2,9, |
> >>> | 6 | 1, |
> >>> | 7 | 2,9, |
> >>> | 8 | 4,9, |
> >>> | 9 | 5,7,8, |
> >>> | 10 | 4, |
> >>> +--------+----------+
> >>>
> >>> My requirement is to find all values in FIELD1 where FIELD2 contains
> >>> all values of FIELD1 where FIELD2 contains 3
> >>> Which means something like
> >>> FIELD2:(FIELD2:3)
> >>>
> >>> Is it possible to achieve this in a single query? If yes, then how
> >>> should I go about it?
> >>>
> >>>
> >>>
> >>> Thanks in anticipation,
> >>> kapilChhabra
> >>>
> >>>
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
>
>
>
Re: Nested Queries
Posted by Kapil Chhabra <ka...@naukri.com>.
Hi Steve,
Thanks for the response.
Actually I am not looking for a query language. My question is, whether
Lucene supports Nested Queries or self joins?
As per
http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html
In BNF, the query grammar is:
Query ::= ( Clause )*
Clause ::= ["+", "-"] [<TERM> ":"] ( <TERM> | "(" Query ")" )
Which means that
FIELD2:(FIELD2:3)
is a correct query. Correct me if I am wrong.
What will this query translate into? Will it be same as
FIELD2: 1 OR FIELD2: 2
Thanks
kapilChhabra
Steven Rowe wrote:
> Hi Kapil,
>
> Kapil Chhabra wrote:
>
>> Just to mention, I have tokenized FIELD2 on "," and indexed it.
>>
>> FIELD2:3 should return 1,2
>> FIELD2:(FIELD2:3) should return something like the output of:
>>
>> *FIELD2: 1 OR FIELD2: 2
>>
>
> Given your data table, I assume you mean:
>
> FIELD1:3 should return 1,2
> FIELD1:(FIELD2:3) should return something like the output of:
>
> *FIELD1: 1 OR FIELD1: 2
>
> If you make FIELD1 stored, and search using "FIELD2:3", you can iterate
> through the hits and return the values for FIELD1.
>
> Are you looking for a query language that does this work for you? That
> is, one which can query on any field and then return information from
> other field(s) in matching documents? If so, I don't know of such a
> query language that exists for Lucene.
>
> There have been several integrations of Lucene with databases, some of
> which enable SQL queries something like:
>
> SELECT FIELD1 WHERE FIELD2 CONTAINS('3');
>
> 1. Marcelo Ochoa's Oracle JVM implementation for Lucene DataStore:
> <http://issues.apache.org/jira/browse/LUCENE-724>; see also these
> threads on the Lucene Java-Users list:
> <http://www.nabble.com/Oracle-and-Lucene-Integration-tf2689965.html>
> <http://www.nabble.com/Oracle-Lucene-integration--status--tf2865873.html>
>
> 2. Mark Harwood's Lucene database bindings for HSQLDB and Derby:
> <http://issues.apache.org/jira/browse/LUCENE-434>; see also this thread
> on the Lucene Java-Users list:
> <http://www.nabble.com/Lucene-database-bindings-tf316816.html>
>
> 3. Hibernate (as of v3.1, I think) Lucene Integration:
> <http://www.hibernate.org/hib_docs/annotations/reference/en/html/lucene.html>
>
> 4. DBSight enables Lucene search with databases: <http://www.dbsight.net/>
>
>
> Hope it helps,
> Steve
>
>
>> Kapil Chhabra wrote:
>>
>>> Hi,
>>>
>>> Please see the following data-structure
>>> +--------+----------+
>>> | FIELD1 | FIELD2 |
>>> +--------+----------+
>>> | 1 | 2,3,4,6, |
>>> | 2 | 3,1,5,7, |
>>> | 3 | 1,2, |
>>> | 4 | 1,8,10, |
>>> | 5 | 2,9, |
>>> | 6 | 1, |
>>> | 7 | 2,9, |
>>> | 8 | 4,9, |
>>> | 9 | 5,7,8, |
>>> | 10 | 4, |
>>> +--------+----------+
>>>
>>> My requirement is to find all values in FIELD1 where FIELD2 contains
>>> all values of FIELD1 where FIELD2 contains 3
>>> Which means something like
>>> FIELD2:(FIELD2:3)
>>>
>>> Is it possible to achieve this in a single query? If yes, then how
>>> should I go about it?
>>>
>>>
>>>
>>> Thanks in anticipation,
>>> kapilChhabra
>>>
>>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
Re: Nested Queries
Posted by Steven Rowe <sa...@syr.edu>.
Hi Kapil,
Kapil Chhabra wrote:
> Just to mention, I have tokenized FIELD2 on "," and indexed it.
>
> FIELD2:3 should return 1,2
> FIELD2:(FIELD2:3) should return something like the output of:
>
> *FIELD2: 1 OR FIELD2: 2
Given your data table, I assume you mean:
FIELD1:3 should return 1,2
FIELD1:(FIELD2:3) should return something like the output of:
*FIELD1: 1 OR FIELD1: 2
If you make FIELD1 stored, and search using "FIELD2:3", you can iterate
through the hits and return the values for FIELD1.
Are you looking for a query language that does this work for you? That
is, one which can query on any field and then return information from
other field(s) in matching documents? If so, I don't know of such a
query language that exists for Lucene.
There have been several integrations of Lucene with databases, some of
which enable SQL queries something like:
SELECT FIELD1 WHERE FIELD2 CONTAINS('3');
1. Marcelo Ochoa's Oracle JVM implementation for Lucene DataStore:
<http://issues.apache.org/jira/browse/LUCENE-724>; see also these
threads on the Lucene Java-Users list:
<http://www.nabble.com/Oracle-and-Lucene-Integration-tf2689965.html>
<http://www.nabble.com/Oracle-Lucene-integration--status--tf2865873.html>
2. Mark Harwood's Lucene database bindings for HSQLDB and Derby:
<http://issues.apache.org/jira/browse/LUCENE-434>; see also this thread
on the Lucene Java-Users list:
<http://www.nabble.com/Lucene-database-bindings-tf316816.html>
3. Hibernate (as of v3.1, I think) Lucene Integration:
<http://www.hibernate.org/hib_docs/annotations/reference/en/html/lucene.html>
4. DBSight enables Lucene search with databases: <http://www.dbsight.net/>
Hope it helps,
Steve
> Kapil Chhabra wrote:
>> Hi,
>>
>> Please see the following data-structure
>> +--------+----------+
>> | FIELD1 | FIELD2 |
>> +--------+----------+
>> | 1 | 2,3,4,6, |
>> | 2 | 3,1,5,7, |
>> | 3 | 1,2, |
>> | 4 | 1,8,10, |
>> | 5 | 2,9, |
>> | 6 | 1, |
>> | 7 | 2,9, |
>> | 8 | 4,9, |
>> | 9 | 5,7,8, |
>> | 10 | 4, |
>> +--------+----------+
>>
>> My requirement is to find all values in FIELD1 where FIELD2 contains
>> all values of FIELD1 where FIELD2 contains 3
>> Which means something like
>> FIELD2:(FIELD2:3)
>>
>> Is it possible to achieve this in a single query? If yes, then how
>> should I go about it?
>>
>>
>>
>> Thanks in anticipation,
>> kapilChhabra
>>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Nested Queries
Posted by Grant Ingersoll <gs...@apache.org>.
Hi Kapil,
I am not sure exactly what you asking, could you give an example of
the correct response? Also, are you truly using numbers or are they
just substitutes for text? And are they part of a bigger problem
requiring Lucene? If it is just numbers, maybe a DB might be the
better way to go, since you would have SET operations that may make
this easier. Not saying Lucene can't do what you want, just thinking
there are other ways
-Grant
On Dec 26, 2006, at 4:47 AM, Kapil Chhabra wrote:
> Just to mention, I have tokenized FIELD2 on "," and indexed it.
>
> FIELD2:3 should return 1,2
> FIELD2:(FIELD2:3) should return something like the output of:
>
> *FIELD2: 1 OR FIELD2: 2
>
> *
> Regards,
> kapilChhabra*
> *
>
> Kapil Chhabra wrote:
>> Hi,
>>
>> Please see the following data-structure
>> +--------+----------+
>> | FIELD1 | FIELD2 |
>> +--------+----------+
>> | 1 | 2,3,4,6, |
>> | 2 | 3,1,5,7, |
>> | 3 | 1,2, |
>> | 4 | 1,8,10, |
>> | 5 | 2,9, |
>> | 6 | 1, |
>> | 7 | 2,9, |
>> | 8 | 4,9, |
>> | 9 | 5,7,8, |
>> | 10 | 4, |
>> +--------+----------+
>>
>> My requirement is to find all values in FIELD1 where FIELD2
>> contains all values of FIELD1 where FIELD2 contains 3
>> Which means something like
>> FIELD2:(FIELD2:3)
>>
>> Is it possible to achieve this in a single query? If yes, then how
>> should I go about it?
>>
>>
>>
>> Thanks in anticipation,
>> kapilChhabra
>>
>
--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org
Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Nested Queries
Posted by Kapil Chhabra <ka...@naukri.com>.
Hi All,
Any pointers in this direction?
Thanks in advance.
Kapil
Kapil Chhabra wrote:
> Just to mention, I have tokenized FIELD2 on "," and indexed it.
>
> FIELD2:3 should return 1,2
> FIELD2:(FIELD2:3) should return something like the output of:
>
> *FIELD2: 1 OR FIELD2: 2
>
> *
> Regards,
> kapilChhabra*
> *
>
> Kapil Chhabra wrote:
>> Hi,
>>
>> Please see the following data-structure
>> +--------+----------+
>> | FIELD1 | FIELD2 |
>> +--------+----------+
>> | 1 | 2,3,4,6, |
>> | 2 | 3,1,5,7, |
>> | 3 | 1,2, |
>> | 4 | 1,8,10, |
>> | 5 | 2,9, |
>> | 6 | 1, |
>> | 7 | 2,9, |
>> | 8 | 4,9, |
>> | 9 | 5,7,8, |
>> | 10 | 4, |
>> +--------+----------+
>>
>> My requirement is to find all values in FIELD1 where FIELD2 contains
>> all values of FIELD1 where FIELD2 contains 3
>> Which means something like
>> FIELD2:(FIELD2:3)
>>
>> Is it possible to achieve this in a single query? If yes, then how
>> should I go about it?
>>
>>
>>
>> Thanks in anticipation,
>> kapilChhabra
>>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Nested Queries
Posted by Kapil Chhabra <ka...@naukri.com>.
Just to mention, I have tokenized FIELD2 on "," and indexed it.
FIELD2:3 should return 1,2
FIELD2:(FIELD2:3) should return something like the output of:
*FIELD2: 1 OR FIELD2: 2
*
Regards,
kapilChhabra*
*
Kapil Chhabra wrote:
> Hi,
>
> Please see the following data-structure
> +--------+----------+
> | FIELD1 | FIELD2 |
> +--------+----------+
> | 1 | 2,3,4,6, |
> | 2 | 3,1,5,7, |
> | 3 | 1,2, |
> | 4 | 1,8,10, |
> | 5 | 2,9, |
> | 6 | 1, |
> | 7 | 2,9, |
> | 8 | 4,9, |
> | 9 | 5,7,8, |
> | 10 | 4, |
> +--------+----------+
>
> My requirement is to find all values in FIELD1 where FIELD2 contains
> all values of FIELD1 where FIELD2 contains 3
> Which means something like
> FIELD2:(FIELD2:3)
>
> Is it possible to achieve this in a single query? If yes, then how
> should I go about it?
>
>
>
> Thanks in anticipation,
> kapilChhabra
>