You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Kapil Chhabra <ka...@naukri.com> on 2006/12/26 10:33:41 UTC

Nested Queries

Hi,

Please see the following data-structure
+--------+----------+
| FIELD1 | FIELD2   |
+--------+----------+
| 1      | 2,3,4,6, |
| 2      | 3,1,5,7, |
| 3      | 1,2,     |
| 4      | 1,8,10,  |
| 5      | 2,9,     |
| 6      | 1,       |
| 7      | 2,9,     |
| 8      | 4,9,     |
| 9      | 5,7,8,   |
| 10     | 4,       |
+--------+----------+

My requirement is to find all values in FIELD1 where FIELD2 contains all 
values of FIELD1 where FIELD2 contains 3
Which means something like
FIELD2:(FIELD2:3)

Is it possible to achieve this in a single query? If yes, then how 
should I go about it?



Thanks in anticipation,
kapilChhabra

Re: Nested Queries

Posted by Steven Rowe <sa...@syr.edu>.
Hi Kapil,

Kapil Chhabra wrote:
> Hi Steve,
> Thanks for the response.
> Actually I am not looking for a query language. My question is, whether
> Lucene supports Nested Queries or self joins?
> As per
> http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html
> 
> In BNF, the query grammar is:
> 
>   Query  ::= ( Clause )*
>   Clause ::= ["+", "-"] [<TERM> ":"] ( <TERM> | "(" Query ")" )
> 
> Which means that FIELD2:(FIELD2:3) is a correct query. Correct me if I
> am wrong.
> 
> What will this query translate into? Will it  be same as FIELD2: 1 OR
> FIELD2: 2

"FIELD2:(FIELD2:3)" translates to "FIELD2:3".  This is because the
FieldX in "FieldX:(TermA OR TermB)" is interpreted distributively - this
query is equivalent to "FieldX:TermA OR FieldX:TermB".  A field
specifier on a nested query term or clause overrides the containing
field specifier, so "FIELD1:(FIELD2:3)" translates to "FIELD2:3".

A more complicated example:

    "Field2:(Field3:TermA OR (Field4:TermB AND TermC))"

translates to:

    "Field3:TermA OR (Field4:TermB AND Field2:TermC)"

Lucene does have nested queries, but these are not the same thing as SQL
nested queries.  Unlike SQL nested queries, in which the nested query is
evaluated and the *results* of the nested query are used as input to the
containing query, Lucene's queries are evaluated all at once.

Of course, you could achieve (self) joins with Lucene manually, by
submitting two queries serially, first the nested query, and then the
containing query, constructed with results returned from the nested
query.  But I know of no built-in Lucene functionality that will invoke
the search machinery for you in this fashion[1].

>From <http://lucene.apache.org/java/docs/scoring.html>:

    Lucene scoring uses a combination of the Vector Space
    Model (VSM) of Information Retrieval[2] and the
    Boolean model[3] to determine how relevant a given
    Document is to a User's query. In general, the idea
    behind the VSM is the more times a query term appears
    in a document relative to the number of times the
    term appears in all the documents in the collection,
    the more relevant that document is to the query. It
    uses the Boolean model to first narrow down the
    documents that need to be scored based on the use of
    boolean logic in the Query specification.

Hope it helps,
Steve

[1] There is a tradition of using something like joins in Information
Retrieval: (Pseudo-)Relevance Feedback, in which a subset of the terms
found in a subset of the documents of an initial query's result set are
combined with the intial query's terms to produce an augmented query.
See Grant Ingersoll's ApacheCon 2005 presentation and code at
<http://www.cnlp.org/apachecon2005/> for an implementation of
Pseudo-Relevance Feedback using Lucene.
[2] <http://en.wikipedia.org/wiki/Vector_Space_Model>
[3] <http://en.wikipedia.org/wiki/Standard_Boolean_model>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Nested Queries

Posted by Erick Erickson <er...@gmail.com>.
No. Lucene is a text search engine, NOT a RDBMS. Whenever you think of joins
(self or otherwise), you're thinking in RDMBS terms, which Lucene is not. At
best, you'll have to use one of the DB integrations that Steve mentioned
(assuming they work). But I wouldn't keep looking for any magic internal to
Lucene to solve this problem.

On 12/28/06, Kapil Chhabra <ka...@naukri.com> wrote:
>
> Hi Steve,
> Thanks for the response.
> Actually I am not looking for a query language. My question is, whether
> Lucene supports Nested Queries or self joins?
> As per
>
> http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html
>
> In BNF, the query grammar is:
>
>    Query  ::= ( Clause )*
>    Clause ::= ["+", "-"] [<TERM> ":"] ( <TERM> | "(" Query ")" )
>
> Which means that
> FIELD2:(FIELD2:3)
> is a correct query. Correct me if I am wrong.
>
> What will this query translate into? Will it  be same as
> FIELD2: 1 OR FIELD2: 2
>
>
> Thanks
> kapilChhabra
>
> Steven Rowe wrote:
> > Hi Kapil,
> >
> > Kapil Chhabra wrote:
> >
> >> Just to mention, I have tokenized FIELD2 on "," and indexed it.
> >>
> >> FIELD2:3 should return 1,2
> >> FIELD2:(FIELD2:3) should return something like the output of:
> >>
> >> *FIELD2: 1 OR FIELD2: 2
> >>
> >
> > Given your data table, I assume you mean:
> >
> >    FIELD1:3 should return 1,2
> >    FIELD1:(FIELD2:3) should return something like the output of:
> >
> >    *FIELD1: 1 OR FIELD1: 2
> >
> > If you make FIELD1 stored, and search using "FIELD2:3", you can iterate
> > through the hits and return the values for FIELD1.
> >
> > Are you looking for a query language that does this work for you?  That
> > is, one which can query on any field and then return information from
> > other field(s) in matching documents?  If so, I don't know of such a
> > query language that exists for Lucene.
> >
> > There have been several integrations of Lucene with databases, some of
> > which enable SQL queries something like:
> >
> >     SELECT FIELD1 WHERE FIELD2 CONTAINS('3');
> >
> > 1. Marcelo Ochoa's Oracle JVM implementation for Lucene DataStore:
> > <http://issues.apache.org/jira/browse/LUCENE-724>; see also these
> > threads on the Lucene Java-Users list:
> > <http://www.nabble.com/Oracle-and-Lucene-Integration-tf2689965.html>
> > <http://www.nabble.com/Oracle-Lucene-integration--status--tf2865873.html
> >
> >
> > 2. Mark Harwood's Lucene database bindings for HSQLDB and Derby:
> > <http://issues.apache.org/jira/browse/LUCENE-434>; see also this thread
> > on the Lucene Java-Users list:
> > <http://www.nabble.com/Lucene-database-bindings-tf316816.html>
> >
> > 3. Hibernate (as of v3.1, I think) Lucene Integration:
> > <
> http://www.hibernate.org/hib_docs/annotations/reference/en/html/lucene.html
> >
> >
> > 4. DBSight enables Lucene search with databases: <
> http://www.dbsight.net/>
> >
> >
> > Hope it helps,
> > Steve
> >
> >
> >> Kapil Chhabra wrote:
> >>
> >>> Hi,
> >>>
> >>> Please see the following data-structure
> >>> +--------+----------+
> >>> | FIELD1 | FIELD2   |
> >>> +--------+----------+
> >>> | 1      | 2,3,4,6, |
> >>> | 2      | 3,1,5,7, |
> >>> | 3      | 1,2,     |
> >>> | 4      | 1,8,10,  |
> >>> | 5      | 2,9,     |
> >>> | 6      | 1,       |
> >>> | 7      | 2,9,     |
> >>> | 8      | 4,9,     |
> >>> | 9      | 5,7,8,   |
> >>> | 10     | 4,       |
> >>> +--------+----------+
> >>>
> >>> My requirement is to find all values in FIELD1 where FIELD2 contains
> >>> all values of FIELD1 where FIELD2 contains 3
> >>> Which means something like
> >>> FIELD2:(FIELD2:3)
> >>>
> >>> Is it possible to achieve this in a single query? If yes, then how
> >>> should I go about it?
> >>>
> >>>
> >>>
> >>> Thanks in anticipation,
> >>> kapilChhabra
> >>>
> >>>
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
>
>
>

Re: Nested Queries

Posted by Kapil Chhabra <ka...@naukri.com>.
Hi Steve,
Thanks for the response.
Actually I am not looking for a query language. My question is, whether 
Lucene supports Nested Queries or self joins?
As per 
http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/QueryParser.html

In BNF, the query grammar is:

   Query  ::= ( Clause )*
   Clause ::= ["+", "-"] [<TERM> ":"] ( <TERM> | "(" Query ")" )
 
Which means that 
FIELD2:(FIELD2:3) 
is a correct query. Correct me if I am wrong.

What will this query translate into? Will it  be same as 
FIELD2: 1 OR FIELD2: 2


Thanks
kapilChhabra

Steven Rowe wrote:
> Hi Kapil,
>
> Kapil Chhabra wrote:
>   
>> Just to mention, I have tokenized FIELD2 on "," and indexed it.
>>
>> FIELD2:3 should return 1,2
>> FIELD2:(FIELD2:3) should return something like the output of:
>>
>> *FIELD2: 1 OR FIELD2: 2
>>     
>
> Given your data table, I assume you mean:
>
>    FIELD1:3 should return 1,2
>    FIELD1:(FIELD2:3) should return something like the output of:
>
>    *FIELD1: 1 OR FIELD1: 2
>
> If you make FIELD1 stored, and search using "FIELD2:3", you can iterate
> through the hits and return the values for FIELD1.
>
> Are you looking for a query language that does this work for you?  That
> is, one which can query on any field and then return information from
> other field(s) in matching documents?  If so, I don't know of such a
> query language that exists for Lucene.
>
> There have been several integrations of Lucene with databases, some of
> which enable SQL queries something like:
>
>     SELECT FIELD1 WHERE FIELD2 CONTAINS('3');
>
> 1. Marcelo Ochoa's Oracle JVM implementation for Lucene DataStore:
> <http://issues.apache.org/jira/browse/LUCENE-724>; see also these
> threads on the Lucene Java-Users list:
> <http://www.nabble.com/Oracle-and-Lucene-Integration-tf2689965.html>
> <http://www.nabble.com/Oracle-Lucene-integration--status--tf2865873.html>
>
> 2. Mark Harwood's Lucene database bindings for HSQLDB and Derby:
> <http://issues.apache.org/jira/browse/LUCENE-434>; see also this thread
> on the Lucene Java-Users list:
> <http://www.nabble.com/Lucene-database-bindings-tf316816.html>
>
> 3. Hibernate (as of v3.1, I think) Lucene Integration:
> <http://www.hibernate.org/hib_docs/annotations/reference/en/html/lucene.html>
>
> 4. DBSight enables Lucene search with databases: <http://www.dbsight.net/>
>
>
> Hope it helps,
> Steve
>
>   
>> Kapil Chhabra wrote:
>>     
>>> Hi,
>>>
>>> Please see the following data-structure
>>> +--------+----------+
>>> | FIELD1 | FIELD2   |
>>> +--------+----------+
>>> | 1      | 2,3,4,6, |
>>> | 2      | 3,1,5,7, |
>>> | 3      | 1,2,     |
>>> | 4      | 1,8,10,  |
>>> | 5      | 2,9,     |
>>> | 6      | 1,       |
>>> | 7      | 2,9,     |
>>> | 8      | 4,9,     |
>>> | 9      | 5,7,8,   |
>>> | 10     | 4,       |
>>> +--------+----------+
>>>
>>> My requirement is to find all values in FIELD1 where FIELD2 contains
>>> all values of FIELD1 where FIELD2 contains 3
>>> Which means something like
>>> FIELD2:(FIELD2:3)
>>>
>>> Is it possible to achieve this in a single query? If yes, then how
>>> should I go about it?
>>>
>>>
>>>
>>> Thanks in anticipation,
>>> kapilChhabra
>>>
>>>       
>>     
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>   


Re: Nested Queries

Posted by Steven Rowe <sa...@syr.edu>.
Hi Kapil,

Kapil Chhabra wrote:
> Just to mention, I have tokenized FIELD2 on "," and indexed it.
> 
> FIELD2:3 should return 1,2
> FIELD2:(FIELD2:3) should return something like the output of:
> 
> *FIELD2: 1 OR FIELD2: 2

Given your data table, I assume you mean:

   FIELD1:3 should return 1,2
   FIELD1:(FIELD2:3) should return something like the output of:

   *FIELD1: 1 OR FIELD1: 2

If you make FIELD1 stored, and search using "FIELD2:3", you can iterate
through the hits and return the values for FIELD1.

Are you looking for a query language that does this work for you?  That
is, one which can query on any field and then return information from
other field(s) in matching documents?  If so, I don't know of such a
query language that exists for Lucene.

There have been several integrations of Lucene with databases, some of
which enable SQL queries something like:

    SELECT FIELD1 WHERE FIELD2 CONTAINS('3');

1. Marcelo Ochoa's Oracle JVM implementation for Lucene DataStore:
<http://issues.apache.org/jira/browse/LUCENE-724>; see also these
threads on the Lucene Java-Users list:
<http://www.nabble.com/Oracle-and-Lucene-Integration-tf2689965.html>
<http://www.nabble.com/Oracle-Lucene-integration--status--tf2865873.html>

2. Mark Harwood's Lucene database bindings for HSQLDB and Derby:
<http://issues.apache.org/jira/browse/LUCENE-434>; see also this thread
on the Lucene Java-Users list:
<http://www.nabble.com/Lucene-database-bindings-tf316816.html>

3. Hibernate (as of v3.1, I think) Lucene Integration:
<http://www.hibernate.org/hib_docs/annotations/reference/en/html/lucene.html>

4. DBSight enables Lucene search with databases: <http://www.dbsight.net/>


Hope it helps,
Steve

> Kapil Chhabra wrote:
>> Hi,
>>
>> Please see the following data-structure
>> +--------+----------+
>> | FIELD1 | FIELD2   |
>> +--------+----------+
>> | 1      | 2,3,4,6, |
>> | 2      | 3,1,5,7, |
>> | 3      | 1,2,     |
>> | 4      | 1,8,10,  |
>> | 5      | 2,9,     |
>> | 6      | 1,       |
>> | 7      | 2,9,     |
>> | 8      | 4,9,     |
>> | 9      | 5,7,8,   |
>> | 10     | 4,       |
>> +--------+----------+
>>
>> My requirement is to find all values in FIELD1 where FIELD2 contains
>> all values of FIELD1 where FIELD2 contains 3
>> Which means something like
>> FIELD2:(FIELD2:3)
>>
>> Is it possible to achieve this in a single query? If yes, then how
>> should I go about it?
>>
>>
>>
>> Thanks in anticipation,
>> kapilChhabra
>>
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Nested Queries

Posted by Grant Ingersoll <gs...@apache.org>.
Hi Kapil,

I am not sure exactly what you asking, could you give an example of  
the correct response?  Also, are you truly using numbers or are they  
just substitutes for text?  And are they part of a bigger problem  
requiring Lucene? If it is just numbers, maybe a DB might be the  
better way to go, since you would have SET operations that may make  
this easier.  Not saying Lucene can't do what you want, just thinking  
there are other ways

-Grant

On Dec 26, 2006, at 4:47 AM, Kapil Chhabra wrote:

> Just to mention, I have tokenized FIELD2 on "," and indexed it.
>
> FIELD2:3 should return 1,2
> FIELD2:(FIELD2:3) should return something like the output of:
>
> *FIELD2: 1 OR FIELD2: 2
>
> *
> Regards,
> kapilChhabra*
> *
>
> Kapil Chhabra wrote:
>> Hi,
>>
>> Please see the following data-structure
>> +--------+----------+
>> | FIELD1 | FIELD2   |
>> +--------+----------+
>> | 1      | 2,3,4,6, |
>> | 2      | 3,1,5,7, |
>> | 3      | 1,2,     |
>> | 4      | 1,8,10,  |
>> | 5      | 2,9,     |
>> | 6      | 1,       |
>> | 7      | 2,9,     |
>> | 8      | 4,9,     |
>> | 9      | 5,7,8,   |
>> | 10     | 4,       |
>> +--------+----------+
>>
>> My requirement is to find all values in FIELD1 where FIELD2  
>> contains all values of FIELD1 where FIELD2 contains 3
>> Which means something like
>> FIELD2:(FIELD2:3)
>>
>> Is it possible to achieve this in a single query? If yes, then how  
>> should I go about it?
>>
>>
>>
>> Thanks in anticipation,
>> kapilChhabra
>>
>

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Nested Queries

Posted by Kapil Chhabra <ka...@naukri.com>.
Hi All,
Any pointers in this direction?

Thanks in advance.

Kapil

Kapil Chhabra wrote:
> Just to mention, I have tokenized FIELD2 on "," and indexed it.
>
> FIELD2:3 should return 1,2
> FIELD2:(FIELD2:3) should return something like the output of:
>
> *FIELD2: 1 OR FIELD2: 2
>
> *
> Regards,
> kapilChhabra*
> *
>
> Kapil Chhabra wrote:
>> Hi,
>>
>> Please see the following data-structure
>> +--------+----------+
>> | FIELD1 | FIELD2   |
>> +--------+----------+
>> | 1      | 2,3,4,6, |
>> | 2      | 3,1,5,7, |
>> | 3      | 1,2,     |
>> | 4      | 1,8,10,  |
>> | 5      | 2,9,     |
>> | 6      | 1,       |
>> | 7      | 2,9,     |
>> | 8      | 4,9,     |
>> | 9      | 5,7,8,   |
>> | 10     | 4,       |
>> +--------+----------+
>>
>> My requirement is to find all values in FIELD1 where FIELD2 contains 
>> all values of FIELD1 where FIELD2 contains 3
>> Which means something like
>> FIELD2:(FIELD2:3)
>>
>> Is it possible to achieve this in a single query? If yes, then how 
>> should I go about it?
>>
>>
>>
>> Thanks in anticipation,
>> kapilChhabra
>>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Nested Queries

Posted by Kapil Chhabra <ka...@naukri.com>.
Just to mention, I have tokenized FIELD2 on "," and indexed it.

FIELD2:3 should return 1,2
FIELD2:(FIELD2:3) should return something like the output of:

*FIELD2: 1 OR FIELD2: 2

*
Regards,
kapilChhabra*
*

Kapil Chhabra wrote:
> Hi,
>
> Please see the following data-structure
> +--------+----------+
> | FIELD1 | FIELD2   |
> +--------+----------+
> | 1      | 2,3,4,6, |
> | 2      | 3,1,5,7, |
> | 3      | 1,2,     |
> | 4      | 1,8,10,  |
> | 5      | 2,9,     |
> | 6      | 1,       |
> | 7      | 2,9,     |
> | 8      | 4,9,     |
> | 9      | 5,7,8,   |
> | 10     | 4,       |
> +--------+----------+
>
> My requirement is to find all values in FIELD1 where FIELD2 contains 
> all values of FIELD1 where FIELD2 contains 3
> Which means something like
> FIELD2:(FIELD2:3)
>
> Is it possible to achieve this in a single query? If yes, then how 
> should I go about it?
>
>
>
> Thanks in anticipation,
> kapilChhabra
>