You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Donal Murtagh <do...@yahoo.co.uk> on 2009/07/29 22:42:21 UTC

Querying across object relationships

Hi,

I'm trying to use Lucene to query a domain that has the following structure

    Student 1-------* Attendance *---------1 Course

The data in the domain is summarised below

    Course.name   Attendance.mandatory   Student.name
    -------------------------------------------------
    cooking                        N                      Bob
    art                                Y                      Bob

If I execute the query "+courseName:cooking AND +mandatory:Y" 

it
returns Bob, because Bob is attending the cooking course, and Bob is
also attending a mandatory course. However, what I *really* want to
query for is "students attending a mandatory cooking course", which in
this case would return nobody. Is it possible to formulate this as a
Lucene query?

For the sake of completeness, the domain classes
themselves are shown below. These classes are Grails domain classes,
but I'm using the standard Compass annotations and Lucene query syntax.

Thanks!
- Don

    @Searchable
    class Student {
    
        @SearchableProperty(accessor = 'property')
        String name
        
        static hasMany = [attendances: Attendance]
    
        @SearchableId(accessor = 'property')
        Long id
    
        @SearchableComponent
        Set<Attendance> getAttendances() {
            return attendances
        }
    }
    
    @Searchable(root = false)
    class Attendance {
    
        static belongsTo = [student: Student, course: Course]
    
        @SearchableProperty(accessor = 'property')
        String mandatory = "Y"
    
        @SearchableId(accessor = 'property')
        Long id
    
        @SearchableComponent
        Course getCourse() {
            return course
        }
    }
    
    @Searchable(root = false)
    class Course {
    
        @SearchableProperty(accessor = 'property', name = "courseName")
        String name  
    
        @SearchableId(accessor = 'property')
        Long id
    }


      

Re: Problems with IndexWriter.commit()

Posted by Michael McCandless <lu...@mikemccandless.com>.
Phew :)  Thanks for bringing closure!

Mike

On Thu, Jul 30, 2009 at 3:57 PM, Woolf, Ross<Ro...@bmc.com> wrote:
> This turned out to be my own problem, but using infoStream helped me to discover where my problem was.
>
> Thanks
>
> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Wednesday, July 29, 2009 6:11 PM
> To: java-user@lucene.apache.org
> Subject: Re: Problems with IndexWriter.commit()
>
> This is certainly not expected.
>
> Can you turn on IndexWriter's infoStream and post the result?
>
> Mike
>
> On Wed, Jul 29, 2009 at 7:03 PM, Woolf, Ross<Ro...@bmc.com> wrote:
>> I'm experiencing a problem/unexpected behavior with IndexWriter.commit().  I have an open IndexWriter and I am writing a lot of documents to the index (addDocument).  I call a commit() and the data is committed as expected, but now as I continue to write documents to the index I have lost all caching from the writer (I have kept the same writer open).  Every single addDocument thereafter is treated as if it were followed with a commit right after it and the indexing speed comes to a crawl.
>>
>> I'm assuming this is not expected behavior since the concept of commit is to allow doing commits while keeping the indexWriter open.  Is anyone aware of this problem?  Does anyone know how I can rectify it and allow addDocument's thereafter to processed using the indexWriter caching?  I am on Lucene 2.4.1.
>>
>> Thanks,
>> Ross
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Problems with IndexWriter.commit()

Posted by "Woolf, Ross" <Ro...@BMC.com>.
This turned out to be my own problem, but using infoStream helped me to discover where my problem was.

Thanks

-----Original Message-----
From: Michael McCandless [mailto:lucene@mikemccandless.com] 
Sent: Wednesday, July 29, 2009 6:11 PM
To: java-user@lucene.apache.org
Subject: Re: Problems with IndexWriter.commit()

This is certainly not expected.

Can you turn on IndexWriter's infoStream and post the result?

Mike

On Wed, Jul 29, 2009 at 7:03 PM, Woolf, Ross<Ro...@bmc.com> wrote:
> I'm experiencing a problem/unexpected behavior with IndexWriter.commit().  I have an open IndexWriter and I am writing a lot of documents to the index (addDocument).  I call a commit() and the data is committed as expected, but now as I continue to write documents to the index I have lost all caching from the writer (I have kept the same writer open).  Every single addDocument thereafter is treated as if it were followed with a commit right after it and the indexing speed comes to a crawl.
>
> I'm assuming this is not expected behavior since the concept of commit is to allow doing commits while keeping the indexWriter open.  Is anyone aware of this problem?  Does anyone know how I can rectify it and allow addDocument's thereafter to processed using the indexWriter caching?  I am on Lucene 2.4.1.
>
> Thanks,
> Ross
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Problems with IndexWriter.commit()

Posted by Michael McCandless <lu...@mikemccandless.com>.
This is certainly not expected.

Can you turn on IndexWriter's infoStream and post the result?

Mike

On Wed, Jul 29, 2009 at 7:03 PM, Woolf, Ross<Ro...@bmc.com> wrote:
> I'm experiencing a problem/unexpected behavior with IndexWriter.commit().  I have an open IndexWriter and I am writing a lot of documents to the index (addDocument).  I call a commit() and the data is committed as expected, but now as I continue to write documents to the index I have lost all caching from the writer (I have kept the same writer open).  Every single addDocument thereafter is treated as if it were followed with a commit right after it and the indexing speed comes to a crawl.
>
> I'm assuming this is not expected behavior since the concept of commit is to allow doing commits while keeping the indexWriter open.  Is anyone aware of this problem?  Does anyone know how I can rectify it and allow addDocument's thereafter to processed using the indexWriter caching?  I am on Lucene 2.4.1.
>
> Thanks,
> Ross
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Problems with IndexWriter.commit()

Posted by "Woolf, Ross" <Ro...@BMC.com>.
I'm experiencing a problem/unexpected behavior with IndexWriter.commit().  I have an open IndexWriter and I am writing a lot of documents to the index (addDocument).  I call a commit() and the data is committed as expected, but now as I continue to write documents to the index I have lost all caching from the writer (I have kept the same writer open).  Every single addDocument thereafter is treated as if it were followed with a commit right after it and the indexing speed comes to a crawl.  

I'm assuming this is not expected behavior since the concept of commit is to allow doing commits while keeping the indexWriter open.  Is anyone aware of this problem?  Does anyone know how I can rectify it and allow addDocument's thereafter to processed using the indexWriter caching?  I am on Lucene 2.4.1.

Thanks,
Ross

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Querying across object relationships

Posted by Paolo DiCanio <do...@yahoo.co.uk>.
The domain classes are defined as Groovy classes with compass annotations
(see my original post).
Each class maps directly to a DB table and when the application starts up,
Compass automatically reads the relevant tables and adds the data to the
index.



Lukáš Vlček wrote:
> 
> Don,
> To me it seems as if there is only one document in your index, and
> moreover
> the only document has mutifield courseName and mandatory fields (this
> means
> you will get the same result even if you query +courseName:art
> +mandatory:N).
> Do you think you can share how you create your domain objects and how you
> push them into index?
> 
> Did you check you transaction logic? Are you sure you indexed all the
> domain
> objects you wanted?
> 
> Lukas
> 
> 

-- 
View this message in context: http://www.nabble.com/Querying-across-object-relationships-tp24727196p24747695.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Querying across object relationships

Posted by Lukáš Vlček <lu...@gmail.com>.
Don,
To me it seems as if there is only one document in your index, and moreover
the only document has mutifield courseName and mandatory fields (this means
you will get the same result even if you query +courseName:art +mandatory:N).
Do you think you can share how you create your domain objects and how you
push them into index?

Did you check you transaction logic? Are you sure you indexed all the domain
objects you wanted?

Lukas

http://blog.lukas-vlcek.com/


On Thu, Jul 30, 2009 at 9:10 PM, Donal Murtagh <do...@yahoo.co.uk> wrote:

> Basically the classes I'm indexing have the following relationships:
>
> Student 1------* Attendance 1------* Course
>
> The
> only root class is Student, i.e. only instances of this class can be
> returned from a search. I have a Student object graph that could be
> represented in JSON as follows:
>
> {
>  name: Bob,
>  attendances: [
>    {mandatory: N, course: {name: cooking}},
>    {mandatory: Y, course: {name: art}}]
> }
>
> When I search for an instance of Student using the query:
>
>  "+courseName:cooking +mandatory:Y"
>
> Bob
> is returned because, because he attends a course named "cooking" and he
> attends a mandatory course (named "art).. But what I really want to
> search for is students that attend a mandatory cooking course. It
> doesn't appear to be possible to do this based on the responses
> provided here:
>
> http://stackoverflow.com/questions/1202422/lucene-query-syntax/1203186#1203186
>
> I
> opened the Student index in Luke, exported it to XML and have appended
> the results here:
> http://pastebin.com/m6e5bbcf3
>
> I don't really know how to interpret this
> myself, but thanks in advance for any further help you can provide.
>
> - Don
>
>
>
>

Re: Querying across object relationships

Posted by Lukáš Vlček <lu...@gmail.com>.
Hi,

Lucene Document is a set of fields. Each field has a name and a textual
value. There is no notion of nested fields (filed inside a filed). Do not
focus too much on the XML representation of the index obtained from Luke.
Read Lucene documentation instead.
When indexing a java bean then what in fact has to happen is that you have
to transform tree-like data structure to a linear form: to one vector
(document) of vectors (fields). And this means that you have to lose some
information. In this case you are loosing mandatory - courseName relation.
Thus you have to keep this relation in arbitrary field in index - as long as
you want to have just ONE student Bob in your index. One option here is what
I have described before - adding a new fields into Student. These fileds
will somehow keep the mandatory - courseName relation.

If you can have many Bobs in the index then the Student class can not be a
searchable root and you have to change you Compass mappings.

Regards,
Lukas

http://blog.lukas-vlcek.com/


On Thu, Jul 30, 2009 at 11:36 PM, Paolo DiCanio <do...@yahoo.co.uk>wrote:

>
> Thanks Steven,
>
> I guess the index structure that I need in order to perform my query is:
>
> <doc id='1'>
>    <field name='courseName'>
>        <val>cooking</val>
>         <field name='mandatory'>
>            <val>N</val>
>         </field>
>    </field>
>    <field name='courseName'>
>        <val>art</val>
>        <field name='mandatory'>
>             <val>Y</val>
>        </field>
>    <field name='name'>
>        <val>Bob</val>
>    </field>
> </doc>
>
> But I'm not sure how to map my domain classes in order to achieve this (or
> even if it's possible)
>
>
> Steven A Rowe wrote:
> >
> > Hi Donal,
> >
> > I looked at the XML index dump you provided, and I can see that there is
> > only one document in the index.  This document matches your query.  I've
> > pasted it below, without the "$/*"-named fields I'm assuming Compass adds
> > to manage Lucene document -> Grails object mapping, and with just the
> > "name" attribute on the field elements:
> >
> >   <doc id='1'>
> >     <field name='courseName'>
> >       <val>cooking</val>
> >       <val>art</val>
> >     </field>
> >     <field name='mandatory'>
> >       <val>N</val>
> >       <val>Y</val>
> >     </field>
> >     <field name='name'>
> >      <val>Bob</val>
> >     </field>
> >   </doc>
> >
> > Compass's Lucene document to Grails object mapping is your problem here.
> >
> > In Lucene-land, the query (+courseName:cooking +mandatory:Y) matches the
> > above document, because the document contains those values in those
> > fields.
> >
> > So with that query, based on the Lucene document structure, you seem to
> be
> > asking the question: "Which student attends a cooking course and also
> > attends a mandatory course?".  Bob is a match.
> >
> > Steve
> >
> >> -----Original Message-----
> >> From: Donal Murtagh [mailto:domurtag@yahoo.co.uk]
> >> Sent: Thursday, July 30, 2009 3:10 PM
> >> To: java-user@lucene.apache.org
> >> Subject: Re: Querying across object relationships
> >>
> >> Basically the classes I'm indexing have the following relationships:
> >>
> >> Student 1------* Attendance 1------* Course
> >>
> >> The
> >> only root class is Student, i.e. only instances of this class can be
> >> returned from a search. I have a Student object graph that could be
> >> represented in JSON as follows:
> >>
> >> {
> >>   name: Bob,
> >>   attendances: [
> >>     {mandatory: N, course: {name: cooking}},
> >>     {mandatory: Y, course: {name: art}}]
> >> }
> >>
> >> When I search for an instance of Student using the query:
> >>
> >>   "+courseName:cooking +mandatory:Y"
> >>
> >> Bob
> >> is returned because, because he attends a course named "cooking" and he
> >> attends a mandatory course (named "art).. But what I really want to
> >> search for is students that attend a mandatory cooking course. It
> >> doesn't appear to be possible to do this based on the responses
> >> provided here:
> >> http://stackoverflow.com/questions/1202422/lucene-query-
> >> syntax/1203186#1203186
> >>
> >> I
> >> opened the Student index in Luke, exported it to XML and have appended
> >> the results here:
> >> http://pastebin.com/m6e5bbcf3
> >>
> >> I don't really know how to interpret this
> >> myself, but thanks in advance for any further help you can provide.
> >>
> >> - Don
> >>
> >>
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Querying-across-object-relationships-tp24727196p24747745.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Querying across object relationships

Posted by Paolo DiCanio <do...@yahoo.co.uk>.
Thanks Steven,

I guess the index structure that I need in order to perform my query is:

<doc id='1'>
    <field name='courseName'>
        <val>cooking</val>
        <field name='mandatory'>
            <val>N</val>
        </field>
    </field>
    <field name='courseName'>
        <val>art</val>
        <field name='mandatory'>
            <val>Y</val>
        </field>
    <field name='name'>
        <val>Bob</val>
    </field>
</doc> 

But I'm not sure how to map my domain classes in order to achieve this (or
even if it's possible)


Steven A Rowe wrote:
> 
> Hi Donal,
> 
> I looked at the XML index dump you provided, and I can see that there is
> only one document in the index.  This document matches your query.  I've
> pasted it below, without the "$/*"-named fields I'm assuming Compass adds
> to manage Lucene document -> Grails object mapping, and with just the
> "name" attribute on the field elements:
> 
>   <doc id='1'>
>     <field name='courseName'>
>       <val>cooking</val>
>       <val>art</val>
>     </field>
>     <field name='mandatory'>
>       <val>N</val>
>       <val>Y</val>
>     </field>
>     <field name='name'>
>      <val>Bob</val>
>     </field>
>   </doc>
> 
> Compass's Lucene document to Grails object mapping is your problem here.
> 
> In Lucene-land, the query (+courseName:cooking +mandatory:Y) matches the
> above document, because the document contains those values in those
> fields.
> 
> So with that query, based on the Lucene document structure, you seem to be
> asking the question: "Which student attends a cooking course and also
> attends a mandatory course?".  Bob is a match.
> 
> Steve
> 
>> -----Original Message-----
>> From: Donal Murtagh [mailto:domurtag@yahoo.co.uk]
>> Sent: Thursday, July 30, 2009 3:10 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: Querying across object relationships
>> 
>> Basically the classes I'm indexing have the following relationships:
>> 
>> Student 1------* Attendance 1------* Course
>> 
>> The
>> only root class is Student, i.e. only instances of this class can be
>> returned from a search. I have a Student object graph that could be
>> represented in JSON as follows:
>> 
>> {
>>   name: Bob,
>>   attendances: [
>>     {mandatory: N, course: {name: cooking}},
>>     {mandatory: Y, course: {name: art}}]
>> }
>> 
>> When I search for an instance of Student using the query:
>> 
>>   "+courseName:cooking +mandatory:Y"
>> 
>> Bob
>> is returned because, because he attends a course named "cooking" and he
>> attends a mandatory course (named "art).. But what I really want to
>> search for is students that attend a mandatory cooking course. It
>> doesn't appear to be possible to do this based on the responses
>> provided here:
>> http://stackoverflow.com/questions/1202422/lucene-query-
>> syntax/1203186#1203186
>> 
>> I
>> opened the Student index in Luke, exported it to XML and have appended
>> the results here:
>> http://pastebin.com/m6e5bbcf3
>> 
>> I don't really know how to interpret this
>> myself, but thanks in advance for any further help you can provide.
>> 
>> - Don
>> 
>> 
>> 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Querying-across-object-relationships-tp24727196p24747745.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Querying across object relationships

Posted by Lukáš Vlček <lu...@gmail.com>.
Don,
may be you could try this:

   @Searchable   @SearchableDynamicMetaData(name="noAttending",
converter="groovy", expression=" ... here iterate over all attendances where
attendance = N and output all course names")
   @SearchableDynamicMetaData(name="yesAttending", converter="groovy",
expression=" ... here iterate over all attendances where attendance = Y and
output all course names")
   class Student {

       @SearchableProperty(accessor = 'property')
       String name

       static hasMany = [attendances: Attendance]

       @SearchableId(accessor = 'property')
       Long id

       @SearchableComponent
       Set<Attendance> getAttendances() {
           return attendances
       }
   }

In essence this will add two new fields for each student. Each filed will
contain a list of all cources which the student does attend/ does not
attend.
I think that using groovy you should be able to get suitable expression (may
be something like: data.attendances.findAll( it.mandatory == "Y").collect(
it.course.name ).join(' ')). Then you can query like: +yesAttending:cooking

Anyway, you should consider moving this conversation to Compass formu.

Lukas

http://blog.lukas-vlcek.com/


On Thu, Jul 30, 2009 at 10:00 PM, Lukáš Vlček <lu...@gmail.com> wrote:

> Don,
> in order to use such query you have to keep mandatory and courseName
> relation in your index. In Compass you could use dynamic metadata (
> http://www.compass-project.org/docs/2.2.0/reference/html/core-osem.html#core-osem-dynamic).
> This way you can add additional fileds into your document. You can create an
> artificial field which is combination of mandatory and courseName. But may
> be re-thinking your object model would be better in this case.
>
> Lukas
>
> http://blog.lukas-vlcek.com/
>
>
> On Thu, Jul 30, 2009 at 9:42 PM, Steven A Rowe <sa...@syr.edu> wrote:
>
>> Hi Donal,
>>
>> I looked at the XML index dump you provided, and I can see that there is
>> only one document in the index.  This document matches your query.  I've
>> pasted it below, without the "$/*"-named fields I'm assuming Compass adds to
>> manage Lucene document -> Grails object mapping, and with just the "name"
>> attribute on the field elements:
>>
>>  <doc id='1'>
>>    <field name='courseName'>
>>      <val>cooking</val>
>>      <val>art</val>
>>    </field>
>>    <field name='mandatory'>
>>      <val>N</val>
>>      <val>Y</val>
>>    </field>
>>    <field name='name'>
>>     <val>Bob</val>
>>    </field>
>>  </doc>
>>
>> Compass's Lucene document to Grails object mapping is your problem here.
>>
>> In Lucene-land, the query (+courseName:cooking +mandatory:Y) matches the
>> above document, because the document contains those values in those fields.
>>
>> So with that query, based on the Lucene document structure, you seem to be
>> asking the question: "Which student attends a cooking course and also
>> attends a mandatory course?".  Bob is a match.
>>
>> Steve
>>
>> > -----Original Message-----
>> > From: Donal Murtagh [mailto:domurtag@yahoo.co.uk]
>> > Sent: Thursday, July 30, 2009 3:10 PM
>> > To: java-user@lucene.apache.org
>> > Subject: Re: Querying across object relationships
>> >
>> > Basically the classes I'm indexing have the following relationships:
>> >
>> > Student 1------* Attendance 1------* Course
>> >
>> > The
>> > only root class is Student, i.e. only instances of this class can be
>> > returned from a search. I have a Student object graph that could be
>> > represented in JSON as follows:
>> >
>> > {
>> >   name: Bob,
>> >   attendances: [
>> >     {mandatory: N, course: {name: cooking}},
>> >     {mandatory: Y, course: {name: art}}]
>> > }
>> >
>> > When I search for an instance of Student using the query:
>> >
>> >   "+courseName:cooking +mandatory:Y"
>> >
>> > Bob
>> > is returned because, because he attends a course named "cooking" and he
>> > attends a mandatory course (named "art).. But what I really want to
>> > search for is students that attend a mandatory cooking course. It
>> > doesn't appear to be possible to do this based on the responses
>> > provided here:
>> > http://stackoverflow.com/questions/1202422/lucene-query-
>> > syntax/1203186#1203186
>> >
>> > I
>> > opened the Student index in Luke, exported it to XML and have appended
>> > the results here:
>> > http://pastebin.com/m6e5bbcf3
>> >
>> > I don't really know how to interpret this
>> > myself, but thanks in advance for any further help you can provide.
>> >
>> > - Don
>> >
>> >
>> >
>> >
>>
>
>

Re: Querying across object relationships

Posted by Lukáš Vlček <lu...@gmail.com>.
Don,
in order to use such query you have to keep mandatory and courseName
relation in your index. In Compass you could use dynamic metadata (
http://www.compass-project.org/docs/2.2.0/reference/html/core-osem.html#core-osem-dynamic).
This way you can add additional fileds into your document. You can create an
artificial field which is combination of mandatory and courseName. But may
be re-thinking your object model would be better in this case.

Lukas

http://blog.lukas-vlcek.com/


On Thu, Jul 30, 2009 at 9:42 PM, Steven A Rowe <sa...@syr.edu> wrote:

> Hi Donal,
>
> I looked at the XML index dump you provided, and I can see that there is
> only one document in the index.  This document matches your query.  I've
> pasted it below, without the "$/*"-named fields I'm assuming Compass adds to
> manage Lucene document -> Grails object mapping, and with just the "name"
> attribute on the field elements:
>
>  <doc id='1'>
>    <field name='courseName'>
>      <val>cooking</val>
>      <val>art</val>
>    </field>
>    <field name='mandatory'>
>      <val>N</val>
>      <val>Y</val>
>    </field>
>    <field name='name'>
>     <val>Bob</val>
>    </field>
>  </doc>
>
> Compass's Lucene document to Grails object mapping is your problem here.
>
> In Lucene-land, the query (+courseName:cooking +mandatory:Y) matches the
> above document, because the document contains those values in those fields.
>
> So with that query, based on the Lucene document structure, you seem to be
> asking the question: "Which student attends a cooking course and also
> attends a mandatory course?".  Bob is a match.
>
> Steve
>
> > -----Original Message-----
> > From: Donal Murtagh [mailto:domurtag@yahoo.co.uk]
> > Sent: Thursday, July 30, 2009 3:10 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Querying across object relationships
> >
> > Basically the classes I'm indexing have the following relationships:
> >
> > Student 1------* Attendance 1------* Course
> >
> > The
> > only root class is Student, i.e. only instances of this class can be
> > returned from a search. I have a Student object graph that could be
> > represented in JSON as follows:
> >
> > {
> >   name: Bob,
> >   attendances: [
> >     {mandatory: N, course: {name: cooking}},
> >     {mandatory: Y, course: {name: art}}]
> > }
> >
> > When I search for an instance of Student using the query:
> >
> >   "+courseName:cooking +mandatory:Y"
> >
> > Bob
> > is returned because, because he attends a course named "cooking" and he
> > attends a mandatory course (named "art).. But what I really want to
> > search for is students that attend a mandatory cooking course. It
> > doesn't appear to be possible to do this based on the responses
> > provided here:
> > http://stackoverflow.com/questions/1202422/lucene-query-
> > syntax/1203186#1203186
> >
> > I
> > opened the Student index in Luke, exported it to XML and have appended
> > the results here:
> > http://pastebin.com/m6e5bbcf3
> >
> > I don't really know how to interpret this
> > myself, but thanks in advance for any further help you can provide.
> >
> > - Don
> >
> >
> >
> >
>

RE: Querying across object relationships

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Donal,

I looked at the XML index dump you provided, and I can see that there is only one document in the index.  This document matches your query.  I've pasted it below, without the "$/*"-named fields I'm assuming Compass adds to manage Lucene document -> Grails object mapping, and with just the "name" attribute on the field elements:

  <doc id='1'>
    <field name='courseName'>
      <val>cooking</val>
      <val>art</val>
    </field>
    <field name='mandatory'>
      <val>N</val>
      <val>Y</val>
    </field>
    <field name='name'>
     <val>Bob</val>
    </field>
  </doc>

Compass's Lucene document to Grails object mapping is your problem here.

In Lucene-land, the query (+courseName:cooking +mandatory:Y) matches the above document, because the document contains those values in those fields.

So with that query, based on the Lucene document structure, you seem to be asking the question: "Which student attends a cooking course and also attends a mandatory course?".  Bob is a match.

Steve

> -----Original Message-----
> From: Donal Murtagh [mailto:domurtag@yahoo.co.uk]
> Sent: Thursday, July 30, 2009 3:10 PM
> To: java-user@lucene.apache.org
> Subject: Re: Querying across object relationships
> 
> Basically the classes I'm indexing have the following relationships:
> 
> Student 1------* Attendance 1------* Course
> 
> The
> only root class is Student, i.e. only instances of this class can be
> returned from a search. I have a Student object graph that could be
> represented in JSON as follows:
> 
> {
>   name: Bob,
>   attendances: [
>     {mandatory: N, course: {name: cooking}},
>     {mandatory: Y, course: {name: art}}]
> }
> 
> When I search for an instance of Student using the query:
> 
>   "+courseName:cooking +mandatory:Y"
> 
> Bob
> is returned because, because he attends a course named "cooking" and he
> attends a mandatory course (named "art).. But what I really want to
> search for is students that attend a mandatory cooking course. It
> doesn't appear to be possible to do this based on the responses
> provided here:
> http://stackoverflow.com/questions/1202422/lucene-query-
> syntax/1203186#1203186
> 
> I
> opened the Student index in Luke, exported it to XML and have appended
> the results here:
> http://pastebin.com/m6e5bbcf3
> 
> I don't really know how to interpret this
> myself, but thanks in advance for any further help you can provide.
> 
> - Don
> 
> 
> 
> 

Re: Querying across object relationships

Posted by Donal Murtagh <do...@yahoo.co.uk>.
Basically the classes I'm indexing have the following relationships:

Student 1------* Attendance 1------* Course

The
only root class is Student, i.e. only instances of this class can be
returned from a search. I have a Student object graph that could be
represented in JSON as follows:

{
  name: Bob,
  attendances: [
    {mandatory: N, course: {name: cooking}},
    {mandatory: Y, course: {name: art}}]
}

When I search for an instance of Student using the query:

  "+courseName:cooking +mandatory:Y"

Bob
is returned because, because he attends a course named "cooking" and he
attends a mandatory course (named "art).. But what I really want to
search for is students that attend a mandatory cooking course. It
doesn't appear to be possible to do this based on the responses
provided here:
http://stackoverflow.com/questions/1202422/lucene-query-syntax/1203186#1203186

I
opened the Student index in Luke, exported it to XML and have appended
the results here: 
http://pastebin.com/m6e5bbcf3

I don't really know how to interpret this
myself, but thanks in advance for any further help you can provide.

- Don



      

RE: Querying across object relationships

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Donal,

I'm not familiar with Compass annotations, so forgive my ignorance, but it's not clear to me what your documents look like, or how a Lucene document corresponds to your objects.

What does the document you get as a hit when you search look like?  That is, what fields are defined on it?

When you look in Luke, what do you see?  Are the documents heterogeneous?  That is, do some documents have different fields defined on them than others?

Also, you wrote:
> The only reason I was using "AND" in my query, was to be explicit about
> how the predicates should be combined, rather than relying on the
> default (which is in fact "AND").

What you're missing is that Lucene's query syntax does not exactly incorporate Boolean logic.  See <http://lucene.apache.org/java/2_4_1/queryparsersyntax.html#Boolean%20operators> for more information; also, there is a discussion on the Lucene wiki that may be useful to you: <http://wiki.apache.org/lucene-java/BooleanQuerySyntax>.  In particular, using the "+" operator in front of a term means that it MUST be present; a match will not occur unless the specified field with a term matching the specified value is present on a document.

So, somehow, the hit you're getting (even though you don't think you should be) matches *both* courseName:cooking and mandatory:Y.

That is, your problem is one of document structure, not query syntax.  I suspect you'll need to further investigate how Compass annotations affect document structure to resolve this issue.

Steve


Re: Querying across object relationships

Posted by Donal Murtagh <do...@yahoo.co.uk>.
Hi,

I tried your suggestion:

"+courseName:cooking +mandatory:Y"

but
it still matches the student who attends a non-mandatory cooking
course, and another mandatory course, which is not what I want. The
only reason I was using "AND" in my query, was to be explicit about how
the predicates should be combined, rather than relying on the default
(which is in fact "AND"). Incidentally, I also posted a description of
my problem to stackoverflow. The most useful response I got was:


"What you are trying to do is sometimes known
as "scoped search" or "xml search" - the ability to search based on a
set of related sub-elements. Lucene does not support this natively but
there are some tricks you can do to get it to work. You can put all of the course data associated with a student in a
single field. Then bump the term position by a fixed amount (like 100)
between the terms for each course. You can then do a proximity search
with phrase queries or span queries to force a match for attributes of
a single course. This is how Solr supports multi-valued fields." 
So
it seems like Lucene simply doesn't support this kind of query. This is
surprising to me, but based on my experience over the last few days, it
appears to be true. I've tried using Luke to check the data in the
index, but I'm not entirely sure what to look for. Any help you could
provide with this would be much appreciated.

Cheers,
Don


      

Re: Querying across object relationships

Posted by Lukáš Vlček <lu...@gmail.com>.
Hi,

this is interesting but why do you use "AND" in your query when both the
term are a MUST (they have +). See
http://lucene.apache.org/java/2_4_1/queryparsersyntax.html for more details
about Lucene query syntax.

Try dropping the AND and try the following query:
+courseName:cooking +mandatory:Y
This query means that both the terms must be met.

If this does not work check that the default operator is set to AND and also
you can directly check what data is in your index (using Luke for example).

Let me know if this helps.

Regards,
Lukas

http://blog.lukas-vlcek.com/


On Thu, Jul 30, 2009 at 4:10 PM, Donal Murtagh <do...@yahoo.co.uk> wrote:

> Hi Phil,
>
> I don't really have any query parsing/generation code to send you, because
> I'm not using Lucene directly. I'm using the Grails Searchable Plugin,
> which builds on both Lucene and Compass. The only relevant information
> I can give you is my Grails domain classes which show how I've mapped
> my classes to the Lucene search index.
>
> @Searchable
> class Student {
>
>    @SearchableProperty(accessor = 'property')
>    String name
>
>    static hasMany = [attendances: Attendance]
>
>    @SearchableId(accessor = 'property')
>    Long id
>
>    @SearchableComponent
>    Set<Attendance> getAttendances() {
>        return attendances
>    }
> }
>
> @Searchable(root = false)
> class Attendance {
>
>     static searchable = true
>     static belongsTo = [student: Student, course: Course]
>
>    @SearchableProperty(accessor = 'property')
>    String mandatory = "Y"
>
>    @SearchableId(accessor = 'property')
>    Long id
>
>    @SearchableComponent
>    Course getCourse() {
>        return course
>    }
> }
>
> @Searchable(root = false)
> class Course {
>
>    @SearchableProperty(accessor = 'property', name = "courseName")
>    String name
>
>    @SearchableId(accessor = 'property')
>    Long id
> }
>
> In order to execute a search I simply provide a Lucene query string such as
> "+courseName:cooking AND +mandatory:Y"
>
> Cheers,
> Don
>
>
>

Re: Querying across object relationships

Posted by Donal Murtagh <do...@yahoo.co.uk>.
Hi Phil,

I don't really have any query parsing/generation code to send you, because I'm not using Lucene directly. I'm using the Grails Searchable Plugin,
which builds on both Lucene and Compass. The only relevant information
I can give you is my Grails domain classes which show how I've mapped
my classes to the Lucene search index.

@Searchable
class Student {

    @SearchableProperty(accessor = 'property')
    String name
    
    static hasMany = [attendances: Attendance]

    @SearchableId(accessor = 'property')
    Long id

    @SearchableComponent
    Set<Attendance> getAttendances() {
        return attendances
    }
}

@Searchable(root = false)
class Attendance {

    static searchable = true
    static belongsTo = [student: Student, course: Course]

    @SearchableProperty(accessor = 'property')
    String mandatory = "Y"

    @SearchableId(accessor = 'property')
    Long id

    @SearchableComponent
    Course getCourse() {
        return course
    }
} 

@Searchable(root = false)
class Course {

    @SearchableProperty(accessor = 'property', name = "courseName")
    String name  

    @SearchableId(accessor = 'property')
    Long id
}

In order to execute a search I simply provide a Lucene query string such as "+courseName:cooking AND +mandatory:Y"

Cheers,
Don


      

Re: Querying across object relationships

Posted by Phil Whelan <ph...@gmail.com>.
Hi Don,

On Wed, Jul 29, 2009 at 1:42 PM, Donal Murtagh<do...@yahoo.co.uk> wrote:
>    Course.name   Attendance.mandatory   Student.name
>    -------------------------------------------------
>    cooking                        N                      Bob
>    art                                Y                      Bob
>
> If I execute the query "+courseName:cooking AND +mandatory:Y"
> it  returns Bob, because Bob is attending the cooking course, and Bob is
> also attending a mandatory course.

What you're describing is "+courseName:cooking OR +mandatory:Y".

What you want is what you've written above, "+courseName:cooking AND
+mandatory:Y". I'm not sure why you have a result for this AND query,
when it does not match any of the documents listed.

Can you send the code for the query parsing / generation you're using?

Thanks,
Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Querying across object relationships

Posted by Renaud Delbru <re...@deri.org>.
Hi Donal,

We released SIREn [1], a plugin for Lucene that allows indexing and 
querying of semi-structured data, a few days ago. Your use case seems to 
match perfectly what SIREn can do.

SIREn enables the indexing of semi-structured data into a Lucene field, 
and offers additional query components to build programmatically 
semi-structured queries. SIREn is currently indexing tabular data, i.e. 
data composed of rows and columns.

For example, for your use case, you can create a SIREn's field that will 
contain the following table

    Course.name   Attendance.mandatory
    ----------------------------------
    cooking                        N  
    art                            Y

Course.name is the first column of the SIREn's table, 
Attendance.mandatory the second column. SIREn does not have limitation 
on the number of columns, which means that you can index additional 
information than the course.name or attendance.mandatory, for example 
the course.location or professor.name. Each row (or SIREn's tuple) is 
one of your database entry. For example, the first row is {cooking, N}. 
The Student.name is index into a normal Lucene's field in order to be 
able to retrieve it. To summarize, your Lucene's document schema will 
look like:

Doc {
 - name: Bob
 - content: {cooking, N}, {art, Y}
}

The 'content' field is created using SIREn, and index two tuples: 
{cooking, N} and {art, Y}.
Then, you can retrieve, using SIREn's query components, all documents 
that matches certain tuples, such as {cooking, Y}. In this example, this 
will return nothing since there is no tuples containing {cooking, Y}.

You can have a look at the IMDB indexing and querying example [2]. It 
shows how to index and query tabular data of this kind. If you need some 
help, feel free to ask your questions in our mailing list.

[1] http://siren.sindice.com
[2] 
https://dev.deri.ie/confluence/display/SIREn/Indexing+and+Searching+Tabular+Data

Best Regards,
-- 
Renaud Delbru

Donal Murtagh wrote:
> Hi,
>
> I'm trying to use Lucene to query a domain that has the following structure
>
>     Student 1-------* Attendance *---------1 Course
>
> The data in the domain is summarised below
>
>     Course.name   Attendance.mandatory   Student.name
>     -------------------------------------------------
>     cooking                        N                      Bob
>     art                                Y                      Bob
>
> If I execute the query "+courseName:cooking AND +mandatory:Y" 
>
> it
> returns Bob, because Bob is attending the cooking course, and Bob is
> also attending a mandatory course. However, what I *really* want to
> query for is "students attending a mandatory cooking course", which in
> this case would return nobody. Is it possible to formulate this as a
> Lucene query?
>
> For the sake of completeness, the domain classes
> themselves are shown below. These classes are Grails domain classes,
> but I'm using the standard Compass annotations and Lucene query syntax.
>
> Thanks!
> - Don
>
>     @Searchable
>     class Student {
>     
>         @SearchableProperty(accessor = 'property')
>         String name
>         
>         static hasMany = [attendances: Attendance]
>     
>         @SearchableId(accessor = 'property')
>         Long id
>     
>         @SearchableComponent
>         Set<Attendance> getAttendances() {
>             return attendances
>         }
>     }
>     
>     @Searchable(root = false)
>     class Attendance {
>     
>         static belongsTo = [student: Student, course: Course]
>     
>         @SearchableProperty(accessor = 'property')
>         String mandatory = "Y"
>     
>         @SearchableId(accessor = 'property')
>         Long id
>     
>         @SearchableComponent
>         Course getCourse() {
>             return course
>         }
>     }
>     
>     @Searchable(root = false)
>     class Course {
>     
>         @SearchableProperty(accessor = 'property', name = "courseName")
>         String name  
>     
>         @SearchableId(accessor = 'property')
>         Long id
>     }
>
>
>       
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org