You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by hasghari <ha...@gmail.com> on 2012/02/10 23:31:10 UTC

Nested BlockJoinQuery

I'm trying to learn more about using BlockJoinQuery in our search application
and I came across this blog post by Mike McCandless:
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html

The blog post mentions that it is possible to do joins that can be nested
(parent to child to grandchild) but does not elaborate further.

Could someone please explain how to formulate such a query for the following
use case?

Let's say we want to create a music search application where the lucene
index documents are nested as such:

music genre -> band -> band members

Some sample data:

Rock -> Pink Floyd -> Roger Waters, David Gilmour, Richard Wright, Nick
Mason

Pop -> Michael Jackson -> Michael Jackson

Alternative/Indie -> Waters -> Van Pierszalowski

We would like to search for the term "waters" and be able to find out what
the genre/band are. In the case of the sample data above, we would expect
the result set to include 'Rock/Pink Floyd' because of Roger Waters and
'Alternative/Indie' because of the Waters band name.

It seems like this would be a good candidate for using nested BlockJoinQuery
queries.

Thanks,
Hamed

--
View this message in context: http://lucene.472066.n3.nabble.com/Nested-BlockJoinQuery-tp3733885p3733885.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Nested BlockJoinQuery

Posted by Mark Harwood <ma...@yahoo.co.uk>.
Your requirement does not sound like a good fit for the nested stuff but is probably more one for conventional faceting.

I would characterise the uses for Nested  as follows:

1) The parent of a nested block is typically the "item of interest" that is returned i.e. the search results are a list of the parent items
2) The children (and grandchildren) of the parent all must fit comfortably into RAM (an index-time restriction)
3) There is typically more than one child doc of each child type (otherwise we could happily accommodate the single child's fields on the parent document)
4) The set of children for a parent  is typically not updated frequently as any change to the membership of the set  requires rewriting the whole block of parent plus children.

Examples of things that fit this model are:
a) Resumes of people with many sections on work and education
b) Books with many chapters
c) Products with many components.
d) XML documents

Your example is not a good fit because it breaks several of the characteristics I outlined. A "genre" is an expansive item so would not fit in RAM and undergoes constant change as new "children" are added to the set. Check out Solr faceting for your requirement

Cheers,
Mark 

 



On 10 Feb 2012, at 22:31, hasghari wrote:

> I'm trying to learn more about using BlockJoinQuery in our search application
> and I came across this blog post by Mike McCandless:
> http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
> 
> The blog post mentions that it is possible to do joins that can be nested
> (parent to child to grandchild) but does not elaborate further.
> 
> Could someone please explain how to formulate such a query for the following
> use case?
> 
> Let's say we want to create a music search application where the lucene
> index documents are nested as such:
> 
> music genre -> band -> band members
> 
> Some sample data:
> 
> Rock -> Pink Floyd -> Roger Waters, David Gilmour, Richard Wright, Nick
> Mason
> 
> Pop -> Michael Jackson -> Michael Jackson
> 
> Alternative/Indie -> Waters -> Van Pierszalowski
> 
> We would like to search for the term "waters" and be able to find out what
> the genre/band are. In the case of the sample data above, we would expect
> the result set to include 'Rock/Pink Floyd' because of Roger Waters and
> 'Alternative/Indie' because of the Waters band name.
> 
> It seems like this would be a good candidate for using nested BlockJoinQuery
> queries.
> 
> Thanks,
> Hamed
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Nested-BlockJoinQuery-tp3733885p3733885.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org