You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Eric Van Dewoestine <er...@gmail.com> on 2006/12/11 20:12:00 UTC

How to query a parent child relationship returning result set of parents?

We are currently using solr to index various types of content in our
system, several of which allow users to comment on.  What we would
like to do is issue a query on the top level content which also
searches the attached comments but only returns unique top level
documents as results, while still maintaining the option to search and
return comments as an alternative type of search for the user.

The simplest example would probably be that of a blog.  The blog could
be indexed as follows:

id: blog_intId
title: blog title
content: blog content

And the associated comments:

id: comment_intId
title: comment title
content: comment content
parentId: blog_intId

Given this type of layout, how would I go about querying and returning
a list of blogs which contain text in either the blog content or any
of the comments' content?

The only solutions I can come up with would be to:
1) aggregate comment content into the blog content index, allowing me
to query directly on the blog.  However we are expecting the site to
generate many comments, along the lines of hundreds and possibly
thousands.  This also has the downside of requiring duplicate content
in the index if we want to still permit users to search on and return
comments.

2) Use facets to get a list of parent items and issue an additional
query (or hit the database) to pull in the parent content.  Again,
this isn't an ideal solution since we would have to page the results
ourselves since solr's facet parameters don't support an offset.  This
possibly negates any optimizations solr may have for paging regular
queries.  Also, it forces us to issue a second round trip to either
solr or the database to get summary content to display in the search
results list.  It also seems like a poor use case for the facet
functionality in general.

3) Plug into the solr code and implement a custom request handler,
HitCollector, or ...?  I've spent some time digging into the solr code
and I don't see any obvious place to plug this type of functionality
in.  A major concern of mine is performance as well, so I want to
ensure that I can get at and modify the results prior to solr loading
any unnecessary content into memory.

Any thoughts on this are very appreciated.  Any kind of kick start,
pointer, or places to dig into would be very helpful.

-- 
eric

Re: How to query a parent child relationship returning result set of parents?

Posted by Eric Van Dewoestine <er...@gmail.com>.
> You can do pretty much anything you want in a custom request handler, but
> i must admit that off the top of my head i can't think of any elegant way
> to solve your problem.
>
> Most people i know are happy with option #1 :)
>
> -Hoss

I appreciate the input Hoss.  Unfortunately, I don't see option 1
working for us give the number of comments we expect our site to
generate.  If solr had some sort of append command to only index
appended content, then this may be a more viable solution. However,
I'm afraid of the performance impact that will result from re-index
the parent content and all child content every time a new child is
added.

It looks as though option 3 is the 'proper' solution and it's just a
matter of determining what and how to plug it in.  I've seen a couple
topics on the lucene mailing list which seem promising, so now I just
have to figure out how to fit that into the solr environment.

If you or anyone else have any more suggestions, tips, etc. I'd
appreciate the help as I'm a bit time constrained.

Thanks again for the response.

-- 
eric

Re: How to query a parent child relationship returning result set of parents?

Posted by Chris Hostetter <ho...@fucit.org>.
: Given this type of layout, how would I go about querying and returning
: a list of blogs which contain text in either the blog content or any
: of the comments' content?

a big issue is how timely do your comments have to show up in the index
... for some people an acceptible tradeoff is that new/edited "blogs" get
sent to the index immediately, but a cron runing at fixed regular
intervals indexes the comments ... in that approach your first idea is
usually the most straight forward...

: 1) aggregate comment content into the blog content index, allowing me
: to query directly on the blog.  However we are expecting the site to


A hybrid of your second and third suggestions is a much more involved
approach that might also work...

: 2) Use facets to get a list of parent items and issue an additional
: query (or hit the database) to pull in the parent content.  Again,
	...
: 3) Plug into the solr code and implement a custom request handler,
: HitCollector, or ...?  I've spent some time digging into the solr code
: and I don't see any obvious place to plug this type of functionality

You can do pretty much anything you want in a custom request handler, but
i must admit that off the top of my head i can't think of any elegant way
to solve your problem.

Most people i know are happy with option #1 :)

-Hoss