You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Yonik Seeley <ys...@gmail.com> on 2016/04/18 00:50:31 UTC

block join rollups

Hey folks, we're at the point of figuring out the API for block join
child rollups for the JSON Facet API.
We already have simple block join faceting:
http://yonik.com/solr-nested-objects/
So now we need an API to carry over more information from children to
parents (say rolling up average rating of all the reviews to the
corresponding parent book objects).

I've gathered some of my notes/thoughts on the API here:
https://issues.apache.org/jira/browse/SOLR-8998

Feedback welcome, and we can discuss here in this thread rather than
cluttering the JIRA.

-Yonik

Re: block join rollups

Posted by Nick Vasilyev <ni...@gmail.com>.
Hi Yonik,

Well, no one replied to this yet, so I thought I'd chime in with some of
the use cases that I am working with. Please note that I am lagging a big
behind the last few releases, so I haven't had time to experiment with Solr
5.3+, I am sure that some of this is included in there already and I am
very excited to play around with the new streaming API, json facets and SQL
interface when I have a bit more time.

I am indexing click stream data into Solr. Each set of records represents a
user's unique visit to our website. They all share a common session id, as
well as several session attributes, such as IP and user attributes if they
log in. Each record represents an individual action, such as a search,
product view or a visit to a particular page, all attributes and data
elements of each request are stored with each record, additionally, session
attributes get copied down to each event item. The current goal of this
system is to provide less tech savvy users with easy access to this data in
a way they can explore it and drill down on particular elements; we are
using Banana for this.

Currently, I have to copy a lot of session fields to each event so I can
filter on them, for example, show all searches for users associated with
organization X. This is super redundant and I am really looking for a
better way. It would be great if I could make parent document fields appear
as if they are a part of child documents.

Additionally, I am counting various events for each session during
processing. For example, I count the number of searches, product views, add
to carts, etc... This information is also indexed in each record. This
allows me to pull up specific events (like product views) where the number
of searches in a given session is greater than X. However, again, indexing
this information for each event creates a lot of redundancy.

Finally, a slightly different use cases involves running functions on items
in a group (even if they aren't a part of the result set) and returning
that as a part of the document. Almost like a dynamically generated
document, based on aggregations from child documents. This is currently
somewhat available, but I can't include it in sort. For example, I am
grouping items on a field, I want to get the minimum value of a field per
group and sort the result (of groups) on that calculated value.

I am not sure if this helps you at all, but wanted to share some of my pain
points, hope it helps.

On Sun, Apr 17, 2016 at 6:50 PM, Yonik Seeley <ys...@gmail.com> wrote:

> Hey folks, we're at the point of figuring out the API for block join
> child rollups for the JSON Facet API.
> We already have simple block join faceting:
> http://yonik.com/solr-nested-objects/
> So now we need an API to carry over more information from children to
> parents (say rolling up average rating of all the reviews to the
> corresponding parent book objects).
>
> I've gathered some of my notes/thoughts on the API here:
> https://issues.apache.org/jira/browse/SOLR-8998
>
> Feedback welcome, and we can discuss here in this thread rather than
> cluttering the JIRA.
>
> -Yonik
>