You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Darin Amos <da...@gmail.com> on 2014/12/08 19:50:44 UTC

Custom Rollup (Join) Query

Hello,

I posted this question within another thread and I think it got lost so I wanted to start a new thread about it. I have built a small POC for a customization I am hoping to get some validation on in case what I have built is a really bad implementation. I have been doing a lot of digging over the last 2 weeks, and SOLR out of the box does not offer exactly what I need, so I decided to go custom (extended). 

I do ecommerce, when you type in a search in the website, we want to execute the search against available sku’s (i.e.. small red shirt, dark 34x34 jeans) then we want to perform a rollup (or join) to those sku’s parent products (shirt, jeans). We don’t want to return the item documents at all, but we do want the facets to represent the child document set, not the final parent document set; for example, if your viewing a “shirt” category, we don’t want to show the “Green” facet if none of the shirts your viewing have green available or were part of the original search (maybe you a shopper searched "blue shirts").

It is also important to note we are on SOLR 4.3.0 and don’t have parent/child support yet; upgrading to 4.10.2 could be tricky since our SOLR instance is embedded within another piece of software.

I can’t use grouping to do this solution which was suggested to me in the past because grouping will throw off our pagination, and I also won’t be able to use grouping for other things. For example, when viewing a “shirt” category, we want to group the final product set by another element such as a shirt “type” or sub category (t-shirt, sweater, “tank tops” etc….)

To support our needs, I have built a small POC over the weekend (and thanks to everyone for putting up with all my random emails as I spent time learning SOLR/Lucene and getting my head wrapped around the internals). The POC involves a custom query parser, query, search component and document transformer, and I uploaded all my code to github this morning (https://github.com/damos/SolrRollupQuery/blob/master/src/com/vast/solr/rollup/ <https://github.com/damos/SolrRollupQuery/blob/master/src/com/vast/solr/rollup/>), it still needs tuning and is definitely not finished, but I have mostly based this code off of other elements such as the join query, dismax query parser, and some of the 4.10.2 parent/child code.

I want my customer to be able to build a SOLR request like the following:
select?q={!rollup from=parent to=id}name:(*Shirt*)&cfq=color:green&cfq=size:small&facet=true&facet.field=color&facet.field=size&fl=field1,field2,field3,[ru fl=childField1,childField2]child_*

Where:
q= A query against item documents that rolls up into parent documents
cfq= (child filter query) a query that will filter the child document set before the rollup happens. This will allow you to still be able to use the fq parameter to filter the parent document set after the fact.
[ru]=a document transformer that will bring fields from the child documents into the parent documents dynamic field child_*


The implementation includes 4 main components:

1) New “rollup" Query Parser: Parses the incoming request, builds the rollup query and puts the query into the request context
2) New “rollup” Query: I modelled this after the code in JoinUti, the constructor executes the primary query with a custom collector that collects the terms scores and also collects the entire child docset. This query makes the child docset available externally in an accessor method.
3) An extended QueryComponent that checks the request context for the rollup query, if it exists, it overrides the rb.getResult().docset with the child docset so the facets are built off the children, not the parents. (This part feels very clumsy but I have reasons for not completely overriding QueryComponent.process())
4) A custom doc transformer that adds child fields to a parent dynamic field. I DON’T want to return fields for all children, only the ones that were in the main query, so this also needs the results of the rollup queries child docset.


Sorry to send such a long email, I want to contribute this discussion because I am pretty sure we are not the first people in ecommerce to have a similar use case that is very specific like this. If you have taken some time to read this, thank you very much for your time, it is very much appreciated.

Cheers!

Darin


PS:

The following query:
http://localhost:8983/solr/testcore/select?q={!rollup%20from=parentSku%20to=sku}name:(*Awesome*)&facet=true&facet.mincount=1&facet.field=size&fl=id,sku,name,[ru%20fl=name,sku]child_*&cfq=size:small <http://localhost:8983/solr/testcore/select?q=%7B!rollup%20from=parentSku%20to=sku%7Dname:(*Awesome*)&facet=true&facet.mincount=1&facet.field=size&fl=id,sku,name,%5Bru%20fl=name,sku%5Dchild_*&cfq=size:small>

Returns the following results:
<result name="response" numFound="2" start="0">
<doc>
<str name="id">0001</str>
<str name="sku">shirt-0001</str>
<str name="name">Awesome Shirt</str>
<int name="child_count">1</int>
<str name="child_name">Small Awesome Shirt</str>
<str name="child_sku">shirt-0001-01</str>
</doc>
<doc>
<str name="id">0002</str>
<str name="sku">jeans-0001</str>
<str name="name">Awesome Jeans</str>
<int name="child_count">1</int>
<str name="child_name">Small Awesome Jeans</str>
<str name="child_sku">jeans-0001-01</str>
</doc>
</result>