You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Jeff Schmidt <ja...@535consulting.com> on 2011/12/02 05:47:20 UTC

Possible to facet across two indices, or document types in single index?

Hello:

I'm trying to relate together two different types of documents. Currently I have 'node' documents that reside in one index (core), and 'product mapping' documents that are in another index. The product mapping index is used to map tenant products to nodes. The nodes are canonical content that gets updated every quarter, where as the product mappings can change at any time.

I put them in two indexes because (1) canonical content changes rarely, and I don't want product mapping changes to affect it (commit, re-open searchers etc.), and I would like to support multiple tenants mapping products to the same canonical content to avoid duplication (a few GB).

This arrange has worked well thus far, but only in the sense that for each node result returned, I can query the product mapping index to determine the products mapped to the node. I combine this information within my application and return it to the client. This works okay in that there are only 5-20 results returned per page (start, rows). But now I'm being asked to facet the product catagories (multi-valued field within a product mapping document) along with other facets defined in the canonical content.

Can this be done with Solr 3.5.0? I've been looking into sub-queries, function queries etc. Also, I've seen various postings indicating that one needs to denormalize more. I don't want to add product information as fields to the canonical content. Not only does that defeat my objective (1) above, but Solr does not support incremental updates of document fields.

So, one approach is to issue by query to the canonical index and get all of the document IDs (could be 1000s), and then issue a filter query to the product mapping index with all of these IDs and have Solr facet the product categories. Is that efficient? I suppose I could use HTTP POST (via SolrJ) to convey that payload of IDs? I could then take the facet results of that query and combine them with the canonical index results and return them to the client.

That may be do-able, but then let's say the user clicks on a product category facet value to narrow the node results to only those mapped to category XYZ. This will not affect the query issued against the canonical content index. Instead, I think I'd have to go through the canonical results and eliminate the nodes that are not associated with product category XYZ. Then, if the current page of results is inadequate (rows=10, but 3 nodes were eliminated), I'd have to go back to the canonical index to get more rows, eliminate some some again perhaps, get more etc. That sounds unappealing and low performing.

Is there a Solr way to do this? My Packt "Apache Solr 3 Enterprise Search Server" book (page 34) states regarding separate indices:

"If you do develop separate schemas and if you need to search across your indices in one search then you must perform a distributed search, described in the last chapter. A distributed search is usually a feature employed for a large corpus but it applies here too."

But in the chapter it goes on to talk about dealing with sharding, replication etc. to support a large corpus, not necessarily tying together two different indexes.

Is it possible to accomplish my goal in a less ugly way than I outlined above? Since we only have a single tenant to worry about, I could use a combined index at least for a few months (separate fields per document type, IDs are unique among then all) if that makes a difference.

Thanks!

Jeff
--
Jeff Schmidt
535 Consulting
jas@535consulting.com
http://www.535consulting.com
(650) 423-1068