You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Vlad Beznosov <vb...@ritchiebros.com.INVALID> on 2019/08/14 21:43:11 UTC

Multiple documents with different fields in the same collection

Hello SOLR Users.

I am new to SOLR, so please forgive me if something in this email will not make sense to some of you.

Here is the problem I am trying to solve:

We have a collection of documents A that has corresponding configuration set with schema.xml file in it.
We need to add another core to that collection, which will contain a document of type B with fields mostly different from document of type A except for the field "key", which is also present in document of type A.
Indexing of these two cores should be done independently: core A stores dynamic data while data in core B is largely static.
But for search purposes these two cores should be combined to produce result based on criteria built from the fields from both cores.

I have found a post that suggests creating a separate schema that will unite the two documents: https://stackoverflow.com/questions/19313910/query-multiple-collections-with-different-fields-in-solr

So far so good, but now I am trying to figure out how to put it all together: Can I define three different document schemas in the same schema.xml (and if yes, how that can be done), or should I create separate schema.xml files for each document (and if yes, where should they be placed).

Ideally it would be nice to have this configured within the same collection to make this transparent for the search.

Any help would be greatly appreciated.

Thank you,
Vlad.

Re: Multiple documents with different fields in the same collection

Posted by Erick Erickson <er...@gmail.com>.
Vlad:

“different schemas in one schema” is really just multiple <field> and <fieldType> definitions. There’s no requirement at all that two Solr documents share any fields (well, except if you have ‘required=“true”’ set for a field, for instance the “id” field).

I’d also include a “doc_type” field set to A or B to allow you to restrict searches to a single type if that’s desirable by adding “fq=field_type:A” for instance.

Then it’s just a matter of convention. For searching docs of type A, you search in fieldA1, fieldA2 etc. For type B fieldB1, fieldB2….. You can set up different request handlers in solrconfig.xml (e.g. like the “select” or “query” handlers) if you’d like different defaults for the two.

Unused fields add very, very little to search times due to how the inverted index is structured. When you get to 100s of fields, it might be noticeable with careful measurements. I know of systems with over 1,000 fields that perform acceptably though.

You have to define what “…these two cores should be combined to produce….” means though. Unless the documents share some fields, you’d get disjoint sets of documents back. "q=fieldA1:val1 AND fieldB1:val2” would produce no documents at all if no documents had both fields….

This sounds like stand-alone Solr if you’re talking about “cores”. Is that a conscious choice? SolrCloud gives you HA/DR even with single-shard collections….

Finally, you say one type of doc is relatively static and one more dynamic. How dynamic? The one drawback in the above is that if your commit rate is quite high, the static portions of your index won’t be cached as usefully as they would if they were, indeed, in separate cores (collections in SolrCloud).

Best,
Erick



> On Aug 14, 2019, at 5:43 PM, Vlad Beznosov <vb...@ritchiebros.com.INVALID> wrote:
> 
> Hello SOLR Users.
> 
> I am new to SOLR, so please forgive me if something in this email will not make sense to some of you.
> 
> Here is the problem I am trying to solve:
> 
> We have a collection of documents A that has corresponding configuration set with schema.xml file in it.
> We need to add another core to that collection, which will contain a document of type B with fields mostly different from document of type A except for the field "key", which is also present in document of type A.
> Indexing of these two cores should be done independently: core A stores dynamic data while data in core B is largely static.
> But for search purposes these two cores should be combined to produce result based on criteria built from the fields from both cores.
> 
> I have found a post that suggests creating a separate schema that will unite the two documents: https://stackoverflow.com/questions/19313910/query-multiple-collections-with-different-fields-in-solr
> 
> So far so good, but now I am trying to figure out how to put it all together: Can I define three different document schemas in the same schema.xml (and if yes, how that can be done), or should I create separate schema.xml files for each document (and if yes, where should they be placed).
> 
> Ideally it would be nice to have this configured within the same collection to make this transparent for the search.
> 
> Any help would be greatly appreciated.
> 
> Thank you,
> Vlad.