You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Olson, Ron" <RO...@lbpc.com> on 2010/10/19 22:57:43 UTC

Documents and Cores, take 2

Hi all-

I have a newbie design question about documents, especially with SQL databases. I am trying to set up Solr to go against a database that, for example, has "items" and "people". The way I see it, and I don't know if this is right or not (thus the question), is that I see both as separate documents as an item may contain a list of parts, which the user may want to search, and, as part of the "item", view the list of people who have ordered the item.

Then there's the actual "people", who the user might want to search to find a name and, consequently, what items they ordered. To me they are both "top level" things, with some overlap of fields. If I'm searching for "people", I'm likely not going to be interested in the parts of the item, while if I'm searching for "items" the likelihood is that I may want to search for "42532" which is, in this instance, a SKU, and not get hits on the zip code section of the "people".

Does it make sense, then, to separate these two out as separate documents? I believe so because the documentation I've read suggests that a document should be analogous to a row in a table (in this case, very de-normalized). What is tripping me up is, as far as I can tell, you can have only one document type per index, and thus one document per core. So in this example, I have two cores, "items" and "people". Is this correct? Should I embrace the idea of having many cores or am I supposed to have a single, unified index with all documents (which doesn't seem like Solr supports).

The ultimate question comes down to the search interface. I don't necessarily want to have the user explicitly state which document they want to search; I'd like them to simply type "42532" and get documents from both cores, and then possibly allow for filtering results after the fact, not before. As I've only used the admin site so far (which is core-specific), does the client API allow for unified searching across all cores? Assuming it does, I'd think my idea of multiple-documents is okay, but I'd love to hear from people who actually know what they're doing. :)

Thanks,

Ron

BTW: Sorry about the problem with the previous message; I didn't know about thread hijacking.

DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is  unauthorized and strictly prohibited.  If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company.
Thank you.

Re: Documents and Cores, take 2

Posted by Ken Stanley <do...@gmail.com>.
Ron,

In the past I've worked with SOLR for a product that required the ability to
search - separately - for companies, people, business lists, and a
combination of the previous three. In designing this in SOLR, I found that
using a combination of explicit field definitions and dynamic fields (
http://wiki.apache.org/solr/SchemaXml#Dynamic_fields) gave me the best
possible solution for the problem.

In essence, I created explicit fields that would be shared among all
document "types": a unique id, a document type, an indexed date, a modified
date, and maybe a couple of other fields that share traits with all document
types (i.e., name, a "market" specific to our business, etc). The unique id
was built as a string, and was prefixed with the document type, and it ended
with the unique id from the database.

The dynamic fields can be configured to be as flexible as you need, and in
my experience I would strongly recommend documenting each type of dynamic
field for each of your document types as a reference for your developers
(and yourself). :)

This allows us to build queries that can be focused on specific document
types, or combining all of the types into a "super" search. For example, you
could something to the effect of: (docType: people) AND (df_firstName:John
AND df_lastName:Hancock), (docType:companies) AND
(df_BusinessName:Acme+Inc), or even ((df_firstName:John AND
df_lastName:Hancock) OR (df_BusinessName:Acme+Inc)).

I hope this helps!

- Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
                -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"


On Tue, Oct 19, 2010 at 4:57 PM, Olson, Ron <RO...@lbpc.com> wrote:

> Hi all-
>
> I have a newbie design question about documents, especially with SQL
> databases. I am trying to set up Solr to go against a database that, for
> example, has "items" and "people". The way I see it, and I don't know if
> this is right or not (thus the question), is that I see both as separate
> documents as an item may contain a list of parts, which the user may want to
> search, and, as part of the "item", view the list of people who have ordered
> the item.
>
> Then there's the actual "people", who the user might want to search to find
> a name and, consequently, what items they ordered. To me they are both "top
> level" things, with some overlap of fields. If I'm searching for "people",
> I'm likely not going to be interested in the parts of the item, while if I'm
> searching for "items" the likelihood is that I may want to search for
> "42532" which is, in this instance, a SKU, and not get hits on the zip code
> section of the "people".
>
> Does it make sense, then, to separate these two out as separate documents?
> I believe so because the documentation I've read suggests that a document
> should be analogous to a row in a table (in this case, very de-normalized).
> What is tripping me up is, as far as I can tell, you can have only one
> document type per index, and thus one document per core. So in this example,
> I have two cores, "items" and "people". Is this correct? Should I embrace
> the idea of having many cores or am I supposed to have a single, unified
> index with all documents (which doesn't seem like Solr supports).
>
> The ultimate question comes down to the search interface. I don't
> necessarily want to have the user explicitly state which document they want
> to search; I'd like them to simply type "42532" and get documents from both
> cores, and then possibly allow for filtering results after the fact, not
> before. As I've only used the admin site so far (which is core-specific),
> does the client API allow for unified searching across all cores? Assuming
> it does, I'd think my idea of multiple-documents is okay, but I'd love to
> hear from people who actually know what they're doing. :)
>
> Thanks,
>
> Ron
>
> BTW: Sorry about the problem with the previous message; I didn't know about
> thread hijacking.
>
> DISCLAIMER: This electronic message, including any attachments, files or
> documents, is intended only for the addressee and may contain CONFIDENTIAL,
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended
> recipient, you are hereby notified that any use, disclosure, copying or
> distribution of this message or any of the information included in or with
> it is  unauthorized and strictly prohibited.  If you have received this
> message in error, please notify the sender immediately by reply e-mail and
> permanently delete and destroy this message and its attachments, along with
> any copies thereof. This message does not create any contractual obligation
> on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.
>