You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Vijay Bhoomireddy <vi...@whishworks.com> on 2015/03/29 22:04:25 UTC
Structured and Unstructured data indexing in SolrCloud
Hi,
We have a requirement where both structured and unstructured data comes into
the system. We need to index both of them and then enable search
functionality on it. We are using SolrCloud on Hadoop platform. For
structured data, we are planning to put the data into HBase and for
unstructured, directly into HDFS.
My question is how to index these sources under a single Solr core? Would
that be possible to index both structured and unstructured data under a
single core/collection in SolrCloud and then enable search functionality
over that index?
Thanks in advance.
--
The contents of this e-mail are confidential and for the exclusive use of
the intended recipient. If you receive this e-mail in error please delete
it from your system immediately and notify us either by e-mail or
telephone. You should not copy, forward or otherwise disclose the content
of the e-mail. The views expressed in this communication may not
necessarily be the view held by WHISHWORKS.
RE: Structured and Unstructured data indexing in SolrCloud
Posted by "Reitzel, Charles" <Ch...@tiaa-cref.org>.
Hi Vijay,
The short answer is yes, you can combine almost anything you want into a single collection. But, in addition to working out your queries, you might want work out your data life cycle.
In our application, we have comingled the structured and unstructured documents into a single collection for initial development purposes. The only field they have in common is the unique ID. Works fine.
In production, however, we see things like query rates, access controls, load balancing, availability, shard keys, overall document counts, update frequency, etc. will drive us to use separate collections. For us, the deciding factor is less about "structured vs. unstructured" and more about "public vs. private". We have developed our app so that splitting the collection will have minimal impact by executing separate queries, in parallel, at runtime.
Of course, your application is different. YMMV, etc.
hth,
Charlie
-----Original Message-----
From: Jack Krupansky [mailto:jack.krupansky@gmail.com]
Sent: Sunday, March 29, 2015 4:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Structured and Unstructured data indexing in SolrCloud
The first step is to work out the queries that you wish to perform - that will determine how the data should be organized in the Solr schema.
-- Jack Krupansky
On Sun, Mar 29, 2015 at 4:04 PM, Vijay Bhoomireddy < vijaya.bhoomireddy@whishworks.com> wrote:
> Hi,
>
>
>
> We have a requirement where both structured and unstructured data
> comes into the system. We need to index both of them and then enable
> search functionality on it. We are using SolrCloud on Hadoop platform.
> For structured data, we are planning to put the data into HBase and
> for unstructured, directly into HDFS.
>
>
>
> My question is how to index these sources under a single Solr core?
> Would that be possible to index both structured and unstructured data
> under a single core/collection in SolrCloud and then enable search
> functionality over that index?
>
>
>
> Thanks in advance.
>
>
> --
> The contents of this e-mail are confidential and for the exclusive use
> of the intended recipient. If you receive this e-mail in error please
> delete it from your system immediately and notify us either by e-mail
> or telephone. You should not copy, forward or otherwise disclose the
> content of the e-mail. The views expressed in this communication may
> not necessarily be the view held by WHISHWORKS.
>
*************************************************************************
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and then delete it.
TIAA-CREF
*************************************************************************
Re: Structured and Unstructured data indexing in SolrCloud
Posted by Jack Krupansky <ja...@gmail.com>.
The first step is to work out the queries that you wish to perform - that
will determine how the data should be organized in the Solr schema.
-- Jack Krupansky
On Sun, Mar 29, 2015 at 4:04 PM, Vijay Bhoomireddy <
vijaya.bhoomireddy@whishworks.com> wrote:
> Hi,
>
>
>
> We have a requirement where both structured and unstructured data comes
> into
> the system. We need to index both of them and then enable search
> functionality on it. We are using SolrCloud on Hadoop platform. For
> structured data, we are planning to put the data into HBase and for
> unstructured, directly into HDFS.
>
>
>
> My question is how to index these sources under a single Solr core? Would
> that be possible to index both structured and unstructured data under a
> single core/collection in SolrCloud and then enable search functionality
> over that index?
>
>
>
> Thanks in advance.
>
>
> --
> The contents of this e-mail are confidential and for the exclusive use of
> the intended recipient. If you receive this e-mail in error please delete
> it from your system immediately and notify us either by e-mail or
> telephone. You should not copy, forward or otherwise disclose the content
> of the e-mail. The views expressed in this communication may not
> necessarily be the view held by WHISHWORKS.
>