You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Vijay Bhoomireddy <vi...@whishworks.com> on 2015/03/29 22:04:25 UTC

Structured and Unstructured data indexing in SolrCloud

Hi,

 

We have a requirement where both structured and unstructured data comes into
the system. We need to index both of them and then enable search
functionality on it. We are using SolrCloud on Hadoop platform. For
structured data, we are planning to put the data into HBase and for
unstructured, directly into HDFS.

 

My question is how to index these sources under a single Solr core? Would
that be possible to index both structured and unstructured data under a
single core/collection in SolrCloud and then enable search functionality
over that index?

 

Thanks in advance.


-- 
The contents of this e-mail are confidential and for the exclusive use of 
the intended recipient. If you receive this e-mail in error please delete 
it from your system immediately and notify us either by e-mail or 
telephone. You should not copy, forward or otherwise disclose the content 
of the e-mail. The views expressed in this communication may not 
necessarily be the view held by WHISHWORKS.

RE: Structured and Unstructured data indexing in SolrCloud

Posted by "Reitzel, Charles" <Ch...@tiaa-cref.org>.
Hi Vijay, 

The short answer is yes, you can combine almost anything you want into a single collection.   But, in addition to working out your queries, you might want work out your data life cycle.

In our application, we have comingled the structured and unstructured documents into a single collection for initial development purposes.   The only field they have in common is the unique ID.    Works fine.

In production, however, we see things like query rates, access controls, load balancing, availability, shard keys, overall document counts, update frequency, etc. will drive us to use separate collections.  For us, the deciding factor is less about "structured vs. unstructured" and more about "public vs. private".   We have developed our app so that splitting the collection will have minimal impact by executing separate queries, in parallel, at runtime.   

Of course, your application is different.  YMMV, etc.

hth,
Charlie


-----Original Message-----
From: Jack Krupansky [mailto:jack.krupansky@gmail.com] 
Sent: Sunday, March 29, 2015 4:26 PM
To: solr-user@lucene.apache.org
Subject: Re: Structured and Unstructured data indexing in SolrCloud

The first step is to work out the queries that you wish to perform - that will determine how the data should be organized in the Solr schema.

-- Jack Krupansky

On Sun, Mar 29, 2015 at 4:04 PM, Vijay Bhoomireddy < vijaya.bhoomireddy@whishworks.com> wrote:

> Hi,
>
>
>
> We have a requirement where both structured and unstructured data 
> comes into the system. We need to index both of them and then enable 
> search functionality on it. We are using SolrCloud on Hadoop platform. 
> For structured data, we are planning to put the data into HBase and 
> for unstructured, directly into HDFS.
>
>
>
> My question is how to index these sources under a single Solr core? 
> Would that be possible to index both structured and unstructured data 
> under a single core/collection in SolrCloud and then enable search 
> functionality over that index?
>
>
>
> Thanks in advance.
>
>
> --
> The contents of this e-mail are confidential and for the exclusive use 
> of the intended recipient. If you receive this e-mail in error please 
> delete it from your system immediately and notify us either by e-mail 
> or telephone. You should not copy, forward or otherwise disclose the 
> content of the e-mail. The views expressed in this communication may 
> not necessarily be the view held by WHISHWORKS.
>

*************************************************************************
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and then delete it.

TIAA-CREF
*************************************************************************

Re: Structured and Unstructured data indexing in SolrCloud

Posted by Jack Krupansky <ja...@gmail.com>.
The first step is to work out the queries that you wish to perform - that
will determine how the data should be organized in the Solr schema.

-- Jack Krupansky

On Sun, Mar 29, 2015 at 4:04 PM, Vijay Bhoomireddy <
vijaya.bhoomireddy@whishworks.com> wrote:

> Hi,
>
>
>
> We have a requirement where both structured and unstructured data comes
> into
> the system. We need to index both of them and then enable search
> functionality on it. We are using SolrCloud on Hadoop platform. For
> structured data, we are planning to put the data into HBase and for
> unstructured, directly into HDFS.
>
>
>
> My question is how to index these sources under a single Solr core? Would
> that be possible to index both structured and unstructured data under a
> single core/collection in SolrCloud and then enable search functionality
> over that index?
>
>
>
> Thanks in advance.
>
>
> --
> The contents of this e-mail are confidential and for the exclusive use of
> the intended recipient. If you receive this e-mail in error please delete
> it from your system immediately and notify us either by e-mail or
> telephone. You should not copy, forward or otherwise disclose the content
> of the e-mail. The views expressed in this communication may not
> necessarily be the view held by WHISHWORKS.
>