You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Yavar Husain <ya...@gmail.com> on 2014/07/21 14:37:46 UTC

Solr Cassandra MySQL Best Practice Indexing

So my full text data lies on Cassandra along with an ID. Now I have a lot
of structured data linked to the ID which lies on an RDBMS (read MySQL). I
need this structured data as it would help me with my faceting and other
needs. What is the best practice in going about indexing in this scenario.
My thoughts (maybe weird):

1. Read the data from Cassandra, for each ID read, read the corresponding
row from MySQL for that ID, form an XML on the fly (for each ID) and send
it to Solr for Indexing without storing anything.
2. I do not have much idea on Solandra. However even if I use it I will
have to go to MySQL for fetching the structured data.
3. Duplicate the data and either get all of Cassandra to MySQL or vice
versa but then data duplication would happen.

I will think about incremental indexing for the new records later.

Bit confused. Any help would be appreciated.

Re: Solr Cassandra MySQL Best Practice Indexing

Posted by Yavar Husain <ya...@gmail.com>.

Exactly. Thanks a lot Jack. +1 for "Your best bet is to get that RDBMS data
moved to Cassandra or DSE ASAP."


On Tue, Jul 22, 2014 at 5:15 PM, Jack Krupansky <ja...@basetechnology.com>
wrote:

> I don't think the Solr Data Import Handler has a Cassandra plugin (entity
> processor) yet, so the most straight forward approach is to write a Java
> app that reads from Cassandra, then reads the corresponding RDBMS data,
> combines the data, and then uses SolrJ to add documents to Solr.
>
> Your best bet is to get that RDBMS data moved to Cassandra or DSE ASAP.
> All you have until then is a stopgap measure rather than a robust
> architecture.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Yavar Husain
> Sent: Tuesday, July 22, 2014 2:22 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Cassandra MySQL Best Practice Indexing
>
>
> Thanks Jack for your guidance on DSE. However it would be great if somebody
> could help me solving my use case:
>
> So my full text data lies on Cassandra along with an ID. Now I have a lot
> of structured data linked to the ID which lies on an RDBMS (read MySQL). I
> need this structured data as it would help me with my faceting and other
> needs. What is the best practice in going about indexing in this scenario.
>
> I will think about incremental indexing for the new records later.
>
> Bit confused. Any help would be appreciated.
>
>
> On Mon, Jul 21, 2014 at 6:51 PM, Jack Krupansky <ja...@basetechnology.com>
> wrote:
>
>  Solandra is not a supported product. DataStax Enterprise (DSE) supersedes
>> it. With DSE, just load your data into a Solr-enabled Cassandra data
>> center
>> and it will be indexed automatically in the embedded Solr within DSE, as
>> per a Solr schema that you provide. Then use any of the nodes in that
>> Solr-enabled Cassandra data center just the same as with normal Solr.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Yavar Husain
>> Sent: Monday, July 21, 2014 8:37 AM
>> To: solr-user@lucene.apache.org
>> Subject: Solr Cassandra MySQL Best Practice Indexing
>>
>>
>> So my full text data lies on Cassandra along with an ID. Now I have a lot
>> of structured data linked to the ID which lies on an RDBMS (read MySQL). I
>> need this structured data as it would help me with my faceting and other
>> needs. What is the best practice in going about indexing in this scenario.
>> My thoughts (maybe weird):
>>
>> 1. Read the data from Cassandra, for each ID read, read the corresponding
>> row from MySQL for that ID, form an XML on the fly (for each ID) and send
>> it to Solr for Indexing without storing anything.
>> 2. I do not have much idea on Solandra. However even if I use it I will
>> have to go to MySQL for fetching the structured data.
>> 3. Duplicate the data and either get all of Cassandra to MySQL or vice
>> versa but then data duplication would happen.
>>
>> I will think about incremental indexing for the new records later.
>>
>> Bit confused. Any help would be appreciated.
>>
>>
>

Re: Solr Cassandra MySQL Best Practice Indexing

Posted by Jack Krupansky <ja...@basetechnology.com>.

I don't think the Solr Data Import Handler has a Cassandra plugin (entity 
processor) yet, so the most straight forward approach is to write a Java app 
that reads from Cassandra, then reads the corresponding RDBMS data, combines 
the data, and then uses SolrJ to add documents to Solr.

Your best bet is to get that RDBMS data moved to Cassandra or DSE ASAP. All 
you have until then is a stopgap measure rather than a robust architecture.

-- Jack Krupansky

-----Original Message----- 
From: Yavar Husain
Sent: Tuesday, July 22, 2014 2:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr Cassandra MySQL Best Practice Indexing

Thanks Jack for your guidance on DSE. However it would be great if somebody
could help me solving my use case:

So my full text data lies on Cassandra along with an ID. Now I have a lot
of structured data linked to the ID which lies on an RDBMS (read MySQL). I
need this structured data as it would help me with my faceting and other
needs. What is the best practice in going about indexing in this scenario.

I will think about incremental indexing for the new records later.

Bit confused. Any help would be appreciated.

On Mon, Jul 21, 2014 at 6:51 PM, Jack Krupansky <ja...@basetechnology.com>
wrote:

> Solandra is not a supported product. DataStax Enterprise (DSE) supersedes
> it. With DSE, just load your data into a Solr-enabled Cassandra data 
> center
> and it will be indexed automatically in the embedded Solr within DSE, as
> per a Solr schema that you provide. Then use any of the nodes in that
> Solr-enabled Cassandra data center just the same as with normal Solr.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Yavar Husain
> Sent: Monday, July 21, 2014 8:37 AM
> To: solr-user@lucene.apache.org
> Subject: Solr Cassandra MySQL Best Practice Indexing
>
>
> So my full text data lies on Cassandra along with an ID. Now I have a lot
> of structured data linked to the ID which lies on an RDBMS (read MySQL). I
> need this structured data as it would help me with my faceting and other
> needs. What is the best practice in going about indexing in this scenario.
> My thoughts (maybe weird):
>
> 1. Read the data from Cassandra, for each ID read, read the corresponding
> row from MySQL for that ID, form an XML on the fly (for each ID) and send
> it to Solr for Indexing without storing anything.
> 2. I do not have much idea on Solandra. However even if I use it I will
> have to go to MySQL for fetching the structured data.
> 3. Duplicate the data and either get all of Cassandra to MySQL or vice
> versa but then data duplication would happen.
>
> I will think about incremental indexing for the new records later.
>
> Bit confused. Any help would be appreciated.
>

Re: Solr Cassandra MySQL Best Practice Indexing

Posted by Yavar Husain <ya...@gmail.com>.

Thanks Jack for your guidance on DSE. However it would be great if somebody
could help me solving my use case:

So my full text data lies on Cassandra along with an ID. Now I have a lot
of structured data linked to the ID which lies on an RDBMS (read MySQL). I
need this structured data as it would help me with my faceting and other
needs. What is the best practice in going about indexing in this scenario.

I will think about incremental indexing for the new records later.

Bit confused. Any help would be appreciated.


On Mon, Jul 21, 2014 at 6:51 PM, Jack Krupansky <ja...@basetechnology.com>
wrote:

> Solandra is not a supported product. DataStax Enterprise (DSE) supersedes
> it. With DSE, just load your data into a Solr-enabled Cassandra data center
> and it will be indexed automatically in the embedded Solr within DSE, as
> per a Solr schema that you provide. Then use any of the nodes in that
> Solr-enabled Cassandra data center just the same as with normal Solr.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Yavar Husain
> Sent: Monday, July 21, 2014 8:37 AM
> To: solr-user@lucene.apache.org
> Subject: Solr Cassandra MySQL Best Practice Indexing
>
>
> So my full text data lies on Cassandra along with an ID. Now I have a lot
> of structured data linked to the ID which lies on an RDBMS (read MySQL). I
> need this structured data as it would help me with my faceting and other
> needs. What is the best practice in going about indexing in this scenario.
> My thoughts (maybe weird):
>
> 1. Read the data from Cassandra, for each ID read, read the corresponding
> row from MySQL for that ID, form an XML on the fly (for each ID) and send
> it to Solr for Indexing without storing anything.
> 2. I do not have much idea on Solandra. However even if I use it I will
> have to go to MySQL for fetching the structured data.
> 3. Duplicate the data and either get all of Cassandra to MySQL or vice
> versa but then data duplication would happen.
>
> I will think about incremental indexing for the new records later.
>
> Bit confused. Any help would be appreciated.
>

Re: Solr Cassandra MySQL Best Practice Indexing

Posted by Jack Krupansky <ja...@basetechnology.com>.

Solandra is not a supported product. DataStax Enterprise (DSE) supersedes 
it. With DSE, just load your data into a Solr-enabled Cassandra data center 
and it will be indexed automatically in the embedded Solr within DSE, as per 
a Solr schema that you provide. Then use any of the nodes in that 
Solr-enabled Cassandra data center just the same as with normal Solr.

-- Jack Krupansky

-----Original Message----- 
From: Yavar Husain
Sent: Monday, July 21, 2014 8:37 AM
To: solr-user@lucene.apache.org
Subject: Solr Cassandra MySQL Best Practice Indexing

So my full text data lies on Cassandra along with an ID. Now I have a lot
of structured data linked to the ID which lies on an RDBMS (read MySQL). I
need this structured data as it would help me with my faceting and other
needs. What is the best practice in going about indexing in this scenario.
My thoughts (maybe weird):

1. Read the data from Cassandra, for each ID read, read the corresponding
row from MySQL for that ID, form an XML on the fly (for each ID) and send
it to Solr for Indexing without storing anything.
2. I do not have much idea on Solandra. However even if I use it I will
have to go to MySQL for fetching the structured data.
3. Duplicate the data and either get all of Cassandra to MySQL or vice
versa but then data duplication would happen.

I will think about incremental indexing for the new records later.

Bit confused. Any help would be appreciated.