You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Bhuvan Rawal <bh...@gmail.com> on 2016/09/18 15:43:06 UTC

Having secondary indices limited to analytics dc

Hi,

Is it possible to have secondary indices (SASI or native ones) defined on a
table restricted to a particular DC? For instance it is very much possible
in mysql to have a parent server on which writes are being done without any
indices (other than the required ones), and to have indices on replica
db's, this helps the parent database to be lightweight and free from
building secondary index on every write.

For analytics & auditing purposes it is essential to serve different access
patterns than that modeled from a partition key fetch perspective, although
a limited reads are needed by users but if enabled cluster wide it will
require index write for every row written on that table on every single
node on every DC even the one which may be serving read operations.

What could be the potential means to solve this problem inside of cassandra
(Not having to ship off the data into elasticsearch etc).

Best Regards,
Bhuvan

Re: Having secondary indices limited to analytics dc

Posted by Andres de la Peña <ad...@stratio.com>.

Hi,

Sratio's Lucene index takes a walk around to offer this feature. The index
can be created with a configuration option
<https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.8/doc/documentation.rst#indexing>
specifying a list of data centers to be excluded from indexing. The index
is actually created on this data centers but all the write operations are
silently ignored. It would be really nice to have something similar at
Cassandra level.

2016-09-18 21:01 GMT+01:00 Bhuvan Rawal <bh...@gmail.com>:

> Created CASSANDRA-12663
> <https://issues.apache.org/jira/browse/CASSANDRA-12663> pls feel free to
> make edits. From a birds eye view it seems a bit ineffecient to keep doing
> computations and generating data which may not be put to use. (A user may
> never read via Secondary Indices on primary transactional DC but he/she is
> currently forced to create them on every dc in cluster).
>
> On Mon, Sep 19, 2016 at 1:05 AM, Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
>
>> I don't see why having per DC indexes would be an issue, from a technical
>> standpoint.  I suggest putting in a JIRA for it, it's a good idea (if it
>> doesn't exist already).  Post back to the ML with the issue #.
>>
>> On Sun, Sep 18, 2016 at 12:26 PM Bhuvan Rawal <bh...@gmail.com>
>> wrote:
>>
>>> Can it be possible with change log feature implemented in CASSANDRA-8844
>>> <https://issues.apache.org/jira/browse/CASSANDRA-8844>?  i.e. to have
>>> two clusters (With different schema definitions for secondary indices) and
>>> segregating analytics workload on the other cluster with CDC log shipper
>>> enabled on parent DC which is taking care of transactional workload?
>>>
>>> On Sun, Sep 18, 2016 at 9:30 PM, Dorian Hoxha <do...@gmail.com>
>>> wrote:
>>>
>>>> Only way I know is in elassandra <https://github.com/vroyer/elassandra>.
>>>> You spin nodes in dc1 as elassandra (having data + indexes) and in dc2 as
>>>> cassandra (having only data).
>>>>
>>>> On Sun, Sep 18, 2016 at 5:43 PM, Bhuvan Rawal <bh...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Is it possible to have secondary indices (SASI or native ones) defined
>>>>> on a table restricted to a particular DC? For instance it is very much
>>>>> possible in mysql to have a parent server on which writes are being done
>>>>> without any indices (other than the required ones), and to have indices on
>>>>> replica db's, this helps the parent database to be lightweight and free
>>>>> from building secondary index on every write.
>>>>>
>>>>> For analytics & auditing purposes it is essential to serve different
>>>>> access patterns than that modeled from a partition key fetch perspective,
>>>>> although a limited reads are needed by users but if enabled cluster wide it
>>>>> will require index write for every row written on that table on every
>>>>> single node on every DC even the one which may be serving read operations.
>>>>>
>>>>> What could be the potential means to solve this problem inside of
>>>>> cassandra (Not having to ship off the data into elasticsearch etc).
>>>>>
>>>>> Best Regards,
>>>>> Bhuvan
>>>>>
>>>>
>>>>
>>>
>


-- 
Andrés de la Peña

Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
<https://twitter.com/StratioBD>*

Re: Having secondary indices limited to analytics dc

Posted by Bhuvan Rawal <bh...@gmail.com>.

Created CASSANDRA-12663
<https://issues.apache.org/jira/browse/CASSANDRA-12663> pls feel free to
make edits. From a birds eye view it seems a bit ineffecient to keep doing
computations and generating data which may not be put to use. (A user may
never read via Secondary Indices on primary transactional DC but he/she is
currently forced to create them on every dc in cluster).

On Mon, Sep 19, 2016 at 1:05 AM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> I don't see why having per DC indexes would be an issue, from a technical
> standpoint.  I suggest putting in a JIRA for it, it's a good idea (if it
> doesn't exist already).  Post back to the ML with the issue #.
>
> On Sun, Sep 18, 2016 at 12:26 PM Bhuvan Rawal <bh...@gmail.com> wrote:
>
>> Can it be possible with change log feature implemented in CASSANDRA-8844
>> <https://issues.apache.org/jira/browse/CASSANDRA-8844>?  i.e. to have
>> two clusters (With different schema definitions for secondary indices) and
>> segregating analytics workload on the other cluster with CDC log shipper
>> enabled on parent DC which is taking care of transactional workload?
>>
>> On Sun, Sep 18, 2016 at 9:30 PM, Dorian Hoxha <do...@gmail.com>
>> wrote:
>>
>>> Only way I know is in elassandra <https://github.com/vroyer/elassandra>.
>>> You spin nodes in dc1 as elassandra (having data + indexes) and in dc2 as
>>> cassandra (having only data).
>>>
>>> On Sun, Sep 18, 2016 at 5:43 PM, Bhuvan Rawal <bh...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Is it possible to have secondary indices (SASI or native ones) defined
>>>> on a table restricted to a particular DC? For instance it is very much
>>>> possible in mysql to have a parent server on which writes are being done
>>>> without any indices (other than the required ones), and to have indices on
>>>> replica db's, this helps the parent database to be lightweight and free
>>>> from building secondary index on every write.
>>>>
>>>> For analytics & auditing purposes it is essential to serve different
>>>> access patterns than that modeled from a partition key fetch perspective,
>>>> although a limited reads are needed by users but if enabled cluster wide it
>>>> will require index write for every row written on that table on every
>>>> single node on every DC even the one which may be serving read operations.
>>>>
>>>> What could be the potential means to solve this problem inside of
>>>> cassandra (Not having to ship off the data into elasticsearch etc).
>>>>
>>>> Best Regards,
>>>> Bhuvan
>>>>
>>>
>>>
>>

Re: Having secondary indices limited to analytics dc

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

I don't see why having per DC indexes would be an issue, from a technical
standpoint.  I suggest putting in a JIRA for it, it's a good idea (if it
doesn't exist already).  Post back to the ML with the issue #.

On Sun, Sep 18, 2016 at 12:26 PM Bhuvan Rawal <bh...@gmail.com> wrote:

> Can it be possible with change log feature implemented in CASSANDRA-8844
> <https://issues.apache.org/jira/browse/CASSANDRA-8844>?  i.e. to have two
> clusters (With different schema definitions for secondary indices) and
> segregating analytics workload on the other cluster with CDC log shipper
> enabled on parent DC which is taking care of transactional workload?
>
> On Sun, Sep 18, 2016 at 9:30 PM, Dorian Hoxha <do...@gmail.com>
> wrote:
>
>> Only way I know is in elassandra <https://github.com/vroyer/elassandra>.
>> You spin nodes in dc1 as elassandra (having data + indexes) and in dc2 as
>> cassandra (having only data).
>>
>> On Sun, Sep 18, 2016 at 5:43 PM, Bhuvan Rawal <bh...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Is it possible to have secondary indices (SASI or native ones) defined
>>> on a table restricted to a particular DC? For instance it is very much
>>> possible in mysql to have a parent server on which writes are being done
>>> without any indices (other than the required ones), and to have indices on
>>> replica db's, this helps the parent database to be lightweight and free
>>> from building secondary index on every write.
>>>
>>> For analytics & auditing purposes it is essential to serve different
>>> access patterns than that modeled from a partition key fetch perspective,
>>> although a limited reads are needed by users but if enabled cluster wide it
>>> will require index write for every row written on that table on every
>>> single node on every DC even the one which may be serving read operations.
>>>
>>> What could be the potential means to solve this problem inside of
>>> cassandra (Not having to ship off the data into elasticsearch etc).
>>>
>>> Best Regards,
>>> Bhuvan
>>>
>>
>>
>

Re: Having secondary indices limited to analytics dc

Posted by Bhuvan Rawal <bh...@gmail.com>.

Can it be possible with change log feature implemented in CASSANDRA-8844
<https://issues.apache.org/jira/browse/CASSANDRA-8844>?  i.e. to have two
clusters (With different schema definitions for secondary indices) and
segregating analytics workload on the other cluster with CDC log shipper
enabled on parent DC which is taking care of transactional workload?

On Sun, Sep 18, 2016 at 9:30 PM, Dorian Hoxha <do...@gmail.com>
wrote:

> Only way I know is in elassandra <https://github.com/vroyer/elassandra>.
> You spin nodes in dc1 as elassandra (having data + indexes) and in dc2 as
> cassandra (having only data).
>
> On Sun, Sep 18, 2016 at 5:43 PM, Bhuvan Rawal <bh...@gmail.com> wrote:
>
>> Hi,
>>
>> Is it possible to have secondary indices (SASI or native ones) defined on
>> a table restricted to a particular DC? For instance it is very much
>> possible in mysql to have a parent server on which writes are being done
>> without any indices (other than the required ones), and to have indices on
>> replica db's, this helps the parent database to be lightweight and free
>> from building secondary index on every write.
>>
>> For analytics & auditing purposes it is essential to serve different
>> access patterns than that modeled from a partition key fetch perspective,
>> although a limited reads are needed by users but if enabled cluster wide it
>> will require index write for every row written on that table on every
>> single node on every DC even the one which may be serving read operations.
>>
>> What could be the potential means to solve this problem inside of
>> cassandra (Not having to ship off the data into elasticsearch etc).
>>
>> Best Regards,
>> Bhuvan
>>
>
>

Re: Having secondary indices limited to analytics dc

Posted by Dorian Hoxha <do...@gmail.com>.

Only way I know is in elassandra <https://github.com/vroyer/elassandra>.
You spin nodes in dc1 as elassandra (having data + indexes) and in dc2 as
cassandra (having only data).

On Sun, Sep 18, 2016 at 5:43 PM, Bhuvan Rawal <bh...@gmail.com> wrote:

> Hi,
>
> Is it possible to have secondary indices (SASI or native ones) defined on
> a table restricted to a particular DC? For instance it is very much
> possible in mysql to have a parent server on which writes are being done
> without any indices (other than the required ones), and to have indices on
> replica db's, this helps the parent database to be lightweight and free
> from building secondary index on every write.
>
> For analytics & auditing purposes it is essential to serve different
> access patterns than that modeled from a partition key fetch perspective,
> although a limited reads are needed by users but if enabled cluster wide it
> will require index write for every row written on that table on every
> single node on every DC even the one which may be serving read operations.
>
> What could be the potential means to solve this problem inside of
> cassandra (Not having to ship off the data into elasticsearch etc).
>
> Best Regards,
> Bhuvan
>