You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Bhuvan Rawal <bh...@gmail.com> on 2016/01/05 12:20:53 UTC

Requesting some details for my use case

Hi All,

Im planning to shift from SQL database to a columnar nosql database, we
have streamlined our choices to Cassandra and HBase. I would really
appreciate if someone decent experience with both give me a honest
comparison on below parameters (links to neutral benchmarks/blogs also
appreciated):

1. Data Consistency (Eventual consistency allowed but define "eventual")
2. Ease of Scaling Up
3. Managebility
4. Failure Recovery options
5. Secondary Indexing
6. Data Aggregation
7. Query Language (3rd party wrapper solutions also allowed)
8. Security
9. *Commercial Support for quick solutions to issues*.
10. Run batch job on data like map reduce or some common aggregation
functions using row scan. Any other packages for cassandra to achieve this?
11. Trigger specific updates on tables used for secondary index.
12. Please consider that our DB will be the source of truth, with no
specific requirement of immediate data consistency amongst nodes.

Regards,
Bhuvan Rawal
SDE

Re: Requesting some details for my use case

Posted by Bhuvan Rawal <bh...@gmail.com>.
Hi Jack,

We are valuing reliability and consistency over performance right now. In
E-commerce industry we can expect unexpected spikes at odd times.

Ill be grateful if you tell me about reliability and failover scenarios.

On Wed, Jan 6, 2016 at 2:59 AM, Jack Krupansky <ja...@gmail.com>
wrote:

> DataStax has documented quite a few customers/case studies:
> http://www.datastax.com/resources/casestudies
>
> Materialized Views should be considered if you can go straight to 3.0, but
> you can always do the same synthesized views yourself in your app, which is
> current standard best practice anyways. MV is just a way to automate that
> best practice.
>
> The key to performance is to characterize your load requirements and then
> make sure to provision your cluster with enough nodes to support that load.
> You'll have to do a proof of concept implementation to verify your own
> requirements. Like start with a 6 or 8 node cluster for a subset of the
> data and add nodes as needed to accommodate load. The trick is to limit the
> amount of data on each node so that incoming requests can be processed as
> rapidly as possible to meet latency requirements, and then to scale up load
> capacity by adding nodes.
>
> -- Jack Krupansky
>
> On Tue, Jan 5, 2016 at 4:02 PM, Bhuvan Rawal <bh...@gmail.com> wrote:
>
>> *Thanks Jack* *for the detailed advice*.
>>
>> Yes it is a Java Application.
>>
>> We have a Denormalized view of our data already in place,  we use it for
>> storing it in MongoDB as a cache, however will get our hands dirty before
>> implementation. We would like to have a single DB view. And replace MongoDB
>> & MySQL with a single data store. If we talk numbers then we can expect 10
>> Million create/update requests a day and ~500 Million read requests.
>>
>> The question here not "should I or should I not", but "which one".
>>
>> A lot of the features you have mentioned are supported but not advisable. *(automated
>> Materialized View feature) (Triggers are supported, but not advised)
>> (Secondary indexes are supported, but not advised). *By when do you
>> believe that these will be stable enough to use for enterprise
>> implementation?
>>
>> We have made our minds clear far as shift to NoSQL is concerned as MySQL
>> is not able to serve our purpose and is currently a bottleneck in the
>> design.
>>
>>  From all the benchmarks we have analyzed for our use case, Cassandra
>> seems to be doing better as far as performance is concerned.  Our only
>> concern is to know as a Primary Database how Cassandra compares with HBase.
>> By Primary database I mean the attributes: Data Consistency, Transaction
>> Management and Rollback, brisk Failure Recovery, cross datacenter
>> replication and partition aware sharding.
>>
>> The general opinion of Cassandra is that its more of a cache, and as we
>> are going to be replacing our primary Data Store we need something fast but
>> not at the expense of reliability. Can you guide me towards a case study
>> where someone has tuned it in such a way to perform reliably for most use
>> cases.
>>
>> Also Ill be grateful if someone directs me to a repository where I can
>> find major customers of the DB's and their case studies.
>>
>> Thanks & Regards,
>> Bhuvan
>>
>> On Tue, Jan 5, 2016 at 9:56 PM, Jack Krupansky <ja...@gmail.com>
>> wrote:
>>
>>> Bear in mind that you won't be able to merely "tune" your schema - you
>>> will need to completely redesign your data model. Step one is to look at
>>> all of the queries you need to perform and get a handle on what flat,
>>> denormalized data model they will need to execute performantly in a NoSQL
>>> database. No JOINs. No ad hoc queries. Secondary indexes are supported, but
>>> not advised. The general model is that you have a "query table" for each
>>> form of query, with the primary key adapted to the needs of the query. That
>>> means a lot of denormalization and repetition of data. The new, automated
>>> Materialized View feature of Cassandra 3.0 can help with that a lot, but is
>>> a new feature and not quite stable enough for production (no DataStax
>>> Enterprise (DSE) release with 3.0 yet.) Triggers are supported, but not
>>> advised - better to do that processing at the application level. DSE also
>>> supports Hadoop and Spark for batch/analytics and Solr for search and ad
>>> hoc queries (or use Stratio or Stargate for Lucene queries.)
>>>
>>> Best to start with a basic proof of concept implementation to get your
>>> feet wet and learn the ins and outs before making a full commitment.
>>>
>>> Is this a Java app? The Java Driver is where you need to get started in
>>> terms of ingesting and querying data. It's a bit more sophisticated than
>>> just a simple JDBC interface. Most of your queries will need to be
>>> rewritten anyway even though the CQL syntax does indeed look a lot like
>>> SQL, but much of that will be because your data model will need to be made
>>> NoSQL-compatible.
>>>
>>> That should get you started.
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Tue, Jan 5, 2016 at 10:52 AM, Bhuvan Rawal <bh...@gmail.com>
>>> wrote:
>>>
>>>> I understand, Ravi,  we have our application layers well defined. The
>>>> major changes will be in database access layers and entities will be
>>>> changed. Schema will be modified to tune the efficiency of the data store
>>>> chosen.
>>>>
>>>> We have been using mongo as a cache for a long time now, but as its a
>>>> document store and since we have a crisp well defined schema we chose to go
>>>> with a columnar database.
>>>>
>>>> Our data size has been growing very rapidly. Currently it is 200GB with
>>>> indexes, in couple of years it will grow up to approx 5 TB. And we may need
>>>> to run procedures to aggregate data and update tables.
>>>>
>>>> On Tue, Jan 5, 2016 at 6:54 PM, Ravi Krishna <sr...@gmail.com>
>>>> wrote:
>>>>
>>>>> You are moving from a SQL database to C* ??? I hope you are aware of
>>>>> the differences between a nosql like C* and a RDBMS. To keep it short, the
>>>>> app has to change significantly.
>>>>>
>>>>> Please read documentation on differences between nosql and RDBMS.
>>>>>
>>>>> thanks.
>>>>>
>>>>> On Tue, Jan 5, 2016 at 6:20 AM, Bhuvan Rawal <bh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> Im planning to shift from SQL database to a columnar nosql database,
>>>>>> we have streamlined our choices to Cassandra and HBase. I would really
>>>>>> appreciate if someone decent experience with both give me a honest
>>>>>> comparison on below parameters (links to neutral benchmarks/blogs also
>>>>>> appreciated):
>>>>>>
>>>>>> 1. Data Consistency (Eventual consistency allowed but define
>>>>>> "eventual")
>>>>>> 2. Ease of Scaling Up
>>>>>> 3. Managebility
>>>>>> 4. Failure Recovery options
>>>>>> 5. Secondary Indexing
>>>>>> 6. Data Aggregation
>>>>>> 7. Query Language (3rd party wrapper solutions also allowed)
>>>>>> 8. Security
>>>>>> 9. *Commercial Support for quick solutions to issues*.
>>>>>> 10. Run batch job on data like map reduce or some common aggregation
>>>>>> functions using row scan. Any other packages for cassandra to achieve this?
>>>>>> 11. Trigger specific updates on tables used for secondary index.
>>>>>> 12. Please consider that our DB will be the source of truth, with no
>>>>>> specific requirement of immediate data consistency amongst nodes.
>>>>>>
>>>>>> Regards,
>>>>>> Bhuvan Rawal
>>>>>> SDE
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Requesting some details for my use case

Posted by Jack Krupansky <ja...@gmail.com>.
DataStax has documented quite a few customers/case studies:
http://www.datastax.com/resources/casestudies

Materialized Views should be considered if you can go straight to 3.0, but
you can always do the same synthesized views yourself in your app, which is
current standard best practice anyways. MV is just a way to automate that
best practice.

The key to performance is to characterize your load requirements and then
make sure to provision your cluster with enough nodes to support that load.
You'll have to do a proof of concept implementation to verify your own
requirements. Like start with a 6 or 8 node cluster for a subset of the
data and add nodes as needed to accommodate load. The trick is to limit the
amount of data on each node so that incoming requests can be processed as
rapidly as possible to meet latency requirements, and then to scale up load
capacity by adding nodes.

-- Jack Krupansky

On Tue, Jan 5, 2016 at 4:02 PM, Bhuvan Rawal <bh...@gmail.com> wrote:

> *Thanks Jack* *for the detailed advice*.
>
> Yes it is a Java Application.
>
> We have a Denormalized view of our data already in place,  we use it for
> storing it in MongoDB as a cache, however will get our hands dirty before
> implementation. We would like to have a single DB view. And replace MongoDB
> & MySQL with a single data store. If we talk numbers then we can expect 10
> Million create/update requests a day and ~500 Million read requests.
>
> The question here not "should I or should I not", but "which one".
>
> A lot of the features you have mentioned are supported but not advisable. *(automated
> Materialized View feature) (Triggers are supported, but not advised)
> (Secondary indexes are supported, but not advised). *By when do you
> believe that these will be stable enough to use for enterprise
> implementation?
>
> We have made our minds clear far as shift to NoSQL is concerned as MySQL
> is not able to serve our purpose and is currently a bottleneck in the
> design.
>
>  From all the benchmarks we have analyzed for our use case, Cassandra
> seems to be doing better as far as performance is concerned.  Our only
> concern is to know as a Primary Database how Cassandra compares with HBase.
> By Primary database I mean the attributes: Data Consistency, Transaction
> Management and Rollback, brisk Failure Recovery, cross datacenter
> replication and partition aware sharding.
>
> The general opinion of Cassandra is that its more of a cache, and as we
> are going to be replacing our primary Data Store we need something fast but
> not at the expense of reliability. Can you guide me towards a case study
> where someone has tuned it in such a way to perform reliably for most use
> cases.
>
> Also Ill be grateful if someone directs me to a repository where I can
> find major customers of the DB's and their case studies.
>
> Thanks & Regards,
> Bhuvan
>
> On Tue, Jan 5, 2016 at 9:56 PM, Jack Krupansky <ja...@gmail.com>
> wrote:
>
>> Bear in mind that you won't be able to merely "tune" your schema - you
>> will need to completely redesign your data model. Step one is to look at
>> all of the queries you need to perform and get a handle on what flat,
>> denormalized data model they will need to execute performantly in a NoSQL
>> database. No JOINs. No ad hoc queries. Secondary indexes are supported, but
>> not advised. The general model is that you have a "query table" for each
>> form of query, with the primary key adapted to the needs of the query. That
>> means a lot of denormalization and repetition of data. The new, automated
>> Materialized View feature of Cassandra 3.0 can help with that a lot, but is
>> a new feature and not quite stable enough for production (no DataStax
>> Enterprise (DSE) release with 3.0 yet.) Triggers are supported, but not
>> advised - better to do that processing at the application level. DSE also
>> supports Hadoop and Spark for batch/analytics and Solr for search and ad
>> hoc queries (or use Stratio or Stargate for Lucene queries.)
>>
>> Best to start with a basic proof of concept implementation to get your
>> feet wet and learn the ins and outs before making a full commitment.
>>
>> Is this a Java app? The Java Driver is where you need to get started in
>> terms of ingesting and querying data. It's a bit more sophisticated than
>> just a simple JDBC interface. Most of your queries will need to be
>> rewritten anyway even though the CQL syntax does indeed look a lot like
>> SQL, but much of that will be because your data model will need to be made
>> NoSQL-compatible.
>>
>> That should get you started.
>>
>>
>> -- Jack Krupansky
>>
>> On Tue, Jan 5, 2016 at 10:52 AM, Bhuvan Rawal <bh...@gmail.com>
>> wrote:
>>
>>> I understand, Ravi,  we have our application layers well defined. The
>>> major changes will be in database access layers and entities will be
>>> changed. Schema will be modified to tune the efficiency of the data store
>>> chosen.
>>>
>>> We have been using mongo as a cache for a long time now, but as its a
>>> document store and since we have a crisp well defined schema we chose to go
>>> with a columnar database.
>>>
>>> Our data size has been growing very rapidly. Currently it is 200GB with
>>> indexes, in couple of years it will grow up to approx 5 TB. And we may need
>>> to run procedures to aggregate data and update tables.
>>>
>>> On Tue, Jan 5, 2016 at 6:54 PM, Ravi Krishna <sr...@gmail.com>
>>> wrote:
>>>
>>>> You are moving from a SQL database to C* ??? I hope you are aware of
>>>> the differences between a nosql like C* and a RDBMS. To keep it short, the
>>>> app has to change significantly.
>>>>
>>>> Please read documentation on differences between nosql and RDBMS.
>>>>
>>>> thanks.
>>>>
>>>> On Tue, Jan 5, 2016 at 6:20 AM, Bhuvan Rawal <bh...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> Im planning to shift from SQL database to a columnar nosql database,
>>>>> we have streamlined our choices to Cassandra and HBase. I would really
>>>>> appreciate if someone decent experience with both give me a honest
>>>>> comparison on below parameters (links to neutral benchmarks/blogs also
>>>>> appreciated):
>>>>>
>>>>> 1. Data Consistency (Eventual consistency allowed but define
>>>>> "eventual")
>>>>> 2. Ease of Scaling Up
>>>>> 3. Managebility
>>>>> 4. Failure Recovery options
>>>>> 5. Secondary Indexing
>>>>> 6. Data Aggregation
>>>>> 7. Query Language (3rd party wrapper solutions also allowed)
>>>>> 8. Security
>>>>> 9. *Commercial Support for quick solutions to issues*.
>>>>> 10. Run batch job on data like map reduce or some common aggregation
>>>>> functions using row scan. Any other packages for cassandra to achieve this?
>>>>> 11. Trigger specific updates on tables used for secondary index.
>>>>> 12. Please consider that our DB will be the source of truth, with no
>>>>> specific requirement of immediate data consistency amongst nodes.
>>>>>
>>>>> Regards,
>>>>> Bhuvan Rawal
>>>>> SDE
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Requesting some details for my use case

Posted by Bhuvan Rawal <bh...@gmail.com>.
*Thanks Jack* *for the detailed advice*.

Yes it is a Java Application.

We have a Denormalized view of our data already in place,  we use it for
storing it in MongoDB as a cache, however will get our hands dirty before
implementation. We would like to have a single DB view. And replace MongoDB
& MySQL with a single data store. If we talk numbers then we can expect 10
Million create/update requests a day and ~500 Million read requests.

The question here not "should I or should I not", but "which one".

A lot of the features you have mentioned are supported but not
advisable. *(automated
Materialized View feature) (Triggers are supported, but not advised)
(Secondary indexes are supported, but not advised). *By when do you believe
that these will be stable enough to use for enterprise implementation?

We have made our minds clear far as shift to NoSQL is concerned as MySQL is
not able to serve our purpose and is currently a bottleneck in the design.

 From all the benchmarks we have analyzed for our use case, Cassandra seems
to be doing better as far as performance is concerned.  Our only concern is
to know as a Primary Database how Cassandra compares with HBase. By Primary
database I mean the attributes: Data Consistency, Transaction Management
and Rollback, brisk Failure Recovery, cross datacenter replication and
partition aware sharding.

The general opinion of Cassandra is that its more of a cache, and as we are
going to be replacing our primary Data Store we need something fast but not
at the expense of reliability. Can you guide me towards a case study where
someone has tuned it in such a way to perform reliably for most use cases.

Also Ill be grateful if someone directs me to a repository where I can find
major customers of the DB's and their case studies.

Thanks & Regards,
Bhuvan

On Tue, Jan 5, 2016 at 9:56 PM, Jack Krupansky <ja...@gmail.com>
wrote:

> Bear in mind that you won't be able to merely "tune" your schema - you
> will need to completely redesign your data model. Step one is to look at
> all of the queries you need to perform and get a handle on what flat,
> denormalized data model they will need to execute performantly in a NoSQL
> database. No JOINs. No ad hoc queries. Secondary indexes are supported, but
> not advised. The general model is that you have a "query table" for each
> form of query, with the primary key adapted to the needs of the query. That
> means a lot of denormalization and repetition of data. The new, automated
> Materialized View feature of Cassandra 3.0 can help with that a lot, but is
> a new feature and not quite stable enough for production (no DataStax
> Enterprise (DSE) release with 3.0 yet.) Triggers are supported, but not
> advised - better to do that processing at the application level. DSE also
> supports Hadoop and Spark for batch/analytics and Solr for search and ad
> hoc queries (or use Stratio or Stargate for Lucene queries.)
>
> Best to start with a basic proof of concept implementation to get your
> feet wet and learn the ins and outs before making a full commitment.
>
> Is this a Java app? The Java Driver is where you need to get started in
> terms of ingesting and querying data. It's a bit more sophisticated than
> just a simple JDBC interface. Most of your queries will need to be
> rewritten anyway even though the CQL syntax does indeed look a lot like
> SQL, but much of that will be because your data model will need to be made
> NoSQL-compatible.
>
> That should get you started.
>
>
> -- Jack Krupansky
>
> On Tue, Jan 5, 2016 at 10:52 AM, Bhuvan Rawal <bh...@gmail.com> wrote:
>
>> I understand, Ravi,  we have our application layers well defined. The
>> major changes will be in database access layers and entities will be
>> changed. Schema will be modified to tune the efficiency of the data store
>> chosen.
>>
>> We have been using mongo as a cache for a long time now, but as its a
>> document store and since we have a crisp well defined schema we chose to go
>> with a columnar database.
>>
>> Our data size has been growing very rapidly. Currently it is 200GB with
>> indexes, in couple of years it will grow up to approx 5 TB. And we may need
>> to run procedures to aggregate data and update tables.
>>
>> On Tue, Jan 5, 2016 at 6:54 PM, Ravi Krishna <sr...@gmail.com>
>> wrote:
>>
>>> You are moving from a SQL database to C* ??? I hope you are aware of the
>>> differences between a nosql like C* and a RDBMS. To keep it short, the app
>>> has to change significantly.
>>>
>>> Please read documentation on differences between nosql and RDBMS.
>>>
>>> thanks.
>>>
>>> On Tue, Jan 5, 2016 at 6:20 AM, Bhuvan Rawal <bh...@gmail.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Im planning to shift from SQL database to a columnar nosql database, we
>>>> have streamlined our choices to Cassandra and HBase. I would really
>>>> appreciate if someone decent experience with both give me a honest
>>>> comparison on below parameters (links to neutral benchmarks/blogs also
>>>> appreciated):
>>>>
>>>> 1. Data Consistency (Eventual consistency allowed but define "eventual")
>>>> 2. Ease of Scaling Up
>>>> 3. Managebility
>>>> 4. Failure Recovery options
>>>> 5. Secondary Indexing
>>>> 6. Data Aggregation
>>>> 7. Query Language (3rd party wrapper solutions also allowed)
>>>> 8. Security
>>>> 9. *Commercial Support for quick solutions to issues*.
>>>> 10. Run batch job on data like map reduce or some common aggregation
>>>> functions using row scan. Any other packages for cassandra to achieve this?
>>>> 11. Trigger specific updates on tables used for secondary index.
>>>> 12. Please consider that our DB will be the source of truth, with no
>>>> specific requirement of immediate data consistency amongst nodes.
>>>>
>>>> Regards,
>>>> Bhuvan Rawal
>>>> SDE
>>>>
>>>
>>>
>>
>

Re: Requesting some details for my use case

Posted by Jack Krupansky <ja...@gmail.com>.
Bear in mind that you won't be able to merely "tune" your schema - you will
need to completely redesign your data model. Step one is to look at all of
the queries you need to perform and get a handle on what flat, denormalized
data model they will need to execute performantly in a NoSQL database. No
JOINs. No ad hoc queries. Secondary indexes are supported, but not advised.
The general model is that you have a "query table" for each form of query,
with the primary key adapted to the needs of the query. That means a lot of
denormalization and repetition of data. The new, automated Materialized
View feature of Cassandra 3.0 can help with that a lot, but is a new
feature and not quite stable enough for production (no DataStax Enterprise
(DSE) release with 3.0 yet.) Triggers are supported, but not advised -
better to do that processing at the application level. DSE also supports
Hadoop and Spark for batch/analytics and Solr for search and ad hoc queries
(or use Stratio or Stargate for Lucene queries.)

Best to start with a basic proof of concept implementation to get your feet
wet and learn the ins and outs before making a full commitment.

Is this a Java app? The Java Driver is where you need to get started in
terms of ingesting and querying data. It's a bit more sophisticated than
just a simple JDBC interface. Most of your queries will need to be
rewritten anyway even though the CQL syntax does indeed look a lot like
SQL, but much of that will be because your data model will need to be made
NoSQL-compatible.

That should get you started.


-- Jack Krupansky

On Tue, Jan 5, 2016 at 10:52 AM, Bhuvan Rawal <bh...@gmail.com> wrote:

> I understand, Ravi,  we have our application layers well defined. The
> major changes will be in database access layers and entities will be
> changed. Schema will be modified to tune the efficiency of the data store
> chosen.
>
> We have been using mongo as a cache for a long time now, but as its a
> document store and since we have a crisp well defined schema we chose to go
> with a columnar database.
>
> Our data size has been growing very rapidly. Currently it is 200GB with
> indexes, in couple of years it will grow up to approx 5 TB. And we may need
> to run procedures to aggregate data and update tables.
>
> On Tue, Jan 5, 2016 at 6:54 PM, Ravi Krishna <sr...@gmail.com>
> wrote:
>
>> You are moving from a SQL database to C* ??? I hope you are aware of the
>> differences between a nosql like C* and a RDBMS. To keep it short, the app
>> has to change significantly.
>>
>> Please read documentation on differences between nosql and RDBMS.
>>
>> thanks.
>>
>> On Tue, Jan 5, 2016 at 6:20 AM, Bhuvan Rawal <bh...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> Im planning to shift from SQL database to a columnar nosql database, we
>>> have streamlined our choices to Cassandra and HBase. I would really
>>> appreciate if someone decent experience with both give me a honest
>>> comparison on below parameters (links to neutral benchmarks/blogs also
>>> appreciated):
>>>
>>> 1. Data Consistency (Eventual consistency allowed but define "eventual")
>>> 2. Ease of Scaling Up
>>> 3. Managebility
>>> 4. Failure Recovery options
>>> 5. Secondary Indexing
>>> 6. Data Aggregation
>>> 7. Query Language (3rd party wrapper solutions also allowed)
>>> 8. Security
>>> 9. *Commercial Support for quick solutions to issues*.
>>> 10. Run batch job on data like map reduce or some common aggregation
>>> functions using row scan. Any other packages for cassandra to achieve this?
>>> 11. Trigger specific updates on tables used for secondary index.
>>> 12. Please consider that our DB will be the source of truth, with no
>>> specific requirement of immediate data consistency amongst nodes.
>>>
>>> Regards,
>>> Bhuvan Rawal
>>> SDE
>>>
>>
>>
>

Re: Requesting some details for my use case

Posted by Bhuvan Rawal <bh...@gmail.com>.
I understand, Ravi,  we have our application layers well defined. The major
changes will be in database access layers and entities will be changed.
Schema will be modified to tune the efficiency of the data store chosen.

We have been using mongo as a cache for a long time now, but as its a
document store and since we have a crisp well defined schema we chose to go
with a columnar database.

Our data size has been growing very rapidly. Currently it is 200GB with
indexes, in couple of years it will grow up to approx 5 TB. And we may need
to run procedures to aggregate data and update tables.

On Tue, Jan 5, 2016 at 6:54 PM, Ravi Krishna <sr...@gmail.com>
wrote:

> You are moving from a SQL database to C* ??? I hope you are aware of the
> differences between a nosql like C* and a RDBMS. To keep it short, the app
> has to change significantly.
>
> Please read documentation on differences between nosql and RDBMS.
>
> thanks.
>
> On Tue, Jan 5, 2016 at 6:20 AM, Bhuvan Rawal <bh...@gmail.com> wrote:
>
>> Hi All,
>>
>> Im planning to shift from SQL database to a columnar nosql database, we
>> have streamlined our choices to Cassandra and HBase. I would really
>> appreciate if someone decent experience with both give me a honest
>> comparison on below parameters (links to neutral benchmarks/blogs also
>> appreciated):
>>
>> 1. Data Consistency (Eventual consistency allowed but define "eventual")
>> 2. Ease of Scaling Up
>> 3. Managebility
>> 4. Failure Recovery options
>> 5. Secondary Indexing
>> 6. Data Aggregation
>> 7. Query Language (3rd party wrapper solutions also allowed)
>> 8. Security
>> 9. *Commercial Support for quick solutions to issues*.
>> 10. Run batch job on data like map reduce or some common aggregation
>> functions using row scan. Any other packages for cassandra to achieve this?
>> 11. Trigger specific updates on tables used for secondary index.
>> 12. Please consider that our DB will be the source of truth, with no
>> specific requirement of immediate data consistency amongst nodes.
>>
>> Regards,
>> Bhuvan Rawal
>> SDE
>>
>
>

Re: Requesting some details for my use case

Posted by Ravi Krishna <sr...@gmail.com>.
You are moving from a SQL database to C* ??? I hope you are aware of the
differences between a nosql like C* and a RDBMS. To keep it short, the app
has to change significantly.

Please read documentation on differences between nosql and RDBMS.

thanks.

On Tue, Jan 5, 2016 at 6:20 AM, Bhuvan Rawal <bh...@gmail.com> wrote:

> Hi All,
>
> Im planning to shift from SQL database to a columnar nosql database, we
> have streamlined our choices to Cassandra and HBase. I would really
> appreciate if someone decent experience with both give me a honest
> comparison on below parameters (links to neutral benchmarks/blogs also
> appreciated):
>
> 1. Data Consistency (Eventual consistency allowed but define "eventual")
> 2. Ease of Scaling Up
> 3. Managebility
> 4. Failure Recovery options
> 5. Secondary Indexing
> 6. Data Aggregation
> 7. Query Language (3rd party wrapper solutions also allowed)
> 8. Security
> 9. *Commercial Support for quick solutions to issues*.
> 10. Run batch job on data like map reduce or some common aggregation
> functions using row scan. Any other packages for cassandra to achieve this?
> 11. Trigger specific updates on tables used for secondary index.
> 12. Please consider that our DB will be the source of truth, with no
> specific requirement of immediate data consistency amongst nodes.
>
> Regards,
> Bhuvan Rawal
> SDE
>

Re: Requesting some details for my use case

Posted by Bhuvan Rawal <bh...@gmail.com>.
Thanks for pointing out the typo Jonathan. Our use case is of Column
Family. :)

On Wed, Jan 6, 2016 at 2:38 AM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> Sorry to nitpick, but Cassandra is not a columnar database.  If you're
> looking for columnar because you have an analytics need, Cassandra is not
> what you want.  If you've just made the same mistake that 99% of people
> make, well, now you know.  Cassandra historically has been referred to as a
> "Column Family" data store, which is easily mistaken for columnar.
>
>
> On Tue, Jan 5, 2016 at 3:21 AM Bhuvan Rawal <bh...@gmail.com> wrote:
>
>> Hi All,
>>
>> Im planning to shift from SQL database to a columnar nosql database, we
>> have streamlined our choices to Cassandra and HBase. I would really
>> appreciate if someone decent experience with both give me a honest
>> comparison on below parameters (links to neutral benchmarks/blogs also
>> appreciated):
>>
>> 1. Data Consistency (Eventual consistency allowed but define "eventual")
>> 2. Ease of Scaling Up
>> 3. Managebility
>> 4. Failure Recovery options
>> 5. Secondary Indexing
>> 6. Data Aggregation
>> 7. Query Language (3rd party wrapper solutions also allowed)
>> 8. Security
>> 9. *Commercial Support for quick solutions to issues*.
>> 10. Run batch job on data like map reduce or some common aggregation
>> functions using row scan. Any other packages for cassandra to achieve this?
>> 11. Trigger specific updates on tables used for secondary index.
>> 12. Please consider that our DB will be the source of truth, with no
>> specific requirement of immediate data consistency amongst nodes.
>>
>> Regards,
>> Bhuvan Rawal
>> SDE
>>
>

Re: Requesting some details for my use case

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
Sorry to nitpick, but Cassandra is not a columnar database.  If you're
looking for columnar because you have an analytics need, Cassandra is not
what you want.  If you've just made the same mistake that 99% of people
make, well, now you know.  Cassandra historically has been referred to as a
"Column Family" data store, which is easily mistaken for columnar.

On Tue, Jan 5, 2016 at 3:21 AM Bhuvan Rawal <bh...@gmail.com> wrote:

> Hi All,
>
> Im planning to shift from SQL database to a columnar nosql database, we
> have streamlined our choices to Cassandra and HBase. I would really
> appreciate if someone decent experience with both give me a honest
> comparison on below parameters (links to neutral benchmarks/blogs also
> appreciated):
>
> 1. Data Consistency (Eventual consistency allowed but define "eventual")
> 2. Ease of Scaling Up
> 3. Managebility
> 4. Failure Recovery options
> 5. Secondary Indexing
> 6. Data Aggregation
> 7. Query Language (3rd party wrapper solutions also allowed)
> 8. Security
> 9. *Commercial Support for quick solutions to issues*.
> 10. Run batch job on data like map reduce or some common aggregation
> functions using row scan. Any other packages for cassandra to achieve this?
> 11. Trigger specific updates on tables used for secondary index.
> 12. Please consider that our DB will be the source of truth, with no
> specific requirement of immediate data consistency amongst nodes.
>
> Regards,
> Bhuvan Rawal
> SDE
>