You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Anuj Wadehra <an...@yahoo.co.in> on 2015/03/17 04:06:55 UTC

Run Mixed Workload using two instances on one node


 Hi,

We are trying to Decouple our Reporting DB from OLTP. Need urgent help on the feasibility of proposed solution for PRODUCTION.

Use Case: Currently, our OLTP and Reporting application and DB are same. Some CF are used for both OLTP and Reporting while others are solely used for Reporting.Every business transaction synchronously updates the main OLTP CF and asynchronously updates other Reporting CFs.

Problem Statement:
1. Decouple Reporting and OLTP such that Reporting load can't impact  OLTP performance.
2. Scaling of Reporting  and OLTP modules must be independent
3. OLTP client should not update all Reporting CFs. We generate Data Records on File sytem/shared disk.Reporting should use these Records to create Reporting DB.
4. Small customers may do OLTP and Reporting on same 3-node cluster. Bigger customers can be given an option to have dedicated OLTP and Reporting nodes. So, standard Hardware box should be usable for 3 deployments (OLTP,Reporting or OLTP+Reporting)

Note: Reporting is ad-hoc, may involve full table scans and does not involve Analytics. Data size is huge 2TB (OLTP+Reporting) per node.

Hardware : Standard deployment -3 node cluster with each node having 24 cores, 64GB RAM, 400GB * 6 SSDs in RAID5

Proposed Solution:
1. Split OLTP and Reporting clients into two application components.
2. For small deployments where more than 3 nodes are not required:
    A. Install 2 Cassandra instances on each node one for OLTP and other for Reporting
    B. To distribute I/O load in 2:1 --Remove RAID5 (as Cassandra offers replication) and assign 4 disks as JBod for OLTP and 2 disks for Reporting
    C. RAM is abundant and often under-utilized , so assign 8GB each for 2 Cassandra instance
    D. To make sure that Reporting is not able to overload CPU, tune concurrent_reads,concurrent_writes 
 OLTP client will only write to OLTP DB and generate DB record. Reporting client will poll FS and populate Reporting DB in required format.
3. Larger customers can have Reporting clients and DB on dedicated physical nodes with all resources.

Key Questions:
Is it ok to run 2 Cassandra instances on one node in Production system and limit CPU Usage,Disk I/O and RAM as suggested above?
Any other solution for above mentioned problem statement?



Thanks
Anuj


   

RE: Run Mixed Workload using two instances on one node

Posted by SE...@homedepot.com.
Yes, for over 2 years.

As for #2 - you would keep all CFs in both DCs. But, maybe only do RF=2 in OLTP and 3 in reporting. Not sure of all your requirements. Writes are fast and cheap in Cassandra, so I wouldn’t be concerned with “extra” writes in the OLTP DC.


Sean Durity – Cassandra Admin, Big Data Team
From: Anuj Wadehra [mailto:anujw_2003@yahoo.co.in]
Sent: Tuesday, March 17, 2015 1:29 PM
To: user@cassandra.apache.org
Subject: Re: Run Mixed Workload using two instances on one node

Thanks Sean. Are you using 2 Cassandra instances on single node in PRODUCTION environment?

Yes. We considered having separate virtual DC for OLTP and Reporting something similar to the approach mentioned at http://www.datastax.com/docs/datastax_enterprise3.1/solutions/dse_search_cluster .
The approach mentions dedicated nodes for each workload.

In Virtual DC approach, we have following concerns:
1. We can't afford dedicated Reporting nodes for small customers. Bigger clusters may have dedicated reporting nodes.
2. When OLTP DC replicates transaction data to Reporting DC in real time, what would trigger population of other reporting tables in Reporting DC. I am not sure whether triggers are stable yet?

Anuj


On Tuesday, 17 March 2015 6:58 PM, "SEAN_R_DURITY@homedepot.com<ma...@homedepot.com>" <SE...@homedepot.com>> wrote:

We run two cassandra nodes on the same host for a use case that requires a random ordered ring and a byte ordered ring. It is technically feasible. However, it makes administration of the rings a bit tougher (different ports for one, etc.). OpsCenter agents can only connect to one of the rings at a time. (Perhaps we could run 2 agents also, on different ports…)

Have you considered using a separate, logical Data Center for the reporting use cases? OLTP clients use the OLTP DC; reporting clients use the Reporting DC. The reporting CFs could have smaller replication factor on the OLTP DC, if needed, to keep writes and data size minimized there.


Sean Durity – Cassandra Admin, Big Data Team
From: Anuj Wadehra [mailto:anujw_2003@yahoo.co.in]
Sent: Tuesday, March 17, 2015 1:36 AM
To: Ali Akhtar; user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Run Mixed Workload using two instances on one node

I understand that 2 instances on one node looks a weird solution. But can have dedicated reporting nodes for big customers but not for small customers.

My questions would be:
1. What is the technical reasoning? What problems you foresee  if we use 2 C* instances on one node in production? We have ample HW on each server and mostly it's under-utilized. We just want that heavy reporting must not impact OLTP and both OLTP and reporting should be individually scalable.
2. I think we dont need Elastic Search. We just need a plain Reporting DB which can reply to reporting queries.We can create our own CF as indexes. We dont need overhead of another 3PP for our current reporting needs.

Thanks
Anuj







On Tuesday, 17 March 2015 9:59 AM, Ali Akhtar <al...@gmail.com>> wrote:

I don't think its recommended to have two instances on the same node.
Have you considered using something like elasticsearch for the reports? Its designed for that sort of thing.
On Mar 17, 2015 8:07 AM, "Anuj Wadehra" <an...@yahoo.co.in>> wrote:

Hi,

We are trying to Decouple our Reporting DB from OLTP. Need urgent help on the feasibility of proposed solution for PRODUCTION.

Use Case: Currently, our OLTP and Reporting application and DB are same. Some CF are used for both OLTP and Reporting while others are solely used for Reporting.Every business transaction synchronously updates the main OLTP CF and asynchronously updates other Reporting CFs.

Problem Statement:
1. Decouple Reporting and OLTP such that Reporting load can't impact  OLTP performance.
2. Scaling of Reporting  and OLTP modules must be independent
3. OLTP client should not update all Reporting CFs. We generate Data Records on File sytem/shared disk.Reporting should use these Records to create Reporting DB.
4. Small customers may do OLTP and Reporting on same 3-node cluster. Bigger customers can be given an option to have dedicated OLTP and Reporting nodes. So, standard Hardware box should be usable for 3 deployments (OLTP,Reporting or OLTP+Reporting)

Note: Reporting is ad-hoc, may involve full table scans and does not involve Analytics. Data size is huge 2TB (OLTP+Reporting) per node.

Hardware : Standard deployment -3 node cluster with each node having 24 cores, 64GB RAM, 400GB * 6 SSDs in RAID5

Proposed Solution:
1. Split OLTP and Reporting clients into two application components.
2. For small deployments where more than 3 nodes are not required:
    A. Install 2 Cassandra instances on each node one for OLTP and other for Reporting
    B. To distribute I/O load in 2:1 --Remove RAID5 (as Cassandra offers replication) and assign 4 disks as JBod for OLTP and 2 disks for Reporting
    C. RAM is abundant and often under-utilized , so assign 8GB each for 2 Cassandra instance
    D. To make sure that Reporting is not able to overload CPU, tune concurrent_reads,concurrent_writes
OLTP client will only write to OLTP DB and generate DB record. Reporting client will poll FS and populate Reporting DB in required format.
3. Larger customers can have Reporting clients and DB on dedicated physical nodes with all resources.

Key Questions:
Is it ok to run 2 Cassandra instances on one node in Production system and limit CPU Usage,Disk I/O and RAM as suggested above?
Any other solution for above mentioned problem statement?



Thanks
Anuj


________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.


________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

Re: Run Mixed Workload using two instances on one node

Posted by Anuj Wadehra <an...@yahoo.co.in>.
Thanks Sean. Are you using 2 Cassandra instances on single node in PRODUCTION environment? 

Yes. We considered having separate virtual DC for OLTP and Reporting something similar to the approach mentioned at http://www.datastax.com/docs/datastax_enterprise3.1/solutions/dse_search_cluster .
The approach mentions dedicated nodes for each workload.  

In Virtual DC approach, we have following concerns:
1. We can't afford dedicated Reporting nodes for small customers. Bigger clusters may have dedicated reporting nodes.
2. When OLTP DC replicates transaction data to Reporting DC in real time, what would trigger population of other reporting tables in Reporting DC. I am not sure whether triggers are stable yet?
Anuj
 


     On Tuesday, 17 March 2015 6:58 PM, "SEAN_R_DURITY@homedepot.com" <SE...@homedepot.com> wrote:
   

 #yiv4297789573 #yiv4297789573 -- _filtered #yiv4297789573 {font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered #yiv4297789573 {font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered #yiv4297789573 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv4297789573 {font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;} _filtered #yiv4297789573 {font-family:Garamond;panose-1:2 2 4 4 3 3 1 1 8 3;}#yiv4297789573 #yiv4297789573 p.yiv4297789573MsoNormal, #yiv4297789573 li.yiv4297789573MsoNormal, #yiv4297789573 div.yiv4297789573MsoNormal {margin:0in;margin-bottom:.0001pt;font-size:12.0pt;}#yiv4297789573 a:link, #yiv4297789573 span.yiv4297789573MsoHyperlink {color:blue;text-decoration:underline;}#yiv4297789573 a:visited, #yiv4297789573 span.yiv4297789573MsoHyperlinkFollowed {color:purple;text-decoration:underline;}#yiv4297789573 span.yiv4297789573EmailStyle17 {color:#1F497D;}#yiv4297789573 .yiv4297789573MsoChpDefault {font-size:10.0pt;} _filtered #yiv4297789573 {margin:1.0in 1.0in 1.0in 1.0in;}#yiv4297789573 div.yiv4297789573WordSection1 {}#yiv4297789573 We run two cassandra nodes on the same host for a use case that requires a random ordered ring and a byte ordered ring. It is technically feasible. However, it makes administration of the rings a bit tougher (different ports for one, etc.). OpsCenter agents can only connect to one of the rings at a time. (Perhaps we could run 2 agents also, on different ports…)    Have you considered using a separate, logical Data Center for the reporting use cases? OLTP clients use the OLTP DC; reporting clients use the Reporting DC. The reporting CFs could have smaller replication factor on the OLTP DC, if needed, to keep writes and data size minimized there.       Sean Durity – Cassandra Admin, Big Data Team From: Anuj Wadehra [mailto:anujw_2003@yahoo.co.in]
Sent: Tuesday, March 17, 2015 1:36 AM
To: Ali Akhtar; user@cassandra.apache.org
Subject: Re: Run Mixed Workload using two instances on one node    I understand that 2 instances on one node looks a weird solution. But can have dedicated reporting nodes for big customers but not for small customers.     My questions would be: 1. What is the technical reasoning? What problems you foresee  if we use 2 C* instances on one node in production? We have ample HW on each server and mostly it's under-utilized. We just want that heavy reporting must not impact OLTP and both OLTP and reporting should be individually scalable. 2. I think we dont need Elastic Search. We just need a plain Reporting DB which can reply to reporting queries.We can create our own CF as indexes. We dont need overhead of another 3PP for our current reporting needs.    Thanks Anuj                      On Tuesday, 17 March 2015 9:59 AM, Ali Akhtar <al...@gmail.com> wrote:    I don't think its recommended to have two instances on the same node. Have you considered using something like elasticsearch for the reports? Its designed for that sort of thing. On Mar 17, 2015 8:07 AM, "Anuj Wadehra" <an...@yahoo.co.in> wrote:    Hi,

We are trying to Decouple our Reporting DB from OLTP. Need urgent help on the feasibility of proposed solution for PRODUCTION.

Use Case: Currently, our OLTP and Reporting application and DB are same. Some CF are used for both OLTP and Reporting while others are solely used for Reporting.Every business transaction synchronously updates the main OLTP CF and asynchronously updates other Reporting CFs.

Problem Statement:
1. Decouple Reporting and OLTP such that Reporting load can't impact  OLTP performance.
2. Scaling of Reporting  and OLTP modules must be independent
3. OLTP client should not update all Reporting CFs. We generate Data Records on File sytem/shared disk.Reporting should use these Records to create Reporting DB.
4. Small customers may do OLTP and Reporting on same 3-node cluster. Bigger customers can be given an option to have dedicated OLTP and Reporting nodes. So, standard Hardware box should be usable for 3 deployments (OLTP,Reporting or OLTP+Reporting)

Note: Reporting is ad-hoc, may involve full table scans and does not involve Analytics. Data size is huge 2TB (OLTP+Reporting) per node.

Hardware : Standard deployment -3 node cluster with each node having 24 cores, 64GB RAM, 400GB * 6 SSDs in RAID5

Proposed Solution:
1. Split OLTP and Reporting clients into two application components.
2. For small deployments where more than 3 nodes are not required:
    A. Install 2 Cassandra instances on each node one for OLTP and other for Reporting
    B. To distribute I/O load in 2:1 --Remove RAID5 (as Cassandra offers replication) and assign 4 disks as JBod for OLTP and 2 disks for Reporting
    C. RAM is abundant and often under-utilized , so assign 8GB each for 2 Cassandra instance
    D. To make sure that Reporting is not able to overload CPU, tune concurrent_reads,concurrent_writes
OLTP client will only write to OLTP DB and generate DB record. Reporting client will poll FS and populate Reporting DB in required format.
3. Larger customers can have Reporting clients and DB on dedicated physical nodes with all resources.

Key Questions:
Is it ok to run 2 Cassandra instances on one node in Production system and limit CPU Usage,Disk I/O and RAM as suggested above?
Any other solution for above mentioned problem statement?



Thanks
Anuj

    

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.


  

RE: Run Mixed Workload using two instances on one node

Posted by SE...@homedepot.com.
We run two cassandra nodes on the same host for a use case that requires a random ordered ring and a byte ordered ring. It is technically feasible. However, it makes administration of the rings a bit tougher (different ports for one, etc.). OpsCenter agents can only connect to one of the rings at a time. (Perhaps we could run 2 agents also, on different ports…)

Have you considered using a separate, logical Data Center for the reporting use cases? OLTP clients use the OLTP DC; reporting clients use the Reporting DC. The reporting CFs could have smaller replication factor on the OLTP DC, if needed, to keep writes and data size minimized there.


Sean Durity – Cassandra Admin, Big Data Team
From: Anuj Wadehra [mailto:anujw_2003@yahoo.co.in]
Sent: Tuesday, March 17, 2015 1:36 AM
To: Ali Akhtar; user@cassandra.apache.org
Subject: Re: Run Mixed Workload using two instances on one node

I understand that 2 instances on one node looks a weird solution. But can have dedicated reporting nodes for big customers but not for small customers.

My questions would be:
1. What is the technical reasoning? What problems you foresee  if we use 2 C* instances on one node in production? We have ample HW on each server and mostly it's under-utilized. We just want that heavy reporting must not impact OLTP and both OLTP and reporting should be individually scalable.
2. I think we dont need Elastic Search. We just need a plain Reporting DB which can reply to reporting queries.We can create our own CF as indexes. We dont need overhead of another 3PP for our current reporting needs.

Thanks
Anuj







On Tuesday, 17 March 2015 9:59 AM, Ali Akhtar <al...@gmail.com>> wrote:

I don't think its recommended to have two instances on the same node.
Have you considered using something like elasticsearch for the reports? Its designed for that sort of thing.
On Mar 17, 2015 8:07 AM, "Anuj Wadehra" <an...@yahoo.co.in>> wrote:

Hi,

We are trying to Decouple our Reporting DB from OLTP. Need urgent help on the feasibility of proposed solution for PRODUCTION.

Use Case: Currently, our OLTP and Reporting application and DB are same. Some CF are used for both OLTP and Reporting while others are solely used for Reporting.Every business transaction synchronously updates the main OLTP CF and asynchronously updates other Reporting CFs.

Problem Statement:
1. Decouple Reporting and OLTP such that Reporting load can't impact  OLTP performance.
2. Scaling of Reporting  and OLTP modules must be independent
3. OLTP client should not update all Reporting CFs. We generate Data Records on File sytem/shared disk.Reporting should use these Records to create Reporting DB.
4. Small customers may do OLTP and Reporting on same 3-node cluster. Bigger customers can be given an option to have dedicated OLTP and Reporting nodes. So, standard Hardware box should be usable for 3 deployments (OLTP,Reporting or OLTP+Reporting)

Note: Reporting is ad-hoc, may involve full table scans and does not involve Analytics. Data size is huge 2TB (OLTP+Reporting) per node.

Hardware : Standard deployment -3 node cluster with each node having 24 cores, 64GB RAM, 400GB * 6 SSDs in RAID5

Proposed Solution:
1. Split OLTP and Reporting clients into two application components.
2. For small deployments where more than 3 nodes are not required:
    A. Install 2 Cassandra instances on each node one for OLTP and other for Reporting
    B. To distribute I/O load in 2:1 --Remove RAID5 (as Cassandra offers replication) and assign 4 disks as JBod for OLTP and 2 disks for Reporting
    C. RAM is abundant and often under-utilized , so assign 8GB each for 2 Cassandra instance
    D. To make sure that Reporting is not able to overload CPU, tune concurrent_reads,concurrent_writes
OLTP client will only write to OLTP DB and generate DB record. Reporting client will poll FS and populate Reporting DB in required format.
3. Larger customers can have Reporting clients and DB on dedicated physical nodes with all resources.

Key Questions:
Is it ok to run 2 Cassandra instances on one node in Production system and limit CPU Usage,Disk I/O and RAM as suggested above?
Any other solution for above mentioned problem statement?



Thanks
Anuj



________________________________

The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.

Re: Run Mixed Workload using two instances on one node

Posted by Anuj Wadehra <an...@yahoo.co.in>.
I understand that 2 instances on one node looks a weird solution. But can have dedicated reporting nodes for big customers but not for small customers. 

My questions would be:1. What is the technical reasoning? What problems you foresee  if we use 2 C* instances on one node in production? We have ample HW on each server and mostly it's under-utilized. We just want that heavy reporting must not impact OLTP and both OLTP and reporting should be individually scalable.

2. I think we dont need Elastic Search. We just need a plain Reporting DB which can reply to reporting queries.We can create our own CF as indexes. We dont need overhead of another 3PP for our current reporting needs.
ThanksAnuj





 


     On Tuesday, 17 March 2015 9:59 AM, Ali Akhtar <al...@gmail.com> wrote:
   

 I don't think its recommended to have two instances on the same node.Have you considered using something like elasticsearch for the reports? Its designed for that sort of thing.On Mar 17, 2015 8:07 AM, "Anuj Wadehra" <an...@yahoo.co.in> wrote:



 Hi,

We are trying to Decouple our Reporting DB from OLTP. Need urgent help on the feasibility of proposed solution for PRODUCTION.

Use Case: Currently, our OLTP and Reporting application and DB are same. Some CF are used for both OLTP and Reporting while others are solely used for Reporting.Every business transaction synchronously updates the main OLTP CF and asynchronously updates other Reporting CFs.

Problem Statement:
1. Decouple Reporting and OLTP such that Reporting load can't impact  OLTP performance.
2. Scaling of Reporting  and OLTP modules must be independent
3. OLTP client should not update all Reporting CFs. We generate Data Records on File sytem/shared disk.Reporting should use these Records to create Reporting DB.
4. Small customers may do OLTP and Reporting on same 3-node cluster. Bigger customers can be given an option to have dedicated OLTP and Reporting nodes. So, standard Hardware box should be usable for 3 deployments (OLTP,Reporting or OLTP+Reporting)

Note: Reporting is ad-hoc, may involve full table scans and does not involve Analytics. Data size is huge 2TB (OLTP+Reporting) per node.

Hardware : Standard deployment -3 node cluster with each node having 24 cores, 64GB RAM, 400GB * 6 SSDs in RAID5

Proposed Solution:
1. Split OLTP and Reporting clients into two application components.
2. For small deployments where more than 3 nodes are not required:
    A. Install 2 Cassandra instances on each node one for OLTP and other for Reporting
    B. To distribute I/O load in 2:1 --Remove RAID5 (as Cassandra offers replication) and assign 4 disks as JBod for OLTP and 2 disks for Reporting
    C. RAM is abundant and often under-utilized , so assign 8GB each for 2 Cassandra instance
    D. To make sure that Reporting is not able to overload CPU, tune concurrent_reads,concurrent_writes 
 OLTP client will only write to OLTP DB and generate DB record. Reporting client will poll FS and populate Reporting DB in required format.
3. Larger customers can have Reporting clients and DB on dedicated physical nodes with all resources.

Key Questions:
Is it ok to run 2 Cassandra instances on one node in Production system and limit CPU Usage,Disk I/O and RAM as suggested above?
Any other solution for above mentioned problem statement?



Thanks
Anuj


    


  

Re: Run Mixed Workload using two instances on one node

Posted by Ali Akhtar <al...@gmail.com>.
I don't think its recommended to have two instances on the same node.

Have you considered using something like elasticsearch for the reports? Its
designed for that sort of thing.
On Mar 17, 2015 8:07 AM, "Anuj Wadehra" <an...@yahoo.co.in> wrote:

>
>
> Hi,
>
> We are trying to Decouple our Reporting DB from OLTP. Need urgent help on
> the feasibility of proposed solution for PRODUCTION.
>
> Use Case: Currently, our OLTP and Reporting application and DB are same.
> Some CF are used for both OLTP and Reporting while others are solely used
> for Reporting.Every business transaction synchronously updates the main
> OLTP CF and asynchronously updates other Reporting CFs.
>
> Problem Statement:
> 1. Decouple Reporting and OLTP such that Reporting load can't impact  OLTP
> performance.
> 2. Scaling of Reporting  and OLTP modules must be independent
> 3. OLTP client should not update all Reporting CFs. We generate Data
> Records on File sytem/shared disk.Reporting should use these Records to
> create Reporting DB.
> 4. Small customers may do OLTP and Reporting on same 3-node cluster.
> Bigger customers can be given an option to have dedicated OLTP and
> Reporting nodes. So, standard Hardware box should be usable for 3
> deployments (OLTP,Reporting or OLTP+Reporting)
>
> Note: Reporting is ad-hoc, may involve full table scans and does not
> involve Analytics. Data size is huge 2TB (OLTP+Reporting) per node.
>
> Hardware : Standard deployment -3 node cluster with each node having 24
> cores, 64GB RAM, 400GB * 6 SSDs in RAID5
>
> Proposed Solution:
> 1. Split OLTP and Reporting clients into two application components.
> 2. For small deployments where more than 3 nodes are not required:
>     A. Install 2 Cassandra instances on each node one for OLTP and other
> for Reporting
>     B. To distribute I/O load in 2:1 --Remove RAID5 (as Cassandra offers
> replication) and assign 4 disks as JBod for OLTP and 2 disks for Reporting
>     C. RAM is abundant and often under-utilized , so assign 8GB each for 2
> Cassandra instance
>     D. To make sure that Reporting is not able to overload CPU, tune
> concurrent_reads,concurrent_writes
> OLTP client will only write to OLTP DB and generate DB record. Reporting
> client will poll FS and populate Reporting DB in required format.
> 3. Larger customers can have Reporting clients and DB on dedicated
> physical nodes with all resources.
>
> Key Questions:
> Is it ok to run 2 Cassandra instances on one node in Production system and
> limit CPU Usage,Disk I/O and RAM as suggested above?
> Any other solution for above mentioned problem statement?
>
>
>
> Thanks
> Anuj
>
>
>