You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by vi...@wipro.com on 2016/01/23 02:37:48 UTC

Spark Cassandra clusters

Hi All,
What is the right spark Cassandra cluster setup - having Cassandra cluster and spark cluster in different nodes or they should be on same nodes.
We are having them in different nodes and performance test shows very bad result for the spark streaming jobs.
Please let us know.

Regards
Vivek


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com

Re: Spark Cassandra clusters

Posted by Ted Yu <yu...@gmail.com>.
Vivek:
I searched for 'cassandra gc pause' and found a few hits.
e.g. :
http://search-hadoop.com/m/qZFqM1c5nrn1Ihwf6&subj=Re+GC+pauses+affecting+entire+cluster+

Keep in mind the effect of GC on shared nodes.

FYI

On Fri, Jan 22, 2016 at 7:09 PM, Mohammed Guller <mo...@glassbeam.com>
wrote:

> For data locality, it is recommended to run the Spark workers and
> Cassandra on the same nodes.
>
>
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>
>
>
> *From:* vivek.meghanathan@wipro.com [mailto:vivek.meghanathan@wipro.com]
> *Sent:* Friday, January 22, 2016 5:38 PM
> *To:* user@spark.apache.org
> *Subject:* Spark Cassandra clusters
>
>
>
> Hi All,
> What is the right spark Cassandra cluster setup - having Cassandra cluster
> and spark cluster in different nodes or they should be on same nodes.
> We are having them in different nodes and performance test shows very bad
> result for the spark streaming jobs.
> Please let us know.
>
> Regards
> Vivek
>
> The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s) and
> may contain proprietary, confidential or privileged information. If you are
> not the intended recipient, you should not disseminate, distribute or copy
> this e-mail. Please notify the sender immediately and destroy all copies of
> this message and any attachments. WARNING: Computer viruses can be
> transmitted via email. The recipient should check this email and any
> attachments for the presence of viruses. The company accepts no liability
> for any damage caused by any virus transmitted by this email.
> www.wipro.com
>

RE: Spark Cassandra clusters

Posted by Mohammed Guller <mo...@glassbeam.com>.
For data locality, it is recommended to run the Spark workers and Cassandra on the same nodes.

Mohammed
Author: Big Data Analytics with Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: vivek.meghanathan@wipro.com [mailto:vivek.meghanathan@wipro.com]
Sent: Friday, January 22, 2016 5:38 PM
To: user@spark.apache.org
Subject: Spark Cassandra clusters


Hi All,
What is the right spark Cassandra cluster setup - having Cassandra cluster and spark cluster in different nodes or they should be on same nodes.
We are having them in different nodes and performance test shows very bad result for the spark streaming jobs.
Please let us know.

Regards
Vivek
The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com<http://www.wipro.com>

Re: Spark Cassandra clusters

Posted by vi...@wipro.com.
Thanks mohammed and Ted.

I will try out the options and let you all know the progress. Also had posted in spark Cassandra connector community, got similar response.

Regards
Vivek
On Sat, Jan 23, 2016 at 11:37 am, Mohammed Guller <mo...@glassbeam.com>> wrote:

Vivek,

By default, Cassandra uses ¼ of the system memory, so in your case, it will be around 8GB, which is fine.

If you have more Cassandra related question, it is better to post it on the Cassandra mailing list. Also feel free to email me directly.

Mohammed
Author: Big Data Analytics with Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Ted Yu [mailto:yuzhihong@gmail.com]
Sent: Friday, January 22, 2016 6:37 PM
To: vivek.meghanathan@wipro.com
Cc: user
Subject: Re: Spark Cassandra clusters

I am not Cassandra developer :-)

Can you use http://search-hadoop.com/ or ask on Cassandra mailing list.

Cheers

On Fri, Jan 22, 2016 at 6:35 PM, <vi...@wipro.com>> wrote:

Thanks Ted, also what is the suggested memory setting for Cassandra process?

Regards
Vivek
On Sat, Jan 23, 2016 at 7:57 am, Ted Yu <yu...@gmail.com>> wrote:

>From your description, putting Cassandra daemon on Spark cluster should be feasible.

One aspect to be measured is how much locality can be achieved in this setup - Cassandra is distributed NoSQL store.

Cheers

On Fri, Jan 22, 2016 at 6:13 PM, <vi...@wipro.com>> wrote:

+ spark standalone cluster
On Sat, Jan 23, 2016 at 7:33 am, Vivek Meghanathan (WT01 - NEP) <vi...@wipro.com>> wrote:


We have the setup on Google cloud platform. Each node has 8 CPU + 30GB memory. 10 nodes for spark another 9nodes for Cassandra.
We are using spark 1.3.0 and Datastax bundle 4.5.9(which has 2.0.x Cassandra).
Spark master and worker daemon uses Xmx & Xms 4G. We have not changed the default setting of Cassandra, should we be increasing the JVM memory?

we have 9 streaming jobs the core usage varies from 2-6 and memory usage from 1 - 4 gb.

We have budget to use higher CPU or higher memory systems hence was planning to have them together on more efficient nodes.

Regards
Vivek
On Sat, Jan 23, 2016 at 7:13 am, Ted Yu <yu...@gmail.com>> wrote:

Can you give us a bit more information ?

How much memory does each node have ?
What's the current heap allocation for Cassandra process and executor ?
Spark / Cassandra release you are using

Thanks

On Fri, Jan 22, 2016 at 5:37 PM, <vi...@wipro.com>> wrote:

Hi All,
What is the right spark Cassandra cluster setup - having Cassandra cluster and spark cluster in different nodes or they should be on same nodes.
We are having them in different nodes and performance test shows very bad result for the spark streaming jobs.
Please let us know.

Regards
Vivek
The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com<http://www.wipro.com>

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com<http://www.wipro.com>

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com<http://www.wipro.com>

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com

RE: Spark Cassandra clusters

Posted by Mohammed Guller <mo...@glassbeam.com>.
Vivek,

By default, Cassandra uses ¼ of the system memory, so in your case, it will be around 8GB, which is fine.

If you have more Cassandra related question, it is better to post it on the Cassandra mailing list. Also feel free to email me directly.

Mohammed
Author: Big Data Analytics with Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Ted Yu [mailto:yuzhihong@gmail.com]
Sent: Friday, January 22, 2016 6:37 PM
To: vivek.meghanathan@wipro.com
Cc: user
Subject: Re: Spark Cassandra clusters

I am not Cassandra developer :-)

Can you use http://search-hadoop.com/ or ask on Cassandra mailing list.

Cheers

On Fri, Jan 22, 2016 at 6:35 PM, <vi...@wipro.com>> wrote:

Thanks Ted, also what is the suggested memory setting for Cassandra process?

Regards
Vivek
On Sat, Jan 23, 2016 at 7:57 am, Ted Yu <yu...@gmail.com>> wrote:

From your description, putting Cassandra daemon on Spark cluster should be feasible.

One aspect to be measured is how much locality can be achieved in this setup - Cassandra is distributed NoSQL store.

Cheers

On Fri, Jan 22, 2016 at 6:13 PM, <vi...@wipro.com>> wrote:

+ spark standalone cluster
On Sat, Jan 23, 2016 at 7:33 am, Vivek Meghanathan (WT01 - NEP) <vi...@wipro.com>> wrote:


We have the setup on Google cloud platform. Each node has 8 CPU + 30GB memory. 10 nodes for spark another 9nodes for Cassandra.
We are using spark 1.3.0 and Datastax bundle 4.5.9(which has 2.0.x Cassandra).
Spark master and worker daemon uses Xmx & Xms 4G. We have not changed the default setting of Cassandra, should we be increasing the JVM memory?

we have 9 streaming jobs the core usage varies from 2-6 and memory usage from 1 - 4 gb.

We have budget to use higher CPU or higher memory systems hence was planning to have them together on more efficient nodes.

Regards
Vivek
On Sat, Jan 23, 2016 at 7:13 am, Ted Yu <yu...@gmail.com>> wrote:

Can you give us a bit more information ?

How much memory does each node have ?
What's the current heap allocation for Cassandra process and executor ?
Spark / Cassandra release you are using

Thanks

On Fri, Jan 22, 2016 at 5:37 PM, <vi...@wipro.com>> wrote:

Hi All,
What is the right spark Cassandra cluster setup - having Cassandra cluster and spark cluster in different nodes or they should be on same nodes.
We are having them in different nodes and performance test shows very bad result for the spark streaming jobs.
Please let us know.

Regards
Vivek
The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com<http://www.wipro.com>

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com<http://www.wipro.com>

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com<http://www.wipro.com>


Re: Spark Cassandra clusters

Posted by Ted Yu <yu...@gmail.com>.
I am not Cassandra developer :-)

Can you use http://search-hadoop.com/ or ask on Cassandra mailing list.

Cheers

On Fri, Jan 22, 2016 at 6:35 PM, <vi...@wipro.com> wrote:

> Thanks Ted, also what is the suggested memory setting for Cassandra
> process?
>
> Regards
> Vivek
> On Sat, Jan 23, 2016 at 7:57 am, Ted Yu <yu...@gmail.com> wrote:
>
> From your description, putting Cassandra daemon on Spark cluster should
> be feasible.
>
> One aspect to be measured is how much locality can be achieved in this
> setup - Cassandra is distributed NoSQL store.
>
> Cheers
>
> On Fri, Jan 22, 2016 at 6:13 PM, <vi...@wipro.com> wrote:
>
>> + spark standalone cluster
>> On Sat, Jan 23, 2016 at 7:33 am, Vivek Meghanathan (WT01 - NEP) <
>> vivek.meghanathan@wipro.com> wrote:
>>
>> We have the setup on Google cloud platform. Each node has 8 CPU + 30GB
>> memory. 10 nodes for spark another 9nodes for Cassandra.
>> We are using spark 1.3.0 and Datastax bundle 4.5.9(which has 2.0.x
>> Cassandra).
>> Spark master and worker daemon uses Xmx & Xms 4G. We have not changed the
>> default setting of Cassandra, should we be increasing the JVM memory?
>>
>> we have 9 streaming jobs the core usage varies from 2-6 and memory usage
>> from 1 - 4 gb.
>>
>> We have budget to use higher CPU or higher memory systems hence was
>> planning to have them together on more efficient nodes.
>>
>> Regards
>> Vivek
>> On Sat, Jan 23, 2016 at 7:13 am, Ted Yu <yu...@gmail.com> wrote:
>>
>> Can you give us a bit more information ?
>>
>> How much memory does each node have ?
>> What's the current heap allocation for Cassandra process and executor ?
>> Spark / Cassandra release you are using
>>
>> Thanks
>>
>> On Fri, Jan 22, 2016 at 5:37 PM, <vi...@wipro.com> wrote:
>>
>>> Hi All,
>>> What is the right spark Cassandra cluster setup - having Cassandra
>>> cluster and spark cluster in different nodes or they should be on same
>>> nodes.
>>> We are having them in different nodes and performance test shows very
>>> bad result for the spark streaming jobs.
>>> Please let us know.
>>>
>>> Regards
>>> Vivek
>>>
>>> The information contained in this electronic message and any attachments
>>> to this message are intended for the exclusive use of the addressee(s) and
>>> may contain proprietary, confidential or privileged information. If you are
>>> not the intended recipient, you should not disseminate, distribute or copy
>>> this e-mail. Please notify the sender immediately and destroy all copies of
>>> this message and any attachments. WARNING: Computer viruses can be
>>> transmitted via email. The recipient should check this email and any
>>> attachments for the presence of viruses. The company accepts no liability
>>> for any damage caused by any virus transmitted by this email.
>>> www.wipro.com
>>>
>>
>> The information contained in this electronic message and any attachments
>> to this message are intended for the exclusive use of the addressee(s) and
>> may contain proprietary, confidential or privileged information. If you are
>> not the intended recipient, you should not disseminate, distribute or copy
>> this e-mail. Please notify the sender immediately and destroy all copies of
>> this message and any attachments. WARNING: Computer viruses can be
>> transmitted via email. The recipient should check this email and any
>> attachments for the presence of viruses. The company accepts no liability
>> for any damage caused by any virus transmitted by this email.
>> www.wipro.com
>>
>
> The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s) and
> may contain proprietary, confidential or privileged information. If you are
> not the intended recipient, you should not disseminate, distribute or copy
> this e-mail. Please notify the sender immediately and destroy all copies of
> this message and any attachments. WARNING: Computer viruses can be
> transmitted via email. The recipient should check this email and any
> attachments for the presence of viruses. The company accepts no liability
> for any damage caused by any virus transmitted by this email.
> www.wipro.com
>

Re: Spark Cassandra clusters

Posted by vi...@wipro.com.
Thanks Ted, also what is the suggested memory setting for Cassandra process?

Regards
Vivek

On Sat, Jan 23, 2016 at 7:57 am, Ted Yu <yu...@gmail.com>> wrote:

>From your description, putting Cassandra daemon on Spark cluster should be feasible.

One aspect to be measured is how much locality can be achieved in this setup - Cassandra is distributed NoSQL store.

Cheers

On Fri, Jan 22, 2016 at 6:13 PM, <vi...@wipro.com>> wrote:

+ spark standalone cluster

On Sat, Jan 23, 2016 at 7:33 am, Vivek Meghanathan (WT01 - NEP) <vi...@wipro.com>> wrote:


We have the setup on Google cloud platform. Each node has 8 CPU + 30GB memory. 10 nodes for spark another 9nodes for Cassandra.
We are using spark 1.3.0 and Datastax bundle 4.5.9(which has 2.0.x Cassandra).
Spark master and worker daemon uses Xmx & Xms 4G. We have not changed the default setting of Cassandra, should we be increasing the JVM memory?

we have 9 streaming jobs the core usage varies from 2-6 and memory usage from 1 - 4 gb.

We have budget to use higher CPU or higher memory systems hence was planning to have them together on more efficient nodes.

Regards
Vivek
On Sat, Jan 23, 2016 at 7:13 am, Ted Yu <yu...@gmail.com>> wrote:

Can you give us a bit more information ?

How much memory does each node have ?
What's the current heap allocation for Cassandra process and executor ?
Spark / Cassandra release you are using

Thanks

On Fri, Jan 22, 2016 at 5:37 PM, <vi...@wipro.com>> wrote:

Hi All,
What is the right spark Cassandra cluster setup - having Cassandra cluster and spark cluster in different nodes or they should be on same nodes.
We are having them in different nodes and performance test shows very bad result for the spark streaming jobs.
Please let us know.

Regards
Vivek


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com<http://www.wipro.com>

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com<http://www.wipro.com>

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com

Re: Spark Cassandra clusters

Posted by Ted Yu <yu...@gmail.com>.
>From your description, putting Cassandra daemon on Spark cluster should be
feasible.

One aspect to be measured is how much locality can be achieved in this
setup - Cassandra is distributed NoSQL store.

Cheers

On Fri, Jan 22, 2016 at 6:13 PM, <vi...@wipro.com> wrote:

> + spark standalone cluster
> On Sat, Jan 23, 2016 at 7:33 am, Vivek Meghanathan (WT01 - NEP) <
> vivek.meghanathan@wipro.com> wrote:
>
> We have the setup on Google cloud platform. Each node has 8 CPU + 30GB
> memory. 10 nodes for spark another 9nodes for Cassandra.
> We are using spark 1.3.0 and Datastax bundle 4.5.9(which has 2.0.x
> Cassandra).
> Spark master and worker daemon uses Xmx & Xms 4G. We have not changed the
> default setting of Cassandra, should we be increasing the JVM memory?
>
> we have 9 streaming jobs the core usage varies from 2-6 and memory usage
> from 1 - 4 gb.
>
> We have budget to use higher CPU or higher memory systems hence was
> planning to have them together on more efficient nodes.
>
> Regards
> Vivek
> On Sat, Jan 23, 2016 at 7:13 am, Ted Yu <yu...@gmail.com> wrote:
>
> Can you give us a bit more information ?
>
> How much memory does each node have ?
> What's the current heap allocation for Cassandra process and executor ?
> Spark / Cassandra release you are using
>
> Thanks
>
> On Fri, Jan 22, 2016 at 5:37 PM, <vi...@wipro.com> wrote:
>
>> Hi All,
>> What is the right spark Cassandra cluster setup - having Cassandra
>> cluster and spark cluster in different nodes or they should be on same
>> nodes.
>> We are having them in different nodes and performance test shows very bad
>> result for the spark streaming jobs.
>> Please let us know.
>>
>> Regards
>> Vivek
>>
>> The information contained in this electronic message and any attachments
>> to this message are intended for the exclusive use of the addressee(s) and
>> may contain proprietary, confidential or privileged information. If you are
>> not the intended recipient, you should not disseminate, distribute or copy
>> this e-mail. Please notify the sender immediately and destroy all copies of
>> this message and any attachments. WARNING: Computer viruses can be
>> transmitted via email. The recipient should check this email and any
>> attachments for the presence of viruses. The company accepts no liability
>> for any damage caused by any virus transmitted by this email.
>> www.wipro.com
>>
>
> The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s) and
> may contain proprietary, confidential or privileged information. If you are
> not the intended recipient, you should not disseminate, distribute or copy
> this e-mail. Please notify the sender immediately and destroy all copies of
> this message and any attachments. WARNING: Computer viruses can be
> transmitted via email. The recipient should check this email and any
> attachments for the presence of viruses. The company accepts no liability
> for any damage caused by any virus transmitted by this email.
> www.wipro.com
>

Re: Spark Cassandra clusters

Posted by vi...@wipro.com.
+ spark standalone cluster

On Sat, Jan 23, 2016 at 7:33 am, Vivek Meghanathan (WT01 - NEP) <vi...@wipro.com>> wrote:


We have the setup on Google cloud platform. Each node has 8 CPU + 30GB memory. 10 nodes for spark another 9nodes for Cassandra.
We are using spark 1.3.0 and Datastax bundle 4.5.9(which has 2.0.x Cassandra).
Spark master and worker daemon uses Xmx & Xms 4G. We have not changed the default setting of Cassandra, should we be increasing the JVM memory?

we have 9 streaming jobs the core usage varies from 2-6 and memory usage from 1 - 4 gb.

We have budget to use higher CPU or higher memory systems hence was planning to have them together on more efficient nodes.

Regards
Vivek
On Sat, Jan 23, 2016 at 7:13 am, Ted Yu <yu...@gmail.com>> wrote:

Can you give us a bit more information ?

How much memory does each node have ?
What's the current heap allocation for Cassandra process and executor ?
Spark / Cassandra release you are using

Thanks

On Fri, Jan 22, 2016 at 5:37 PM, <vi...@wipro.com>> wrote:

Hi All,
What is the right spark Cassandra cluster setup - having Cassandra cluster and spark cluster in different nodes or they should be on same nodes.
We are having them in different nodes and performance test shows very bad result for the spark streaming jobs.
Please let us know.

Regards
Vivek


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com<http://www.wipro.com>

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com

Re: Spark Cassandra clusters

Posted by vi...@wipro.com.
We have the setup on Google cloud platform. Each node has 8 CPU + 30GB memory. 10 nodes for spark another 9nodes for Cassandra.
We are using spark 1.3.0 and Datastax bundle 4.5.9(which has 2.0.x Cassandra).
Spark master and worker daemon uses Xmx & Xms 4G. We have not changed the default setting of Cassandra, should we be increasing the JVM memory?

we have 9 streaming jobs the core usage varies from 2-6 and memory usage from 1 - 4 gb.

We have budget to use higher CPU or higher memory systems hence was planning to have them together on more efficient nodes.

Regards
Vivek
On Sat, Jan 23, 2016 at 7:13 am, Ted Yu <yu...@gmail.com>> wrote:

Can you give us a bit more information ?

How much memory does each node have ?
What's the current heap allocation for Cassandra process and executor ?
Spark / Cassandra release you are using

Thanks

On Fri, Jan 22, 2016 at 5:37 PM, <vi...@wipro.com>> wrote:

Hi All,
What is the right spark Cassandra cluster setup - having Cassandra cluster and spark cluster in different nodes or they should be on same nodes.
We are having them in different nodes and performance test shows very bad result for the spark streaming jobs.
Please let us know.

Regards
Vivek


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com<http://www.wipro.com>

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com

Re: Spark Cassandra clusters

Posted by Ted Yu <yu...@gmail.com>.
Can you give us a bit more information ?

How much memory does each node have ?
What's the current heap allocation for Cassandra process and executor ?
Spark / Cassandra release you are using

Thanks

On Fri, Jan 22, 2016 at 5:37 PM, <vi...@wipro.com> wrote:

> Hi All,
> What is the right spark Cassandra cluster setup - having Cassandra cluster
> and spark cluster in different nodes or they should be on same nodes.
> We are having them in different nodes and performance test shows very bad
> result for the spark streaming jobs.
> Please let us know.
>
> Regards
> Vivek
>
> The information contained in this electronic message and any attachments
> to this message are intended for the exclusive use of the addressee(s) and
> may contain proprietary, confidential or privileged information. If you are
> not the intended recipient, you should not disseminate, distribute or copy
> this e-mail. Please notify the sender immediately and destroy all copies of
> this message and any attachments. WARNING: Computer viruses can be
> transmitted via email. The recipient should check this email and any
> attachments for the presence of viruses. The company accepts no liability
> for any damage caused by any virus transmitted by this email.
> www.wipro.com
>

Re: Spark Cassandra clusters

Posted by vi...@wipro.com.
Thanks.

We are using spark - Cassandra connector aligned for spark 1.3.

Regards
Vivek
On Sat, Jan 23, 2016 at 7:27 am, Durgesh Verma <dv...@gmail.com>> wrote:

This may be useful, you can try connectors.
https://academy.datastax.com/demos/getting-started-apache-spark-and-cassandra

https://spark-summit.org/2015/events/cassandra-and-spark-optimizing-for-data-locality/

Thanks,
-Durgesh

On Jan 22, 2016, at 8:37 PM, <vi...@wipro.com>> <vi...@wipro.com>> wrote:


Hi All,
What is the right spark Cassandra cluster setup - having Cassandra cluster and spark cluster in different nodes or they should be on same nodes.
We are having them in different nodes and performance test shows very bad result for the spark streaming jobs.
Please let us know.

Regards
Vivek


The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com<http://www.wipro.com>
The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com

Re: Spark Cassandra clusters

Posted by Durgesh Verma <dv...@gmail.com>.
This may be useful, you can try connectors.
https://academy.datastax.com/demos/getting-started-apache-spark-and-cassandra

https://spark-summit.org/2015/events/cassandra-and-spark-optimizing-for-data-locality/

Thanks,
-Durgesh

> On Jan 22, 2016, at 8:37 PM, <vi...@wipro.com> <vi...@wipro.com> wrote:
> 
> Hi All,
> What is the right spark Cassandra cluster setup - having Cassandra cluster and spark cluster in different nodes or they should be on same nodes.
> We are having them in different nodes and performance test shows very bad result for the spark streaming jobs.
> Please let us know.
> 
> Regards
> Vivek
> 
> The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com