You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by kooljava2 <ko...@yahoo.com.INVALID> on 2018/04/11 20:55:18 UTC

Sorl/DSE Spark

Hello,
We are exploring on configuring Sorl/Spark. Wanted to get input on this. 1) How do we decide which one to use?2) Do we run this on a DC where there is less workload?
Any other suggestion or comments are appreciated.
Thank you.

Re: Sorl/DSE Spark

Posted by Niclas Hedhman <ni...@apache.org>.

On Fri, Apr 13, 2018, 18:40 Ben Bromhead <be...@instaclustr.com> wrote:

>
> DSE is literally in the title.
>

:-D who reads the title???

Sorry...

Re: Sorl/DSE Spark

Posted by Ben Bromhead <be...@instaclustr.com>.

On Thu, Apr 12, 2018, 21:23 Niclas Hedhman <ni...@apache.org> wrote:

> Ben,
>
> 1. I don't see anything in this thread that is DSE specific, so I think it
> belongs here.
>
DSE is literally in the title.


> 2. Careful when you say that Datastax produces Cassandra. Cassandra is a
> product of Apache Software Foundation, and no one else. You, Ben, should be
> very well aware of this, to avoid further trademark issues between Datastax
> and ASF.
>
Given the context and subject, the software I was referring to is DSE.

Mind you, it would be hilarious if this email caused more trademark issues
with Datastax.



> Cheers
> Niclas Hedhman
> Member of ASF
>
> On Thu, Apr 12, 2018 at 9:57 PM, Ben Bromhead <be...@instaclustr.com> wrote:
>
>> Folks this is the user list for Apache Cassandra. I would suggest
>> redirecting the question to Datastax the commercial entity that produces
>> the software.
>>
>> On Thu, Apr 12, 2018 at 9:51 AM vincent gromakowski <
>> vincent.gromakowski@gmail.com> wrote:
>>
>>> Best practise is to use a dedicated DC for analytics separated from the
>>> hot DC.
>>>
>>> Le jeu. 12 avr. 2018 à 15:45, sha p <sh...@gmail.com> a écrit :
>>>
>>>> Got it.
>>>> Thank you so for your detailed explanation.
>>>>
>>>> Regards,
>>>> Shyam
>>>>
>>>> On Thu, 12 Apr 2018, 17:37 Evelyn Smith, <u5...@gmail.com> wrote:
>>>>
>>>>> Cassandra tends to be used in a lot of web applications. It’s loads
>>>>> are more natural and evenly distributed. Like people logging on throughout
>>>>> the day. And people operating it tend to be latency sensitive.
>>>>>
>>>>> Spark on the other hand will try and complete it’s tasks as quickly as
>>>>> possible. This might mean bulk reading from the Cassandra at 10 times the
>>>>> usual operations load, but for only say 5 minutes every half hour (however
>>>>> long it takes to read in the data for a job and whenever that job is run).
>>>>> In this case during that 5 minutes your normal operations work (customers)
>>>>> are going to experience a lot of latency.
>>>>>
>>>>> This even happens with streaming jobs, every time spark goes to
>>>>> interact with Cassandra it does so very quickly, hammers it for reads and
>>>>> then does it’s own stuff until it needs to write things out. This might
>>>>> equate to intermittent latency spikes.
>>>>>
>>>>> In theory, you can throttle your reads and writes but I don’t know
>>>>> much about this and don’t see people actually doing it.
>>>>>
>>>>> Regards,
>>>>> Evelyn.
>>>>>
>>>>> On 12 Apr 2018, at 4:30 pm, sha p <sh...@gmail.com> wrote:
>>>>>
>>>>> Evelyn,
>>>>> Can you please elaborate on below
>>>>> Spark is notorious for causing latency spikes in Cassandra which is
>>>>> not great if you are are sensitive to that.
>>>>>
>>>>>
>>>>> On Thu, 12 Apr 2018, 10:46 Evelyn Smith, <u5...@gmail.com> wrote:
>>>>>
>>>>>> Are you building a search engine -> Solr
>>>>>> Are you building an analytics function -> Spark
>>>>>>
>>>>>> I feel they are used in significantly different use cases, what are
>>>>>> you trying to build?
>>>>>>
>>>>>> If it’s an analytics functionality that’s seperate from your
>>>>>> operations functionality I’d build it in it’s own DC. Spark is notorious
>>>>>> for causing latency spikes in Cassandra which is not great if you are are
>>>>>> sensitive to that.
>>>>>>
>>>>>> Regards,
>>>>>> Evelyn.
>>>>>>
>>>>>> On 12 Apr 2018, at 6:55 am, kooljava2 <ko...@yahoo.com.INVALID>
>>>>>> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> We are exploring on configuring Sorl/Spark. Wanted to get input on
>>>>>> this.
>>>>>> 1) How do we decide which one to use?
>>>>>> 2) Do we run this on a DC where there is less workload?
>>>>>>
>>>>>> Any other suggestion or comments are appreciated.
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>>
>>>>>>
>>>>> --
>> Ben Bromhead
>> CTO | Instaclustr <https://www.instaclustr.com/>
>> +1 650 284 9692
>> Reliability at Scale
>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
>>
>
>
>
> --
> Niclas Hedhman, Software Developer
> http://zest.apache.org - New Energy for Java
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Reliability at Scale
Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer

Re: Sorl/DSE Spark

Posted by Ben Bromhead <be...@instaclustr.com>.

Thanks Jeff.

On Thu, Apr 12, 2018, 21:37 Jeff Jirsa <jj...@gmail.com> wrote:

> Pretty sure Ben meant that datastax produces DSE, not Cassandra, and since
> the questions specifically mentions DSE in the subject (implying that the
> user is going to be running either solr or spark within DSE to talk to
> cassandra), Ben’s recommendation seems quite reasonable to me.
>
>
>
> --
> Jeff Jirsa
>
>
> On Apr 12, 2018, at 6:23 PM, Niclas Hedhman <ni...@apache.org> wrote:
>
> Ben,
>
> 1. I don't see anything in this thread that is DSE specific, so I think it
> belongs here.
>
> 2. Careful when you say that Datastax produces Cassandra. Cassandra is a
> product of Apache Software Foundation, and no one else. You, Ben, should be
> very well aware of this, to avoid further trademark issues between Datastax
> and ASF.
>
> Cheers
> Niclas Hedhman
> Member of ASF
>
> On Thu, Apr 12, 2018 at 9:57 PM, Ben Bromhead <be...@instaclustr.com> wrote:
>
>> Folks this is the user list for Apache Cassandra. I would suggest
>> redirecting the question to Datastax the commercial entity that produces
>> the software.
>>
>> On Thu, Apr 12, 2018 at 9:51 AM vincent gromakowski <
>> vincent.gromakowski@gmail.com> wrote:
>>
>>> Best practise is to use a dedicated DC for analytics separated from the
>>> hot DC.
>>>
>>> Le jeu. 12 avr. 2018 à 15:45, sha p <sh...@gmail.com> a écrit :
>>>
>>>> Got it.
>>>> Thank you so for your detailed explanation.
>>>>
>>>> Regards,
>>>> Shyam
>>>>
>>>> On Thu, 12 Apr 2018, 17:37 Evelyn Smith, <u5...@gmail.com> wrote:
>>>>
>>>>> Cassandra tends to be used in a lot of web applications. It’s loads
>>>>> are more natural and evenly distributed. Like people logging on throughout
>>>>> the day. And people operating it tend to be latency sensitive.
>>>>>
>>>>> Spark on the other hand will try and complete it’s tasks as quickly as
>>>>> possible. This might mean bulk reading from the Cassandra at 10 times the
>>>>> usual operations load, but for only say 5 minutes every half hour (however
>>>>> long it takes to read in the data for a job and whenever that job is run).
>>>>> In this case during that 5 minutes your normal operations work (customers)
>>>>> are going to experience a lot of latency.
>>>>>
>>>>> This even happens with streaming jobs, every time spark goes to
>>>>> interact with Cassandra it does so very quickly, hammers it for reads and
>>>>> then does it’s own stuff until it needs to write things out. This might
>>>>> equate to intermittent latency spikes.
>>>>>
>>>>> In theory, you can throttle your reads and writes but I don’t know
>>>>> much about this and don’t see people actually doing it.
>>>>>
>>>>> Regards,
>>>>> Evelyn.
>>>>>
>>>>> On 12 Apr 2018, at 4:30 pm, sha p <sh...@gmail.com> wrote:
>>>>>
>>>>> Evelyn,
>>>>> Can you please elaborate on below
>>>>> Spark is notorious for causing latency spikes in Cassandra which is
>>>>> not great if you are are sensitive to that.
>>>>>
>>>>>
>>>>> On Thu, 12 Apr 2018, 10:46 Evelyn Smith, <u5...@gmail.com> wrote:
>>>>>
>>>>>> Are you building a search engine -> Solr
>>>>>> Are you building an analytics function -> Spark
>>>>>>
>>>>>> I feel they are used in significantly different use cases, what are
>>>>>> you trying to build?
>>>>>>
>>>>>> If it’s an analytics functionality that’s seperate from your
>>>>>> operations functionality I’d build it in it’s own DC. Spark is notorious
>>>>>> for causing latency spikes in Cassandra which is not great if you are are
>>>>>> sensitive to that.
>>>>>>
>>>>>> Regards,
>>>>>> Evelyn.
>>>>>>
>>>>>> On 12 Apr 2018, at 6:55 am, kooljava2 <ko...@yahoo.com.INVALID>
>>>>>> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> We are exploring on configuring Sorl/Spark. Wanted to get input on
>>>>>> this.
>>>>>> 1) How do we decide which one to use?
>>>>>> 2) Do we run this on a DC where there is less workload?
>>>>>>
>>>>>> Any other suggestion or comments are appreciated.
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>>
>>>>>>
>>>>> --
>> Ben Bromhead
>> CTO | Instaclustr <https://www.instaclustr.com/>
>> +1 650 284 9692
>> Reliability at Scale
>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
>>
>
>
>
> --
> Niclas Hedhman, Software Developer
> http://zest.apache.org - New Energy for Java
>
> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Reliability at Scale
Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer

Re: Sorl/DSE Spark

Posted by Jeff Jirsa <jj...@gmail.com>.

Pretty sure Ben meant that datastax produces DSE, not Cassandra, and since the questions specifically mentions DSE in the subject (implying that the user is going to be running either solr or spark within DSE to talk to cassandra), Ben’s recommendation seems quite reasonable to me. 



-- 
Jeff Jirsa


> On Apr 12, 2018, at 6:23 PM, Niclas Hedhman <ni...@apache.org> wrote:
> 
> Ben,
> 
> 1. I don't see anything in this thread that is DSE specific, so I think it belongs here.
> 
> 2. Careful when you say that Datastax produces Cassandra. Cassandra is a product of Apache Software Foundation, and no one else. You, Ben, should be very well aware of this, to avoid further trademark issues between Datastax and ASF.
> 
> Cheers
> Niclas Hedhman
> Member of ASF
> 
>> On Thu, Apr 12, 2018 at 9:57 PM, Ben Bromhead <be...@instaclustr.com> wrote:
>> Folks this is the user list for Apache Cassandra. I would suggest redirecting the question to Datastax the commercial entity that produces the software.
>> 
>>> On Thu, Apr 12, 2018 at 9:51 AM vincent gromakowski <vi...@gmail.com> wrote:
>>> Best practise is to use a dedicated DC for analytics separated from the hot DC.
>>> 
>>>> Le jeu. 12 avr. 2018 à 15:45, sha p <sh...@gmail.com> a écrit :
>>>> Got it.
>>>> Thank you so for your detailed explanation.
>>>> 
>>>> Regards,
>>>> Shyam
>>>> 
>>>>> On Thu, 12 Apr 2018, 17:37 Evelyn Smith, <u5...@gmail.com> wrote:
>>>>> Cassandra tends to be used in a lot of web applications. It’s loads are more natural and evenly distributed. Like people logging on throughout the day. And people operating it tend to be latency sensitive.
>>>>> 
>>>>> Spark on the other hand will try and complete it’s tasks as quickly as possible. This might mean bulk reading from the Cassandra at 10 times the usual operations load, but for only say 5 minutes every half hour (however long it takes to read in the data for a job and whenever that job is run). In this case during that 5 minutes your normal operations work (customers) are going to experience a lot of latency.
>>>>> 
>>>>> This even happens with streaming jobs, every time spark goes to interact with Cassandra it does so very quickly, hammers it for reads and then does it’s own stuff until it needs to write things out. This might equate to intermittent latency spikes.
>>>>> 
>>>>> In theory, you can throttle your reads and writes but I don’t know much about this and don’t see people actually doing it.
>>>>> 
>>>>> Regards,
>>>>> Evelyn.
>>>>> 
>>>>>> On 12 Apr 2018, at 4:30 pm, sha p <sh...@gmail.com> wrote:
>>>>>> 
>>>>>> Evelyn,
>>>>>> Can you please elaborate on below
>>>>>> Spark is notorious for causing latency spikes in Cassandra which is not great if you are are sensitive to that. 
>>>>>> 
>>>>>> 
>>>>>>> On Thu, 12 Apr 2018, 10:46 Evelyn Smith, <u5...@gmail.com> wrote:
>>>>>>> Are you building a search engine -> Solr
>>>>>>> Are you building an analytics function -> Spark
>>>>>>> 
>>>>>>> I feel they are used in significantly different use cases, what are you trying to build?
>>>>>>> 
>>>>>>> If it’s an analytics functionality that’s seperate from your operations functionality I’d build it in it’s own DC. Spark is notorious for causing latency spikes in Cassandra which is not great if you are are sensitive to that. 
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Evelyn.
>>>>>>>> On 12 Apr 2018, at 6:55 am, kooljava2 <ko...@yahoo.com.INVALID> wrote:
>>>>>>>> 
>>>>>>>> Hello,
>>>>>>>> 
>>>>>>>> We are exploring on configuring Sorl/Spark. Wanted to get input on this.
>>>>>>>> 1) How do we decide which one to use?
>>>>>>>> 2) Do we run this on a DC where there is less workload?
>>>>>>>> 
>>>>>>>> Any other suggestion or comments are appreciated.
>>>>>>>> 
>>>>>>>> Thank you.
>>>>>>>> 
>>>>>>> 
>>>>> 
>> -- 
>> Ben Bromhead
>> CTO | Instaclustr
>> +1 650 284 9692
>> Reliability at Scale
>> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
> 
> 
> 
> -- 
> Niclas Hedhman, Software Developer
> http://zest.apache.org - New Energy for Java

Re: Sorl/DSE Spark

Posted by Niclas Hedhman <ni...@apache.org>.

Ben,

1. I don't see anything in this thread that is DSE specific, so I think it
belongs here.

2. Careful when you say that Datastax produces Cassandra. Cassandra is a
product of Apache Software Foundation, and no one else. You, Ben, should be
very well aware of this, to avoid further trademark issues between Datastax
and ASF.

Cheers
Niclas Hedhman
Member of ASF

On Thu, Apr 12, 2018 at 9:57 PM, Ben Bromhead <be...@instaclustr.com> wrote:

> Folks this is the user list for Apache Cassandra. I would suggest
> redirecting the question to Datastax the commercial entity that produces
> the software.
>
> On Thu, Apr 12, 2018 at 9:51 AM vincent gromakowski <
> vincent.gromakowski@gmail.com> wrote:
>
>> Best practise is to use a dedicated DC for analytics separated from the
>> hot DC.
>>
>> Le jeu. 12 avr. 2018 à 15:45, sha p <sh...@gmail.com> a écrit :
>>
>>> Got it.
>>> Thank you so for your detailed explanation.
>>>
>>> Regards,
>>> Shyam
>>>
>>> On Thu, 12 Apr 2018, 17:37 Evelyn Smith, <u5...@gmail.com> wrote:
>>>
>>>> Cassandra tends to be used in a lot of web applications. It’s loads are
>>>> more natural and evenly distributed. Like people logging on throughout the
>>>> day. And people operating it tend to be latency sensitive.
>>>>
>>>> Spark on the other hand will try and complete it’s tasks as quickly as
>>>> possible. This might mean bulk reading from the Cassandra at 10 times the
>>>> usual operations load, but for only say 5 minutes every half hour (however
>>>> long it takes to read in the data for a job and whenever that job is run).
>>>> In this case during that 5 minutes your normal operations work (customers)
>>>> are going to experience a lot of latency.
>>>>
>>>> This even happens with streaming jobs, every time spark goes to
>>>> interact with Cassandra it does so very quickly, hammers it for reads and
>>>> then does it’s own stuff until it needs to write things out. This might
>>>> equate to intermittent latency spikes.
>>>>
>>>> In theory, you can throttle your reads and writes but I don’t know much
>>>> about this and don’t see people actually doing it.
>>>>
>>>> Regards,
>>>> Evelyn.
>>>>
>>>> On 12 Apr 2018, at 4:30 pm, sha p <sh...@gmail.com> wrote:
>>>>
>>>> Evelyn,
>>>> Can you please elaborate on below
>>>> Spark is notorious for causing latency spikes in Cassandra which is not
>>>> great if you are are sensitive to that.
>>>>
>>>>
>>>> On Thu, 12 Apr 2018, 10:46 Evelyn Smith, <u5...@gmail.com> wrote:
>>>>
>>>>> Are you building a search engine -> Solr
>>>>> Are you building an analytics function -> Spark
>>>>>
>>>>> I feel they are used in significantly different use cases, what are
>>>>> you trying to build?
>>>>>
>>>>> If it’s an analytics functionality that’s seperate from your
>>>>> operations functionality I’d build it in it’s own DC. Spark is notorious
>>>>> for causing latency spikes in Cassandra which is not great if you are are
>>>>> sensitive to that.
>>>>>
>>>>> Regards,
>>>>> Evelyn.
>>>>>
>>>>> On 12 Apr 2018, at 6:55 am, kooljava2 <ko...@yahoo.com.INVALID>
>>>>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> We are exploring on configuring Sorl/Spark. Wanted to get input on
>>>>> this.
>>>>> 1) How do we decide which one to use?
>>>>> 2) Do we run this on a DC where there is less workload?
>>>>>
>>>>> Any other suggestion or comments are appreciated.
>>>>>
>>>>> Thank you.
>>>>>
>>>>>
>>>>>
>>>> --
> Ben Bromhead
> CTO | Instaclustr <https://www.instaclustr.com/>
> +1 650 284 9692
> Reliability at Scale
> Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer
>



-- 
Niclas Hedhman, Software Developer
http://zest.apache.org - New Energy for Java

Re: Sorl/DSE Spark

Posted by Ben Bromhead <be...@instaclustr.com>.

Folks this is the user list for Apache Cassandra. I would suggest
redirecting the question to Datastax the commercial entity that produces
the software.

On Thu, Apr 12, 2018 at 9:51 AM vincent gromakowski <
vincent.gromakowski@gmail.com> wrote:

> Best practise is to use a dedicated DC for analytics separated from the
> hot DC.
>
> Le jeu. 12 avr. 2018 à 15:45, sha p <sh...@gmail.com> a écrit :
>
>> Got it.
>> Thank you so for your detailed explanation.
>>
>> Regards,
>> Shyam
>>
>> On Thu, 12 Apr 2018, 17:37 Evelyn Smith, <u5...@gmail.com> wrote:
>>
>>> Cassandra tends to be used in a lot of web applications. It’s loads are
>>> more natural and evenly distributed. Like people logging on throughout the
>>> day. And people operating it tend to be latency sensitive.
>>>
>>> Spark on the other hand will try and complete it’s tasks as quickly as
>>> possible. This might mean bulk reading from the Cassandra at 10 times the
>>> usual operations load, but for only say 5 minutes every half hour (however
>>> long it takes to read in the data for a job and whenever that job is run).
>>> In this case during that 5 minutes your normal operations work (customers)
>>> are going to experience a lot of latency.
>>>
>>> This even happens with streaming jobs, every time spark goes to interact
>>> with Cassandra it does so very quickly, hammers it for reads and then does
>>> it’s own stuff until it needs to write things out. This might equate to
>>> intermittent latency spikes.
>>>
>>> In theory, you can throttle your reads and writes but I don’t know much
>>> about this and don’t see people actually doing it.
>>>
>>> Regards,
>>> Evelyn.
>>>
>>> On 12 Apr 2018, at 4:30 pm, sha p <sh...@gmail.com> wrote:
>>>
>>> Evelyn,
>>> Can you please elaborate on below
>>> Spark is notorious for causing latency spikes in Cassandra which is not
>>> great if you are are sensitive to that.
>>>
>>>
>>> On Thu, 12 Apr 2018, 10:46 Evelyn Smith, <u5...@gmail.com> wrote:
>>>
>>>> Are you building a search engine -> Solr
>>>> Are you building an analytics function -> Spark
>>>>
>>>> I feel they are used in significantly different use cases, what are you
>>>> trying to build?
>>>>
>>>> If it’s an analytics functionality that’s seperate from your operations
>>>> functionality I’d build it in it’s own DC. Spark is notorious for causing
>>>> latency spikes in Cassandra which is not great if you are are sensitive to
>>>> that.
>>>>
>>>> Regards,
>>>> Evelyn.
>>>>
>>>> On 12 Apr 2018, at 6:55 am, kooljava2 <ko...@yahoo.com.INVALID>
>>>> wrote:
>>>>
>>>> Hello,
>>>>
>>>> We are exploring on configuring Sorl/Spark. Wanted to get input on
>>>> this.
>>>> 1) How do we decide which one to use?
>>>> 2) Do we run this on a DC where there is less workload?
>>>>
>>>> Any other suggestion or comments are appreciated.
>>>>
>>>> Thank you.
>>>>
>>>>
>>>>
>>> --
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Reliability at Scale
Cassandra, Spark, Elasticsearch on AWS, Azure, GCP and Softlayer

Re: Sorl/DSE Spark

Posted by vincent gromakowski <vi...@gmail.com>.

Best practise is to use a dedicated DC for analytics separated from the hot
DC.

Le jeu. 12 avr. 2018 à 15:45, sha p <sh...@gmail.com> a écrit :

> Got it.
> Thank you so for your detailed explanation.
>
> Regards,
> Shyam
>
> On Thu, 12 Apr 2018, 17:37 Evelyn Smith, <u5...@gmail.com> wrote:
>
>> Cassandra tends to be used in a lot of web applications. It’s loads are
>> more natural and evenly distributed. Like people logging on throughout the
>> day. And people operating it tend to be latency sensitive.
>>
>> Spark on the other hand will try and complete it’s tasks as quickly as
>> possible. This might mean bulk reading from the Cassandra at 10 times the
>> usual operations load, but for only say 5 minutes every half hour (however
>> long it takes to read in the data for a job and whenever that job is run).
>> In this case during that 5 minutes your normal operations work (customers)
>> are going to experience a lot of latency.
>>
>> This even happens with streaming jobs, every time spark goes to interact
>> with Cassandra it does so very quickly, hammers it for reads and then does
>> it’s own stuff until it needs to write things out. This might equate to
>> intermittent latency spikes.
>>
>> In theory, you can throttle your reads and writes but I don’t know much
>> about this and don’t see people actually doing it.
>>
>> Regards,
>> Evelyn.
>>
>> On 12 Apr 2018, at 4:30 pm, sha p <sh...@gmail.com> wrote:
>>
>> Evelyn,
>> Can you please elaborate on below
>> Spark is notorious for causing latency spikes in Cassandra which is not
>> great if you are are sensitive to that.
>>
>>
>> On Thu, 12 Apr 2018, 10:46 Evelyn Smith, <u5...@gmail.com> wrote:
>>
>>> Are you building a search engine -> Solr
>>> Are you building an analytics function -> Spark
>>>
>>> I feel they are used in significantly different use cases, what are you
>>> trying to build?
>>>
>>> If it’s an analytics functionality that’s seperate from your operations
>>> functionality I’d build it in it’s own DC. Spark is notorious for causing
>>> latency spikes in Cassandra which is not great if you are are sensitive to
>>> that.
>>>
>>> Regards,
>>> Evelyn.
>>>
>>> On 12 Apr 2018, at 6:55 am, kooljava2 <ko...@yahoo.com.INVALID>
>>> wrote:
>>>
>>> Hello,
>>>
>>> We are exploring on configuring Sorl/Spark. Wanted to get input on this.
>>> 1) How do we decide which one to use?
>>> 2) Do we run this on a DC where there is less workload?
>>>
>>> Any other suggestion or comments are appreciated.
>>>
>>> Thank you.
>>>
>>>
>>>
>>

Re: Sorl/DSE Spark

Posted by sha p <sh...@gmail.com>.

Got it.
Thank you so for your detailed explanation.

Regards,
Shyam

On Thu, 12 Apr 2018, 17:37 Evelyn Smith, <u5...@gmail.com> wrote:

> Cassandra tends to be used in a lot of web applications. It’s loads are
> more natural and evenly distributed. Like people logging on throughout the
> day. And people operating it tend to be latency sensitive.
>
> Spark on the other hand will try and complete it’s tasks as quickly as
> possible. This might mean bulk reading from the Cassandra at 10 times the
> usual operations load, but for only say 5 minutes every half hour (however
> long it takes to read in the data for a job and whenever that job is run).
> In this case during that 5 minutes your normal operations work (customers)
> are going to experience a lot of latency.
>
> This even happens with streaming jobs, every time spark goes to interact
> with Cassandra it does so very quickly, hammers it for reads and then does
> it’s own stuff until it needs to write things out. This might equate to
> intermittent latency spikes.
>
> In theory, you can throttle your reads and writes but I don’t know much
> about this and don’t see people actually doing it.
>
> Regards,
> Evelyn.
>
> On 12 Apr 2018, at 4:30 pm, sha p <sh...@gmail.com> wrote:
>
> Evelyn,
> Can you please elaborate on below
> Spark is notorious for causing latency spikes in Cassandra which is not
> great if you are are sensitive to that.
>
>
> On Thu, 12 Apr 2018, 10:46 Evelyn Smith, <u5...@gmail.com> wrote:
>
>> Are you building a search engine -> Solr
>> Are you building an analytics function -> Spark
>>
>> I feel they are used in significantly different use cases, what are you
>> trying to build?
>>
>> If it’s an analytics functionality that’s seperate from your operations
>> functionality I’d build it in it’s own DC. Spark is notorious for causing
>> latency spikes in Cassandra which is not great if you are are sensitive to
>> that.
>>
>> Regards,
>> Evelyn.
>>
>> On 12 Apr 2018, at 6:55 am, kooljava2 <ko...@yahoo.com.INVALID>
>> wrote:
>>
>> Hello,
>>
>> We are exploring on configuring Sorl/Spark. Wanted to get input on this.
>> 1) How do we decide which one to use?
>> 2) Do we run this on a DC where there is less workload?
>>
>> Any other suggestion or comments are appreciated.
>>
>> Thank you.
>>
>>
>>
>

Re: Sorl/DSE Spark

Posted by Evelyn Smith <u5...@gmail.com>.

Cassandra tends to be used in a lot of web applications. It’s loads are more natural and evenly distributed. Like people logging on throughout the day. And people operating it tend to be latency sensitive.

Spark on the other hand will try and complete it’s tasks as quickly as possible. This might mean bulk reading from the Cassandra at 10 times the usual operations load, but for only say 5 minutes every half hour (however long it takes to read in the data for a job and whenever that job is run). In this case during that 5 minutes your normal operations work (customers) are going to experience a lot of latency.

This even happens with streaming jobs, every time spark goes to interact with Cassandra it does so very quickly, hammers it for reads and then does it’s own stuff until it needs to write things out. This might equate to intermittent latency spikes.

In theory, you can throttle your reads and writes but I don’t know much about this and don’t see people actually doing it.

Regards,
Evelyn.

> On 12 Apr 2018, at 4:30 pm, sha p <sh...@gmail.com> wrote:
> 
> Evelyn,
> Can you please elaborate on below
> Spark is notorious for causing latency spikes in Cassandra which is not great if you are are sensitive to that. 
> 
> 
> On Thu, 12 Apr 2018, 10:46 Evelyn Smith, <u5015159@gmail.com <ma...@gmail.com>> wrote:
> Are you building a search engine -> Solr
> Are you building an analytics function -> Spark
> 
> I feel they are used in significantly different use cases, what are you trying to build?
> 
> If it’s an analytics functionality that’s seperate from your operations functionality I’d build it in it’s own DC. Spark is notorious for causing latency spikes in Cassandra which is not great if you are are sensitive to that. 
> 
> Regards,
> Evelyn.
>> On 12 Apr 2018, at 6:55 am, kooljava2 <kooljava2@yahoo.com.INVALID <ma...@yahoo.com.INVALID>> wrote:
>> 
>> Hello,
>> 
>> We are exploring on configuring Sorl/Spark. Wanted to get input on this.
>> 1) How do we decide which one to use?
>> 2) Do we run this on a DC where there is less workload?
>> 
>> Any other suggestion or comments are appreciated.
>> 
>> Thank you.
>> 
>

Re: Sorl/DSE Spark

Posted by sha p <sh...@gmail.com>.

Evelyn,
Can you please elaborate on below
Spark is notorious for causing latency spikes in Cassandra which is not
great if you are are sensitive to that.


On Thu, 12 Apr 2018, 10:46 Evelyn Smith, <u5...@gmail.com> wrote:

> Are you building a search engine -> Solr
> Are you building an analytics function -> Spark
>
> I feel they are used in significantly different use cases, what are you
> trying to build?
>
> If it’s an analytics functionality that’s seperate from your operations
> functionality I’d build it in it’s own DC. Spark is notorious for causing
> latency spikes in Cassandra which is not great if you are are sensitive to
> that.
>
> Regards,
> Evelyn.
>
> On 12 Apr 2018, at 6:55 am, kooljava2 <ko...@yahoo.com.INVALID> wrote:
>
> Hello,
>
> We are exploring on configuring Sorl/Spark. Wanted to get input on this.
> 1) How do we decide which one to use?
> 2) Do we run this on a DC where there is less workload?
>
> Any other suggestion or comments are appreciated.
>
> Thank you.
>
>
>

Re: Sorl/DSE Spark

Posted by Evelyn Smith <u5...@gmail.com>.

Are you building a search engine -> Solr
Are you building an analytics function -> Spark

I feel they are used in significantly different use cases, what are you trying to build?

If it’s an analytics functionality that’s seperate from your operations functionality I’d build it in it’s own DC. Spark is notorious for causing latency spikes in Cassandra which is not great if you are are sensitive to that. 

Regards,
Evelyn.
> On 12 Apr 2018, at 6:55 am, kooljava2 <ko...@yahoo.com.INVALID> wrote:
> 
> Hello,
> 
> We are exploring on configuring Sorl/Spark. Wanted to get input on this.
> 1) How do we decide which one to use?
> 2) Do we run this on a DC where there is less workload?
> 
> Any other suggestion or comments are appreciated.
> 
> Thank you.
>