You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "aditya.athalye" <ad...@gmail.com> on 2014/12/08 15:43:30 UTC

Locking for shared RDDs

I am relatively new to Spark. I am planning to use Spark Streaming for my
OLAP use case, but I would like to know how RDDs are shared between multiple
workers. 
If I need to constantly compute some stats on the streaming data, presumably
shared state would have to updated serially by different spark workers. Is
this managed by Spark automatically or does the application need to ensure
distributed locks are acquired?

Thanks



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Locking-for-shared-RDDs-tp20578.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Locking for shared RDDs

Posted by Tathagata Das <ta...@gmail.com>.

Aditya, I think you have the mental model of spark streaming a little
off the mark. Unlike traditional streaming systems, where any kind of
state is mutable, SparkStreaming is designed on Sparks immutable RDDs.
Streaming data is received and divided into immutable blocks, then
form immutable RDDs, and then transformations form new immutable RDDs.
Its best that you first read the Spark paper and then the Spark
Streaming paper to under the model. Once you understand that, you will
realize that since everything is immutable, the question of
consistency does not even arise :)

TD

On Mon, Dec 8, 2014 at 9:44 PM, Raghavendra Pandey
<ra...@gmail.com> wrote:
> You don't need to worry about locks as such as one thread/worker is
> responsible exclusively for one partition of the RDD. You can use
> Accumulator variables that spark provides to get the state updates.
>
>
> On Mon Dec 08 2014 at 8:14:28 PM aditya.athalye <ad...@gmail.com>
> wrote:
>>
>> I am relatively new to Spark. I am planning to use Spark Streaming for my
>> OLAP use case, but I would like to know how RDDs are shared between
>> multiple
>> workers.
>> If I need to constantly compute some stats on the streaming data,
>> presumably
>> shared state would have to updated serially by different spark workers. Is
>> this managed by Spark automatically or does the application need to ensure
>> distributed locks are acquired?
>>
>> Thanks
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Locking-for-shared-RDDs-tp20578.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: Locking for shared RDDs

Posted by Raghavendra Pandey <ra...@gmail.com>.

You don't need to worry about locks as such as one thread/worker is
responsible exclusively for one partition of the RDD. You can use
Accumulator variables that spark provides to get the state updates.

On Mon Dec 08 2014 at 8:14:28 PM aditya.athalye <ad...@gmail.com>
wrote:

> I am relatively new to Spark. I am planning to use Spark Streaming for my
> OLAP use case, but I would like to know how RDDs are shared between
> multiple
> workers.
> If I need to constantly compute some stats on the streaming data,
> presumably
> shared state would have to updated serially by different spark workers. Is
> this managed by Spark automatically or does the application need to ensure
> distributed locks are acquired?
>
> Thanks
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Locking-for-shared-RDDs-tp20578.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>