You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Adrian Mocanu <am...@verticalscope.com> on 2014/02/27 18:18:59 UTC

is RDD failure transparent to stream consumer

Is RDD failure transparent to a spark stream consumer except for the slowdown needed to recreate the RDD.
After reading the papers on RDDs and DStreams from spark homepage I believe it is, but I'd like a confirmation.

Thanks
-Adrian


RE: is RDD failure transparent to stream consumer

Posted by Adrian Mocanu <am...@verticalscope.com>.
Thanks so much Matei!

From: Matei Zaharia [mailto:matei.zaharia@gmail.com]
Sent: February-28-14 10:59 AM
To: user@spark.apache.org
Subject: Re: is RDD failure transparent to stream consumer

For output operators like this, the operator will run multiple times, so it need to be idempotent. However, the built-in save operators (e.g. saveAsTextFile) are automatically idempotent (they only create each output partition once).

Matei

On Feb 28, 2014, at 10:10 AM, Adrian Mocanu <am...@verticalscope.com>> wrote:


Would really like an answer to this. A `yes` or `no` would suffice.

I'm talking ab RDD failure in this context:
myStream.foreachRDD(rdd=>rdd.foreach(tuple => println(tuple)))

From: Adrian Mocanu [mailto:amocanu@verticalscope.com]
Sent: February-27-14 12:19 PM
To: user@spark.incubator.apache.org<ma...@spark.incubator.apache.org>
Subject: is RDD failure transparent to stream consumer

Is RDD failure transparent to a spark stream consumer except for the slowdown needed to recreate the RDD.
After reading the papers on RDDs and DStreams from spark homepage I believe it is, but I'd like a confirmation.

Thanks
-Adrian


Re: is RDD failure transparent to stream consumer

Posted by Matei Zaharia <ma...@gmail.com>.
For output operators like this, the operator will run multiple times, so it need to be idempotent. However, the built-in save operators (e.g. saveAsTextFile) are automatically idempotent (they only create each output partition once).

Matei

On Feb 28, 2014, at 10:10 AM, Adrian Mocanu <am...@verticalscope.com> wrote:

> Would really like an answer to this. A `yes` or `no` would suffice.
>  
> I’m talking ab RDD failure in this context:
> myStream.foreachRDD(rdd=>rdd.foreach(tuple => println(tuple)))
>  
> From: Adrian Mocanu [mailto:amocanu@verticalscope.com] 
> Sent: February-27-14 12:19 PM
> To: user@spark.incubator.apache.org
> Subject: is RDD failure transparent to stream consumer
>  
> Is RDD failure transparent to a spark stream consumer except for the slowdown needed to recreate the RDD.
> After reading the papers on RDDs and DStreams from spark homepage I believe it is, but I’d like a confirmation.
>  
> Thanks
> -Adrian


RE: is RDD failure transparent to stream consumer

Posted by Adrian Mocanu <am...@verticalscope.com>.
Would really like an answer to this. A `yes` or `no` would suffice.

I'm talking ab RDD failure in this context:
myStream.foreachRDD(rdd=>rdd.foreach(tuple => println(tuple)))

From: Adrian Mocanu [mailto:amocanu@verticalscope.com]
Sent: February-27-14 12:19 PM
To: user@spark.incubator.apache.org
Subject: is RDD failure transparent to stream consumer

Is RDD failure transparent to a spark stream consumer except for the slowdown needed to recreate the RDD.
After reading the papers on RDDs and DStreams from spark homepage I believe it is, but I'd like a confirmation.

Thanks
-Adrian