You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Adrian Mocanu <am...@verticalscope.com> on 2014/02/27 18:18:59 UTC
is RDD failure transparent to stream consumer
Is RDD failure transparent to a spark stream consumer except for the slowdown needed to recreate the RDD.
After reading the papers on RDDs and DStreams from spark homepage I believe it is, but I'd like a confirmation.
Thanks
-Adrian
RE: is RDD failure transparent to stream consumer
Posted by Adrian Mocanu <am...@verticalscope.com>.
Thanks so much Matei!
From: Matei Zaharia [mailto:matei.zaharia@gmail.com]
Sent: February-28-14 10:59 AM
To: user@spark.apache.org
Subject: Re: is RDD failure transparent to stream consumer
For output operators like this, the operator will run multiple times, so it need to be idempotent. However, the built-in save operators (e.g. saveAsTextFile) are automatically idempotent (they only create each output partition once).
Matei
On Feb 28, 2014, at 10:10 AM, Adrian Mocanu <am...@verticalscope.com>> wrote:
Would really like an answer to this. A `yes` or `no` would suffice.
I'm talking ab RDD failure in this context:
myStream.foreachRDD(rdd=>rdd.foreach(tuple => println(tuple)))
From: Adrian Mocanu [mailto:amocanu@verticalscope.com]
Sent: February-27-14 12:19 PM
To: user@spark.incubator.apache.org<ma...@spark.incubator.apache.org>
Subject: is RDD failure transparent to stream consumer
Is RDD failure transparent to a spark stream consumer except for the slowdown needed to recreate the RDD.
After reading the papers on RDDs and DStreams from spark homepage I believe it is, but I'd like a confirmation.
Thanks
-Adrian
Re: is RDD failure transparent to stream consumer
Posted by Matei Zaharia <ma...@gmail.com>.
For output operators like this, the operator will run multiple times, so it need to be idempotent. However, the built-in save operators (e.g. saveAsTextFile) are automatically idempotent (they only create each output partition once).
Matei
On Feb 28, 2014, at 10:10 AM, Adrian Mocanu <am...@verticalscope.com> wrote:
> Would really like an answer to this. A `yes` or `no` would suffice.
>
> I’m talking ab RDD failure in this context:
> myStream.foreachRDD(rdd=>rdd.foreach(tuple => println(tuple)))
>
> From: Adrian Mocanu [mailto:amocanu@verticalscope.com]
> Sent: February-27-14 12:19 PM
> To: user@spark.incubator.apache.org
> Subject: is RDD failure transparent to stream consumer
>
> Is RDD failure transparent to a spark stream consumer except for the slowdown needed to recreate the RDD.
> After reading the papers on RDDs and DStreams from spark homepage I believe it is, but I’d like a confirmation.
>
> Thanks
> -Adrian
RE: is RDD failure transparent to stream consumer
Posted by Adrian Mocanu <am...@verticalscope.com>.
Would really like an answer to this. A `yes` or `no` would suffice.
I'm talking ab RDD failure in this context:
myStream.foreachRDD(rdd=>rdd.foreach(tuple => println(tuple)))
From: Adrian Mocanu [mailto:amocanu@verticalscope.com]
Sent: February-27-14 12:19 PM
To: user@spark.incubator.apache.org
Subject: is RDD failure transparent to stream consumer
Is RDD failure transparent to a spark stream consumer except for the slowdown needed to recreate the RDD.
After reading the papers on RDDs and DStreams from spark homepage I believe it is, but I'd like a confirmation.
Thanks
-Adrian