You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Ni...@ril.com on 2017/12/12 06:51:08 UTC

How Fault Tolerance is achieved in Spark ??

Hello Techie's,

How fault tolerance is achieved in Spark when data is read from HDFS and is in form of RDD (Memory).

Regards
Nikhil
"Confidentiality Warning: This message and any attachments are intended only for the use of the intended recipient(s). 
are confidential and may be privileged. If you are not the intended recipient. you are hereby notified that any 
review. re-transmission. conversion to hard copy. copying. circulation or other use of this message and any attachments is 
strictly prohibited. If you are not the intended recipient. please notify the sender immediately by return email. 
and delete this message and any attachments from your system.

Virus Warning: Although the company has taken reasonable precautions to ensure no viruses are present in this email. 
The company cannot accept responsibility for any loss or damage arising from the use of this email or attachment."

Re: How Fault Tolerance is achieved in Spark ??

Posted by Naresh Dulam <na...@gmail.com>.

Hi Nikhil,

Fault tolerance is something which is not lost incase of failures. Fault
tolerance achieved in different way in case of different cases.
In case of HDFS fault tolerance is achieved by having the replication
across different nodes.
In case of spark fault tolerance is achieved by having DAG.  Let me put in
simple words
 You have created RDD1 by reading data from HDFS. Applied couple of
transformations and created two new data frames

RDD1-->RDD2--> RDD3.

Let's assume now you have cached RDD3 and for after some time for some
reason RDD3 cleared from cache from to provide space for new RDD4 created
and cached.

Now if you wanted to acccess RDD3 which is not available in cache. So now
Spark will use the DAG to compute RDD3. So in this way Data in RDD3 always
available.

Hope this answer your question in straight way.

Thank you,
Naresh

On Tue, Dec 12, 2017 at 12:51 AM <Ni...@ril.com> wrote:

> Hello Techie’s,
>
>
>
> How fault tolerance is achieved in Spark when data is read from HDFS and
> is in form of RDD (Memory).
>
>
>
> Regards
>
> Nikhil
>
>
> "*Confidentiality Warning*: This message and any attachments are intended
> only for the use of the intended recipient(s), are confidential and may be
> privileged. If you are not the intended recipient, you are hereby notified
> that any review, re-transmission, conversion to hard copy, copying,
> circulation or other use of this message and any attachments is strictly
> prohibited. If you are not the intended recipient, please notify the sender
> immediately by return email and delete this message and any attachments
> from your system.
>
> *Virus Warning:* Although the company has taken reasonable precautions to
> ensure no viruses are present in this email. The company cannot accept
> responsibility for any loss or damage arising from the use of this email or
> attachment."
>