You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by suman bharadwaj <su...@gmail.com> on 2014/01/23 22:10:16 UTC

Giraph Vs SPARK

Hi,

I might be wrong, but need your help.

My understanding in Giraph is that, it doesn't write the intermediate data
to disk while sending messages to different machines. But in SPARK, I see
that intermediate map outputs gets written to disk. Why does SPARK write
intermediate data to disk ?

What happens at reducer side ? Does SPARK write the data again to disk ?
How does it differ from Hadoop MR ?

Can't SPARK communicate everything in memory ?

If my understanding is wrong. Please do correct me.

Regards,
Suman Bharadwaj S

Re: Giraph Vs SPARK

Posted by Matei Zaharia <ma...@gmail.com>.

It’s not meant to tolerate a machine crashing and losing this data before syncing it to disk, but in that case, we rerun the map tasks on other machines. In general MapReduce-like systems don’t assume that they will get back a node after a failure. Nodes can go down for many reasons, including their network card being broken, etc, so all these systems have a way of recomputing task results if a node becomes unavailable. But the reason to buffer the outputs like this is to decrease the amount of work recomputed. The original MapReduce design works the same way.

Matei

On Jan 23, 2014, at 3:27 PM, Linky <li...@gmail.com> wrote:

> On Thu, Jan 23, 2014 at 5:41 PM, Matei Zaharia <ma...@gmail.com> wrote:
> The data gets written to files for fault tolerance, in case we need to re-run a reduce task and re-fetch the files after. Otherwise, we’d have to re-run *all* the map tasks whenever one reduce task fails. However, these files usually remain in the OS buffer cache so they are written and read at memory speed.
> 
> Then, how is it fault tolerant? Does the checkpoint API have a sync  if i am really paranoid about a specific step that has a high fan-in of mappers?
> 
> Thanks.
> 
> 
> 
> On Jan 23, 2014, at 2:25 PM, suman bharadwaj <su...@gmail.com> wrote:
> 
>> Hi,
>> 
>> Sorry for the confusion. 
>> 
>> So let me rephrase my question.
>> 
>> Why does SPARK have to write the intermediate data to disk when there is a shuffle dependency? Can't the communication happen directly just like Giraph ?
>> And does data get written at reducer side as well ?
>> 
>> Again please feel free to correct me, in case my understanding is incorrect.
>> 
>> Regards,
>> SB
>> 
>> 
>> On Fri, Jan 24, 2014 at 3:44 AM, Jey Kottalam <je...@cs.berkeley.edu> wrote:
>> Hi Suman,
>> 
>> Spark does indeed do in-memory computation, and does not require
>> spilling to disk after every map task. Could you explain where you
>> "see that intermediate map outputs gets written to disk"? Perhaps
>> you're seeing some intermediate results during a shuffle phase? In
>> that case, you may want to look into the
>> "spark.shuffle.consolidateFiles" option:
>> https://spark.incubator.apache.org/docs/0.8.1/configuration.html
>> 
>> -Jey
>> 
>> On Thu, Jan 23, 2014 at 1:10 PM, suman bharadwaj <su...@gmail.com> wrote:
>> > Hi,
>> >
>> > I might be wrong, but need your help.
>> >
>> > My understanding in Giraph is that, it doesn't write the intermediate data
>> > to disk while sending messages to different machines. But in SPARK, I see
>> > that intermediate map outputs gets written to disk. Why does SPARK write
>> > intermediate data to disk ?
>> >
>> > What happens at reducer side ? Does SPARK write the data again to disk ? How
>> > does it differ from Hadoop MR ?
>> >
>> > Can't SPARK communicate everything in memory ?
>> >
>> > If my understanding is wrong. Please do correct me.
>> >
>> > Regards,
>> > Suman Bharadwaj S
>> 
> 
>

Re: Giraph Vs SPARK

Posted by Linky <li...@gmail.com>.

On Thu, Jan 23, 2014 at 5:41 PM, Matei Zaharia <ma...@gmail.com>wrote:

> The data gets written to files for fault tolerance, in case we need to
> re-run a reduce task and re-fetch the files after. Otherwise, we’d have to
> re-run *all* the map tasks whenever one reduce task fails. However, these
> files usually remain in the OS buffer cache so they are written and read at
> memory speed.
>

Then, how is it fault tolerant? Does the checkpoint API have a sync  if i
am really paranoid about a specific step that has a high fan-in of mappers?

Thanks.



> On Jan 23, 2014, at 2:25 PM, suman bharadwaj <su...@gmail.com> wrote:
>
> Hi,
>
> Sorry for the confusion.
>
> So let me rephrase my question.
>
> Why does SPARK have to write the intermediate data to disk when there is a
> shuffle dependency? Can't the communication happen directly just like
> Giraph ?
> And does data get written at reducer side as well ?
>
> Again please feel free to correct me, in case my understanding is
> incorrect.
>
> Regards,
> SB
>
>
> On Fri, Jan 24, 2014 at 3:44 AM, Jey Kottalam <je...@cs.berkeley.edu> wrote:
>
>> Hi Suman,
>>
>> Spark does indeed do in-memory computation, and does not require
>> spilling to disk after every map task. Could you explain where you
>> "see that intermediate map outputs gets written to disk"? Perhaps
>> you're seeing some intermediate results during a shuffle phase? In
>> that case, you may want to look into the
>> "spark.shuffle.consolidateFiles" option:
>> https://spark.incubator.apache.org/docs/0.8.1/configuration.html
>>
>> -Jey
>>
>> On Thu, Jan 23, 2014 at 1:10 PM, suman bharadwaj <su...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I might be wrong, but need your help.
>> >
>> > My understanding in Giraph is that, it doesn't write the intermediate
>> data
>> > to disk while sending messages to different machines. But in SPARK, I
>> see
>> > that intermediate map outputs gets written to disk. Why does SPARK write
>> > intermediate data to disk ?
>> >
>> > What happens at reducer side ? Does SPARK write the data again to disk
>> ? How
>> > does it differ from Hadoop MR ?
>> >
>> > Can't SPARK communicate everything in memory ?
>> >
>> > If my understanding is wrong. Please do correct me.
>> >
>> > Regards,
>> > Suman Bharadwaj S
>>
>
>
>

Re: Giraph Vs SPARK

Posted by suman bharadwaj <su...@gmail.com>.

Thanks guys. Really appreciate it !! Things are clarified at memory speed
in this forum :)

Regards,
SB


On Fri, Jan 24, 2014 at 4:11 AM, Matei Zaharia <ma...@gmail.com>wrote:

> The data gets written to files for fault tolerance, in case we need to
> re-run a reduce task and re-fetch the files after. Otherwise, we’d have to
> re-run *all* the map tasks whenever one reduce task fails. However, these
> files usually remain in the OS buffer cache so they are written and read at
> memory speed. In the future we might add a setting that skips this and uses
> Spark’s memory store for shuffle data instead.
>
> On the reduce side there’s no use of disk except in Spark 0.9, where we
> added the option to spill to disk if the reduce’s inputs don’t fit in
> memory.
>
> Matei
>
> On Jan 23, 2014, at 2:25 PM, suman bharadwaj <su...@gmail.com> wrote:
>
> Hi,
>
> Sorry for the confusion.
>
> So let me rephrase my question.
>
> Why does SPARK have to write the intermediate data to disk when there is a
> shuffle dependency? Can't the communication happen directly just like
> Giraph ?
> And does data get written at reducer side as well ?
>
> Again please feel free to correct me, in case my understanding is
> incorrect.
>
> Regards,
> SB
>
>
> On Fri, Jan 24, 2014 at 3:44 AM, Jey Kottalam <je...@cs.berkeley.edu> wrote:
>
>> Hi Suman,
>>
>> Spark does indeed do in-memory computation, and does not require
>> spilling to disk after every map task. Could you explain where you
>> "see that intermediate map outputs gets written to disk"? Perhaps
>> you're seeing some intermediate results during a shuffle phase? In
>> that case, you may want to look into the
>> "spark.shuffle.consolidateFiles" option:
>> https://spark.incubator.apache.org/docs/0.8.1/configuration.html
>>
>> -Jey
>>
>> On Thu, Jan 23, 2014 at 1:10 PM, suman bharadwaj <su...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I might be wrong, but need your help.
>> >
>> > My understanding in Giraph is that, it doesn't write the intermediate
>> data
>> > to disk while sending messages to different machines. But in SPARK, I
>> see
>> > that intermediate map outputs gets written to disk. Why does SPARK write
>> > intermediate data to disk ?
>> >
>> > What happens at reducer side ? Does SPARK write the data again to disk
>> ? How
>> > does it differ from Hadoop MR ?
>> >
>> > Can't SPARK communicate everything in memory ?
>> >
>> > If my understanding is wrong. Please do correct me.
>> >
>> > Regards,
>> > Suman Bharadwaj S
>>
>
>
>

Re: Giraph Vs SPARK

Posted by Matei Zaharia <ma...@gmail.com>.

The data gets written to files for fault tolerance, in case we need to re-run a reduce task and re-fetch the files after. Otherwise, we’d have to re-run *all* the map tasks whenever one reduce task fails. However, these files usually remain in the OS buffer cache so they are written and read at memory speed. In the future we might add a setting that skips this and uses Spark’s memory store for shuffle data instead.

On the reduce side there’s no use of disk except in Spark 0.9, where we added the option to spill to disk if the reduce’s inputs don’t fit in memory.

Matei

On Jan 23, 2014, at 2:25 PM, suman bharadwaj <su...@gmail.com> wrote:

> Hi,
> 
> Sorry for the confusion. 
> 
> So let me rephrase my question.
> 
> Why does SPARK have to write the intermediate data to disk when there is a shuffle dependency? Can't the communication happen directly just like Giraph ?
> And does data get written at reducer side as well ?
> 
> Again please feel free to correct me, in case my understanding is incorrect.
> 
> Regards,
> SB
> 
> 
> On Fri, Jan 24, 2014 at 3:44 AM, Jey Kottalam <je...@cs.berkeley.edu> wrote:
> Hi Suman,
> 
> Spark does indeed do in-memory computation, and does not require
> spilling to disk after every map task. Could you explain where you
> "see that intermediate map outputs gets written to disk"? Perhaps
> you're seeing some intermediate results during a shuffle phase? In
> that case, you may want to look into the
> "spark.shuffle.consolidateFiles" option:
> https://spark.incubator.apache.org/docs/0.8.1/configuration.html
> 
> -Jey
> 
> On Thu, Jan 23, 2014 at 1:10 PM, suman bharadwaj <su...@gmail.com> wrote:
> > Hi,
> >
> > I might be wrong, but need your help.
> >
> > My understanding in Giraph is that, it doesn't write the intermediate data
> > to disk while sending messages to different machines. But in SPARK, I see
> > that intermediate map outputs gets written to disk. Why does SPARK write
> > intermediate data to disk ?
> >
> > What happens at reducer side ? Does SPARK write the data again to disk ? How
> > does it differ from Hadoop MR ?
> >
> > Can't SPARK communicate everything in memory ?
> >
> > If my understanding is wrong. Please do correct me.
> >
> > Regards,
> > Suman Bharadwaj S
>

Re: Giraph Vs SPARK

Posted by suman bharadwaj <su...@gmail.com>.

Hi,

Sorry for the confusion.

So let me rephrase my question.

Why does SPARK have to write the intermediate data to disk when there is a
shuffle dependency? Can't the communication happen directly just like
Giraph ?
And does data get written at reducer side as well ?

Again please feel free to correct me, in case my understanding is incorrect.

Regards,
SB


On Fri, Jan 24, 2014 at 3:44 AM, Jey Kottalam <je...@cs.berkeley.edu> wrote:

> Hi Suman,
>
> Spark does indeed do in-memory computation, and does not require
> spilling to disk after every map task. Could you explain where you
> "see that intermediate map outputs gets written to disk"? Perhaps
> you're seeing some intermediate results during a shuffle phase? In
> that case, you may want to look into the
> "spark.shuffle.consolidateFiles" option:
> https://spark.incubator.apache.org/docs/0.8.1/configuration.html
>
> -Jey
>
> On Thu, Jan 23, 2014 at 1:10 PM, suman bharadwaj <su...@gmail.com>
> wrote:
> > Hi,
> >
> > I might be wrong, but need your help.
> >
> > My understanding in Giraph is that, it doesn't write the intermediate
> data
> > to disk while sending messages to different machines. But in SPARK, I see
> > that intermediate map outputs gets written to disk. Why does SPARK write
> > intermediate data to disk ?
> >
> > What happens at reducer side ? Does SPARK write the data again to disk ?
> How
> > does it differ from Hadoop MR ?
> >
> > Can't SPARK communicate everything in memory ?
> >
> > If my understanding is wrong. Please do correct me.
> >
> > Regards,
> > Suman Bharadwaj S
>

Re: Giraph Vs SPARK

Posted by Jey Kottalam <je...@cs.berkeley.edu>.

Hi Suman,

Spark does indeed do in-memory computation, and does not require
spilling to disk after every map task. Could you explain where you
"see that intermediate map outputs gets written to disk"? Perhaps
you're seeing some intermediate results during a shuffle phase? In
that case, you may want to look into the
"spark.shuffle.consolidateFiles" option:
https://spark.incubator.apache.org/docs/0.8.1/configuration.html

-Jey

On Thu, Jan 23, 2014 at 1:10 PM, suman bharadwaj <su...@gmail.com> wrote:
> Hi,
>
> I might be wrong, but need your help.
>
> My understanding in Giraph is that, it doesn't write the intermediate data
> to disk while sending messages to different machines. But in SPARK, I see
> that intermediate map outputs gets written to disk. Why does SPARK write
> intermediate data to disk ?
>
> What happens at reducer side ? Does SPARK write the data again to disk ? How
> does it differ from Hadoop MR ?
>
> Can't SPARK communicate everything in memory ?
>
> If my understanding is wrong. Please do correct me.
>
> Regards,
> Suman Bharadwaj S