You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Annamalai, Sai IN BLR STS" <sa...@siemens.com> on 2014/01/29 04:07:10 UTC

Distributed Shared Access of Cached RDD's

--> RDD's are cached, say RDD1 is cached in NODE 1. It was discussed in the RDD paper that distributed shared memory was compared against.
    So is it that if NODE 2 is free with slot and worker in NODE 2 can directly access mem copy of RDD1 at NODE 1 or is a transfer via network is inevitable???


Regards,
Sai Prasanna.
Siemens Corporate Research Technology, Bangalore.


Re: Distributed Shared Access of Cached RDD's

Posted by Sai Prasanna <an...@gmail.com>.
Thank you TD !!


On Fri, Jan 31, 2014 at 11:20 AM, Tathagata Das <tathagata.das1565@gmail.com
> wrote:

> That depends. By default, the tasks are launched with location preference.
> So if there is not free slot currently available on Node 1, Spark will wait
> for a  free slot. However if enable delay scheduler (see config property
> spark.locality.wait), then it may launch tasks on other machines with free
> slots, and pull the data over the network.
>
>
> On Tue, Jan 28, 2014 at 7:07 PM, Annamalai, Sai IN BLR STS <
> sai.annamalai@siemens.com> wrote:
>
>> à RDD’s are cached, say RDD1 is cached in NODE 1. It was discussed in
>> the RDD paper that distributed shared memory was compared against.
>>
>>     So is it that if NODE 2 is free with slot and worker in NODE 2 can
>> directly access mem copy of RDD1 at NODE 1 or is a transfer via network is
>> inevitable???
>>
>>
>>
>>
>>
>> Regards,
>>
>> Sai Prasanna.
>>
>> Siemens Corporate Research Technology, Bangalore.
>>
>>
>>
>
>


-- 
*Sai Prasanna. AN*
*II M.Tech (CS), SSSIHL*


*Entire water in the ocean can never sink a ship, Unless it gets inside.All
the pressures of life can never hurt you, Unless you let them in.*

Re: Distributed Shared Access of Cached RDD's

Posted by Tathagata Das <ta...@gmail.com>.
That depends. By default, the tasks are launched with location preference.
So if there is not free slot currently available on Node 1, Spark will wait
for a  free slot. However if enable delay scheduler (see config property
spark.locality.wait), then it may launch tasks on other machines with free
slots, and pull the data over the network.


On Tue, Jan 28, 2014 at 7:07 PM, Annamalai, Sai IN BLR STS <
sai.annamalai@siemens.com> wrote:

> à RDD's are cached, say RDD1 is cached in NODE 1. It was discussed in the
> RDD paper that distributed shared memory was compared against.
>
>     So is it that if NODE 2 is free with slot and worker in NODE 2 can
> directly access mem copy of RDD1 at NODE 1 or is a transfer via network is
> inevitable???
>
>
>
>
>
> Regards,
>
> Sai Prasanna.
>
> Siemens Corporate Research Technology, Bangalore.
>
>
>