You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jakub Dubovsky <sp...@gmail.com> on 2016/09/01 11:47:46 UTC

Re: Does a driver jvm houses some rdd partitions?

Hey Mich,

the question was not about one particular job but rather about general way
how spark functions.

If I do call persist on rdd then the executor which computed the partition
of the rdd would try to save the partition on the memory that executor has
reserved for caching. So my question is if there are any partitions
computed (and therefore cached if requested) in driver jvm?

thanks

On Wed, Aug 31, 2016 at 5:56 PM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> Hi,
>
> Are you caching RDD into storage memory here?
>
> Example
>
> s.persist(org.apache.spark.storage.StorageLevel.MEMORY_ONLY)
>
> Do you have a snapshot of your storage tab?
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 31 August 2016 at 14:53, Jakub Dubovsky <spark.dubovsky.jakub@gmail.com
> > wrote:
>
>> Hey all,
>>
>> I have a conceptual question which I have hard time finding answer for.
>>
>> Is the jvm where spark driver is running also used to run computations
>> over rdd partitions and persist them? The answer is obvious for local mode
>> (yes). But when it runs on yarn/mesos/standalone with many executors is the
>> answer no?
>>
>> *My motivation is following*
>> In "executors" tab of sparkUI in "storage memory" column for driver table
>> line one can see "0.0 B / 14.2 GB" for example. This suggests that 14G of
>> ram are not available to computations done in driver but are reserved for
>> rdd caching.
>>
>> But I have plenty of memory on executors to cache rdd there. I would like
>> to use driver memory for being able to collect medium sized data. Since I
>> assume that collected data are stored out of memory reserved from cache
>> this means that those 14G not available for saving collected data.
>>
>> It looks like spark2.0.0 is doing this cache vs non-cache memory
>> management somehow automatically but I do not understand that yet
>>
>> Thanks for any insight on this
>>
>> Jakub D.
>>
>
>