You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Arya Ketan <ke...@gmail.com> on 2020/09/23 07:44:09 UTC

Is RDD.persist honoured if multiple actions are executed in parallel

Hi,
I have a spark streaming use-case ( spark 2.2.1 ). And in my spark job, I
have multiple actions. I am running them in parallel by executing the
actions in separate threads. I have  a rdd.persist after which the DAG
forks into multiple actions.
but I see that rdd caching is not happening  and the entire DAG is executed
twice ( once in each action) .

What am I missing?
Arya

Re: Is RDD.persist honoured if multiple actions are executed in parallel

Posted by Michael Mior <mi...@sharecanada.org>.
If you want to ensure the persisted RDD has been calculated first,
just run foreach with a dummy function first to force evaluation.

--
Michael Mior
michael.mior@gmail.com

Le jeu. 24 sept. 2020 à 00:38, Arya Ketan <ke...@gmail.com> a écrit :
>
> Thanks, we were able to validate the same behaviour.
>
> On Wed, 23 Sep 2020 at 18:05, Sean Owen <sr...@gmail.com> wrote:
>>
>> It is but it happens asynchronously. If you access the same block twice quickly, the cached block may not yet be available the second time yet.
>>
>> On Wed, Sep 23, 2020, 7:17 AM Arya Ketan <ke...@gmail.com> wrote:
>>>
>>> Hi,
>>> I have a spark streaming use-case ( spark 2.2.1 ). And in my spark job, I have multiple actions. I am running them in parallel by executing the actions in separate threads. I have  a rdd.persist after which the DAG forks into multiple actions.
>>> but I see that rdd caching is not happening  and the entire DAG is executed twice ( once in each action) .
>>>
>>> What am I missing?
>>> Arya
>>>
>>>
>>
>>
> --
> Arya

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Is RDD.persist honoured if multiple actions are executed in parallel

Posted by Arya Ketan <ke...@gmail.com>.
Thanks, we were able to validate the same behaviour.

On Wed, 23 Sep 2020 at 18:05, Sean Owen <sr...@gmail.com> wrote:

> It is but it happens asynchronously. If you access the same block twice
> quickly, the cached block may not yet be available the second time yet.
>
> On Wed, Sep 23, 2020, 7:17 AM Arya Ketan <ke...@gmail.com> wrote:
>
>> Hi,
>> I have a spark streaming use-case ( spark 2.2.1 ). And in my spark job, I
>> have multiple actions. I am running them in parallel by executing the
>> actions in separate threads. I have  a rdd.persist after which the DAG
>> forks into multiple actions.
>> but I see that rdd caching is not happening  and the entire DAG is
>> executed twice ( once in each action) .
>>
>> What am I missing?
>> Arya
>>
>>
>>
>
> --
Arya

Re: Is RDD.persist honoured if multiple actions are executed in parallel

Posted by Sean Owen <sr...@gmail.com>.
It is but it happens asynchronously. If you access the same block twice
quickly, the cached block may not yet be available the second time yet.

On Wed, Sep 23, 2020, 7:17 AM Arya Ketan <ke...@gmail.com> wrote:

> Hi,
> I have a spark streaming use-case ( spark 2.2.1 ). And in my spark job, I
> have multiple actions. I am running them in parallel by executing the
> actions in separate threads. I have  a rdd.persist after which the DAG
> forks into multiple actions.
> but I see that rdd caching is not happening  and the entire DAG is
> executed twice ( once in each action) .
>
> What am I missing?
> Arya
>