You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Victor Tso-Guillen <vt...@paxata.com> on 2014/09/19 05:55:33 UTC

diamond dependency tree

Is it possible to express a diamond DAG and have the leaf dependency
evaluate only once? So say data flows left to right (and the dependencies
are oriented right to left):

[image: Inline image 1]
Is it possible to run d.collect() and have a evaluate its iterator only
once?

Re: diamond dependency tree

Posted by Victor Tso-Guillen <vt...@paxata.com>.
Yes, sorry I meant DAG. I fixed it in my message but not the subject. The
terminology of "leaf" wasn't helpful I know so hopefully my visual example
was enough. Anyway, I noticed what you said in a local-mode test. I can try
that in a cluster, too. Thank you!

On Thu, Sep 18, 2014 at 10:28 PM, Tobias Pfeiffer <tg...@preferred.jp> wrote:

> Hi,
>
> On Thu, Sep 18, 2014 at 8:55 PM, Victor Tso-Guillen <vt...@paxata.com>
>> wrote:
>>
>>> Is it possible to express a diamond DAG and have the leaf dependency
>>> evaluate only once?
>>>
>>
> Well, strictly speaking your graph is not a "tree", and also the meaning
> of "leaf" is not totally clear, I'd say.
>
>
>> So say data flows left to right (and the dependencies are oriented right
>>> to left):
>>>
>>> [image: Inline image 1]
>>> Is it possible to run d.collect() and have a evaluate its iterator only
>>> once?
>>>
>>
> If you say a.cache() (or a.persist()) then it will be evaluated only once
> and then the cached data will be used for later accesses.
>
> Tobias
>

Re: diamond dependency tree

Posted by Tobias Pfeiffer <tg...@preferred.jp>.
Hi,

On Thu, Sep 18, 2014 at 8:55 PM, Victor Tso-Guillen <vt...@paxata.com> wrote:
>
>> Is it possible to express a diamond DAG and have the leaf dependency
>> evaluate only once?
>>
>
Well, strictly speaking your graph is not a "tree", and also the meaning of
"leaf" is not totally clear, I'd say.


> So say data flows left to right (and the dependencies are oriented right
>> to left):
>>
>> [image: Inline image 1]
>> Is it possible to run d.collect() and have a evaluate its iterator only
>> once?
>>
>
If you say a.cache() (or a.persist()) then it will be evaluated only once
and then the cached data will be used for later accesses.

Tobias

Re: diamond dependency tree

Posted by Victor Tso-Guillen <vt...@paxata.com>.
Caveat: all arrows are shuffle dependencies.

On Thu, Sep 18, 2014 at 8:55 PM, Victor Tso-Guillen <vt...@paxata.com> wrote:

> Is it possible to express a diamond DAG and have the leaf dependency
> evaluate only once? So say data flows left to right (and the dependencies
> are oriented right to left):
>
> [image: Inline image 1]
> Is it possible to run d.collect() and have a evaluate its iterator only
> once?
>