You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by Ben Juhn <be...@gmail.com> on 2016/07/29 18:33:49 UTC

Removing PCollection.cache call resulting in two MR jobs writing to same path

I removed a .cache call and am seeing some troublesome behavior.  It results in two nodes in Crunch's execution graph writing to the same output path.  When I add the .cache call back I end up with one node writing to crunch tmp space, and the other node writing to the output path.  

Is this expected behavior?

Thanks,
Ben

Re: Removing PCollection.cache call resulting in two MR jobs writing to same path

Posted by Gabriel Reid <ga...@gmail.com>.
Hi Ben,

That doesn't sound like expected behavior to me, although there might
be some extra details that cause it to be executed in that way. Any
chance you could put together a small example test case that
demonstrates this?

- Gabriel

On Fri, Jul 29, 2016 at 8:33 PM, Ben Juhn <be...@gmail.com> wrote:
> I removed a .cache call and am seeing some troublesome behavior.  It results in two nodes in Crunch's execution graph writing to the same output path.  When I add the .cache call back I end up with one node writing to crunch tmp space, and the other node writing to the output path.
>
> Is this expected behavior?
>
> Thanks,
> Ben