You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by Ben Juhn <be...@gmail.com> on 2016/07/29 18:33:49 UTC
Removing PCollection.cache call resulting in two MR jobs writing to same path
I removed a .cache call and am seeing some troublesome behavior. It results in two nodes in Crunch's execution graph writing to the same output path. When I add the .cache call back I end up with one node writing to crunch tmp space, and the other node writing to the output path.
Is this expected behavior?
Thanks,
Ben
Re: Removing PCollection.cache call resulting in two MR jobs writing
to same path
Posted by Gabriel Reid <ga...@gmail.com>.
Hi Ben,
That doesn't sound like expected behavior to me, although there might
be some extra details that cause it to be executed in that way. Any
chance you could put together a small example test case that
demonstrates this?
- Gabriel
On Fri, Jul 29, 2016 at 8:33 PM, Ben Juhn <be...@gmail.com> wrote:
> I removed a .cache call and am seeing some troublesome behavior. It results in two nodes in Crunch's execution graph writing to the same output path. When I add the .cache call back I end up with one node writing to crunch tmp space, and the other node writing to the output path.
>
> Is this expected behavior?
>
> Thanks,
> Ben