You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by Jeremy Lewi <je...@lewi.us> on 2014/03/27 15:26:11 UTC

Exception with AvroPathPerKeyTarget

Hi

I'm hitting the exception pasted below when using AvroPathPerKeyTarget.
Interestingly, my code works just fine when I run on a small dataset using
the LocalJobTracker. However, when I run on a large dataset using a hadoop
cluster I hit the exception.

Here's a link to my code.
http://goo.gl/HTAa58

Any help would be greatly appreciated.

Thanks
Jeremy

java.io.IOException: java.lang.IllegalArgumentException: Reducer output
name 'out0' cannot be parsed
at
org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook.handleMultiPaths(CrunchJobHooks.java:92)
at
org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook.run(CrunchJobHooks.java:79)
at
org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.checkRunningState(CrunchControlledJob.java:258)
at
org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.checkState(CrunchControlledJob.java:268)
at
org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.checkRunningJobs(CrunchJobControl.java:174)
at
org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:229)
at
org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:112)
at org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:55)
at org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:83)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.lang.IllegalArgumentException: Reducer output name 'out0'
cannot be parsed
at
org.apache.crunch.io.impl.FileTargetImpl.extractPartitionNumber(FileTargetImpl.java:194)
at
org.apache.crunch.io.impl.FileTargetImpl.getDestFile(FileTargetImpl.java:175)
at
org.apache.crunch.io.avro.AvroPathPerKeyTarget.handleOutputs(AvroPathPerKeyTarget.java:103)
at
org.apache.crunch.impl.mr.exec.CrunchJobHooks$CompletionHook.handleMultiPaths(CrunchJobHooks.java:87)

Re: Exception with AvroPathPerKeyTarget

Posted by Jeremy Lewi <je...@lewi.us>.
Gabriel,

Thanks for investigating. I'm fixing the issue that's causing me to produce
empty pcollections so that should resolve the problem for me.

J


On Sat, Mar 29, 2014 at 5:13 AM, Gabriel Reid <ga...@gmail.com>wrote:

> Hi Jeremy,
>
> I just took some time to dig into this a bit deeper. It turns out that
> it is indeed an issue with handling an empty output PCollection in the
> AvroPathPerKeyTarget -- I've logged
> https://issues.apache.org/jira/browse/CRUNCH-371 to resolve it.
>
> The reason it was working on the local job tracker is a difference in
> the implementation of LocalFileSystem and DistributedFileSystem in
> hadoop-1. The good/bad news is that the current code will consistently
> crash with with a consistent exception on hadoop-2 with both the local
> file system and HDFS. The short-term solution (other than patching
> your Crunch build with the patch in CRUNCH-371) would be just just
> ensure that the PCollection being output isn't empty.
>
> - Gabriel
>
>
> On Sat, Mar 29, 2014 at 2:27 AM, Jeremy Lewi <je...@lewi.us> wrote:
> > Thanks for the tip. I'll look into it and try to figure it out.
> >
> >
> > On Fri, Mar 28, 2014 at 11:11 AM, Gabriel Reid <ga...@gmail.com>
> > wrote:
> >>
> >> On Fri, Mar 28, 2014 at 6:13 PM, Jeremy Lewi <je...@lewi.us> wrote:
> >> > Unfortunately that didn't work. I still have a reduce only job.
> >> >
> >> > Here's a link to the console output in case that's helpful:
> >> >
> >> >
> https://drive.google.com/a/lewi.us/file/d/0B6ngy4MCihWwcy1sdE9DQ2hiYnc/edit?usp=sharing
> >> >
> >> >
> >> > I'm currently ungrouping my records before writing them (an earlier
> >> > attempt
> >> > to fix this issue). I'm trying without the ungroup now.
> >>
> >> Looking at the console output, I noticed that the second and third
> >> jobs are logging "Total input paths to process : 0", which makes me
> >> think that the first job being run doesn't have any output. Could you
> >> check the job counters there to see if it is indeed outputting
> >> anything? And was your local job running on the same data?
> >>
> >> The fact that there are no inputs would explain the reduce-only job,
> >> and I'm guessing/hoping that will be the reason the
> >> AvroPathPerKeyTarget is breaking.
> >>
> >> - Gabriel
> >>
> >>
> >> >
> >> > J
> >> >
> >> >
> >> > On Fri, Mar 28, 2014 at 10:08 AM, Jeremy Lewi <je...@lewi.us> wrote:
> >> >>
> >> >> Unfortunately that didn't work. I still have a reduce only job. I'm
> >> >> attaching the console output from when I run my job in case thats
> >> >> helpful.
> >> >> I'm currently ungrouping my records before writing them (an earlier
> >> >> attempt to fix this). I'm try undoing that.
> >> >>
> >> >> J
> >> >>
> >> >>
> >> >> On Fri, Mar 28, 2014 at 9:51 AM, Jeremy Lewi <je...@lewi.us> wrote:
> >> >>>
> >> >>> Thanks Gabriel I'll give that a try now. I was actually planning on
> >> >>> making that change once I realized that my current strategy was
> >> >>> forcing me
> >> >>> to materialize data early on.
> >> >>>
> >> >>>
> >> >>> On Fri, Mar 28, 2014 at 7:44 AM, Gabriel Reid <
> gabriel.reid@gmail.com>
> >> >>> wrote:
> >> >>>>
> >> >>>> On Fri, Mar 28, 2014 at 3:19 PM, Jeremy Lewi <je...@lewi.us>
> wrote:
> >> >>>> > No luck. I get the same error even when using a single reducer.
> I'm
> >> >>>> > attaching the job configuration as shown in the web ui.
> >> >>>> >
> >> >>>> > When I look at the job tracker for the job, it has no map tasks.
> Is
> >> >>>> > that
> >> >>>> > expected? I've never heard of a reduce only job.
> >> >>>> >
> >> >>>>
> >> >>>> Nope, a job with no map tasks doesn't sound right to me. I noticed
> >> >>>> that you're doing a effectively doing a materialize at [1], and
> then
> >> >>>> using a BloomFilterJoinStrategy. While this should work fine, I'm
> >> >>>> thinking that it could also potentially lead to some issues such as
> >> >>>> the one you're having (i.e. a job with no map tasks).
> >> >>>>
> >> >>>> Could you try using the default join strategy there to see what
> >> >>>> happens. I'm thinking that the AvroPathPerKeyTarget issue could
> just
> >> >>>> a
> >> >>>> consequence of something else going wrong earlier on.
> >> >>>>
> >> >>>> 1.
> >> >>>>
> >> >>>>
> https://code.google.com/p/contrail-bio/source/browse/src/main/java/contrail/scaffolding/FilterReads.java?name=dev_read_filtering#156
> >> >>>>
> >> >>>> >
> >> >>>> > On Fri, Mar 28, 2014 at 6:45 AM, Jeremy Lewi <je...@lewi.us>
> >> >>>> > wrote:
> >> >>>> >>
> >> >>>> >> This is my first time on a  cluster I'll try what Josh suggests
> >> >>>> >> now.
> >> >>>> >>
> >> >>>> >> J
> >> >>>> >>
> >> >>>> >>
> >> >>>> >> On Fri, Mar 28, 2014 at 3:41 AM, Josh Wills <
> josh.wills@gmail.com>
> >> >>>> >> wrote:
> >> >>>> >>>
> >> >>>> >>>
> >> >>>> >>> On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid
> >> >>>> >>> <ga...@gmail.com>
> >> >>>> >>> wrote:
> >> >>>> >>>>
> >> >>>> >>>> Hi Jeremy,
> >> >>>> >>>>
> >> >>>> >>>> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <je...@lewi.us>
> >> >>>> >>>> wrote:
> >> >>>> >>>> > Hi
> >> >>>> >>>> >
> >> >>>> >>>> > I'm hitting the exception pasted below when using
> >> >>>> >>>> > AvroPathPerKeyTarget.
> >> >>>> >>>> > Interestingly, my code works just fine when I run on a small
> >> >>>> >>>> > dataset
> >> >>>> >>>> > using
> >> >>>> >>>> > the LocalJobTracker. However, when I run on a large dataset
> >> >>>> >>>> > using
> >> >>>> >>>> > a
> >> >>>> >>>> > hadoop
> >> >>>> >>>> > cluster I hit the exception.
> >> >>>> >>>> >
> >> >>>> >>>>
> >> >>>> >>>> Have you ever been able to successfully use the
> >> >>>> >>>> AvroPathPerKeyTarget
> >> >>>> >>>> on a real cluster, or is this the first try with it?
> >> >>>> >>>>
> >> >>>> >>>> I'm wondering if this could be a problem that's always been
> >> >>>> >>>> around
> >> >>>> >>>> (as
> >> >>>> >>>> the integration test for AvroPathPerKeyTarget also runs in the
> >> >>>> >>>> local
> >> >>>> >>>> jobtracker), or if this could be something new.
> >> >>>> >>>
> >> >>>> >>>
> >> >>>> >>> +1-- Jeremy, if you force the job to run w/a single reducer on
> >> >>>> >>> the
> >> >>>> >>> cluster (i.e., via groupByKey(1)), does it work?
> >> >>>> >>>
> >> >>>> >>>>
> >> >>>> >>>>
> >> >>>> >>>> - Gabriel
> >> >>>> >>>
> >> >>>> >>>
> >> >>>> >>
> >> >>>> >
> >> >>>
> >> >>>
> >> >>
> >> >
> >
> >
>

Re: Exception with AvroPathPerKeyTarget

Posted by Gabriel Reid <ga...@gmail.com>.
Hi Jeremy,

I just took some time to dig into this a bit deeper. It turns out that
it is indeed an issue with handling an empty output PCollection in the
AvroPathPerKeyTarget -- I've logged
https://issues.apache.org/jira/browse/CRUNCH-371 to resolve it.

The reason it was working on the local job tracker is a difference in
the implementation of LocalFileSystem and DistributedFileSystem in
hadoop-1. The good/bad news is that the current code will consistently
crash with with a consistent exception on hadoop-2 with both the local
file system and HDFS. The short-term solution (other than patching
your Crunch build with the patch in CRUNCH-371) would be just just
ensure that the PCollection being output isn't empty.

- Gabriel


On Sat, Mar 29, 2014 at 2:27 AM, Jeremy Lewi <je...@lewi.us> wrote:
> Thanks for the tip. I'll look into it and try to figure it out.
>
>
> On Fri, Mar 28, 2014 at 11:11 AM, Gabriel Reid <ga...@gmail.com>
> wrote:
>>
>> On Fri, Mar 28, 2014 at 6:13 PM, Jeremy Lewi <je...@lewi.us> wrote:
>> > Unfortunately that didn't work. I still have a reduce only job.
>> >
>> > Here's a link to the console output in case that's helpful:
>> >
>> > https://drive.google.com/a/lewi.us/file/d/0B6ngy4MCihWwcy1sdE9DQ2hiYnc/edit?usp=sharing
>> >
>> >
>> > I'm currently ungrouping my records before writing them (an earlier
>> > attempt
>> > to fix this issue). I'm trying without the ungroup now.
>>
>> Looking at the console output, I noticed that the second and third
>> jobs are logging "Total input paths to process : 0", which makes me
>> think that the first job being run doesn't have any output. Could you
>> check the job counters there to see if it is indeed outputting
>> anything? And was your local job running on the same data?
>>
>> The fact that there are no inputs would explain the reduce-only job,
>> and I'm guessing/hoping that will be the reason the
>> AvroPathPerKeyTarget is breaking.
>>
>> - Gabriel
>>
>>
>> >
>> > J
>> >
>> >
>> > On Fri, Mar 28, 2014 at 10:08 AM, Jeremy Lewi <je...@lewi.us> wrote:
>> >>
>> >> Unfortunately that didn't work. I still have a reduce only job. I'm
>> >> attaching the console output from when I run my job in case thats
>> >> helpful.
>> >> I'm currently ungrouping my records before writing them (an earlier
>> >> attempt to fix this). I'm try undoing that.
>> >>
>> >> J
>> >>
>> >>
>> >> On Fri, Mar 28, 2014 at 9:51 AM, Jeremy Lewi <je...@lewi.us> wrote:
>> >>>
>> >>> Thanks Gabriel I'll give that a try now. I was actually planning on
>> >>> making that change once I realized that my current strategy was
>> >>> forcing me
>> >>> to materialize data early on.
>> >>>
>> >>>
>> >>> On Fri, Mar 28, 2014 at 7:44 AM, Gabriel Reid <ga...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> On Fri, Mar 28, 2014 at 3:19 PM, Jeremy Lewi <je...@lewi.us> wrote:
>> >>>> > No luck. I get the same error even when using a single reducer. I'm
>> >>>> > attaching the job configuration as shown in the web ui.
>> >>>> >
>> >>>> > When I look at the job tracker for the job, it has no map tasks. Is
>> >>>> > that
>> >>>> > expected? I've never heard of a reduce only job.
>> >>>> >
>> >>>>
>> >>>> Nope, a job with no map tasks doesn't sound right to me. I noticed
>> >>>> that you're doing a effectively doing a materialize at [1], and then
>> >>>> using a BloomFilterJoinStrategy. While this should work fine, I'm
>> >>>> thinking that it could also potentially lead to some issues such as
>> >>>> the one you're having (i.e. a job with no map tasks).
>> >>>>
>> >>>> Could you try using the default join strategy there to see what
>> >>>> happens. I'm thinking that the AvroPathPerKeyTarget issue could just
>> >>>> a
>> >>>> consequence of something else going wrong earlier on.
>> >>>>
>> >>>> 1.
>> >>>>
>> >>>> https://code.google.com/p/contrail-bio/source/browse/src/main/java/contrail/scaffolding/FilterReads.java?name=dev_read_filtering#156
>> >>>>
>> >>>> >
>> >>>> > On Fri, Mar 28, 2014 at 6:45 AM, Jeremy Lewi <je...@lewi.us>
>> >>>> > wrote:
>> >>>> >>
>> >>>> >> This is my first time on a  cluster I'll try what Josh suggests
>> >>>> >> now.
>> >>>> >>
>> >>>> >> J
>> >>>> >>
>> >>>> >>
>> >>>> >> On Fri, Mar 28, 2014 at 3:41 AM, Josh Wills <jo...@gmail.com>
>> >>>> >> wrote:
>> >>>> >>>
>> >>>> >>>
>> >>>> >>> On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid
>> >>>> >>> <ga...@gmail.com>
>> >>>> >>> wrote:
>> >>>> >>>>
>> >>>> >>>> Hi Jeremy,
>> >>>> >>>>
>> >>>> >>>> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <je...@lewi.us>
>> >>>> >>>> wrote:
>> >>>> >>>> > Hi
>> >>>> >>>> >
>> >>>> >>>> > I'm hitting the exception pasted below when using
>> >>>> >>>> > AvroPathPerKeyTarget.
>> >>>> >>>> > Interestingly, my code works just fine when I run on a small
>> >>>> >>>> > dataset
>> >>>> >>>> > using
>> >>>> >>>> > the LocalJobTracker. However, when I run on a large dataset
>> >>>> >>>> > using
>> >>>> >>>> > a
>> >>>> >>>> > hadoop
>> >>>> >>>> > cluster I hit the exception.
>> >>>> >>>> >
>> >>>> >>>>
>> >>>> >>>> Have you ever been able to successfully use the
>> >>>> >>>> AvroPathPerKeyTarget
>> >>>> >>>> on a real cluster, or is this the first try with it?
>> >>>> >>>>
>> >>>> >>>> I'm wondering if this could be a problem that's always been
>> >>>> >>>> around
>> >>>> >>>> (as
>> >>>> >>>> the integration test for AvroPathPerKeyTarget also runs in the
>> >>>> >>>> local
>> >>>> >>>> jobtracker), or if this could be something new.
>> >>>> >>>
>> >>>> >>>
>> >>>> >>> +1-- Jeremy, if you force the job to run w/a single reducer on
>> >>>> >>> the
>> >>>> >>> cluster (i.e., via groupByKey(1)), does it work?
>> >>>> >>>
>> >>>> >>>>
>> >>>> >>>>
>> >>>> >>>> - Gabriel
>> >>>> >>>
>> >>>> >>>
>> >>>> >>
>> >>>> >
>> >>>
>> >>>
>> >>
>> >
>
>

Re: Exception with AvroPathPerKeyTarget

Posted by Jeremy Lewi <je...@lewi.us>.
Thanks for the tip. I'll look into it and try to figure it out.


On Fri, Mar 28, 2014 at 11:11 AM, Gabriel Reid <ga...@gmail.com>wrote:

> On Fri, Mar 28, 2014 at 6:13 PM, Jeremy Lewi <je...@lewi.us> wrote:
> > Unfortunately that didn't work. I still have a reduce only job.
> >
> > Here's a link to the console output in case that's helpful:
> >
> https://drive.google.com/a/lewi.us/file/d/0B6ngy4MCihWwcy1sdE9DQ2hiYnc/edit?usp=sharing
> >
> >
> > I'm currently ungrouping my records before writing them (an earlier
> attempt
> > to fix this issue). I'm trying without the ungroup now.
>
> Looking at the console output, I noticed that the second and third
> jobs are logging "Total input paths to process : 0", which makes me
> think that the first job being run doesn't have any output. Could you
> check the job counters there to see if it is indeed outputting
> anything? And was your local job running on the same data?
>
> The fact that there are no inputs would explain the reduce-only job,
> and I'm guessing/hoping that will be the reason the
> AvroPathPerKeyTarget is breaking.
>
> - Gabriel
>
>
> >
> > J
> >
> >
> > On Fri, Mar 28, 2014 at 10:08 AM, Jeremy Lewi <je...@lewi.us> wrote:
> >>
> >> Unfortunately that didn't work. I still have a reduce only job. I'm
> >> attaching the console output from when I run my job in case thats
> helpful.
> >> I'm currently ungrouping my records before writing them (an earlier
> >> attempt to fix this). I'm try undoing that.
> >>
> >> J
> >>
> >>
> >> On Fri, Mar 28, 2014 at 9:51 AM, Jeremy Lewi <je...@lewi.us> wrote:
> >>>
> >>> Thanks Gabriel I'll give that a try now. I was actually planning on
> >>> making that change once I realized that my current strategy was
> forcing me
> >>> to materialize data early on.
> >>>
> >>>
> >>> On Fri, Mar 28, 2014 at 7:44 AM, Gabriel Reid <ga...@gmail.com>
> >>> wrote:
> >>>>
> >>>> On Fri, Mar 28, 2014 at 3:19 PM, Jeremy Lewi <je...@lewi.us> wrote:
> >>>> > No luck. I get the same error even when using a single reducer. I'm
> >>>> > attaching the job configuration as shown in the web ui.
> >>>> >
> >>>> > When I look at the job tracker for the job, it has no map tasks. Is
> >>>> > that
> >>>> > expected? I've never heard of a reduce only job.
> >>>> >
> >>>>
> >>>> Nope, a job with no map tasks doesn't sound right to me. I noticed
> >>>> that you're doing a effectively doing a materialize at [1], and then
> >>>> using a BloomFilterJoinStrategy. While this should work fine, I'm
> >>>> thinking that it could also potentially lead to some issues such as
> >>>> the one you're having (i.e. a job with no map tasks).
> >>>>
> >>>> Could you try using the default join strategy there to see what
> >>>> happens. I'm thinking that the AvroPathPerKeyTarget issue could just a
> >>>> consequence of something else going wrong earlier on.
> >>>>
> >>>> 1.
> >>>>
> https://code.google.com/p/contrail-bio/source/browse/src/main/java/contrail/scaffolding/FilterReads.java?name=dev_read_filtering#156
> >>>>
> >>>> >
> >>>> > On Fri, Mar 28, 2014 at 6:45 AM, Jeremy Lewi <je...@lewi.us>
> wrote:
> >>>> >>
> >>>> >> This is my first time on a  cluster I'll try what Josh suggests
> now.
> >>>> >>
> >>>> >> J
> >>>> >>
> >>>> >>
> >>>> >> On Fri, Mar 28, 2014 at 3:41 AM, Josh Wills <jo...@gmail.com>
> >>>> >> wrote:
> >>>> >>>
> >>>> >>>
> >>>> >>> On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid
> >>>> >>> <ga...@gmail.com>
> >>>> >>> wrote:
> >>>> >>>>
> >>>> >>>> Hi Jeremy,
> >>>> >>>>
> >>>> >>>> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <je...@lewi.us>
> >>>> >>>> wrote:
> >>>> >>>> > Hi
> >>>> >>>> >
> >>>> >>>> > I'm hitting the exception pasted below when using
> >>>> >>>> > AvroPathPerKeyTarget.
> >>>> >>>> > Interestingly, my code works just fine when I run on a small
> >>>> >>>> > dataset
> >>>> >>>> > using
> >>>> >>>> > the LocalJobTracker. However, when I run on a large dataset
> using
> >>>> >>>> > a
> >>>> >>>> > hadoop
> >>>> >>>> > cluster I hit the exception.
> >>>> >>>> >
> >>>> >>>>
> >>>> >>>> Have you ever been able to successfully use the
> >>>> >>>> AvroPathPerKeyTarget
> >>>> >>>> on a real cluster, or is this the first try with it?
> >>>> >>>>
> >>>> >>>> I'm wondering if this could be a problem that's always been
> around
> >>>> >>>> (as
> >>>> >>>> the integration test for AvroPathPerKeyTarget also runs in the
> >>>> >>>> local
> >>>> >>>> jobtracker), or if this could be something new.
> >>>> >>>
> >>>> >>>
> >>>> >>> +1-- Jeremy, if you force the job to run w/a single reducer on the
> >>>> >>> cluster (i.e., via groupByKey(1)), does it work?
> >>>> >>>
> >>>> >>>>
> >>>> >>>>
> >>>> >>>> - Gabriel
> >>>> >>>
> >>>> >>>
> >>>> >>
> >>>> >
> >>>
> >>>
> >>
> >
>

Re: Exception with AvroPathPerKeyTarget

Posted by Gabriel Reid <ga...@gmail.com>.
On Fri, Mar 28, 2014 at 6:13 PM, Jeremy Lewi <je...@lewi.us> wrote:
> Unfortunately that didn't work. I still have a reduce only job.
>
> Here's a link to the console output in case that's helpful:
> https://drive.google.com/a/lewi.us/file/d/0B6ngy4MCihWwcy1sdE9DQ2hiYnc/edit?usp=sharing
>
>
> I'm currently ungrouping my records before writing them (an earlier attempt
> to fix this issue). I'm trying without the ungroup now.

Looking at the console output, I noticed that the second and third
jobs are logging "Total input paths to process : 0", which makes me
think that the first job being run doesn't have any output. Could you
check the job counters there to see if it is indeed outputting
anything? And was your local job running on the same data?

The fact that there are no inputs would explain the reduce-only job,
and I'm guessing/hoping that will be the reason the
AvroPathPerKeyTarget is breaking.

- Gabriel


>
> J
>
>
> On Fri, Mar 28, 2014 at 10:08 AM, Jeremy Lewi <je...@lewi.us> wrote:
>>
>> Unfortunately that didn't work. I still have a reduce only job. I'm
>> attaching the console output from when I run my job in case thats helpful.
>> I'm currently ungrouping my records before writing them (an earlier
>> attempt to fix this). I'm try undoing that.
>>
>> J
>>
>>
>> On Fri, Mar 28, 2014 at 9:51 AM, Jeremy Lewi <je...@lewi.us> wrote:
>>>
>>> Thanks Gabriel I'll give that a try now. I was actually planning on
>>> making that change once I realized that my current strategy was forcing me
>>> to materialize data early on.
>>>
>>>
>>> On Fri, Mar 28, 2014 at 7:44 AM, Gabriel Reid <ga...@gmail.com>
>>> wrote:
>>>>
>>>> On Fri, Mar 28, 2014 at 3:19 PM, Jeremy Lewi <je...@lewi.us> wrote:
>>>> > No luck. I get the same error even when using a single reducer. I'm
>>>> > attaching the job configuration as shown in the web ui.
>>>> >
>>>> > When I look at the job tracker for the job, it has no map tasks. Is
>>>> > that
>>>> > expected? I've never heard of a reduce only job.
>>>> >
>>>>
>>>> Nope, a job with no map tasks doesn't sound right to me. I noticed
>>>> that you're doing a effectively doing a materialize at [1], and then
>>>> using a BloomFilterJoinStrategy. While this should work fine, I'm
>>>> thinking that it could also potentially lead to some issues such as
>>>> the one you're having (i.e. a job with no map tasks).
>>>>
>>>> Could you try using the default join strategy there to see what
>>>> happens. I'm thinking that the AvroPathPerKeyTarget issue could just a
>>>> consequence of something else going wrong earlier on.
>>>>
>>>> 1.
>>>> https://code.google.com/p/contrail-bio/source/browse/src/main/java/contrail/scaffolding/FilterReads.java?name=dev_read_filtering#156
>>>>
>>>> >
>>>> > On Fri, Mar 28, 2014 at 6:45 AM, Jeremy Lewi <je...@lewi.us> wrote:
>>>> >>
>>>> >> This is my first time on a  cluster I'll try what Josh suggests now.
>>>> >>
>>>> >> J
>>>> >>
>>>> >>
>>>> >> On Fri, Mar 28, 2014 at 3:41 AM, Josh Wills <jo...@gmail.com>
>>>> >> wrote:
>>>> >>>
>>>> >>>
>>>> >>> On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid
>>>> >>> <ga...@gmail.com>
>>>> >>> wrote:
>>>> >>>>
>>>> >>>> Hi Jeremy,
>>>> >>>>
>>>> >>>> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <je...@lewi.us>
>>>> >>>> wrote:
>>>> >>>> > Hi
>>>> >>>> >
>>>> >>>> > I'm hitting the exception pasted below when using
>>>> >>>> > AvroPathPerKeyTarget.
>>>> >>>> > Interestingly, my code works just fine when I run on a small
>>>> >>>> > dataset
>>>> >>>> > using
>>>> >>>> > the LocalJobTracker. However, when I run on a large dataset using
>>>> >>>> > a
>>>> >>>> > hadoop
>>>> >>>> > cluster I hit the exception.
>>>> >>>> >
>>>> >>>>
>>>> >>>> Have you ever been able to successfully use the
>>>> >>>> AvroPathPerKeyTarget
>>>> >>>> on a real cluster, or is this the first try with it?
>>>> >>>>
>>>> >>>> I'm wondering if this could be a problem that's always been around
>>>> >>>> (as
>>>> >>>> the integration test for AvroPathPerKeyTarget also runs in the
>>>> >>>> local
>>>> >>>> jobtracker), or if this could be something new.
>>>> >>>
>>>> >>>
>>>> >>> +1-- Jeremy, if you force the job to run w/a single reducer on the
>>>> >>> cluster (i.e., via groupByKey(1)), does it work?
>>>> >>>
>>>> >>>>
>>>> >>>>
>>>> >>>> - Gabriel
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>
>>>
>>
>

Re: Exception with AvroPathPerKeyTarget

Posted by Jeremy Lewi <je...@lewi.us>.
Unfortunately that didn't work. I still have a reduce only job.

Here's a link to the console output in case that's helpful:
https://drive.google.com/a/lewi.us/file/d/0B6ngy4MCihWwcy1sdE9DQ2hiYnc/edit?usp=sharing


I'm currently ungrouping my records before writing them (an earlier attempt
to fix this issue). I'm trying without the ungroup now.

J


On Fri, Mar 28, 2014 at 10:08 AM, Jeremy Lewi <je...@lewi.us> wrote:

> Unfortunately that didn't work. I still have a reduce only job. I'm
> attaching the console output from when I run my job in case thats helpful.
> I'm currently ungrouping my records before writing them (an earlier
> attempt to fix this). I'm try undoing that.
>
> J
>
>
> On Fri, Mar 28, 2014 at 9:51 AM, Jeremy Lewi <je...@lewi.us> wrote:
>
>> Thanks Gabriel I'll give that a try now. I was actually planning on
>> making that change once I realized that my current strategy was forcing me
>> to materialize data early on.
>>
>>
>> On Fri, Mar 28, 2014 at 7:44 AM, Gabriel Reid <ga...@gmail.com>wrote:
>>
>>> On Fri, Mar 28, 2014 at 3:19 PM, Jeremy Lewi <je...@lewi.us> wrote:
>>> > No luck. I get the same error even when using a single reducer. I'm
>>> > attaching the job configuration as shown in the web ui.
>>> >
>>> > When I look at the job tracker for the job, it has no map tasks. Is
>>> that
>>> > expected? I've never heard of a reduce only job.
>>> >
>>>
>>> Nope, a job with no map tasks doesn't sound right to me. I noticed
>>> that you're doing a effectively doing a materialize at [1], and then
>>> using a BloomFilterJoinStrategy. While this should work fine, I'm
>>> thinking that it could also potentially lead to some issues such as
>>> the one you're having (i.e. a job with no map tasks).
>>>
>>> Could you try using the default join strategy there to see what
>>> happens. I'm thinking that the AvroPathPerKeyTarget issue could just a
>>> consequence of something else going wrong earlier on.
>>>
>>> 1.
>>> https://code.google.com/p/contrail-bio/source/browse/src/main/java/contrail/scaffolding/FilterReads.java?name=dev_read_filtering#156
>>>
>>> >
>>> > On Fri, Mar 28, 2014 at 6:45 AM, Jeremy Lewi <je...@lewi.us> wrote:
>>> >>
>>> >> This is my first time on a  cluster I'll try what Josh suggests now.
>>> >>
>>> >> J
>>> >>
>>> >>
>>> >> On Fri, Mar 28, 2014 at 3:41 AM, Josh Wills <jo...@gmail.com>
>>> wrote:
>>> >>>
>>> >>>
>>> >>> On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid <
>>> gabriel.reid@gmail.com>
>>> >>> wrote:
>>> >>>>
>>> >>>> Hi Jeremy,
>>> >>>>
>>> >>>> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <je...@lewi.us>
>>> wrote:
>>> >>>> > Hi
>>> >>>> >
>>> >>>> > I'm hitting the exception pasted below when using
>>> >>>> > AvroPathPerKeyTarget.
>>> >>>> > Interestingly, my code works just fine when I run on a small
>>> dataset
>>> >>>> > using
>>> >>>> > the LocalJobTracker. However, when I run on a large dataset using
>>> a
>>> >>>> > hadoop
>>> >>>> > cluster I hit the exception.
>>> >>>> >
>>> >>>>
>>> >>>> Have you ever been able to successfully use the AvroPathPerKeyTarget
>>> >>>> on a real cluster, or is this the first try with it?
>>> >>>>
>>> >>>> I'm wondering if this could be a problem that's always been around
>>> (as
>>> >>>> the integration test for AvroPathPerKeyTarget also runs in the local
>>> >>>> jobtracker), or if this could be something new.
>>> >>>
>>> >>>
>>> >>> +1-- Jeremy, if you force the job to run w/a single reducer on the
>>> >>> cluster (i.e., via groupByKey(1)), does it work?
>>> >>>
>>> >>>>
>>> >>>>
>>> >>>> - Gabriel
>>> >>>
>>> >>>
>>> >>
>>> >
>>>
>>
>>
>

Re: Exception with AvroPathPerKeyTarget

Posted by Jeremy Lewi <je...@lewi.us>.
Thanks Gabriel I'll give that a try now. I was actually planning on making
that change once I realized that my current strategy was forcing me to
materialize data early on.


On Fri, Mar 28, 2014 at 7:44 AM, Gabriel Reid <ga...@gmail.com>wrote:

> On Fri, Mar 28, 2014 at 3:19 PM, Jeremy Lewi <je...@lewi.us> wrote:
> > No luck. I get the same error even when using a single reducer. I'm
> > attaching the job configuration as shown in the web ui.
> >
> > When I look at the job tracker for the job, it has no map tasks. Is that
> > expected? I've never heard of a reduce only job.
> >
>
> Nope, a job with no map tasks doesn't sound right to me. I noticed
> that you're doing a effectively doing a materialize at [1], and then
> using a BloomFilterJoinStrategy. While this should work fine, I'm
> thinking that it could also potentially lead to some issues such as
> the one you're having (i.e. a job with no map tasks).
>
> Could you try using the default join strategy there to see what
> happens. I'm thinking that the AvroPathPerKeyTarget issue could just a
> consequence of something else going wrong earlier on.
>
> 1.
> https://code.google.com/p/contrail-bio/source/browse/src/main/java/contrail/scaffolding/FilterReads.java?name=dev_read_filtering#156
>
> >
> > On Fri, Mar 28, 2014 at 6:45 AM, Jeremy Lewi <je...@lewi.us> wrote:
> >>
> >> This is my first time on a  cluster I'll try what Josh suggests now.
> >>
> >> J
> >>
> >>
> >> On Fri, Mar 28, 2014 at 3:41 AM, Josh Wills <jo...@gmail.com>
> wrote:
> >>>
> >>>
> >>> On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid <ga...@gmail.com>
> >>> wrote:
> >>>>
> >>>> Hi Jeremy,
> >>>>
> >>>> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <je...@lewi.us> wrote:
> >>>> > Hi
> >>>> >
> >>>> > I'm hitting the exception pasted below when using
> >>>> > AvroPathPerKeyTarget.
> >>>> > Interestingly, my code works just fine when I run on a small dataset
> >>>> > using
> >>>> > the LocalJobTracker. However, when I run on a large dataset using a
> >>>> > hadoop
> >>>> > cluster I hit the exception.
> >>>> >
> >>>>
> >>>> Have you ever been able to successfully use the AvroPathPerKeyTarget
> >>>> on a real cluster, or is this the first try with it?
> >>>>
> >>>> I'm wondering if this could be a problem that's always been around (as
> >>>> the integration test for AvroPathPerKeyTarget also runs in the local
> >>>> jobtracker), or if this could be something new.
> >>>
> >>>
> >>> +1-- Jeremy, if you force the job to run w/a single reducer on the
> >>> cluster (i.e., via groupByKey(1)), does it work?
> >>>
> >>>>
> >>>>
> >>>> - Gabriel
> >>>
> >>>
> >>
> >
>

Re: Exception with AvroPathPerKeyTarget

Posted by Gabriel Reid <ga...@gmail.com>.
On Fri, Mar 28, 2014 at 3:19 PM, Jeremy Lewi <je...@lewi.us> wrote:
> No luck. I get the same error even when using a single reducer. I'm
> attaching the job configuration as shown in the web ui.
>
> When I look at the job tracker for the job, it has no map tasks. Is that
> expected? I've never heard of a reduce only job.
>

Nope, a job with no map tasks doesn't sound right to me. I noticed
that you're doing a effectively doing a materialize at [1], and then
using a BloomFilterJoinStrategy. While this should work fine, I'm
thinking that it could also potentially lead to some issues such as
the one you're having (i.e. a job with no map tasks).

Could you try using the default join strategy there to see what
happens. I'm thinking that the AvroPathPerKeyTarget issue could just a
consequence of something else going wrong earlier on.

1. https://code.google.com/p/contrail-bio/source/browse/src/main/java/contrail/scaffolding/FilterReads.java?name=dev_read_filtering#156

>
> On Fri, Mar 28, 2014 at 6:45 AM, Jeremy Lewi <je...@lewi.us> wrote:
>>
>> This is my first time on a  cluster I'll try what Josh suggests now.
>>
>> J
>>
>>
>> On Fri, Mar 28, 2014 at 3:41 AM, Josh Wills <jo...@gmail.com> wrote:
>>>
>>>
>>> On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid <ga...@gmail.com>
>>> wrote:
>>>>
>>>> Hi Jeremy,
>>>>
>>>> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <je...@lewi.us> wrote:
>>>> > Hi
>>>> >
>>>> > I'm hitting the exception pasted below when using
>>>> > AvroPathPerKeyTarget.
>>>> > Interestingly, my code works just fine when I run on a small dataset
>>>> > using
>>>> > the LocalJobTracker. However, when I run on a large dataset using a
>>>> > hadoop
>>>> > cluster I hit the exception.
>>>> >
>>>>
>>>> Have you ever been able to successfully use the AvroPathPerKeyTarget
>>>> on a real cluster, or is this the first try with it?
>>>>
>>>> I'm wondering if this could be a problem that's always been around (as
>>>> the integration test for AvroPathPerKeyTarget also runs in the local
>>>> jobtracker), or if this could be something new.
>>>
>>>
>>> +1-- Jeremy, if you force the job to run w/a single reducer on the
>>> cluster (i.e., via groupByKey(1)), does it work?
>>>
>>>>
>>>>
>>>> - Gabriel
>>>
>>>
>>
>

Re: Exception with AvroPathPerKeyTarget

Posted by Jeremy Lewi <je...@lewi.us>.
No luck. I get the same error even when using a single reducer. I'm
attaching the job configuration as shown in the web ui.

When I look at the job tracker for the job, it has no map tasks. Is that
expected? I've never heard of a reduce only job.

J


On Fri, Mar 28, 2014 at 6:45 AM, Jeremy Lewi <je...@lewi.us> wrote:

> This is my first time on a  cluster I'll try what Josh suggests now.
>
> J
>
>
> On Fri, Mar 28, 2014 at 3:41 AM, Josh Wills <jo...@gmail.com> wrote:
>
>>
>> On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid <ga...@gmail.com>wrote:
>>
>>> Hi Jeremy,
>>>
>>> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <je...@lewi.us> wrote:
>>> > Hi
>>> >
>>> > I'm hitting the exception pasted below when using AvroPathPerKeyTarget.
>>> > Interestingly, my code works just fine when I run on a small dataset
>>> using
>>> > the LocalJobTracker. However, when I run on a large dataset using a
>>> hadoop
>>> > cluster I hit the exception.
>>> >
>>>
>>> Have you ever been able to successfully use the AvroPathPerKeyTarget
>>> on a real cluster, or is this the first try with it?
>>>
>>> I'm wondering if this could be a problem that's always been around (as
>>> the integration test for AvroPathPerKeyTarget also runs in the local
>>> jobtracker), or if this could be something new.
>>>
>>
>> +1-- Jeremy, if you force the job to run w/a single reducer on the
>> cluster (i.e., via groupByKey(1)), does it work?
>>
>>
>>>
>>> - Gabriel
>>>
>>
>>
>

Re: Exception with AvroPathPerKeyTarget

Posted by Jeremy Lewi <je...@lewi.us>.
This is my first time on a  cluster I'll try what Josh suggests now.

J


On Fri, Mar 28, 2014 at 3:41 AM, Josh Wills <jo...@gmail.com> wrote:

>
> On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid <ga...@gmail.com>wrote:
>
>> Hi Jeremy,
>>
>> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <je...@lewi.us> wrote:
>> > Hi
>> >
>> > I'm hitting the exception pasted below when using AvroPathPerKeyTarget.
>> > Interestingly, my code works just fine when I run on a small dataset
>> using
>> > the LocalJobTracker. However, when I run on a large dataset using a
>> hadoop
>> > cluster I hit the exception.
>> >
>>
>> Have you ever been able to successfully use the AvroPathPerKeyTarget
>> on a real cluster, or is this the first try with it?
>>
>> I'm wondering if this could be a problem that's always been around (as
>> the integration test for AvroPathPerKeyTarget also runs in the local
>> jobtracker), or if this could be something new.
>>
>
> +1-- Jeremy, if you force the job to run w/a single reducer on the cluster
> (i.e., via groupByKey(1)), does it work?
>
>
>>
>> - Gabriel
>>
>
>

Re: Exception with AvroPathPerKeyTarget

Posted by Josh Wills <jo...@gmail.com>.
On Fri, Mar 28, 2014 at 1:22 AM, Gabriel Reid <ga...@gmail.com>wrote:

> Hi Jeremy,
>
> On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <je...@lewi.us> wrote:
> > Hi
> >
> > I'm hitting the exception pasted below when using AvroPathPerKeyTarget.
> > Interestingly, my code works just fine when I run on a small dataset
> using
> > the LocalJobTracker. However, when I run on a large dataset using a
> hadoop
> > cluster I hit the exception.
> >
>
> Have you ever been able to successfully use the AvroPathPerKeyTarget
> on a real cluster, or is this the first try with it?
>
> I'm wondering if this could be a problem that's always been around (as
> the integration test for AvroPathPerKeyTarget also runs in the local
> jobtracker), or if this could be something new.
>

+1-- Jeremy, if you force the job to run w/a single reducer on the cluster
(i.e., via groupByKey(1)), does it work?


>
> - Gabriel
>

Re: Exception with AvroPathPerKeyTarget

Posted by Gabriel Reid <ga...@gmail.com>.
Hi Jeremy,

On Thu, Mar 27, 2014 at 3:26 PM, Jeremy Lewi <je...@lewi.us> wrote:
> Hi
>
> I'm hitting the exception pasted below when using AvroPathPerKeyTarget.
> Interestingly, my code works just fine when I run on a small dataset using
> the LocalJobTracker. However, when I run on a large dataset using a hadoop
> cluster I hit the exception.
>

Have you ever been able to successfully use the AvroPathPerKeyTarget
on a real cluster, or is this the first try with it?

I'm wondering if this could be a problem that's always been around (as
the integration test for AvroPathPerKeyTarget also runs in the local
jobtracker), or if this could be something new.

- Gabriel