You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@crunch.apache.org by Chao Shi <st...@live.com> on 2013/07/30 10:06:41 UTC

Sort with multiple reducers not working?

Hi devs,

Does any one tried sorting with multiple reducers? I seem to hit this when
trying to implement the HFile bulk loader.

You can reproduce this as follow:
1. modify SortIT to run multiple reducers
2. run SortIT#testWritableSortDesc

I got exception:
java.lang.IllegalArgumentException: Can't read partitions file
        at
org.apache.crunch.lib.sort.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:81)
        at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
        at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
        at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
Caused by: java.io.IOException: Wrong number of partitions in keyset
        at
org.apache.crunch.lib.sort.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:77)
        ... 6 more

It seems that TotalOrderPartitioner does not receive the correct number of
reducers. Any ideas?

Thanks,
Chao

Re: Sort with multiple reducers not working?

Posted by Gabriel Reid <ga...@gmail.com>.

I was just playing around with the HFile output format patch and ran into
this same issue (without realizing that this was the problem), and then
finally made the link with this.

The one way we could test things like this is using a MiniMRCluster, which
is actually accessible via the HBaseTestingUtility. That way we could start
up a "real" cluster that doesn't run in local mode, and then we could test
things like multiple regions here, as well as the sorting code. The
drawback is that it slows down the test code, but seeing as we're already
starting up a mini HBase cluster for the HBase tests then I think that's
probably acceptable.

- Gabriel


On Wed, Jul 31, 2013 at 5:17 PM, Josh Wills <jo...@gmail.com> wrote:

> Not that I know of.
>
>
> On Tue, Jul 30, 2013 at 11:54 PM, Chao Shi <st...@live.com> wrote:
>
> > Got it. I have to test my patch on a real cluster manually and it works.
> Is
> > there any way to do it in unit test?
> >
> >
> > On Tue, Jul 30, 2013 at 11:32 PM, Josh Wills <jw...@cloudera.com>
> wrote:
> >
> > > Hey Chao,
> > >
> > > It's just a problem w/the LocalJobRunner, which always uses a single
> > > reducer no matter what you set it to in the configuration.
> > >
> > > J
> > >
> > >
> > > On Tue, Jul 30, 2013 at 1:06 AM, Chao Shi <st...@live.com> wrote:
> > >
> > > > Hi devs,
> > > >
> > > > Does any one tried sorting with multiple reducers? I seem to hit this
> > > when
> > > > trying to implement the HFile bulk loader.
> > > >
> > > > You can reproduce this as follow:
> > > > 1. modify SortIT to run multiple reducers
> > > > 2. run SortIT#testWritableSortDesc
> > > >
> > > > I got exception:
> > > > java.lang.IllegalArgumentException: Can't read partitions file
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.crunch.lib.sort.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:81)
> > > >         at
> > > >
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
> > > >         at
> > > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
> > > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > > >         at
> > > >
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
> > > > Caused by: java.io.IOException: Wrong number of partitions in keyset
> > > >         at
> > > >
> > > >
> > >
> >
> org.apache.crunch.lib.sort.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:77)
> > > >         ... 6 more
> > > >
> > > > It seems that TotalOrderPartitioner does not receive the correct
> number
> > > of
> > > > reducers. Any ideas?
> > > >
> > > > Thanks,
> > > > Chao
> > > >
> > >
> > >
> > >
> > > --
> > > Director of Data Science
> > > Cloudera <http://www.cloudera.com>
> > > Twitter: @josh_wills <http://twitter.com/josh_wills>
> > >
> >
>

Re: Sort with multiple reducers not working?

Posted by Josh Wills <jo...@gmail.com>.

Not that I know of.


On Tue, Jul 30, 2013 at 11:54 PM, Chao Shi <st...@live.com> wrote:

> Got it. I have to test my patch on a real cluster manually and it works. Is
> there any way to do it in unit test?
>
>
> On Tue, Jul 30, 2013 at 11:32 PM, Josh Wills <jw...@cloudera.com> wrote:
>
> > Hey Chao,
> >
> > It's just a problem w/the LocalJobRunner, which always uses a single
> > reducer no matter what you set it to in the configuration.
> >
> > J
> >
> >
> > On Tue, Jul 30, 2013 at 1:06 AM, Chao Shi <st...@live.com> wrote:
> >
> > > Hi devs,
> > >
> > > Does any one tried sorting with multiple reducers? I seem to hit this
> > when
> > > trying to implement the HFile bulk loader.
> > >
> > > You can reproduce this as follow:
> > > 1. modify SortIT to run multiple reducers
> > > 2. run SortIT#testWritableSortDesc
> > >
> > > I got exception:
> > > java.lang.IllegalArgumentException: Can't read partitions file
> > >         at
> > >
> > >
> >
> org.apache.crunch.lib.sort.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:81)
> > >         at
> > > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> > >         at
> > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
> > >         at
> > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
> > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > >         at
> > >
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
> > > Caused by: java.io.IOException: Wrong number of partitions in keyset
> > >         at
> > >
> > >
> >
> org.apache.crunch.lib.sort.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:77)
> > >         ... 6 more
> > >
> > > It seems that TotalOrderPartitioner does not receive the correct number
> > of
> > > reducers. Any ideas?
> > >
> > > Thanks,
> > > Chao
> > >
> >
> >
> >
> > --
> > Director of Data Science
> > Cloudera <http://www.cloudera.com>
> > Twitter: @josh_wills <http://twitter.com/josh_wills>
> >
>

Re: Sort with multiple reducers not working?

Posted by Chao Shi <st...@live.com>.

Got it. I have to test my patch on a real cluster manually and it works. Is
there any way to do it in unit test?


On Tue, Jul 30, 2013 at 11:32 PM, Josh Wills <jw...@cloudera.com> wrote:

> Hey Chao,
>
> It's just a problem w/the LocalJobRunner, which always uses a single
> reducer no matter what you set it to in the configuration.
>
> J
>
>
> On Tue, Jul 30, 2013 at 1:06 AM, Chao Shi <st...@live.com> wrote:
>
> > Hi devs,
> >
> > Does any one tried sorting with multiple reducers? I seem to hit this
> when
> > trying to implement the HFile bulk loader.
> >
> > You can reproduce this as follow:
> > 1. modify SortIT to run multiple reducers
> > 2. run SortIT#testWritableSortDesc
> >
> > I got exception:
> > java.lang.IllegalArgumentException: Can't read partitions file
> >         at
> >
> >
> org.apache.crunch.lib.sort.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:81)
> >         at
> > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
> >         at
> >
> >
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> >         at
> >
> >
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
> >         at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> >         at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
> > Caused by: java.io.IOException: Wrong number of partitions in keyset
> >         at
> >
> >
> org.apache.crunch.lib.sort.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:77)
> >         ... 6 more
> >
> > It seems that TotalOrderPartitioner does not receive the correct number
> of
> > reducers. Any ideas?
> >
> > Thanks,
> > Chao
> >
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

Re: Sort with multiple reducers not working?

Posted by Josh Wills <jw...@cloudera.com>.

Hey Chao,

It's just a problem w/the LocalJobRunner, which always uses a single
reducer no matter what you set it to in the configuration.

J


On Tue, Jul 30, 2013 at 1:06 AM, Chao Shi <st...@live.com> wrote:

> Hi devs,
>
> Does any one tried sorting with multiple reducers? I seem to hit this when
> trying to implement the HFile bulk loader.
>
> You can reproduce this as follow:
> 1. modify SortIT to run multiple reducers
> 2. run SortIT#testWritableSortDesc
>
> I got exception:
> java.lang.IllegalArgumentException: Can't read partitions file
>         at
>
> org.apache.crunch.lib.sort.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:81)
>         at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
>         at
>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>         at
>
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
> Caused by: java.io.IOException: Wrong number of partitions in keyset
>         at
>
> org.apache.crunch.lib.sort.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:77)
>         ... 6 more
>
> It seems that TotalOrderPartitioner does not receive the correct number of
> reducers. Any ideas?
>
> Thanks,
> Chao
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>