You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Suhas Satish <su...@gmail.com> on 2014/07/26 03:15:00 UTC

alternative to skew join to split work among reducers

I am hitting the following exception with pig skewed join -

Is there a work around to split the work among many reducers instead of
skewed join?

This is the pig script -

raw = LOAD '/user/root/pig_data.dat' USING PigStorage(' ') AS (user, last,
number);

raw1 = LOAD '/user/root/pig_data-2.dat' USING PigStorage(' ') AS (user1,
last1, number);

skewjoin = JOIN raw by user, raw1 by user1 USING 'skewed';

store skewjoin into '/user/root/skew_out';
---------------
java.lang.RuntimeException: java.lang.IllegalArgumentException: Buffer size
<= 0
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.SkewedPartitioner.setConf(SkewedPartitioner.java:119)
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
 at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:593)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:346)
at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
 at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1117)
at org.apache.hadoop.mapred.Child.main(Child.java:271)
Caused by: java.lang.IllegalArgumentException: Buffer size <= 0
at java.io.BufferedInputStream.<init>(BufferedInputStream.java:193)
at
org.apache.hadoop.fs.BufferedFSInputStream.<init>(BufferedFSInputStream.java:44)
 at
org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:179)
at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:127)
 at
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:284)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:485)
 at
org.apache.pig.impl.io.InterRecordReader.initialize(InterRecordReader.java:65)
at
org.apache.pig.impl.io.ReadToEndLoader.initializeReader(ReadToEndLoader.java:210)
 at
org.apache.pig.impl.io.ReadToEndLoader.getNextHelper(ReadToEndLoader.java:245)
at org.apache.pig.impl.io.ReadToEndLoader.getNext(ReadToEndLoader.java:226)
 at
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil.loadPartitionFileFromLocalCache(MapRedUtil.java:110)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.SkewedPartitioner.setConf(SkewedPartitioner.java:114)
 ... 10 more
-------------------------

Thanks,
Suhas.