You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Nicholas Kolegraff <ni...@gmail.com> on 2012/05/03 20:56:34 UTC

Can no longer do a join

Hi Everyone,
This doesn't seem to be a *pig* error but I'm trying to do a join in pig
when I get it.  This use to work just fine but I did an update on some
packages that left hadoop+pig alone .. is this a JVM thing?

This is my pig version:
Apache Pig version 0.8.1-cdh3u3 (rexported)

This is my hadoop version:
Hadoop 0.20.2-cdh3u1

(it use to work with both of these versions)

Here is what I am trying to do:
for i in {1..10000}; do echo $(expr $RANDOM % 1000) >> a.txt; done
for i in {1..10000}; do echo $(expr $RANDOM % 1000) >> b.txt; done

pig_join.pig:
A = load 'a.txt';
B = load 'b.txt';

C = JOIN A by $0, B by $0;
D = FOREACH C GENERATE $0;

DUMP D;


Here is what I get:  Any Ideas?
2012-05-03 11:31:19,622 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: HASH_JOIN
2012-05-03 11:31:19,622 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
pig.usenewlogicalplan is set to true. New logical plan will be used.
2012-05-03 11:31:19,744 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: D:
Store(file:/tmp/temp1192087216/tmp-1872578945:org.apache.pig.impl.io.InterStorage)
- scope-16 Operator Key: scope-16)
2012-05-03 11:31:19,753 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
File concatenation threshold: 100 optimistic? false
2012-05-03 11:31:19,774 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
- Rewrite: POPackage->POForEach to POJoinPackage
2012-05-03 11:31:19,783 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2012-05-03 11:31:19,783 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2012-05-03 11:31:19,801 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
processName=JobTracker, sessionId=
2012-05-03 11:31:19,812 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
to the job
2012-05-03 11:31:19,835 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-05-03 11:31:21,580 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2012-05-03 11:31:21,590 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=113322
2012-05-03 11:31:21,590 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Neither PARALLEL nor default parallelism is set for this job. Setting
number of reducers to 1
2012-05-03 11:31:21,659 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
2012-05-03 11:31:21,659 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2012-05-03 11:31:21,694 [Thread-3] INFO
org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
2012-05-03 11:31:21,843 [Thread-3] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
to process : 1
2012-05-03 11:31:21,843 [Thread-3] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : 1
2012-05-03 11:31:21,852 [Thread-3] WARN
org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library is
available
2012-05-03 11:31:21,852 [Thread-3] INFO
org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library
loaded
2012-05-03 11:31:21,854 [Thread-3] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths (combined) to process : 1
2012-05-03 11:31:21,860 [Thread-3] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
to process : 1
2012-05-03 11:31:21,860 [Thread-3] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths to process : 1
2012-05-03 11:31:21,860 [Thread-3] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
paths (combined) to process : 1
2012-05-03 11:31:22,160 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_local_0001
2012-05-03 11:31:22,160 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2012-05-03 11:31:22,168 [Thread-4] INFO  org.apache.hadoop.util.ProcessTree
- setsid exited with exit code 0
2012-05-03 11:31:22,171 [Thread-4] INFO  org.apache.hadoop.mapred.Task -
Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1badd463
2012-05-03 11:31:22,195 [Thread-4] INFO  org.apache.hadoop.mapred.MapTask -
io.sort.mb = 100
2012-05-03 11:31:22,269 [Thread-4] INFO  org.apache.hadoop.mapred.MapTask -
data buffer = 79691776/99614720
2012-05-03 11:31:22,269 [Thread-4] INFO  org.apache.hadoop.mapred.MapTask -
record buffer = 262144/327680
2012-05-03 11:31:22,320 [Thread-4] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader
- Created input record counter: Input records from a.txt
2012-05-03 11:31:22,685 [Thread-4] INFO  org.apache.hadoop.mapred.MapTask -
Starting flush of map output
2012-05-03 11:31:22,701 [Thread-4] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
java.lang.NoClassDefFoundError:
org/apache/hadoop/thirdparty/guava/common/primitives/UnsignedBytes
    at
org.apache.hadoop.io.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo(FastByteComparisons.java:226)
    at
org.apache.hadoop.io.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo(FastByteComparisons.java:113)
    at
org.apache.hadoop.io.FastByteComparisons.compareTo(FastByteComparisons.java:42)
    at
org.apache.hadoop.io.WritableComparator.compareBytes(WritableComparator.java:150)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigWritableComparator.compare(JobControlCompiler.java:828)
    at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:968)
    at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:95)
    at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
    at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1254)
    at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1155)
    at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:582)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:649)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
    at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.thirdparty.guava.common.primitives.UnsignedBytes
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    ... 14 more

Re: Can no longer do a join

Posted by Scott Foster <sc...@gmail.com>.
I agree with Jagat, you either need to upgrade hadoop or downgrade
pig. Or you might try getting Pig 0.8.1, 0.9.2 or 0.10.0 from Apache.

On Fri, May 4, 2012 at 1:11 PM, Nicholas Kolegraff
<ni...@gmail.com> wrote:
> Scott,
> Thanks for the response!
>
> it does not
>

Re: Can no longer do a join

Posted by Nicholas Kolegraff <ni...@gmail.com>.
Scott,
Thanks for the response!

it does not

On Fri, May 4, 2012 at 9:19 AM, Scott Foster <sc...@gmail.com>wrote:

> Looks like a hadoop classpath problem since it can't find the guava
> jar file. Does this command return anything?
>
> pig -x local -secretDebugCmd | sed 's/:/\n/g' | grep guava
>
> scott.
>
> On Thu, May 3, 2012 at 11:56 AM, Nicholas Kolegraff
> <ni...@gmail.com> wrote:
> > Hi Everyone,
> > This doesn't seem to be a *pig* error but I'm trying to do a join in pig
> > when I get it.  This use to work just fine but I did an update on some
> > packages that left hadoop+pig alone .. is this a JVM thing?
> >
> > This is my pig version:
> > Apache Pig version 0.8.1-cdh3u3 (rexported)
> >
> > This is my hadoop version:
> > Hadoop 0.20.2-cdh3u1
> >
> > (it use to work with both of these versions)
> >
> > Here is what I am trying to do:
> > for i in {1..10000}; do echo $(expr $RANDOM % 1000) >> a.txt; done
> > for i in {1..10000}; do echo $(expr $RANDOM % 1000) >> b.txt; done
> >
> > pig_join.pig:
> > A = load 'a.txt';
> > B = load 'b.txt';
> >
> > C = JOIN A by $0, B by $0;
> > D = FOREACH C GENERATE $0;
> >
> > DUMP D;
> >
> >
> > Here is what I get:  Any Ideas?
> > 2012-05-03 11:31:19,622 [main] INFO
> > org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
> > script: HASH_JOIN
> > 2012-05-03 11:31:19,622 [main] INFO
> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> > pig.usenewlogicalplan is set to true. New logical plan will be used.
> > 2012-05-03 11:31:19,744 [main] INFO
> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
> D:
> >
> Store(file:/tmp/temp1192087216/tmp-1872578945:org.apache.pig.impl.io.InterStorage)
> > - scope-16 Operator Key: scope-16)
> > 2012-05-03 11:31:19,753 [main] INFO
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
> > File concatenation threshold: 100 optimistic? false
> > 2012-05-03 11:31:19,774 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
> > - Rewrite: POPackage->POForEach to POJoinPackage
> > 2012-05-03 11:31:19,783 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> > - MR plan size before optimization: 1
> > 2012-05-03 11:31:19,783 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> > - MR plan size after optimization: 1
> > 2012-05-03 11:31:19,801 [main] INFO
> > org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
> > processName=JobTracker, sessionId=
> > 2012-05-03 11:31:19,812 [main] INFO
> > org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
> > to the job
> > 2012-05-03 11:31:19,835 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > - mapred.job.reduce.markreset.buffer.percent is not set, set to default
> 0.3
> > 2012-05-03 11:31:21,580 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > - Setting up single store job
> > 2012-05-03 11:31:21,590 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=113322
> > 2012-05-03 11:31:21,590 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > - Neither PARALLEL nor default parallelism is set for this job. Setting
> > number of reducers to 1
> > 2012-05-03 11:31:21,659 [main] INFO
> > org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
> > with processName=JobTracker, sessionId= - already initialized
> > 2012-05-03 11:31:21,659 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 1 map-reduce job(s) waiting for submission.
> > 2012-05-03 11:31:21,694 [Thread-3] INFO
> > org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop
> library
> > 2012-05-03 11:31:21,843 [Thread-3] INFO
> > org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> > to process : 1
> > 2012-05-03 11:31:21,843 [Thread-3] INFO
> > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input
> > paths to process : 1
> > 2012-05-03 11:31:21,852 [Thread-3] WARN
> > org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library
> is
> > available
> > 2012-05-03 11:31:21,852 [Thread-3] INFO
> > org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library
> > loaded
> > 2012-05-03 11:31:21,854 [Thread-3] INFO
> > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input
> > paths (combined) to process : 1
> > 2012-05-03 11:31:21,860 [Thread-3] INFO
> > org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> > to process : 1
> > 2012-05-03 11:31:21,860 [Thread-3] INFO
> > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input
> > paths to process : 1
> > 2012-05-03 11:31:21,860 [Thread-3] INFO
> > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input
> > paths (combined) to process : 1
> > 2012-05-03 11:31:22,160 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - HadoopJobId: job_local_0001
> > 2012-05-03 11:31:22,160 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 0% complete
> > 2012-05-03 11:31:22,168 [Thread-4] INFO
>  org.apache.hadoop.util.ProcessTree
> > - setsid exited with exit code 0
> > 2012-05-03 11:31:22,171 [Thread-4] INFO  org.apache.hadoop.mapred.Task -
> > Using ResourceCalculatorPlugin :
> > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1badd463
> > 2012-05-03 11:31:22,195 [Thread-4] INFO
>  org.apache.hadoop.mapred.MapTask -
> > io.sort.mb = 100
> > 2012-05-03 11:31:22,269 [Thread-4] INFO
>  org.apache.hadoop.mapred.MapTask -
> > data buffer = 79691776/99614720
> > 2012-05-03 11:31:22,269 [Thread-4] INFO
>  org.apache.hadoop.mapred.MapTask -
> > record buffer = 262144/327680
> > 2012-05-03 11:31:22,320 [Thread-4] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader
> > - Created input record counter: Input records from a.txt
> > 2012-05-03 11:31:22,685 [Thread-4] INFO
>  org.apache.hadoop.mapred.MapTask -
> > Starting flush of map output
> > 2012-05-03 11:31:22,701 [Thread-4] WARN
> > org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
> > java.lang.NoClassDefFoundError:
> > org/apache/hadoop/thirdparty/guava/common/primitives/UnsignedBytes
> >    at
> >
> org.apache.hadoop.io.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo(FastByteComparisons.java:226)
> >    at
> >
> org.apache.hadoop.io.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo(FastByteComparisons.java:113)
> >    at
> >
> org.apache.hadoop.io.FastByteComparisons.compareTo(FastByteComparisons.java:42)
> >    at
> >
> org.apache.hadoop.io.WritableComparator.compareBytes(WritableComparator.java:150)
> >    at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigWritableComparator.compare(JobControlCompiler.java:828)
> >    at
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:968)
> >    at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:95)
> >    at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
> >    at
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1254)
> >    at
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1155)
> >    at
> >
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:582)
> >    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:649)
> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
> >    at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
> > Caused by: java.lang.ClassNotFoundException:
> > org.apache.hadoop.thirdparty.guava.common.primitives.UnsignedBytes
> >    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> >    at java.security.AccessController.doPrivileged(Native Method)
> >    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> >    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> >    ... 14 more
>

Re: Can no longer do a join

Posted by Jagat <ja...@gmail.com>.
Hello

Few days back i also got similar kind of issue , the reason for that was
mismatch between cloudera version for hadoop and pig. For me it got
resolved when i matched pig with hadoop version. '

But i guess you want some modifications in pig so you changed the version.

You can try checking the jar as suggested by Scott.



On Fri, May 4, 2012 at 9:49 PM, Scott Foster <sc...@gmail.com>wrote:

> Looks like a hadoop classpath problem since it can't find the guava
> jar file. Does this command return anything?
>
> pig -x local -secretDebugCmd | sed 's/:/\n/g' | grep guava
>
> scott.
>
> On Thu, May 3, 2012 at 11:56 AM, Nicholas Kolegraff
> <ni...@gmail.com> wrote:
> > Hi Everyone,
> > This doesn't seem to be a *pig* error but I'm trying to do a join in pig
> > when I get it.  This use to work just fine but I did an update on some
> > packages that left hadoop+pig alone .. is this a JVM thing?
> >
> > This is my pig version:
> > Apache Pig version 0.8.1-cdh3u3 (rexported)
> >
> > This is my hadoop version:
> > Hadoop 0.20.2-cdh3u1
> >
> > (it use to work with both of these versions)
> >
> > Here is what I am trying to do:
> > for i in {1..10000}; do echo $(expr $RANDOM % 1000) >> a.txt; done
> > for i in {1..10000}; do echo $(expr $RANDOM % 1000) >> b.txt; done
> >
> > pig_join.pig:
> > A = load 'a.txt';
> > B = load 'b.txt';
> >
> > C = JOIN A by $0, B by $0;
> > D = FOREACH C GENERATE $0;
> >
> > DUMP D;
> >
> >
> > Here is what I get:  Any Ideas?
> > 2012-05-03 11:31:19,622 [main] INFO
> > org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
> > script: HASH_JOIN
> > 2012-05-03 11:31:19,622 [main] INFO
> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> > pig.usenewlogicalplan is set to true. New logical plan will be used.
> > 2012-05-03 11:31:19,744 [main] INFO
> > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
> D:
> >
> Store(file:/tmp/temp1192087216/tmp-1872578945:org.apache.pig.impl.io.InterStorage)
> > - scope-16 Operator Key: scope-16)
> > 2012-05-03 11:31:19,753 [main] INFO
> > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
> > File concatenation threshold: 100 optimistic? false
> > 2012-05-03 11:31:19,774 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
> > - Rewrite: POPackage->POForEach to POJoinPackage
> > 2012-05-03 11:31:19,783 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> > - MR plan size before optimization: 1
> > 2012-05-03 11:31:19,783 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> > - MR plan size after optimization: 1
> > 2012-05-03 11:31:19,801 [main] INFO
> > org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
> > processName=JobTracker, sessionId=
> > 2012-05-03 11:31:19,812 [main] INFO
> > org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
> > to the job
> > 2012-05-03 11:31:19,835 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > - mapred.job.reduce.markreset.buffer.percent is not set, set to default
> 0.3
> > 2012-05-03 11:31:21,580 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > - Setting up single store job
> > 2012-05-03 11:31:21,590 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=113322
> > 2012-05-03 11:31:21,590 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> > - Neither PARALLEL nor default parallelism is set for this job. Setting
> > number of reducers to 1
> > 2012-05-03 11:31:21,659 [main] INFO
> > org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
> > with processName=JobTracker, sessionId= - already initialized
> > 2012-05-03 11:31:21,659 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 1 map-reduce job(s) waiting for submission.
> > 2012-05-03 11:31:21,694 [Thread-3] INFO
> > org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop
> library
> > 2012-05-03 11:31:21,843 [Thread-3] INFO
> > org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> > to process : 1
> > 2012-05-03 11:31:21,843 [Thread-3] INFO
> > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input
> > paths to process : 1
> > 2012-05-03 11:31:21,852 [Thread-3] WARN
> > org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library
> is
> > available
> > 2012-05-03 11:31:21,852 [Thread-3] INFO
> > org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library
> > loaded
> > 2012-05-03 11:31:21,854 [Thread-3] INFO
> > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input
> > paths (combined) to process : 1
> > 2012-05-03 11:31:21,860 [Thread-3] INFO
> > org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> > to process : 1
> > 2012-05-03 11:31:21,860 [Thread-3] INFO
> > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input
> > paths to process : 1
> > 2012-05-03 11:31:21,860 [Thread-3] INFO
> > org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
> input
> > paths (combined) to process : 1
> > 2012-05-03 11:31:22,160 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - HadoopJobId: job_local_0001
> > 2012-05-03 11:31:22,160 [main] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - 0% complete
> > 2012-05-03 11:31:22,168 [Thread-4] INFO
>  org.apache.hadoop.util.ProcessTree
> > - setsid exited with exit code 0
> > 2012-05-03 11:31:22,171 [Thread-4] INFO  org.apache.hadoop.mapred.Task -
> > Using ResourceCalculatorPlugin :
> > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1badd463
> > 2012-05-03 11:31:22,195 [Thread-4] INFO
>  org.apache.hadoop.mapred.MapTask -
> > io.sort.mb = 100
> > 2012-05-03 11:31:22,269 [Thread-4] INFO
>  org.apache.hadoop.mapred.MapTask -
> > data buffer = 79691776/99614720
> > 2012-05-03 11:31:22,269 [Thread-4] INFO
>  org.apache.hadoop.mapred.MapTask -
> > record buffer = 262144/327680
> > 2012-05-03 11:31:22,320 [Thread-4] INFO
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader
> > - Created input record counter: Input records from a.txt
> > 2012-05-03 11:31:22,685 [Thread-4] INFO
>  org.apache.hadoop.mapred.MapTask -
> > Starting flush of map output
> > 2012-05-03 11:31:22,701 [Thread-4] WARN
> > org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
> > java.lang.NoClassDefFoundError:
> > org/apache/hadoop/thirdparty/guava/common/primitives/UnsignedBytes
> >    at
> >
> org.apache.hadoop.io.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo(FastByteComparisons.java:226)
> >    at
> >
> org.apache.hadoop.io.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo(FastByteComparisons.java:113)
> >    at
> >
> org.apache.hadoop.io.FastByteComparisons.compareTo(FastByteComparisons.java:42)
> >    at
> >
> org.apache.hadoop.io.WritableComparator.compareBytes(WritableComparator.java:150)
> >    at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigWritableComparator.compare(JobControlCompiler.java:828)
> >    at
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:968)
> >    at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:95)
> >    at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
> >    at
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1254)
> >    at
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1155)
> >    at
> >
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:582)
> >    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:649)
> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
> >    at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
> > Caused by: java.lang.ClassNotFoundException:
> > org.apache.hadoop.thirdparty.guava.common.primitives.UnsignedBytes
> >    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> >    at java.security.AccessController.doPrivileged(Native Method)
> >    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> >    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> >    ... 14 more
>

Re: Can no longer do a join

Posted by Scott Foster <sc...@gmail.com>.
Looks like a hadoop classpath problem since it can't find the guava
jar file. Does this command return anything?

pig -x local -secretDebugCmd | sed 's/:/\n/g' | grep guava

scott.

On Thu, May 3, 2012 at 11:56 AM, Nicholas Kolegraff
<ni...@gmail.com> wrote:
> Hi Everyone,
> This doesn't seem to be a *pig* error but I'm trying to do a join in pig
> when I get it.  This use to work just fine but I did an update on some
> packages that left hadoop+pig alone .. is this a JVM thing?
>
> This is my pig version:
> Apache Pig version 0.8.1-cdh3u3 (rexported)
>
> This is my hadoop version:
> Hadoop 0.20.2-cdh3u1
>
> (it use to work with both of these versions)
>
> Here is what I am trying to do:
> for i in {1..10000}; do echo $(expr $RANDOM % 1000) >> a.txt; done
> for i in {1..10000}; do echo $(expr $RANDOM % 1000) >> b.txt; done
>
> pig_join.pig:
> A = load 'a.txt';
> B = load 'b.txt';
>
> C = JOIN A by $0, B by $0;
> D = FOREACH C GENERATE $0;
>
> DUMP D;
>
>
> Here is what I get:  Any Ideas?
> 2012-05-03 11:31:19,622 [main] INFO
> org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
> script: HASH_JOIN
> 2012-05-03 11:31:19,622 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> pig.usenewlogicalplan is set to true. New logical plan will be used.
> 2012-05-03 11:31:19,744 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: D:
> Store(file:/tmp/temp1192087216/tmp-1872578945:org.apache.pig.impl.io.InterStorage)
> - scope-16 Operator Key: scope-16)
> 2012-05-03 11:31:19,753 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
> File concatenation threshold: 100 optimistic? false
> 2012-05-03 11:31:19,774 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler$LastInputStreamingOptimizer
> - Rewrite: POPackage->POForEach to POJoinPackage
> 2012-05-03 11:31:19,783 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 1
> 2012-05-03 11:31:19,783 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 1
> 2012-05-03 11:31:19,801 [main] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 2012-05-03 11:31:19,812 [main] INFO
> org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
> to the job
> 2012-05-03 11:31:19,835 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2012-05-03 11:31:21,580 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Setting up single store job
> 2012-05-03 11:31:21,590 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=113322
> 2012-05-03 11:31:21,590 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Neither PARALLEL nor default parallelism is set for this job. Setting
> number of reducers to 1
> 2012-05-03 11:31:21,659 [main] INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics
> with processName=JobTracker, sessionId= - already initialized
> 2012-05-03 11:31:21,659 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map-reduce job(s) waiting for submission.
> 2012-05-03 11:31:21,694 [Thread-3] INFO
> org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
> 2012-05-03 11:31:21,843 [Thread-3] INFO
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> to process : 1
> 2012-05-03 11:31:21,843 [Thread-3] INFO
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
> paths to process : 1
> 2012-05-03 11:31:21,852 [Thread-3] WARN
> org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library is
> available
> 2012-05-03 11:31:21,852 [Thread-3] INFO
> org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library
> loaded
> 2012-05-03 11:31:21,854 [Thread-3] INFO
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
> paths (combined) to process : 1
> 2012-05-03 11:31:21,860 [Thread-3] INFO
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> to process : 1
> 2012-05-03 11:31:21,860 [Thread-3] INFO
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
> paths to process : 1
> 2012-05-03 11:31:21,860 [Thread-3] INFO
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
> paths (combined) to process : 1
> 2012-05-03 11:31:22,160 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - HadoopJobId: job_local_0001
> 2012-05-03 11:31:22,160 [main] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2012-05-03 11:31:22,168 [Thread-4] INFO  org.apache.hadoop.util.ProcessTree
> - setsid exited with exit code 0
> 2012-05-03 11:31:22,171 [Thread-4] INFO  org.apache.hadoop.mapred.Task -
> Using ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1badd463
> 2012-05-03 11:31:22,195 [Thread-4] INFO  org.apache.hadoop.mapred.MapTask -
> io.sort.mb = 100
> 2012-05-03 11:31:22,269 [Thread-4] INFO  org.apache.hadoop.mapred.MapTask -
> data buffer = 79691776/99614720
> 2012-05-03 11:31:22,269 [Thread-4] INFO  org.apache.hadoop.mapred.MapTask -
> record buffer = 262144/327680
> 2012-05-03 11:31:22,320 [Thread-4] INFO
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader
> - Created input record counter: Input records from a.txt
> 2012-05-03 11:31:22,685 [Thread-4] INFO  org.apache.hadoop.mapred.MapTask -
> Starting flush of map output
> 2012-05-03 11:31:22,701 [Thread-4] WARN
> org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
> java.lang.NoClassDefFoundError:
> org/apache/hadoop/thirdparty/guava/common/primitives/UnsignedBytes
>    at
> org.apache.hadoop.io.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo(FastByteComparisons.java:226)
>    at
> org.apache.hadoop.io.FastByteComparisons$LexicographicalComparerHolder$UnsafeComparer.compareTo(FastByteComparisons.java:113)
>    at
> org.apache.hadoop.io.FastByteComparisons.compareTo(FastByteComparisons.java:42)
>    at
> org.apache.hadoop.io.WritableComparator.compareBytes(WritableComparator.java:150)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigWritableComparator.compare(JobControlCompiler.java:828)
>    at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:968)
>    at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:95)
>    at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
>    at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1254)
>    at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1155)
>    at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:582)
>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:649)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>    at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.thirdparty.guava.common.primitives.UnsignedBytes
>    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>    ... 14 more