You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Chris Diehl <cp...@gmail.com> on 2012/06/08 21:35:23 UTC

Deserialization error when using Jython UDF in Pig 0.10 script

Hi All,

I recently downloaded and installed Pig 0.10 on our Hadoop cluster. After
configuring things as I've done before to use Jython UDFs, I'm seeing
deserialization errors. I've verified that my test code runs when I switch
back to Pig 0.8. I'm successfully using Pig 0.10 with Jython on my MacBook
Pro in local mode so I'm rather flummoxed as to what is up.

What I've done to set up Pig 0.10 for using Jython UDFs:
1) Installed Jython 2.5.0
2) set PIG_CLASSPATH="<jython path>/jython.jar"
3) set JYTHON_HOME="<jython path>"
4) added <jython path>/bin to the path
5) put jython-2.5.0.jar into <pig path>/lib for good measure

I'm not sure how to get around this issue. Anyone have any suggestions? In
case it's illuminating, I've returned output from a very simple Pig job
that loads data and attempts to pass it through a Jython UDF.

Chris

2012-06-08 19:25:16,031 [main] INFO  org.apache.pig.Main - Apache Pig
version 0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12
2012-06-08 19:25:16,032 [main] INFO  org.apache.pig.Main - Logging error
messages to:
/opt/shared_storage/log_analysis_pig_python_scripts/pig_1339183516027.log
2012-06-08 19:25:18,506 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: hdfs://X
2012-06-08 19:25:19,222 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to map-reduce job tracker at: X
2012-06-08 19:25:21,153 [main] INFO
 org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
python.cachedir=/tmp/pig_jython_491028285262723042
*sys-package-mgr*: processing new jar, '/usr/java/jdk1.6.0_21/lib/tools.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/ant-contrib-1.0b3.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-httpclient-3.0.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/core-3.1.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.2.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jetty-6.1.26.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jetty-servlet-tester-6.1.26.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/junit-4.5.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/servlet-api-2.5-20081211.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/contrib/fairscheduler/hadoop-fairscheduler-0.20.2-cdh3u1.jar'
*sys-package-mgr*: processing new jar,
'/opt/shared_storage/jython_2.5.0/jython.jar'
*sys-package-mgr*: processing new jar,
'/opt/shared_storage/pig-0.10.0/lib/automaton.jar'
*sys-package-mgr*: processing new jar,
'/opt/shared_storage/pig-0.10.0/lib/jython-2.5.0.jar'
*sys-package-mgr*: processing new jar,
'/opt/shared_storage/pig-0.10.0/pig-0.10.0-withouthadoop.jar'
*sys-package-mgr*: processing new jar,
'/usr/java/jdk1.6.0_21/jre/lib/resources.jar'
*sys-package-mgr*: processing new jar,
'/usr/java/jdk1.6.0_21/jre/lib/rt.jar'
*sys-package-mgr*: processing new jar,
'/usr/java/jdk1.6.0_21/jre/lib/jsse.jar'
*sys-package-mgr*: processing new jar,
'/usr/java/jdk1.6.0_21/jre/lib/jce.jar'
*sys-package-mgr*: processing new jar,
'/usr/java/jdk1.6.0_21/jre/lib/charsets.jar'
*sys-package-mgr*: processing new jar,
'/usr/java/jdk1.6.0_21/jre/lib/ext/sunjce_provider.jar'
*sys-package-mgr*: processing new jar,
'/usr/java/jdk1.6.0_21/jre/lib/ext/localedata.jar'
*sys-package-mgr*: processing new jar,
'/usr/java/jdk1.6.0_21/jre/lib/ext/dnsns.jar'
*sys-package-mgr*: processing new jar,
'/usr/java/jdk1.6.0_21/jre/lib/ext/sunpkcs11.jar'
2012-06-08 19:25:33,082 [main] INFO
 org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting
UDF: py.time_interval_separations
2012-06-08 19:25:33,597 [main] INFO
 org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN
2012-06-08 19:25:33,654 [main] INFO
 org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned
for raw_data: $0
2012-06-08 19:25:33,844 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
File concatenation threshold: 100 optimistic? false
2012-06-08 19:25:33,865 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2012-06-08 19:25:33,865 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2012-06-08 19:25:33,918 [main] INFO
 org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
to the job
2012-06-08 19:25:33,930 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-06-08 19:25:33,933 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- creating jar file Job5821323232033529741.jar
2012-06-08 19:25:43,538 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- jar file Job5821323232033529741.jar created
2012-06-08 19:25:43,562 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2012-06-08 19:25:43,598 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2012-06-08 19:25:44,054 [Thread-8] INFO
 org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
to process : 1
2012-06-08 19:25:44,054 [Thread-8] INFO
 org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1
2012-06-08 19:25:44,063 [Thread-8] WARN
 org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
2012-06-08 19:25:44,063 [Thread-8] WARN
 org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library
not loaded
2012-06-08 19:25:44,066 [Thread-8] INFO
 org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1
2012-06-08 19:25:44,101 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2012-06-08 19:25:44,796 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201205170527_2026
2012-06-08 19:25:44,796 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at: http://X/jobdetails.jsp?jobid=job_201205170527_2026
2012-06-08 19:26:34,736 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_201205170527_2026 has failed! Stop running all dependent jobs
2012-06-08 19:26:34,736 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2012-06-08 19:26:34,761 [main] ERROR
org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to
recreate exception from backed error: java.io.IOException: Deserialization
error: could not instantiate
'org.apache.pig.scripting.jython.JythonFunction' with arguments
'[/opt/shared_storage/log_analysis_pig_python_scripts/time_interval_separations.py,
time_interval_separations]'
 at
org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:55)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.setup(PigGenericMapBase.java:177)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.RuntimeException: could not instantiate 'org.apache.p
2012-06-08 19:26:34,761 [main] ERROR
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2012-06-08 19:26:34,784 [main] INFO
 org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt Features
0.20.2-cdh3u1 0.10.0 hdfs 2012-06-08 19:25:33 2012-06-08 19:26:34 UNKNOWN

Failed!

Failed Jobs:
JobId Alias Feature Message Outputs
job_201205170527_2026 raw_data,ti_tups MAP_ONLY Message: Job failed! Error
- NA /data/test,

Input(s):
Failed to read data from "/data/TimeIntervals/part-r-00000"

Output(s):
Failed to produce result in "/data/test"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201205170527_2026


2012-06-08 19:26:34,785 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2012-06-08 19:26:34,823 [main] ERROR org.apache.pig.tools.grunt.GruntParser
- ERROR 2997: Unable to recreate exception from backed error:
java.io.IOException: Deserialization error: could not instantiate
'org.apache.pig.scripting.jython.JythonFunction' with arguments
'[/opt/shared_storage/log_analysis_pig_python_scripts/time_interval_separations.py,
time_interval_separations]'
 at
org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:55)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.setup(PigGenericMapBase.java:177)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.RuntimeException: could not instantiate 'org.apache.p
Details at logfile:
/opt/shared_storage/log_analysis_pig_python_scripts/pig_1339183516027.log