You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Chris Diehl <cp...@gmail.com> on 2012/06/08 21:35:23 UTC
Deserialization error when using Jython UDF in Pig 0.10 script
Hi All,
I recently downloaded and installed Pig 0.10 on our Hadoop cluster. After
configuring things as I've done before to use Jython UDFs, I'm seeing
deserialization errors. I've verified that my test code runs when I switch
back to Pig 0.8. I'm successfully using Pig 0.10 with Jython on my MacBook
Pro in local mode so I'm rather flummoxed as to what is up.
What I've done to set up Pig 0.10 for using Jython UDFs:
1) Installed Jython 2.5.0
2) set PIG_CLASSPATH="<jython path>/jython.jar"
3) set JYTHON_HOME="<jython path>"
4) added <jython path>/bin to the path
5) put jython-2.5.0.jar into <pig path>/lib for good measure
I'm not sure how to get around this issue. Anyone have any suggestions? In
case it's illuminating, I've returned output from a very simple Pig job
that loads data and attempts to pass it through a Jython UDF.
Chris
2012-06-08 19:25:16,031 [main] INFO org.apache.pig.Main - Apache Pig
version 0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12
2012-06-08 19:25:16,032 [main] INFO org.apache.pig.Main - Logging error
messages to:
/opt/shared_storage/log_analysis_pig_python_scripts/pig_1339183516027.log
2012-06-08 19:25:18,506 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: hdfs://X
2012-06-08 19:25:19,222 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to map-reduce job tracker at: X
2012-06-08 19:25:21,153 [main] INFO
org.apache.pig.scripting.jython.JythonScriptEngine - created tmp
python.cachedir=/tmp/pig_jython_491028285262723042
*sys-package-mgr*: processing new jar, '/usr/java/jdk1.6.0_21/lib/tools.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/ant-contrib-1.0b3.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-httpclient-3.0.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/commons-net-1.4.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/core-3.1.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.2.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jetty-6.1.26.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jetty-servlet-tester-6.1.26.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/junit-4.5.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/servlet-api-2.5-20081211.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar'
*sys-package-mgr*: processing new jar,
'/usr/lib/hadoop-0.20/contrib/fairscheduler/hadoop-fairscheduler-0.20.2-cdh3u1.jar'
*sys-package-mgr*: processing new jar,
'/opt/shared_storage/jython_2.5.0/jython.jar'
*sys-package-mgr*: processing new jar,
'/opt/shared_storage/pig-0.10.0/lib/automaton.jar'
*sys-package-mgr*: processing new jar,
'/opt/shared_storage/pig-0.10.0/lib/jython-2.5.0.jar'
*sys-package-mgr*: processing new jar,
'/opt/shared_storage/pig-0.10.0/pig-0.10.0-withouthadoop.jar'
*sys-package-mgr*: processing new jar,
'/usr/java/jdk1.6.0_21/jre/lib/resources.jar'
*sys-package-mgr*: processing new jar,
'/usr/java/jdk1.6.0_21/jre/lib/rt.jar'
*sys-package-mgr*: processing new jar,
'/usr/java/jdk1.6.0_21/jre/lib/jsse.jar'
*sys-package-mgr*: processing new jar,
'/usr/java/jdk1.6.0_21/jre/lib/jce.jar'
*sys-package-mgr*: processing new jar,
'/usr/java/jdk1.6.0_21/jre/lib/charsets.jar'
*sys-package-mgr*: processing new jar,
'/usr/java/jdk1.6.0_21/jre/lib/ext/sunjce_provider.jar'
*sys-package-mgr*: processing new jar,
'/usr/java/jdk1.6.0_21/jre/lib/ext/localedata.jar'
*sys-package-mgr*: processing new jar,
'/usr/java/jdk1.6.0_21/jre/lib/ext/dnsns.jar'
*sys-package-mgr*: processing new jar,
'/usr/java/jdk1.6.0_21/jre/lib/ext/sunpkcs11.jar'
2012-06-08 19:25:33,082 [main] INFO
org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting
UDF: py.time_interval_separations
2012-06-08 19:25:33,597 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN
2012-06-08 19:25:33,654 [main] INFO
org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Columns pruned
for raw_data: $0
2012-06-08 19:25:33,844 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
File concatenation threshold: 100 optimistic? false
2012-06-08 19:25:33,865 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2012-06-08 19:25:33,865 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2012-06-08 19:25:33,918 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
to the job
2012-06-08 19:25:33,930 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
2012-06-08 19:25:33,933 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- creating jar file Job5821323232033529741.jar
2012-06-08 19:25:43,538 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- jar file Job5821323232033529741.jar created
2012-06-08 19:25:43,562 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2012-06-08 19:25:43,598 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2012-06-08 19:25:44,054 [Thread-8] INFO
org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
to process : 1
2012-06-08 19:25:44,054 [Thread-8] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths to process : 1
2012-06-08 19:25:44,063 [Thread-8] WARN
org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
2012-06-08 19:25:44,063 [Thread-8] WARN
org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library
not loaded
2012-06-08 19:25:44,066 [Thread-8] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1
2012-06-08 19:25:44,101 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2012-06-08 19:25:44,796 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201205170527_2026
2012-06-08 19:25:44,796 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at: http://X/jobdetails.jsp?jobid=job_201205170527_2026
2012-06-08 19:26:34,736 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_201205170527_2026 has failed! Stop running all dependent jobs
2012-06-08 19:26:34,736 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2012-06-08 19:26:34,761 [main] ERROR
org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to
recreate exception from backed error: java.io.IOException: Deserialization
error: could not instantiate
'org.apache.pig.scripting.jython.JythonFunction' with arguments
'[/opt/shared_storage/log_analysis_pig_python_scripts/time_interval_separations.py,
time_interval_separations]'
at
org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:55)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.setup(PigGenericMapBase.java:177)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.RuntimeException: could not instantiate 'org.apache.p
2012-06-08 19:26:34,761 [main] ERROR
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2012-06-08 19:26:34,784 [main] INFO
org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:
HadoopVersion PigVersion UserId StartedAt FinishedAt Features
0.20.2-cdh3u1 0.10.0 hdfs 2012-06-08 19:25:33 2012-06-08 19:26:34 UNKNOWN
Failed!
Failed Jobs:
JobId Alias Feature Message Outputs
job_201205170527_2026 raw_data,ti_tups MAP_ONLY Message: Job failed! Error
- NA /data/test,
Input(s):
Failed to read data from "/data/TimeIntervals/part-r-00000"
Output(s):
Failed to produce result in "/data/test"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201205170527_2026
2012-06-08 19:26:34,785 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2012-06-08 19:26:34,823 [main] ERROR org.apache.pig.tools.grunt.GruntParser
- ERROR 2997: Unable to recreate exception from backed error:
java.io.IOException: Deserialization error: could not instantiate
'org.apache.pig.scripting.jython.JythonFunction' with arguments
'[/opt/shared_storage/log_analysis_pig_python_scripts/time_interval_separations.py,
time_interval_separations]'
at
org.apache.pig.impl.util.ObjectSerializer.deserialize(ObjectSerializer.java:55)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.setup(PigGenericMapBase.java:177)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Caused by: java.lang.RuntimeException: could not instantiate 'org.apache.p
Details at logfile:
/opt/shared_storage/log_analysis_pig_python_scripts/pig_1339183516027.log