You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Bill Graham (Commented) (JIRA)" <ji...@apache.org> on 2012/03/06 17:44:58 UTC

[jira] [Commented] (PIG-2495) Using merge JOIN from a HBaseStorage produces an error

    [ https://issues.apache.org/jira/browse/PIG-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223404#comment-13223404 ] 

Bill Graham commented on PIG-2495:
----------------------------------

Thanks for the patch Kevin! A few note about Pig code style:

* Indentation should be 4 spaces, you have 2 in some spots.
* Curly brackets should go at the end of the class name or constructor/method signature, not below it.
* Please include the standard apache header above the package name for TableSplitComparable
* I _think_ we favor brackets in if/else clauses but I'll let someone else confirm.

And a few more notes comments:

* I would think {{TableSplitComparable}} should implement {{WritableComparable<TableSplit>}} instead of {{WritableComparable<TableSplitComparable>}}, right?

* Your hashcode method seems like it could just be
{noformat}
return ((tsplit == null) ? 0 : tsplit.hashCode());
{noformat}

since it's just delegating to tsplit. 

* Also, the condition in equals could just be:

{noformat}
else {
  return tsplit.equals(other.tsplit);
}
{noformat}


* I don't think WritableComparable needs to implement Serializable and serialVersionUID.
* Should the wrapped TableSplit be initialized to an empty split? It seems like it should have to be explicitly set, right?
* In getSplitComparable you can just return {{new TableSplitComparable((TableSplit)split);}} after a ! instanceof check that throws an exception.
                
> Using merge JOIN from a HBaseStorage produces an error
> ------------------------------------------------------
>
>                 Key: PIG-2495
>                 URL: https://issues.apache.org/jira/browse/PIG-2495
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.1, 0.9.2
>         Environment: HBase 0.90.3, Hadoop 0.20-append
>            Reporter: Kevin Lion
>             Fix For: 0.9.2
>
>         Attachments: HBaseStorageMergeJoin.patch, HBaseStorageMergeJoin.patch
>
>
> To increase performance of my computation, I would like to use a merge join between two tables to increase speed computation but it produces an error.
> Here is the script:
> {noformat}
> start_sessions = LOAD 'hbase://startSession.bea000000.dev.ubithere.com' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:infoid meta:imei meta:timestamp', '-loadKey') AS (sid:chararray, infoid:chararray, imei:chararray, start:long);
> end_sessions = LOAD 'hbase://endSession.bea000000.dev.ubithere.com' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:timestamp meta:locid', '-loadKey') AS (sid:chararray, end:long, locid:chararray);
> sessions = JOIN start_sessions BY sid, end_sessions BY sid USING 'merge';
> STORE sessions INTO 'sessionsTest' USING PigStorage ('*');
> {noformat} 
> Here is the result of this script :
> {noformat}
> 2012-01-30 16:12:43,920 [main] INFO  org.apache.pig.Main - Logging error messages to: /root/pig_1327939963919.log
> 2012-01-30 16:12:44,025 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://lxc233:9000
> 2012-01-30 16:12:44,102 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: lxc233:9001
> 2012-01-30 16:12:44,760 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: MERGE_JION
> 2012-01-30 16:12:44,923 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
> 2012-01-30 16:12:44,982 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 2
> 2012-01-30 16:12:44,982 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2
> 2012-01-30 16:12:45,001 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
> 2012-01-30 16:12:45,006 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:zookeeper.version=3.3.2-1031432, built on 11/05/2010 05:32 GMT
> 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:host.name=lxc233.machine.com
> 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:java.version=1.6.0_22
> 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:java.vendor=Sun Microsystems Inc.
> 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:java.home=/usr/lib/jvm/java-6-sun-1.6.0.22/jre
> 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:java.class.path=/opt/hadoop/conf:/usr/lib/jvm/java-6-sun/jre/lib/tools.jar:/opt/hadoop:/opt/hadoop/hadoop-0.20-append-core.jar:/opt/hadoop/lib/commons-cli-1.2.jar:/opt/hadoop/lib/commons-codec-1.3.jar:/opt/hadoop/lib/commons-el-1.0.jar:/opt/hadoop/lib/commons-httpclient-3.0.1.jar:/opt/hadoop/lib/commons-logging-1.0.4.jar:/opt/hadoop/lib/commons-logging-api-1.0.4.jar:/opt/hadoop/lib/commons-net-1.4.1.jar:/opt/hadoop/lib/core-3.1.1.jar:/opt/hadoop/lib/hadoop-fairscheduler-0.20-append.jar:/opt/hadoop/lib/hadoop-gpl-compression-0.2.0-dev.jar:/opt/hadoop/lib/hadoop-lzo-0.4.14.jar:/opt/hadoop/lib/hsqldb-1.8.0.10.jar:/opt/hadoop/lib/jasper-compiler-5.5.12.jar:/opt/hadoop/lib/jasper-runtime-5.5.12.jar:/opt/hadoop/lib/jets3t-0.6.1.jar:/opt/hadoop/lib/jetty-6.1.14.jar:/opt/hadoop/lib/jetty-util-6.1.14.jar:/opt/hadoop/lib/junit-4.5.jar:/opt/hadoop/lib/kfs-0.2.2.jar:/opt/hadoop/lib/log4j-1.2.15.jar:/opt/hadoop/lib/mockito-all-1.8.2.jar:/opt/hadoop/lib/oro-2.0.8.jar:/opt/hadoop/lib/servlet-api-2.5-6.1.14.jar:/opt/hadoop/lib/slf4j-api-1.4.3.jar:/opt/hadoop/lib/slf4j-log4j12-1.4.3.jar:/opt/hadoop/lib/xmlenc-0.52.jar:/opt/hadoop/lib/jsp-2.1/jsp-2.1.jar:/opt/hadoop/lib/jsp-2.1/jsp-api-2.1.jar:/opt/pig/bin/../conf:/usr/lib/jvm/java-6-sun/jre/lib/tools.jar:/opt/hadoop/lib/commons-codec-1.3.jar:/opt/hbase/lib/guava-r06.jar:/opt/hbase/hbase-0.90.3.jar:/opt/hadoop/lib/log4j-1.2.15.jar:/opt/hadoop/lib/commons-cli-1.2.jar:/opt/hadoop/lib/commons-logging-1.0.4.jar:/opt/pig/pig-withouthadoop.jar:/opt/hadoop/conf_computation:/opt/hbase/conf:/opt/pig/bin/../lib/hadoop-0.20-append-core.jar:/opt/pig/bin/../lib/hadoop-gpl-compression-0.2.0-dev.jar:/opt/pig/bin/../lib/hbase-0.90.3.jar:/opt/pig/bin/../lib/pigudfs.jar:/opt/pig/bin/../lib/zookeeper-3.3.2.jar:/opt/pig/bin/../pig-withouthadoop.jar:
> 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:java.library.path=/opt/hadoop/lib/native/Linux-amd64-64
> 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:java.io.tmpdir=/tmp
> 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:java.compiler=<NA>
> 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:os.name=Linux
> 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:os.arch=amd64
> 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:os.version=2.6.32-5-amd64
> 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:user.name=root
> 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:user.home=/root
> 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Client environment:user.dir=/root
> 2012-01-30 16:12:45,039 [main] INFO  org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=lxc233.machine.com:2222,lxc231.machine.com:2222,lxc234.machine.com:2222 sessionTimeout=180000 watcher=hconnection
> 2012-01-30 16:12:45,048 [main-SendThread()] INFO  org.apache.zookeeper.ClientCnxn - Opening socket connection to server lxc231.machine.com/192.168.1.231:2222
> 2012-01-30 16:12:45,049 [main-SendThread(lxc231.machine.com:2222)] INFO  org.apache.zookeeper.ClientCnxn - Socket connection established to lxc231.machine.com/192.168.1.231:2222, initiating session
> 2012-01-30 16:12:45,081 [main-SendThread(lxc231.machine.com:2222)] INFO  org.apache.zookeeper.ClientCnxn - Session establishment complete on server lxc231.machine.com/192.168.1.231:2222, sessionid = 0x134c294771a073f, negotiated timeout = 180000
> 2012-01-30 16:12:46,569 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
> 2012-01-30 16:12:46,590 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
> 2012-01-30 16:12:46,870 [Thread-13] INFO  org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=lxc233.machine.com:2222,lxc231.machine.com:2222,lxc234.machine.com:2222 sessionTimeout=180000 watcher=hconnection
> 2012-01-30 16:12:46,871 [Thread-13-SendThread()] INFO  org.apache.zookeeper.ClientCnxn - Opening socket connection to server lxc233.machine.com/192.168.1.233:2222
> 2012-01-30 16:12:46,871 [Thread-13-SendThread(lxc233.machine.com:2222)] INFO  org.apache.zookeeper.ClientCnxn - Socket connection established to lxc233.machine.com/192.168.1.233:2222, initiating session
> 2012-01-30 16:12:46,872 [Thread-13-SendThread(lxc233.machine.com:2222)] INFO  org.apache.zookeeper.ClientCnxn - Session establishment complete on server lxc233.machine.com/192.168.1.233:2222, sessionid = 0x2343822449935e1, negotiated timeout = 180000
> 2012-01-30 16:12:46,880 [Thread-13] INFO  org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=lxc233.machine.com:2222,lxc231.machine.com:2222,lxc234.machine.com:2222 sessionTimeout=180000 watcher=hconnection
> 2012-01-30 16:12:46,880 [Thread-13-SendThread()] INFO  org.apache.zookeeper.ClientCnxn - Opening socket connection to server lxc233.machine.com/192.168.1.233:2222
> 2012-01-30 16:12:46,880 [Thread-13-SendThread(lxc233.machine.com:2222)] INFO  org.apache.zookeeper.ClientCnxn - Socket connection established to lxc233.machine.com/192.168.1.233:2222, initiating session
> 2012-01-30 16:12:46,882 [Thread-13-SendThread(lxc233.machine.com:2222)] INFO  org.apache.zookeeper.ClientCnxn - Session establishment complete on server lxc233.machine.com/192.168.1.233:2222, sessionid = 0x2343822449935e2, negotiated timeout = 180000
> 2012-01-30 16:12:47,091 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
> 2012-01-30 16:12:47,703 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201201201546_0890
> 2012-01-30 16:12:47,703 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://lxc233:50030/jobdetails.jsp?jobid=job_201201201546_0890
> 2012-01-30 16:12:55,723 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 25% complete
> 2012-01-30 16:13:49,312 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 33% complete
> 2012-01-30 16:13:55,322 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete
> 2012-01-30 16:13:57,327 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201201201546_0890 has failed! Stop running all dependent jobs
> 2012-01-30 16:13:57,327 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
> 2012-01-30 16:13:57,337 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: Could create instance of class org.apache.pig.backend.hadoop.hbase.HBaseStorage$1, while attempting to de-serialize it. (no default constructor ?)
> 2012-01-30 16:13:57,337 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> 2012-01-30 16:13:57,338 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics: 
> HadoopVersion	PigVersion	UserId	StartedAt	FinishedAt	Features
> 0.20-append	0.9.2-SNAPSHOT	root	2012-01-30 16:12:44	2012-01-30 16:13:57	MERGE_JION
> Failed!
> Failed Jobs:
> JobId	Alias	Feature	Message	Outputs
> job_201201201546_0890	end_sessions	INDEXER	Message: Job failed!	
> Input(s):
> Failed to read data from "hbase://endSession.bea000000.dev.ubithere.com"
> Output(s):
> Counters:
> Total records written : 0
> Total bytes written : 0
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
> Job DAG:
> job_201201201546_0890	->	null,
> null
> 2012-01-30 16:13:57,338 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!
> 2012-01-30 16:13:57,339 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Encountered IOException. Could create instance of class org.apache.pig.backend.hadoop.hbase.HBaseStorage$1, while attempting to de-serialize it. (no default constructor ?)
> Details at logfile: /root/pig_1327939963919.log
> 2012-01-30 16:13:57,339 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2244: Job failed, hadoop does not return any error message
> Details at logfile: /root/pig_1327939963919.log
> {noformat} 
> And here is the result in the log file :
> {noformat}
> Backend error message
> ---------------------
> java.io.IOException: Could create instance of class org.apache.pig.backend.hadoop.hbase.HBaseStorage$1, while attempting to de-serialize it. (no default constructor ?)
> 	at org.apache.pig.data.BinInterSedes.readWritable(BinInterSedes.java:235)
> 	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:336)
> 	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
> 	at org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:556)
> 	at org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64)
> 	at org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.java:114)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> 	at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:113)
> 	at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
> 	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
> 	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.lang.InstantiationException: org.apache.pig.backend.hadoop.hbase.HBaseStorage$1
> 	at java.lang.Class.newInstance0(Class.java:340)
> 	at java.lang.Class.newInstance(Class.java:308)
> 	at org.apache.pig.data.BinInterSedes.readWritable(BinInterSedes.java:231)
> 	... 13 more
> Pig Stack Trace
> ---------------
> ERROR 2997: Encountered IOException. Could create instance of class org.apache.pig.backend.hadoop.hbase.HBaseStorage$1, while attempting to de-serialize it. (no default constructor ?)
> java.io.IOException: Could create instance of class org.apache.pig.backend.hadoop.hbase.HBaseStorage$1, while attempting to de-serialize it. (no default constructor ?)
> 	at org.apache.pig.data.BinInterSedes.readWritable(BinInterSedes.java:235)
> 	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:336)
> 	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
> 	at org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:556)
> 	at org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64)
> 	at org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.java:114)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> 	at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:113)
> 	at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
> 	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
> 	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.lang.InstantiationException: org.apache.pig.backend.hadoop.hbase.HBaseStorage$1
> 	at java.lang.Class.newInstance0(Class.java:340)
> 	at java.lang.Class.newInstance(Class.java:308)
> 	at org.apache.pig.data.BinInterSedes.readWritable(BinInterSedes.java:231)
> ================================================================================
> Pig Stack Trace
> ---------------
> ERROR 2244: Job failed, hadoop does not return any error message
> org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, hadoop does not return any error message
> 	at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:139)
> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:192)
> 	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
> 	at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
> 	at org.apache.pig.Main.run(Main.java:561)
> 	at org.apache.pig.Main.main(Main.java:111)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> ================================================================================
> {noformat}
> The same script without using merge works without any problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira