You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "William Watson (JIRA)" <ji...@apache.org> on 2017/05/09 17:50:04 UTC
[jira] [Commented] (PIG-5208) Two HBase Loads Followed By a Merge
Join Fails in Mapreduce or Tez Mode
[ https://issues.apache.org/jira/browse/PIG-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16003157#comment-16003157 ]
William Watson commented on PIG-5208:
-------------------------------------
Could this be related to the fact that TableSplitComparable implements: {code}
public int compareTo(org.apache.hadoop.hbase.mapreduce.TableSplit split)
{code}
but doesn't implement something like: {code}
public int compareTo(TableSplitComparable split)
{code}
?
> Two HBase Loads Followed By a Merge Join Fails in Mapreduce or Tez Mode
> -----------------------------------------------------------------------
>
> Key: PIG-5208
> URL: https://issues.apache.org/jira/browse/PIG-5208
> Project: Pig
> Issue Type: Bug
> Reporter: William Watson
>
> I posted this issue to the mailing list awhile back and didn't get a response. Today, I picked this back up, tried on Tez instead of Mapreduce and got the same error. In local mode, this works. As far as I can tell, I've been able to replicate this enough that I feel this is a real bug in pig.
> Here's the original mailing list post with all the details I have from the original time I documented this error: https://www.mail-archive.com/user@pig.apache.org/msg10553.html
> Here's the stack trace from my tez run today: {code}
> 2084439 [main] ERROR org.apache.pig.tools.grunt.GruntParser - ERROR 2998: Unhandled internal error. Vertex failed, vertexName=scope-1797, vertexId=vertex_1490968035192_0008_1_01, diagnostics=[Task failed, taskId=task_1490968035192_0008_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: Error while doing final merge
> at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:318)
> at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:285)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassCastException: org.apache.pig.backend.hadoop.hbase.TableSplitComparable cannot be cast to org.apache.hadoop.hbase.mapreduce.TableSplit
> at org.apache.pig.backend.hadoop.hbase.TableSplitComparable.compareTo(TableSplitComparable.java:26)
> at org.apache.pig.data.DataType.compare(DataType.java:566)
> at org.apache.pig.data.DataType.compare(DataType.java:464)
> at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareDatum(BinInterSedes.java:1106)
> at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:1082)
> at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:787)
> at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:728)
> at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTupleSortComparator.compare(PigTupleSortComparator.java:100)
> at org.apache.tez.runtime.library.common.sort.impl.TezMerger$MergeQueue.lessThan(TezMerger.java:684)
> at org.apache.hadoop.util.PriorityQueue.upHeap(PriorityQueue.java:128)
> at org.apache.hadoop.util.PriorityQueue.put(PriorityQueue.java:55)
> at org.apache.tez.runtime.library.common.sort.impl.TezMerger$MergeQueue.merge(TezMerger.java:783)
> at org.apache.tez.runtime.library.common.sort.impl.TezMerger$MergeQueue.merge(TezMerger.java:694)
> at org.apache.tez.runtime.library.common.sort.impl.TezMerger.merge(TezMerger.java:150)
> at org.apache.tez.runtime.library.common.sort.impl.TezMerger.merge(TezMerger.java:132)
> at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.finalMerge(MergeManager.java:1124)
> at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:583)
> at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:314)
> ... 6 more
> {code}
> And here's the test script I was using with the names of tables and columns changed: {code}
> side_a = LOAD 'hbase://ads' USING
> org.apache.pig.backend.hadoop.hbase.HBaseStorage(
> 'cf1:user_id cf1:ad_id',
> '-minTimestamp=1470024000000 -maxTimestamp=1491019199000 -regex=\\\\|agds=(156)\\\\|'
> ) AS (user_id:chararray, ad_id:chararray);
> side_a = FILTER side_a BY ad_id == '440';
> side_b = LOAD 'hbase://ads' USING
> org.apache.pig.backend.hadoop.hbase.HBaseStorage(
> 'cf1:user_id cf1:ad_id',
> '-minTimestamp=1470024000000 -maxTimestamp=1491019199000 -regex=\\\\|agds=(156)\\\\|'
> ) AS (user_id:chararray, ad_id:chararray);
> side_b = FILTER side_b BY ad_id == '439';
> side_b = JOIN
> side_a BY user_id,
> side_b BY user_id
> USING 'merge';
> after_merge_join = FOREACH side_b GENERATE
> side_b::user_id;
> STORE after_merge_join
> INTO 'hbase://results'
> USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('', '');
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)