You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Mohit Sabharwal (JIRA)" <ji...@apache.org> on 2016/05/17 04:18:12 UTC
[jira] [Commented] (PIG-4886) Add PigSplit#getLocationInfo to fix
the NPE found in log in spark mode
[ https://issues.apache.org/jira/browse/PIG-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15286003#comment-15286003 ]
Mohit Sabharwal commented on PIG-4886:
--------------------------------------
Thanks, [~kellyzly] - left couple of comments on RB.
> Add PigSplit#getLocationInfo to fix the NPE found in log in spark mode
> ----------------------------------------------------------------------
>
> Key: PIG-4886
> URL: https://issues.apache.org/jira/browse/PIG-4886
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4886.patch
>
>
> Use branch code(119f313) to test following pig script in spark mode:
> {code}
> A = load './SkewedJoinInput1.txt' as (id,name,n);
> B = load './SkewedJoinInput2.txt' as (id,name);
> D = join A by (id,name), B by (id,name);
> store D into './testFRJoin.out';
> {code}
> cat bin/SkewedJoinInput1.txt
> {noformat}
> 100 apple1 aaa
> 200 orange1 bbb
> 300 strawberry ccc
> {noformat}
> cat bin/SkewedJoinInput2.txt
> {noformat}
> 100 apple1
> 100 apple2
> 100 apple2
> 200 orange1
> 200 orange2
> 300 strawberry
> 400 pear
> {noformat}
> following exception found in log:
> {noformat}
> [dag-scheduler-event-loop] 2016-05-05 14:21:01,046 DEBUG rdd.NewHadoopRDD (Logging.scala:logDebug(84)) - Failed to use InputSplit#getLocationInfo.
> java.lang.NullPointerException
> at scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114)
> at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:114)
> at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:32)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
> at org.apache.spark.rdd.HadoopRDD$.convertSplitLocationInfo(HadoopRDD.scala:406)
> at org.apache.spark.rdd.NewHadoopRDD.getPreferredLocations(NewHadoopRDD.scala:202)
> at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:231)
> at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:231)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:230)
> at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1387)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1397)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1396)
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1396)
> {noformat}
> org.apache.spark.rdd.NewHadoopRDD.getPreferredLocations will call PigSplit#getLocationInfo but currently PigSplit extends InputSplit and InputSplit#getLocationInfo return null.
> {code}
> @Evolving
> public SplitLocationInfo[] getLocationInfo() throws IOException {
> return null;
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)