You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Jason Tang <ja...@striking.ly> on 2019/06/19 01:39:11 UTC

Kylin cube merge fail due to unordered data

 Below is the log, "Unordered data cannot be split into multi trees” And the Kylin version is 2.6.1.

I found this commit https://issues.apache.org/jira/browse/KYLIN-2794 <https://issues.apache.org/jira/browse/KYLIN-2794> which fixed this issue in v2.3.0. But it seems the problem still exists. But when I refresh the related segments, and then do merge again, the merge action will succeed. Any suggestions? Thanks!

2019-06-18 17:07:25 INFO  Client:54 - Application report for application_1559319352909_9207 (state: RUNNING)
2019-06-18 17:07:26 INFO  Client:54 - Application report for application_1559319352909_9207 (state: RUNNING)
2019-06-18 17:07:27 INFO  Client:54 - Application report for application_1559319352909_9207 (state: RUNNING)
2019-06-18 17:07:28 INFO  Client:54 - Application report for application_1559319352909_9207 (state: RUNNING)
2019-06-18 17:07:29 INFO  Client:54 - Application report for application_1559319352909_9207 (state: FINISHED)
2019-06-18 17:07:29 INFO  Client:54 - 
	 client token: N/A
	 diagnostics: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkMergingDictionary. Root cause: Job aborted.
	at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
	at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:721)
Caused by: org.apache.spark.SparkException: Job aborted.
	at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:100)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1083)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1081)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:1000)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:991)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:991)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:991)
	at org.apache.spark.api.java.JavaPairRDD.saveAsNewAPIHadoopFile(JavaPairRDD.scala:840)
	at org.apache.kylin.engine.spark.SparkMergingDictionary.execute(SparkMergingDictionary.java:157)
	at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
	... 6 more
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, ip-10-201-1-61.cn-north-1.compute.internal, executor 2): org.apache.spark.SparkException: Task failed while writing rows
	at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:155)
	at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83)
	at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Invalid input data. Unordered data cannot be split into multi trees
	at org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:93)
	at org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:79)
	at org.apache.kylin.dict.DictionaryGenerator$StringTrieDictForestBuilder.addValue(DictionaryGenerator.java:236)
	at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:82)
	at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:65)
	at org.apache.kylin.dict.DictionaryGenerator.mergeDictionaries(DictionaryGenerator.java:112)
	at org.apache.kylin.dict.DictionaryManager.mergeDictionary(DictionaryManager.java:263)
	at org.apache.kylin.engine.spark.SparkMergingDictionary$MergeDictAndStatsFunction.call(SparkMergingDictionary.java:222)
	at org.apache.kylin.engine.spark.SparkMergingDictionary$MergeDictAndStatsFunction.call(SparkMergingDictionary.java:162)
	at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043)
	at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
	at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:129)
	at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:127)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1415)
	at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:139)
	... 8 more

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1651)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1639)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1638)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1638)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1872)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1821)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1810)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
	at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:78)
	... 23 more
Caused by: org.apache.spark.SparkException: Task failed while writing rows
	at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:155)
	at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83)
	at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
	at org.apache.spark.scheduler.Task.run(Task.scala:109)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Invalid input data. Unordered data cannot be split into multi trees
	at org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:93)
	at org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:79)
	at org.apache.kylin.dict.DictionaryGenerator$StringTrieDictForestBuilder.addValue(DictionaryGenerator.java:236)
	at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:82)
	at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:65)
	at org.apache.kylin.dict.DictionaryGenerator.mergeDictionaries(DictionaryGenerator.java:112)
	at org.apache.kylin.dict.DictionaryManager.mergeDictionary(DictionaryManager.java:263)
	at org.apache.kylin.engine.spark.SparkMergingDictionary$MergeDictAndStatsFunction.call(SparkMergingDictionary.java:222)
	at org.apache.kylin.engine.spark.SparkMergingDictionary$MergeDictAndStatsFunction.call(SparkMergingDictionary.java:162)
	at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043)
	at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
	at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
	at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:129)
	at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:127)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1415)
	at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:139)
	... 8 more

	 ApplicationMaster host: 10.201.1.218
	 ApplicationMaster RPC port: 0
	 queue: default
	 start time: 1560877570133
	 final status: FAILED
	 tracking URL: http://ip-10-201-1-15.cn-north-1.compute.internal:20888/proxy/application_1559319352909_9207/
	 user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1559319352909_9207 finished with failed status
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:1165)
	at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1520)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2019-06-18 17:07:29 INFO  ShutdownHookManager:54 - Shutdown hook called
2019-06-18 17:07:29 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-45ed432f-96b9-408f-98fe-6261183b3462
2019-06-18 17:07:29 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-bea08595-a825-44e6-9d79-40c33a4b597a
The command is: 

Best Regards
Jason Tang