You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@kylin.apache.org by Jason Tang <ja...@striking.ly> on 2019/06/19 01:39:11 UTC
Kylin cube merge fail due to unordered data
Below is the log, "Unordered data cannot be split into multi trees” And the Kylin version is 2.6.1.
I found this commit https://issues.apache.org/jira/browse/KYLIN-2794 <https://issues.apache.org/jira/browse/KYLIN-2794> which fixed this issue in v2.3.0. But it seems the problem still exists. But when I refresh the related segments, and then do merge again, the merge action will succeed. Any suggestions? Thanks!
2019-06-18 17:07:25 INFO Client:54 - Application report for application_1559319352909_9207 (state: RUNNING)
2019-06-18 17:07:26 INFO Client:54 - Application report for application_1559319352909_9207 (state: RUNNING)
2019-06-18 17:07:27 INFO Client:54 - Application report for application_1559319352909_9207 (state: RUNNING)
2019-06-18 17:07:28 INFO Client:54 - Application report for application_1559319352909_9207 (state: RUNNING)
2019-06-18 17:07:29 INFO Client:54 - Application report for application_1559319352909_9207 (state: FINISHED)
2019-06-18 17:07:29 INFO Client:54 -
client token: N/A
diagnostics: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkMergingDictionary. Root cause: Job aborted.
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42)
at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:721)
Caused by: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:100)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1083)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1081)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:1000)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:991)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:991)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:991)
at org.apache.spark.api.java.JavaPairRDD.saveAsNewAPIHadoopFile(JavaPairRDD.scala:840)
at org.apache.kylin.engine.spark.SparkMergingDictionary.execute(SparkMergingDictionary.java:157)
at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
... 6 more
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, ip-10-201-1-61.cn-north-1.compute.internal, executor 2): org.apache.spark.SparkException: Task failed while writing rows
at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:155)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Invalid input data. Unordered data cannot be split into multi trees
at org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:93)
at org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:79)
at org.apache.kylin.dict.DictionaryGenerator$StringTrieDictForestBuilder.addValue(DictionaryGenerator.java:236)
at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:82)
at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:65)
at org.apache.kylin.dict.DictionaryGenerator.mergeDictionaries(DictionaryGenerator.java:112)
at org.apache.kylin.dict.DictionaryManager.mergeDictionary(DictionaryManager.java:263)
at org.apache.kylin.engine.spark.SparkMergingDictionary$MergeDictAndStatsFunction.call(SparkMergingDictionary.java:222)
at org.apache.kylin.engine.spark.SparkMergingDictionary$MergeDictAndStatsFunction.call(SparkMergingDictionary.java:162)
at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043)
at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:129)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:127)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1415)
at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:139)
... 8 more
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1651)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1639)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1638)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1638)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1872)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1821)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1810)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:78)
... 23 more
Caused by: org.apache.spark.SparkException: Task failed while writing rows
at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:155)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Invalid input data. Unordered data cannot be split into multi trees
at org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:93)
at org.apache.kylin.dict.TrieDictionaryForestBuilder.addValue(TrieDictionaryForestBuilder.java:79)
at org.apache.kylin.dict.DictionaryGenerator$StringTrieDictForestBuilder.addValue(DictionaryGenerator.java:236)
at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:82)
at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:65)
at org.apache.kylin.dict.DictionaryGenerator.mergeDictionaries(DictionaryGenerator.java:112)
at org.apache.kylin.dict.DictionaryManager.mergeDictionary(DictionaryManager.java:263)
at org.apache.kylin.engine.spark.SparkMergingDictionary$MergeDictAndStatsFunction.call(SparkMergingDictionary.java:222)
at org.apache.kylin.engine.spark.SparkMergingDictionary$MergeDictAndStatsFunction.call(SparkMergingDictionary.java:162)
at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043)
at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1043)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.next(Iterator.scala:444)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:129)
at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:127)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1415)
at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:139)
... 8 more
ApplicationMaster host: 10.201.1.218
ApplicationMaster RPC port: 0
queue: default
start time: 1560877570133
final status: FAILED
tracking URL: http://ip-10-201-1-15.cn-north-1.compute.internal:20888/proxy/application_1559319352909_9207/
user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1559319352909_9207 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1165)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1520)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2019-06-18 17:07:29 INFO ShutdownHookManager:54 - Shutdown hook called
2019-06-18 17:07:29 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-45ed432f-96b9-408f-98fe-6261183b3462
2019-06-18 17:07:29 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-bea08595-a825-44e6-9d79-40c33a4b597a
The command is:
Best Regards
Jason Tang