You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by "ZHANGHONGJIA (Jira)" <ji...@apache.org> on 2021/06/10 08:54:00 UTC
[jira] [Created] (KYLIN-5008) backend spark was failed, but
corresponding job status is shown as finished in WebUI
ZHANGHONGJIA created KYLIN-5008:
-----------------------------------
Summary: backend spark was failed, but corresponding job status is shown as finished in WebUI
Key: KYLIN-5008
URL: https://issues.apache.org/jira/browse/KYLIN-5008
Project: Kylin
Issue Type: Bug
Affects Versions: v4.0.0-beta
Reporter: ZHANGHONGJIA
Attachments: image-2021-06-10-16-46-35-919.png, merge-job.log
According to the log shown as below, the spark project was failed due to Container killed by YARN for exceeding memory limits , but in Kylin WebUI ,the status of the mergeJob is finished. Besides, the amount of data in the segment after merged is as three times as the amount of actual data . It seems that kylin didn't monitor the failure of this merge job.
Here is the merge job log :
===============================================================
at org.apache.kylin.engine.spark.job.BuildLayoutWithUpdate$1.call(BuildLayoutWithUpdate.java:43)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 244 in stage 1108.0 failed 4 times, most recent failure: Lost task 244.3 in stage 1108.0 (TID 78736, r4200h1-app.travelsky.com, executor 109): ExecutorLostFailure (executor 109 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 39.0 GB of 36 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1891)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1879)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1878)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1878)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:927)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2112)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2061)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2050)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:738)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:167)
... 34 more
}
RetryInfo{
overrideConf : \{spark.executor.memory=36618MB, spark.executor.memoryOverhead=7323MB},
throwable : java.lang.RuntimeException: Error execute org.apache.kylin.engine.spark.job.CubeMergeJob
at org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:92)
at org.apache.spark.application.JobWorker$$anon$2.run(JobWorker.scala:55)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: org.apache.spark.SparkException: Job aborted.
at org.apache.kylin.engine.spark.job.BuildLayoutWithUpdate.updateLayout(BuildLayoutWithUpdate.java:70)
at org.apache.kylin.engine.spark.job.CubeMergeJob.mergeSegments(CubeMergeJob.java:122)
at org.apache.kylin.engine.spark.job.CubeMergeJob.doExecute(CubeMergeJob.java:82)
at org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:298)
at org.apache.kylin.engine.spark.application.SparkApplication.execute(SparkApplication.java:89)
... 4 more
Caused by: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677)
at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:677)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:677)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:286)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:272)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:230)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:567)
at org.apache.kylin.engine.spark.storage.ParquetStorage.saveTo(ParquetStorage.scala:28)
at org.apache.kylin.engine.spark.job.CubeMergeJob.saveAndUpdateCuboid(CubeMergeJob.java:171)
at org.apache.kylin.engine.spark.job.CubeMergeJob.access$000(CubeMergeJob.java:59)
at org.apache.kylin.engine.spark.job.CubeMergeJob$1.build(CubeMergeJob.java:118)
at org.apache.kylin.engine.spark.job.BuildLayoutWithUpdate$1.call(BuildLayoutWithUpdate.java:51)
at org.apache.kylin.engine.spark.job.BuildLayoutWithUpdate$1.call(BuildLayoutWithUpdate.java:43)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 428 in stage 360.0 failed 4 times, most recent failure: Lost task 428.3 in stage 360.0 (TID 26130, umetrip40-hdp2.6-140.travelsky.com, executor 1): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 48.4 GB of 46 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead or disabling yarn.nodemanager.vmem-check-enabled because of YARN-4714.
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1891)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1879)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1878)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1878)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:927)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:927)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2112)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2061)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2050)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:738)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:167)
... 34 more
}
The WebUI monitor:
!image-2021-06-10-16-46-35-919.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)