You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Ismaël Mejía (Jira)" <ji...@apache.org> on 2022/03/11 08:03:00 UTC

[jira] [Work started] (BEAM-13981) Job event log empty on Spark History Server

     [ https://issues.apache.org/jira/browse/BEAM-13981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on BEAM-13981 started by Ismaël Mejía.
-------------------------------------------
> Job event log empty on Spark History Server
> -------------------------------------------
>
>                 Key: BEAM-13981
>                 URL: https://issues.apache.org/jira/browse/BEAM-13981
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>    Affects Versions: 2.33.0
>            Reporter: Jozef Vilcek
>            Assignee: Ismaël Mejía
>            Priority: P2
>             Fix For: 2.38.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> After upgrade from Beam 2.24.0 -> 2.33.0 Spark jobs run on YARN after complete shows empty data on History server.
> The problem seems to be a race condition and 2 
> {noformat}
> 22/02/22 10:51:11 INFO EventLoggingListener: Logging events to hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1
> ...
> 22/02/22 10:51:41 INFO EventLoggingListener: Logging events to hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1{noformat}
> At the end failure:
> {noformat}
> 22/02/22 11:17:57 INFO SchedulerExtensionServices: Stopping SchedulerExtensionServices
> (serviceOption=None,
>  services=List(),
>  started=false)
> 22/02/22 11:17:57 ERROR Utils: Uncaught exception in thread Driver
> java.io.IOException: Target log file already exists (hdfs:/user/spark/jobhistory/application_1553109013416_12079975_1)
> 	at org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:255)
> 	at org.apache.spark.SparkContext.$anonfun$stop$13(SparkContext.scala:1960)
> 	at org.apache.spark.SparkContext.$anonfun$stop$13$adapted(SparkContext.scala:1960)
> 	at scala.Option.foreach(Option.scala:274)
> 	at org.apache.spark.SparkContext.$anonfun$stop$12(SparkContext.scala:1960)
> 	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
> 	at org.apache.spark.SparkContext.stop(SparkContext.scala:1960)
> 	at org.apache.spark.api.java.JavaSparkContext.stop(JavaSparkContext.scala:654)
> 	at org.apache.beam.runners.spark.translation.SparkContextFactory.stopSparkContext(SparkContextFactory.java:73)
> 	at org.apache.beam.runners.spark.SparkPipelineResult$BatchMode.stop(SparkPipelineResult.java:133)
> 	at org.apache.beam.runners.spark.SparkPipelineResult.offerNewState(SparkPipelineResult.java:234)
> 	at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:99)
> 	at org.apache.beam.runners.spark.SparkPipelineResult.waitUntilFinish(SparkPipelineResult.java:92)
> 	at com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.$anonfun$main$1(PipelineDriver.scala:41)
> 	at scala.util.Try$.apply(Try.scala:213)
> 	at com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.main(PipelineDriver.scala:34)
> 	at com.sizmek.dp.dsp.pipeline.driver.PipelineDriver.main$(PipelineDriver.scala:17)
> 	at com.zetaglobal.dp.dsp.jobs.dealstats.DealStatsDriver$.main(DealStatsDriver.scala:18)
> 	at com.zetaglobal.dp.dsp.jobs.dealstats.DealStatsDriver.main(DealStatsDriver.scala)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:684){noformat}
> This ends up with very empty file in HDFS and empty job details in history server.
>  
> Problem seems to by introduced by this change:
> [https://github.com/apache/beam/pull/14409]
> Why Beam runs concurrent event listener to what spark is doing internally? When I roll back change for SparkRunner, problem disappear for me.
> I am running native SparkRunner with Spark 2.4.4
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)