You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jan Van den bosch (Jira)" <ji...@apache.org> on 2020/01/20 12:44:00 UTC

[jira] [Updated] (SPARK-30586) NPE in LiveRDDDistribution (AppStatusListener)

     [ https://issues.apache.org/jira/browse/SPARK-30586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Van den bosch updated SPARK-30586:
--------------------------------------
    Description: 
We've been noticing a great amount of NullPointerExceptions in our long-running Spark job driver logs:
{noformat}
20/01/17 23:40:12 ERROR AsyncEventQueue: Listener AppStatusListener threw an exception
java.lang.NullPointerException
        at org.spark_project.guava.base.Preconditions.checkNotNull(Preconditions.java:191)
        at org.spark_project.guava.collect.MapMakerInternalMap.putIfAbsent(MapMakerInternalMap.java:3507)
        at org.spark_project.guava.collect.Interners$WeakInterner.intern(Interners.java:85)
        at org.apache.spark.status.LiveEntityHelpers$.weakIntern(LiveEntity.scala:603)
        at org.apache.spark.status.LiveRDDDistribution.toApi(LiveEntity.scala:486)
        at org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548)
        at org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
        at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
        at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
        at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.AbstractTraversable.map(Traversable.scala:104)
        at org.apache.spark.status.LiveRDD.doUpdate(LiveEntity.scala:548)
        at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:49)
        at org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$update(AppStatusListener.scala:991)
        at org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$maybeUpdate(AppStatusListener.scala:997)
        at org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764)
        at org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764)
        at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
        at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
        at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
        at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139)
        at org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$flush(AppStatusListener.scala:788)
        at org.apache.spark.status.AppStatusListener.onExecutorMetricsUpdate(AppStatusListener.scala:764)
        at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:59)
        at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
        at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
        at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
        at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
        at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
        at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
        at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
        at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
        at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
        at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
{noformat}
Symptoms of a Spark app that made us investigate the logs in the first place include:
 * slower execution of submitted jobs
 * jobs remaining "Active Jobs" in the Spark UI even though they should have completed days ago
 * these jobs could not be killed from the Spark UI (the page refreshes but the jobs remained there)
 * stages for these jobs could not be examined in the Spark UI because it returned an error instead.

  was:
We've been noticing a great amount of NullPointerExceptions in our long-running Spark job driver logs:

{noformat}
20/01/17 23:40:12 ERROR AsyncEventQueue: Listener AppStatusListener threw an exception
java.lang.NullPointerException
        at org.spark_project.guava.base.Preconditions.checkNotNull(Preconditions.java:191)
        at org.spark_project.guava.collect.MapMakerInternalMap.putIfAbsent(MapMakerInternalMap.java:3507)
        at org.spark_project.guava.collect.Interners$WeakInterner.intern(Interners.java:85)
        at org.apache.spark.status.LiveEntityHelpers$.weakIntern(LiveEntity.scala:603)
        at org.apache.spark.status.LiveRDDDistribution.toApi(LiveEntity.scala:486)
        at org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548)
        at org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
        at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
        at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
        at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.AbstractTraversable.map(Traversable.scala:104)
        at org.apache.spark.status.LiveRDD.doUpdate(LiveEntity.scala:548)
        at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:49)
        at org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$update(AppStatusListener.scala:991)
        at org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$maybeUpdate(AppStatusListener.scala:997)
        at org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764)
        at org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764)
        at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
        at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
        at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
        at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139)
        at org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$flush(AppStatusListener.scala:788)
        at org.apache.spark.status.AppStatusListener.onExecutorMetricsUpdate(AppStatusListener.scala:764)
        at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:59)
        at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
        at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
        at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
        at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
        at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
        at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
        at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
        at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
        at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
        at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
{noformat}

Symptoms of a Spark app that made us investigate the logs in the first place include:
* slower execution of submitted jobs
* jobs remaining "Active Jobs" in the Spark UI even though they should have completed days ago
* these jobs could not be killed from the Spark UI (the pages refreshes but the jobs remained there)
* stages for these jobs could not be examined in the Spark UI because it returned an error instead.


> NPE in LiveRDDDistribution (AppStatusListener)
> ----------------------------------------------
>
>                 Key: SPARK-30586
>                 URL: https://issues.apache.org/jira/browse/SPARK-30586
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.4
>         Environment: A Hadoop cluster consisting of Centos 7.4 machines.
>            Reporter: Jan Van den bosch
>            Priority: Major
>
> We've been noticing a great amount of NullPointerExceptions in our long-running Spark job driver logs:
> {noformat}
> 20/01/17 23:40:12 ERROR AsyncEventQueue: Listener AppStatusListener threw an exception
> java.lang.NullPointerException
>         at org.spark_project.guava.base.Preconditions.checkNotNull(Preconditions.java:191)
>         at org.spark_project.guava.collect.MapMakerInternalMap.putIfAbsent(MapMakerInternalMap.java:3507)
>         at org.spark_project.guava.collect.Interners$WeakInterner.intern(Interners.java:85)
>         at org.apache.spark.status.LiveEntityHelpers$.weakIntern(LiveEntity.scala:603)
>         at org.apache.spark.status.LiveRDDDistribution.toApi(LiveEntity.scala:486)
>         at org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548)
>         at org.apache.spark.status.LiveRDD$$anonfun$2.apply(LiveEntity.scala:548)
>         at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>         at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>         at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
>         at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
>         at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
>         at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>         at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139)
>         at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>         at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>         at org.apache.spark.status.LiveRDD.doUpdate(LiveEntity.scala:548)
>         at org.apache.spark.status.LiveEntity.write(LiveEntity.scala:49)
>         at org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$update(AppStatusListener.scala:991)
>         at org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$maybeUpdate(AppStatusListener.scala:997)
>         at org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764)
>         at org.apache.spark.status.AppStatusListener$$anonfun$onExecutorMetricsUpdate$2.apply(AppStatusListener.scala:764)
>         at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
>         at scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:139)
>         at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
>         at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>         at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:139)
>         at org.apache.spark.status.AppStatusListener.org$apache$spark$status$AppStatusListener$$flush(AppStatusListener.scala:788)
>         at org.apache.spark.status.AppStatusListener.onExecutorMetricsUpdate(AppStatusListener.scala:764)
>         at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:59)
>         at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>         at org.apache.spark.scheduler.AsyncEventQueue.doPostEvent(AsyncEventQueue.scala:37)
>         at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:91)
>         at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$super$postToAll(AsyncEventQueue.scala:92)
>         at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply$mcJ$sp(AsyncEventQueue.scala:92)
>         at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>         at org.apache.spark.scheduler.AsyncEventQueue$$anonfun$org$apache$spark$scheduler$AsyncEventQueue$$dispatch$1.apply(AsyncEventQueue.scala:87)
>         at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>         at org.apache.spark.scheduler.AsyncEventQueue.org$apache$spark$scheduler$AsyncEventQueue$$dispatch(AsyncEventQueue.scala:87)
>         at org.apache.spark.scheduler.AsyncEventQueue$$anon$1$$anonfun$run$1.apply$mcV$sp(AsyncEventQueue.scala:83)
>         at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302)
>         at org.apache.spark.scheduler.AsyncEventQueue$$anon$1.run(AsyncEventQueue.scala:82)
> {noformat}
> Symptoms of a Spark app that made us investigate the logs in the first place include:
>  * slower execution of submitted jobs
>  * jobs remaining "Active Jobs" in the Spark UI even though they should have completed days ago
>  * these jobs could not be killed from the Spark UI (the page refreshes but the jobs remained there)
>  * stages for these jobs could not be examined in the Spark UI because it returned an error instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org