You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2017/10/08 15:53:00 UTC
[jira] [Commented] (SPARK-16882) Failures in JobGenerator Thread are Swallowed, Job Does Not Fail

    [ https://issues.apache.org/jira/browse/SPARK-16882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196187#comment-16196187 ] 

Hyukjin Kwon commented on SPARK-16882:
--------------------------------------

Hi [~zsxwing], do you maybe think this JIRA is resolvable for now?

> Failures in JobGenerator Thread are Swallowed, Job Does Not Fail
> ----------------------------------------------------------------
>
>                 Key: SPARK-16882
>                 URL: https://issues.apache.org/jira/browse/SPARK-16882
>             Project: Spark
>          Issue Type: Bug
>          Components: DStreams, Scheduler
>    Affects Versions: 1.5.0
>         Environment: CDH 5.6.1, CentOS 6.7
>            Reporter: Brian Schrameck
>
> Using the fileStream functionality and reading a directory with a large number of files over a long period of time, JVM garbage collection limits can be reached. In this case, the JobGenerator thread threw the exception, but it was completely swallowed and did not cause the job to fail. There were no errors in the ApplicationMaster, and the job just silently sat there not processing any further batches.
> It would be expected that any fatal exception, not necessarily specific to this OutOfMemoryError, be handled appropriately and the job should be killed with the correct failure code.
> We are running in YARN cluster mode on a CDH 5.6.1 cluster.
> {noformat}Exception in thread "JobGenerator" java.lang.OutOfMemoryError: GC overhead limit exceeded
> 	at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:68)
> 	at java.lang.StringBuilder.<init>(StringBuilder.java:89)
> 	at org.apache.hadoop.fs.Path.<init>(Path.java:109)
> 	at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:430)
> 	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1494)
> 	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1534)
> 	at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:569)
> 	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1494)
> 	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1534)
> 	at org.apache.spark.streaming.dstream.FileInputDStream.findNewFiles(FileInputDStream.scala:195)
> 	at org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:146)
> 	at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:350)
> 	at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:350)
> 	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> 	at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:349)
> 	at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:349)
> 	at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:399)
> 	at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:344)
> 	at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:342)
> 	at scala.Option.orElse(Option.scala:257)
> 	at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:339)
> 	at org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:38)
> 	at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:120)
> 	at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:120)
> 	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> 	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> 	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> 	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> 	at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
> 	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
> 	at org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:120)
> 	at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$2.apply(JobGenerator.scala:247){noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org