You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2014/11/25 10:56:12 UTC
[jira] [Commented] (SPARK-1148) Suggestions for exception handling
(avoid potential bugs)
[ https://issues.apache.org/jira/browse/SPARK-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224305#comment-14224305 ]
Sean Owen commented on SPARK-1148:
----------------------------------
[~d.yuan] Several problems like this have been fixed since. Can you open a PR to resolve any others that your tool has found or is it all fixed now?
> Suggestions for exception handling (avoid potential bugs)
> ---------------------------------------------------------
>
> Key: SPARK-1148
> URL: https://issues.apache.org/jira/browse/SPARK-1148
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 0.9.0
> Reporter: Ding Yuan
>
> Hi Spark developers,
> We are a group of researchers on software reliability. Recently we did a study and found that majority of the most severe failures in data-analytic systems are caused by bugs in exception handlers – that it is hard to anticipate all the possible real-world error scenarios. Therefore we built a simple checking tool that automatically detects some bug patterns that have caused some very severe real-world failures. I am reporting a few cases here. Any feedback is much appreciated!
> Ding
> =========================
> Case 1:
> Line: 1249, File: "org/apache/spark/SparkContext.scala"
> {noformat}
> 1244: val scheduler = try {
> 1245: val clazz = Class.forName("org.apache.spark.scheduler.cluster.YarnClusterScheduler")
> 1246: val cons = clazz.getConstructor(classOf[SparkContext])
> 1247: cons.newInstance(sc).asInstanceOf[TaskSchedulerImpl]
> 1248: } catch {
> 1249: // TODO: Enumerate the exact reasons why it can fail
> 1250: // But irrespective of it, it means we cannot proceed !
> 1251: case th: Throwable => {
> 1252: throw new SparkException("YARN mode not available ?", th)
> 1253: }
> {noformat}
> The comment suggests the specific exceptions should be enumerated here.
> The try block could throw the following exceptions:
> ClassNotFoundException
> NegativeArraySizeException
> NoSuchMethodException
> SecurityException
> InstantiationException
> IllegalAccessException
> IllegalArgumentException
> InvocationTargetException
> ClassCastException
> ==========================================
> =========================
> Case 2:
> Line: 282, File: "org/apache/spark/executor/Executor.scala"
> {noformat}
> 265: case t: Throwable => {
> 266: val serviceTime = (System.currentTimeMillis() - taskStart).toInt
> 267: val metrics = attemptedTask.flatMap(t => t.metrics)
> 268: for (m <- metrics) {
> 269: m.executorRunTime = serviceTime
> 270: m.jvmGCTime = gcTime - startGCTime
> 271: }
> 272: val reason = ExceptionFailure(t.getClass.getName, t.toString, t.getStackTrace, metrics)
> 273: execBackend.statusUpdate(taskId, TaskState.FAILED, ser.serialize(reason))
> 274:
> 275: // TODO: Should we exit the whole executor here? On the one hand, the failed task may
> 276: // have left some weird state around depending on when the exception was thrown, but on
> 277: // the other hand, maybe we could detect that when future tasks fail and exit then.
> 278: logError("Exception in task ID " + taskId, t)
> 279: //System.exit(1)
> 280: }
> 281: } finally {
> 282: // TODO: Unregister shuffle memory only for ResultTask
> 283: val shuffleMemoryMap = env.shuffleMemoryMap
> 284: shuffleMemoryMap.synchronized {
> 285: shuffleMemoryMap.remove(Thread.currentThread().getId)
> 286: }
> 287: runningTasks.remove(taskId)
> 288: }
> {noformat}
> From the comment in this Throwable exception handler it seems to suggest that the system should just exit?
> ==========================================
> =========================
> Case 3:
> Line: 70, File: "org/apache/spark/network/netty/FileServerHandler.java"
> {noformat}
> 66: try {
> 67: ctx.write(new DefaultFileRegion(new FileInputStream(file)
> 68: .getChannel(), fileSegment.offset(), fileSegment.length()));
> 69: } catch (Exception e) {
> 70: LOG.error("Exception: ", e);
> 71: }
> {noformat}
> "Exception" is too general. The try block only throws "FileNotFoundException".
> Although there is nothing wrong with it now, but later if code evolves this
> might cause some other exceptions to be swallowed.
> ==========================================
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org