You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2014/11/25 10:56:12 UTC
[jira] [Commented] (SPARK-1148) Suggestions for exception handling (avoid potential bugs)

    [ https://issues.apache.org/jira/browse/SPARK-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224305#comment-14224305 ] 

Sean Owen commented on SPARK-1148:
----------------------------------

[~d.yuan] Several problems like this have been fixed since. Can you open a PR to resolve any others that your tool has found or is it all fixed now?

> Suggestions for exception handling (avoid potential bugs)
> ---------------------------------------------------------
>
>                 Key: SPARK-1148
>                 URL: https://issues.apache.org/jira/browse/SPARK-1148
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 0.9.0
>            Reporter: Ding Yuan
>
> Hi Spark developers,
> We are a group of researchers on software reliability. Recently we did a study and found that majority of the most severe failures in data-analytic systems are caused by bugs in exception handlers – that it is hard to anticipate all the possible real-world error scenarios. Therefore we built a simple checking tool that automatically detects some bug patterns that have caused some very severe real-world failures. I am reporting a few cases here. Any feedback is much appreciated!
> Ding
> =========================
> Case 1:
> Line: 1249, File: "org/apache/spark/SparkContext.scala"
> {noformat}
> 1244:         val scheduler = try {
> 1245:           val clazz = Class.forName("org.apache.spark.scheduler.cluster.YarnClusterScheduler")
> 1246:           val cons = clazz.getConstructor(classOf[SparkContext])
> 1247:           cons.newInstance(sc).asInstanceOf[TaskSchedulerImpl]
> 1248:         } catch {
> 1249:           // TODO: Enumerate the exact reasons why it can fail
> 1250:           // But irrespective of it, it means we cannot proceed !
> 1251:           case th: Throwable => {
> 1252:             throw new SparkException("YARN mode not available ?", th)
> 1253:           }
> {noformat}
> The comment suggests the specific exceptions should be enumerated here.
> The try block could throw the following exceptions:
> ClassNotFoundException
> NegativeArraySizeException
> NoSuchMethodException
> SecurityException
> InstantiationException
> IllegalAccessException
> IllegalArgumentException
> InvocationTargetException
> ClassCastException
> ==========================================
> =========================
> Case 2:
> Line: 282, File: "org/apache/spark/executor/Executor.scala"
> {noformat}
> 265:         case t: Throwable => {
> 266:           val serviceTime = (System.currentTimeMillis() - taskStart).toInt
> 267:           val metrics = attemptedTask.flatMap(t => t.metrics)
> 268:           for (m <- metrics) {
> 269:             m.executorRunTime = serviceTime
> 270:             m.jvmGCTime = gcTime - startGCTime
> 271:           }
> 272:           val reason = ExceptionFailure(t.getClass.getName, t.toString, t.getStackTrace, metrics)
> 273:           execBackend.statusUpdate(taskId, TaskState.FAILED, ser.serialize(reason))
> 274:
> 275:           // TODO: Should we exit the whole executor here? On the one hand, the failed task may
> 276:           // have left some weird state around depending on when the exception was thrown, but on
> 277:           // the other hand, maybe we could detect that when future tasks fail and exit then.
> 278:           logError("Exception in task ID " + taskId, t)
> 279:           //System.exit(1)
> 280:         }
> 281:       } finally {
> 282:         // TODO: Unregister shuffle memory only for ResultTask
> 283:         val shuffleMemoryMap = env.shuffleMemoryMap
> 284:         shuffleMemoryMap.synchronized {
> 285:           shuffleMemoryMap.remove(Thread.currentThread().getId)
> 286:         }
> 287:         runningTasks.remove(taskId)
> 288:       }
> {noformat}
> From the comment in this Throwable exception handler it seems to suggest that the system should just exit?
> ==========================================
> =========================
> Case 3:
> Line: 70, File: "org/apache/spark/network/netty/FileServerHandler.java"
> {noformat}
> 66:       try {
> 67:         ctx.write(new DefaultFileRegion(new FileInputStream(file)
> 68:           .getChannel(), fileSegment.offset(), fileSegment.length()));
> 69:       } catch (Exception e) {
> 70:           LOG.error("Exception: ", e);
> 71:       }
> {noformat}
> "Exception" is too general. The try block only throws "FileNotFoundException".
> Although there is nothing wrong with it now, but later if code evolves this
> might cause some other exceptions to be swallowed.
> ==========================================



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org