You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yan Facai (颜发才 JIRA)" <ji...@apache.org> on 2017/05/26 09:03:04 UTC

[jira] [Comment Edited] (SPARK-19581) running NaiveBayes model with 0 features can crash the executor with D rorreGEMV

    [ https://issues.apache.org/jira/browse/SPARK-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16015345#comment-16015345 ] 

Yan Facai (颜发才) edited comment on SPARK-19581 at 5/26/17 9:02 AM:
------------------------------------------------------------------

[~barrybecker4] Hi, Becker.
I can't reproduce the bug on spark-2.1.1-bin-hadoop2.7.

1) For 0 size of feature, the exception is harmless.

{code}
  val data = spark.read.format("libsvm").load("/user/facai/data/libsvm/sample_libsvm_data.txt").cache
  import org.apache.spark.ml.classification.NaiveBayes
  val model = new NaiveBayes().fit(data)
  import org.apache.spark.ml.linalg.{Vectors => SV}
  case class TestData(features: org.apache.spark.ml.linalg.Vector)
  val emptyVector = SV.sparse(0, Array.empty[Int], Array.empty[Double])
  val test = Seq(TestData(emptyVector)).toDF
scala>  test.show
+---------+
| features|
+---------+
|(0,[],[])|
+---------+

scala> model.transform(test).show
org.apache.spark.SparkException: Failed to execute user defined function($anonfun$1: (vector) => vector)
  at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1072)
  ... 48 elided
Caused by: java.lang.IllegalArgumentException: requirement failed: The columns of A don't match the number of elements of x. A: 692, x: 0
  at scala.Predef$.require(Predef.scala:224)
  ... 99 more
{code}

2) For 692 size of empty feature, it's OK.

{code}
scala> val emptyVector = SV.sparse(692, Array.empty[Int], Array.empty[Double])
emptyVector: org.apache.spark.ml.linalg.Vector = (692,[],[])

scala> val test = Seq(TestData(emptyVector)).toDF
test: org.apache.spark.sql.DataFrame = [features: vector]

scala> test.show
+-----------+
|   features|
+-----------+
|(692,[],[])|
+-----------+

scala> model.transform(test).show
+-----------+--------------------+--------------------+----------+
|   features|       rawPrediction|         probability|prediction|
+-----------+--------------------+--------------------+----------+
|(692,[],[])|[-0.8407831793660...|[0.43137254901960...|       1.0|
+-----------+--------------------+--------------------+----------+
{code}


was (Author: facai):
[~barrybecker4] Hi, Becker.
I can't reproduce the bug on spark-2.1.1-bin-hadoop2.7.

1) For 0 size of feature, the exception is harmless.

```scala
  val data = spark.read.format("libsvm").load("/user/facai/data/libsvm/sample_libsvm_data.txt").cache
  import org.apache.spark.ml.classification.NaiveBayes
  val model = new NaiveBayes().fit(data)
  import org.apache.spark.ml.linalg.{Vectors => SV}
  case class TestData(features: org.apache.spark.ml.linalg.Vector)
  val emptyVector = SV.sparse(0, Array.empty[Int], Array.empty[Double])
  val test = Seq(TestData(emptyVector)).toDF
scala>  test.show
+---------+
| features|
+---------+
|(0,[],[])|
+---------+

scala> model.transform(test).show
org.apache.spark.SparkException: Failed to execute user defined function($anonfun$1: (vector) => vector)
  at org.apache.spark.sql.catalyst.expressions.ScalaUDF.eval(ScalaUDF.scala:1072)
  ... 48 elided
Caused by: java.lang.IllegalArgumentException: requirement failed: The columns of A don't match the number of elements of x. A: 692, x: 0
  at scala.Predef$.require(Predef.scala:224)
  ... 99 more
```

2) For 692 size of empty feature, it's OK.

```scala
scala> val emptyVector = SV.sparse(692, Array.empty[Int], Array.empty[Double])
emptyVector: org.apache.spark.ml.linalg.Vector = (692,[],[])

scala> val test = Seq(TestData(emptyVector)).toDF
test: org.apache.spark.sql.DataFrame = [features: vector]

scala> test.show
+-----------+
|   features|
+-----------+
|(692,[],[])|
+-----------+

scala> model.transform(test).show
+-----------+--------------------+--------------------+----------+
|   features|       rawPrediction|         probability|prediction|
+-----------+--------------------+--------------------+----------+
|(692,[],[])|[-0.8407831793660...|[0.43137254901960...|       1.0|
+-----------+--------------------+--------------------+----------+

```

> running NaiveBayes model with 0 features can crash the executor with D rorreGEMV
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-19581
>                 URL: https://issues.apache.org/jira/browse/SPARK-19581
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.1.0
>         Environment: spark development or standalone mode on windows or linux.
>            Reporter: Barry Becker
>            Priority: Minor
>
> The severity of this bug is high (because nothing should cause spark to crash like this) but the priority may be low (because there is an easy workaround).
> In our application, a user can select features and a target to run the NaiveBayes inducer. If columns have too many values or all one value, they will be removed before we call the inducer to create the model. As a result, there are some cases, where all the features may get removed. When this happens, executors will crash and get restarted (if on a cluster) or spark will crash and need to be manually restarted (if in development mode).
> It looks like NaiveBayes uses BLAS, and BLAS does not handle this case well when it is encountered. I emits this vague error :
> ** On entry to DGEMV  parameter number  6 had an illegal value
> and terminates.
> My code looks like this:
> {code}
>    val predictions = model.transform(testData)  // Make predictions
>     // figure out how many were correctly predicted
>     val numCorrect = predictions.filter(new Column(actualTarget) === new Column(PREDICTION_LABEL_COLUMN)).count()
>     val numIncorrect = testRowCount - numCorrect
> {code}
> The failure is at the line that does the count, but it is not the count that causes the problem, it is the model.transform step (where the model contains the NaiveBayes classifier).
> Here is the stack trace (in development mode):
> {code}
> [2017-02-13 06:28:39,946] TRACE evidence.EvidenceVizModel$ [] [akka://JobServer/user/context-supervisor/sql-context] -      done making predictions in 232
>  ** On entry to DGEMV  parameter number  6 had an illegal value
>  ** On entry to DGEMV  parameter number  6 had an illegal value
>  ** On entry to DGEMV  parameter number  6 had an illegal value
> [2017-02-13 06:28:40,506] ERROR .scheduler.LiveListenerBus [] [akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has already stopped! Dropping event SparkListenerSQLExecutionEnd(9,1486996120505)
> [2017-02-13 06:28:40,506] ERROR .scheduler.LiveListenerBus [] [akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has already stopped! Dropping event SparkListenerStageCompleted(org.apache.spark.scheduler.StageInfo@1f6c4a29)
> [2017-02-13 06:28:40,508] ERROR .scheduler.LiveListenerBus [] [akka://JobServer/user/context-supervisor/sql-context] - SparkListenerBus has already stopped! Dropping event SparkListenerJobEnd(12,1486996120507,JobFailed(org.apache.spark.SparkException: Job 12 cancelled because SparkContext was shut down))
> [2017-02-13 06:28:40,509] ERROR .jobserver.JobManagerActor [] [akka://JobServer/user/context-supervisor/sql-context] - Got Throwable
> org.apache.spark.SparkException: Job 12 cancelled because SparkContext was shut down
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:808)
>         at org.apache.spark.scheduler.DAGScheduler$$anonfun$cleanUpAfterSchedulerStop$1.apply(DAGScheduler.scala:806)
>         at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
>         at org.apache.spark.scheduler.DAGScheduler.cleanUpAfterSchedulerStop(DAGScheduler.scala:806)
>         at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onStop(DAGScheduler.scala:1668)
>         at org.apache.spark.util.EventLoop.stop(EventLoop.scala:83)
>         at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1587)
>         at org.apache.spark.SparkContext$$anonfun$stop$8.apply$mcV$sp(SparkContext.scala:1826)
>         at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1283)
>         at org.apache.spark.SparkContext.stop(SparkContext.scala:1825)
>         at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:581)
>         at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
>         at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
> {code}
> and here it is when running in standalone mode:
> {code}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 7134.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7134.0 (TID 13671, 192.168.124.23, executor 8): ExecutorLostFailure (executor 8 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages. Driver stacktrace:org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435) org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423) org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422) scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422) org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802) scala.Option.foreach(Option.scala:257) org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802) org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650) org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605) org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594) org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628) org.apache.spark.SparkContext.runJob(SparkContext.scala:1918) org.apache.spark.SparkContext.runJob(SparkContext.scala:1931) org.apache.spark.SparkContext.runJob(SparkContext.scala:1944) org.apache.spark.SparkContext.runJob(SparkContext.scala:1958) org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935) org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) org.apache.spark.rdd.RDD.withScope(RDD.scala:362) org.apache.spark.rdd.RDD.collect(RDD.scala:934) org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:275) org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2371) org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765) org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2370) org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2377) org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2405) org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2404) org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2778) org.apache.spark.sql.Dataset.count(Dataset.scala:2404) com.mineset.spark.ml.evidence.EvidenceVizModel.getModelValidationInfo(EvidenceVizModel.scala:338) com.mineset.spark.ml.evidence.EvidenceVizModel.getJsonObject(EvidenceVizModel.scala:97) com.mineset.spark.ml.evidence.EvidenceInducer.execute(EvidenceInducer.scala:129) com.mineset.spark.ml.evidence.EvidenceInducer.execute(EvidenceInducer.scala:83) com.mineset.spark.common.util.CommandProcessor.process(CommandProcessor.scala:39) com.mineset.spark.ml.MinesetMachineLearning.processCommands(MinesetMachineLearning.scala:79) com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:53) com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39) spark.jobserver.SparkJobBase$class.runJob(SparkJob.scala:31) com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39) com.mineset.spark.ml.MachineLearningJob$.runJob(MachineLearningJob.scala:39) spark.jobserver.JobManagerActor$$anonfun$getJobFuture$4.apply(JobManagerActor.scala:292) scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org