You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/05/05 11:39:03 UTC

[jira] [Resolved] (SPARK-7362) Spark MLlib libsvm isssues with data

     [ https://issues.apache.org/jira/browse/SPARK-7362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-7362.
------------------------------
    Resolution: Invalid

Hi [~doye] could you start with asking on the mailing list, and/or searching JIRA? There's not enough info to reproduce this or understand what the problem might be. See https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

> Spark MLlib libsvm isssues with data
> ------------------------------------
>
>                 Key: SPARK-7362
>                 URL: https://issues.apache.org/jira/browse/SPARK-7362
>             Project: Spark
>          Issue Type: Question
>          Components: MLlib
>    Affects Versions: 1.3.1
>         Environment: Linux version 3.13.0-45-generic (buildd@phianna) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) )
>            Reporter: doyexie
>
> Hi I'm trying with the demo in http://spark.apache.org/docs/1.2.1/mllib-linear-methods.html with the example with scala version. I run the demo it was worked fine but when I change data with mine the step of train it just error with
> 15/05/05 16:32:02 INFO TaskSetManager: Starting task 0.0 in stage 12.0 (TID 21, localhost, PROCESS_LOCAL, 1447 bytes)
> 15/05/05 16:32:02 INFO TaskSetManager: Starting task 1.0 in stage 12.0 (TID 22, localhost, PROCESS_LOCAL, 1447 bytes)
> 15/05/05 16:32:02 INFO Executor: Running task 0.0 in stage 12.0 (TID 21)
> 15/05/05 16:32:02 INFO Executor: Running task 1.0 in stage 12.0 (TID 22)
> 15/05/05 16:32:02 INFO BlockManager: Found block rdd_7_1 locally
> 15/05/05 16:32:02 ERROR Executor: Exception in task 1.0 in stage 12.0 (TID 22)
> java.lang.ArrayIndexOutOfBoundsException: -1
>     at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:136)
>     at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:106)
>     at org.apache.spark.mllib.optimization.HingeGradient.compute(Gradient.scala:313)
>     at org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:192)
>     at org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:190)
>     at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
>     at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
>     at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>     at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
>     at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
>     at scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
>     at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
>     at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
>     at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
>     at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
>     at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
>     at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
>     at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
>     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>     at org.apache.spark.scheduler.Task.run(Task.scala:64)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
> 15/05/05 16:32:02 INFO BlockManager: Found block rdd_7_0 locally
> 15/05/05 16:32:02 ERROR Executor: Exception in task 0.0 in stage 12.0 (TID 21)
> java.lang.ArrayIndexOutOfBoundsException: -1
>     at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:136)
>     at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:106)
>     at org.apache.spark.mllib.optimization.HingeGradient.compute(Gradient.scala:313)
>     at org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:192)
>     at org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:190)
>     at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
>     at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
>     at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>     at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
>     at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
>     at scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
>     at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
>     at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
>     at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
>     at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
>     at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
>     at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
>     at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
>     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>     at org.apache.spark.scheduler.Task.run(Task.scala:64)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
> 15/05/05 16:32:02 WARN TaskSetManager: Lost task 1.0 in stage 12.0 (TID 22, localhost): java.lang.ArrayIndexOutOfBoundsException: -1
>     at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:136)
>     at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:106)
>     at org.apache.spark.mllib.optimization.HingeGradient.compute(Gradient.scala:313)
>     at org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:192)
>     at org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:190)
>     at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
>     at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
>     at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>     at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
>     at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
>     at scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
>     at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
>     at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
>     at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
>     at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
>     at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
>     at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
>     at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
>     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>     at org.apache.spark.scheduler.Task.run(Task.scala:64)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
> 15/05/05 16:32:02 ERROR TaskSetManager: Task 1 in stage 12.0 failed 1 times; aborting job
> 15/05/05 16:32:02 INFO TaskSchedulerImpl: Removed TaskSet 12.0, whose tasks have all completed, from pool 
> 15/05/05 16:32:02 INFO TaskSetManager: Lost task 0.0 in stage 12.0 (TID 21) on executor localhost: java.lang.ArrayIndexOutOfBoundsException (-1) [duplicate 1]
> 15/05/05 16:32:02 INFO TaskSchedulerImpl: Removed TaskSet 12.0, whose tasks have all completed, from pool 
> 15/05/05 16:32:02 INFO TaskSchedulerImpl: Cancelling stage 12
> 15/05/05 16:32:02 INFO DAGScheduler: Job 12 failed: treeAggregate at GradientDescent.scala:189, took 0.032101 s
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 12.0 failed 1 times, most recent failure: Lost task 1.0 in stage 12.0 (TID 22, localhost): java.lang.ArrayIndexOutOfBoundsException: -1
>     at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:136)
>     at org.apache.spark.mllib.linalg.BLAS$.dot(BLAS.scala:106)
>     at org.apache.spark.mllib.optimization.HingeGradient.compute(Gradient.scala:313)
>     at org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:192)
>     at org.apache.spark.mllib.optimization.GradientDescent$$anonfun$runMiniBatchSGD$1$$anonfun$1.apply(GradientDescent.scala:190)
>     at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
>     at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:144)
>     at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>     at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>     at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:144)
>     at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1157)
>     at scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:201)
>     at scala.collection.AbstractIterator.aggregate(Iterator.scala:1157)
>     at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
>     at org.apache.spark.rdd.RDD$$anonfun$28.apply(RDD.scala:988)
>     at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
>     at org.apache.spark.rdd.RDD$$anonfun$29.apply(RDD.scala:989)
>     at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
>     at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
>     at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>     at org.apache.spark.scheduler.Task.run(Task.scala:64)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
>     at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
>     at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>     at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
>     at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693)
>     at scala.Option.foreach(Option.scala:236)
>     at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693)
>     at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393)
>     at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
>     at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> https://github.com/hermitD/temp here's my test data file I've use it to train with libsvm-tools under linux and it works! and exam format with libsvm python tool it shows ok. just don't know why it error.so please tell me how to fix this.or tell me how to address the problem so to fix it :(



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org