You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ian Ferreira <ia...@hotmail.com> on 2014/04/24 00:19:50 UTC

Failed to run count?

I am getting this cryptic  error running LinearRegressionwithSGD

Data sample
LabeledPoint(39.0, [144.0, 1521.0, 20736.0, 59319.0, 2985984.0])

14/04/23 15:15:34 INFO SparkContext: Starting job: first at
GeneralizedLinearAlgorithm.scala:121
14/04/23 15:15:34 INFO DAGScheduler: Got job 2 (first at
GeneralizedLinearAlgorithm.scala:121) with 1 output partitions
(allowLocal=true)
14/04/23 15:15:34 INFO DAGScheduler: Final stage: Stage 2 (first at
GeneralizedLinearAlgorithm.scala:121)
14/04/23 15:15:34 INFO DAGScheduler: Parents of final stage: List()
14/04/23 15:15:34 INFO DAGScheduler: Missing parents: List()
14/04/23 15:15:34 INFO DAGScheduler: Computing the requested partition
locally
14/04/23 15:15:34 INFO HadoopRDD: Input split:
file:/Users/iferreira/data/test.csv:0+104
14/04/23 15:15:34 INFO SparkContext: Job finished: first at
GeneralizedLinearAlgorithm.scala:121, took 0.030158 s
14/04/23 15:15:34 INFO SparkContext: Starting job: count at
GradientDescent.scala:137
14/04/23 15:15:34 INFO DAGScheduler: Got job 3 (count at
GradientDescent.scala:137) with 2 output partitions (allowLocal=false)
14/04/23 15:15:34 INFO DAGScheduler: Final stage: Stage 3 (count at
GradientDescent.scala:137)
14/04/23 15:15:34 INFO DAGScheduler: Parents of final stage: List()
14/04/23 15:15:34 INFO DAGScheduler: Missing parents: List()
14/04/23 15:15:34 INFO DAGScheduler: Submitting Stage 3 (MappedRDD[7] at map
at GeneralizedLinearAlgorithm.scala:139), which has no missing parents
14/04/23 15:15:35 INFO DAGScheduler: Failed to run count at
GradientDescent.scala:137

Any clues what may trigger this error, overflow?





Re: Failed to run count?

Posted by Xiangrui Meng <me...@gmail.com>.
Which spark version are you using? Could you also include the worker
logs? -Xiangrui

On Wed, Apr 23, 2014 at 3:19 PM, Ian Ferreira <ia...@hotmail.com> wrote:
> I am getting this cryptic  error running LinearRegressionwithSGD
>
> Data sample
> LabeledPoint(39.0, [144.0, 1521.0, 20736.0, 59319.0, 2985984.0])
>
> 14/04/23 15:15:34 INFO SparkContext: Starting job: first at
> GeneralizedLinearAlgorithm.scala:121
> 14/04/23 15:15:34 INFO DAGScheduler: Got job 2 (first at
> GeneralizedLinearAlgorithm.scala:121) with 1 output partitions
> (allowLocal=true)
> 14/04/23 15:15:34 INFO DAGScheduler: Final stage: Stage 2 (first at
> GeneralizedLinearAlgorithm.scala:121)
> 14/04/23 15:15:34 INFO DAGScheduler: Parents of final stage: List()
> 14/04/23 15:15:34 INFO DAGScheduler: Missing parents: List()
> 14/04/23 15:15:34 INFO DAGScheduler: Computing the requested partition
> locally
> 14/04/23 15:15:34 INFO HadoopRDD: Input split:
> file:/Users/iferreira/data/test.csv:0+104
> 14/04/23 15:15:34 INFO SparkContext: Job finished: first at
> GeneralizedLinearAlgorithm.scala:121, took 0.030158 s
> 14/04/23 15:15:34 INFO SparkContext: Starting job: count at
> GradientDescent.scala:137
> 14/04/23 15:15:34 INFO DAGScheduler: Got job 3 (count at
> GradientDescent.scala:137) with 2 output partitions (allowLocal=false)
> 14/04/23 15:15:34 INFO DAGScheduler: Final stage: Stage 3 (count at
> GradientDescent.scala:137)
> 14/04/23 15:15:34 INFO DAGScheduler: Parents of final stage: List()
> 14/04/23 15:15:34 INFO DAGScheduler: Missing parents: List()
> 14/04/23 15:15:34 INFO DAGScheduler: Submitting Stage 3 (MappedRDD[7] at map
> at GeneralizedLinearAlgorithm.scala:139), which has no missing parents
> 14/04/23 15:15:35 INFO DAGScheduler: Failed to run count at
> GradientDescent.scala:137
>
> Any clues what may trigger this error, overflow?
>
>