You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Pat Ferrel <pa...@occamsmachete.com> on 2015/04/24 18:43:41 UTC

AtA error

Running on Yarn Getting an error with AtA. A user is running on those 1887 small ~4k Spark streaming files. The drms seem to be created properly. There may be empty rows in A—I’m having the user try with only AtA, no AtB and so no empty rows.

Any ideas? This is only 7.5M of data.  I’ve tried a similar calc with the two larger files from epinions, and it works fine

The task dies with
Job aborted due to stage failure: Exception while getting task result: java.util.NoSuchElementException: key not found: 20070
The stack trace is:

org.apache.spark.rdd.RDD.collect(RDD.scala:774)
org.apache.mahout.sparkbindings.blas.AtA$.at_a_slim(AtA.scala:121)
org.apache.mahout.sparkbindings.blas.AtA$.at_a(AtA.scala:50)
org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:231)
org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:242)
org.apache.mahout.sparkbindings.SparkEngine$.toPhysical(SparkEngine.scala:108)
org.apache.mahout.math.drm.logical.CheckpointAction.checkpoint(CheckpointAction.scala:40)
org.apache.mahout.math.drm.package$.drm2Checkpointed(package.scala:90)
org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:129)
org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:127)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
scala.collection.Iterator$class.foreach(Iterator.scala:727)
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
scala.collection.AbstractIterator.to(Iterator.scala:1157)
scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257)
scala.collection.AbstractIterator.toList(Iterator.scala:1157)

Re: AtA error

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

yes i thought that's what i said

On Fri, Apr 24, 2015 at 12:54 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> When I concatenate the input into a single file per A, B etc it runs fine.
>
> Do you think I’m reading incorrectly somehow messing up vector sizes?
> Should I go through the input matrix and force vector (row?) sizes to be
> correct?
>
>
> On Apr 24, 2015, at 10:46 AM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>
> in slim, it is almost certainly has to do with incorrect vector length
> coming in.
>
> i have written validate procedure for these things.
>
> On Fri, Apr 24, 2015 at 9:43 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
> > Running on Yarn Getting an error with AtA. A user is running on those
> 1887
> > small ~4k Spark streaming files. The drms seem to be created properly.
> > There may be empty rows in A—I’m having the user try with only AtA, no
> AtB
> > and so no empty rows.
> >
> > Any ideas? This is only 7.5M of data.  I’ve tried a similar calc with the
> > two larger files from epinions, and it works fine
> >
> > The task dies with
> > Job aborted due to stage failure: Exception while getting task result:
> > java.util.NoSuchElementException: key not found: 20070
> > The stack trace is:
> >
> > org.apache.spark.rdd.RDD.collect(RDD.scala:774)
> > org.apache.mahout.sparkbindings.blas.AtA$.at_a_slim(AtA.scala:121)
> > org.apache.mahout.sparkbindings.blas.AtA$.at_a(AtA.scala:50)
> >
> org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:231)
> >
> org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:242)
> >
> >
> org.apache.mahout.sparkbindings.SparkEngine$.toPhysical(SparkEngine.scala:108)
> >
> >
> org.apache.mahout.math.drm.logical.CheckpointAction.checkpoint(CheckpointAction.scala:40)
> > org.apache.mahout.math.drm.package$.drm2Checkpointed(package.scala:90)
> >
> >
> org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:129)
> >
> >
> org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:127)
> > scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> > scala.collection.Iterator$class.foreach(Iterator.scala:727)
> > scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
> > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
> > scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
> > scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
> > scala.collection.AbstractIterator.to(Iterator.scala:1157)
> > scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257)
> > scala.collection.AbstractIterator.toList(Iterator.scala:1157)
> >
> >
>
>

Re: AtA error

Posted by Pat Ferrel <pa...@occamsmachete.com>.

When I concatenate the input into a single file per A, B etc it runs fine.

Do you think I’m reading incorrectly somehow messing up vector sizes? Should I go through the input matrix and force vector (row?) sizes to be correct?


On Apr 24, 2015, at 10:46 AM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

in slim, it is almost certainly has to do with incorrect vector length
coming in.

i have written validate procedure for these things.

On Fri, Apr 24, 2015 at 9:43 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Running on Yarn Getting an error with AtA. A user is running on those 1887
> small ~4k Spark streaming files. The drms seem to be created properly.
> There may be empty rows in A—I’m having the user try with only AtA, no AtB
> and so no empty rows.
> 
> Any ideas? This is only 7.5M of data.  I’ve tried a similar calc with the
> two larger files from epinions, and it works fine
> 
> The task dies with
> Job aborted due to stage failure: Exception while getting task result:
> java.util.NoSuchElementException: key not found: 20070
> The stack trace is:
> 
> org.apache.spark.rdd.RDD.collect(RDD.scala:774)
> org.apache.mahout.sparkbindings.blas.AtA$.at_a_slim(AtA.scala:121)
> org.apache.mahout.sparkbindings.blas.AtA$.at_a(AtA.scala:50)
> org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:231)
> org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:242)
> 
> org.apache.mahout.sparkbindings.SparkEngine$.toPhysical(SparkEngine.scala:108)
> 
> org.apache.mahout.math.drm.logical.CheckpointAction.checkpoint(CheckpointAction.scala:40)
> org.apache.mahout.math.drm.package$.drm2Checkpointed(package.scala:90)
> 
> org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:129)
> 
> org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:127)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> scala.collection.Iterator$class.foreach(Iterator.scala:727)
> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
> scala.collection.AbstractIterator.to(Iterator.scala:1157)
> scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257)
> scala.collection.AbstractIterator.toList(Iterator.scala:1157)
> 
>

Re: AtA error

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

in slim, it is almost certainly has to do with incorrect vector length
coming in.

i have written validate procedure for these things.

On Fri, Apr 24, 2015 at 9:43 AM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> Running on Yarn Getting an error with AtA. A user is running on those 1887
> small ~4k Spark streaming files. The drms seem to be created properly.
> There may be empty rows in A—I’m having the user try with only AtA, no AtB
> and so no empty rows.
>
> Any ideas? This is only 7.5M of data.  I’ve tried a similar calc with the
> two larger files from epinions, and it works fine
>
> The task dies with
> Job aborted due to stage failure: Exception while getting task result:
> java.util.NoSuchElementException: key not found: 20070
> The stack trace is:
>
> org.apache.spark.rdd.RDD.collect(RDD.scala:774)
> org.apache.mahout.sparkbindings.blas.AtA$.at_a_slim(AtA.scala:121)
> org.apache.mahout.sparkbindings.blas.AtA$.at_a(AtA.scala:50)
> org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:231)
> org.apache.mahout.sparkbindings.SparkEngine$.tr2phys(SparkEngine.scala:242)
>
> org.apache.mahout.sparkbindings.SparkEngine$.toPhysical(SparkEngine.scala:108)
>
> org.apache.mahout.math.drm.logical.CheckpointAction.checkpoint(CheckpointAction.scala:40)
> org.apache.mahout.math.drm.package$.drm2Checkpointed(package.scala:90)
>
> org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:129)
>
> org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAnalysis.scala:127)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> scala.collection.Iterator$class.foreach(Iterator.scala:727)
> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:176)
> scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:45)
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
> scala.collection.AbstractIterator.to(Iterator.scala:1157)
> scala.collection.TraversableOnce$class.toList(TraversableOnce.scala:257)
> scala.collection.AbstractIterator.toList(Iterator.scala:1157)
>
>