You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Pat Ferrel <pa...@occamsmachete.com> on 2014/11/18 00:46:52 UTC

AtA error

A matrix with about 4600 rows and somewhere around 27790 columns when executing the following line from AtA (not sure of the exact dimensions)

     /** The version of A'A that does not use GraphX */
     def at_a_nongraph(op: OpAtA[_], srcRdd: DrmRdd[_]): DrmRdd[Int] = {

a vector is created whose size is causes the error. How could I have constructed a drm that would cause this error? If the column IDs were non-contiguous would that yield this error?

==================

14/11/12 17:56:03 ERROR executor.Executor: Exception in task 5.0 in stage 18.0 (TID 66169)
org.apache.mahout.math.IndexException: Index 27792 is outside allowable range of [0,27789)
	at org.apache.mahout.math.AbstractVector.viewPart(AbstractVector.java:147)
	at org.apache.mahout.math.scalabindings.VectorOps.apply(VectorOps.scala:37)
	at org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply(AtA.scala:152)
	at org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply(AtA.scala:149)
	at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
	at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
	at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
	at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
	at scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:980)
	at scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:980)
	at scala.collection.immutable.StreamIterator$LazyCell.v$lzycompute(Stream.scala:969)
	at scala.collection.immutable.StreamIterator$LazyCell.v(Stream.scala:969)
	at scala.collection.immutable.StreamIterator.hasNext(Stream.scala:974)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:137)
	at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
	at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:55)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
	at org.apache.spark.scheduler.Task.run(Task.scala:54)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
	at java.lang.Thread.run(Thread.java:695)

Re: AtA error

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

also technically all vectors should be (or expected to be) of the same
length in a valid matrix thing (doesn't mean they actually have to have all
elements -- or even all vectors, of course). So if needed, just run a
simple validation map before drmWrap to validate or to clean this up,
whichever is suitable.



On Mon, Nov 17, 2014 at 5:24 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> I do use drmWrap so I’ll check there, thanks
>
> On Nov 17, 2014, at 5:22 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>
> On Mon, Nov 17, 2014 at 5:16 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
> > It’s in spark-itemsimilarity. This job reads elements and assigns them to
> > one of two RDD backed drms.
> >
> > I assumed it was a badly formed drm but it’s a 140MB dataset and a bit
> > hard to nail down—just looking for a clue. I read this to say that an ID
> > for an element in a row vector was larger than drm.ncol, correct?
> >
>
> yes.
>
> and then it again comes back to the question how the matrix was
> constructed. General construction of dimensions (ncol, nrow) is
> automatic-lazy, meaning if you have not specified dimensions anywhere
> explicitly, it will lazily compute it for you. But if you did volunteer
> them anywhere (such as to drmWrap() call) they got to be good. Or you see
> things like this.
>
> >
> >
> > On Nov 17, 2014, at 4:58 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> >
> > So this is not a problem of A'A computation -- the input is obviously
> > invalid.
> >
> > Question is what you did before you got a A handle -- read it from file?
> > parallelized it from in-core matrix (drmParallelize)? as a result of
> other
> > computation (if yes than what)? wrapped around manually crafted RDD
> > (drmWrap)?
> >
> > I don't understand the question about non-continuous ids. You are
> referring
> > to some context of your computation assuming I am in context (but i am
> > unfortunately not)
> >
> > On Mon, Nov 17, 2014 at 4:55 PM, Dmitriy Lyubimov <dl...@gmail.com>
> > wrote:
> >
> >>
> >>
> >> On Mon, Nov 17, 2014 at 3:46 PM, Pat Ferrel <pa...@occamsmachete.com>
> > wrote:
> >>
> >>> A matrix with about 4600 rows and somewhere around 27790 columns when
> >>> executing the following line from AtA (not sure of the exact
> dimensions)
> >>>
> >>>    /** The version of A'A that does not use GraphX */
> >>>    def at_a_nongraph(op: OpAtA[_], srcRdd: DrmRdd[_]): DrmRdd[Int] = {
> >>>
> >>> a vector is created whose size is causes the error. How could I have
> >>> constructed a drm that would cause this error? If the column IDs were
> >>> non-contiguous would that yield this error?
> >>>
> >>
> >> what did you do specifically to build matrix A?
> >>
> >>
> >>> ==================
> >>>
> >>> 14/11/12 17:56:03 ERROR executor.Executor: Exception in task 5.0 in
> > stage
> >>> 18.0 (TID 66169)
> >>> org.apache.mahout.math.IndexException: Index 27792 is outside allowable
> >>> range of [0,27789)
> >>>       at
> >>> org.apache.mahout.math.AbstractVector.viewPart(AbstractVector.java:147)
> >>>       at
> >>>
> org.apache.mahout.math.scalabindings.VectorOps.apply(VectorOps.scala:37)
> >>>       at
> >>>
> >
> org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply(AtA.scala:152)
> >>>       at
> >>>
> >
> org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply(AtA.scala:149)
> >>>       at
> >>>
> scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
> >>>       at
> >>>
> scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
> >>>       at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
> >>>       at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
> >>>       at
> >>>
> >
> scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:980)
> >>>       at
> >>>
> >
> scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:980)
> >>>       at
> >>>
> >
> scala.collection.immutable.StreamIterator$LazyCell.v$lzycompute(Stream.scala:969)
> >>>       at
> >>> scala.collection.immutable.StreamIterator$LazyCell.v(Stream.scala:969)
> >>>       at
> >>> scala.collection.immutable.StreamIterator.hasNext(Stream.scala:974)
> >>>       at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
> >>>       at
> >>>
> >
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:137)
> >>>       at
> >>> org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
> >>>       at
> >>>
> >
> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:55)
> >>>       at
> >>>
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> >>>       at
> >>>
> >
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> >>>       at org.apache.spark.scheduler.Task.run(Task.scala:54)
> >>>       at
> >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
> >>>       at
> >>>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> >>>       at
> >>>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> >>>       at java.lang.Thread.run(Thread.java:695)
> >>>
> >>>
> >>
> >
> >
>
>

Re: AtA error

Posted by Pat Ferrel <pa...@occamsmachete.com>.

I do use drmWrap so I’ll check there, thanks

On Nov 17, 2014, at 5:22 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

On Mon, Nov 17, 2014 at 5:16 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> It’s in spark-itemsimilarity. This job reads elements and assigns them to
> one of two RDD backed drms.
> 
> I assumed it was a badly formed drm but it’s a 140MB dataset and a bit
> hard to nail down—just looking for a clue. I read this to say that an ID
> for an element in a row vector was larger than drm.ncol, correct?
> 

yes.

and then it again comes back to the question how the matrix was
constructed. General construction of dimensions (ncol, nrow) is
automatic-lazy, meaning if you have not specified dimensions anywhere
explicitly, it will lazily compute it for you. But if you did volunteer
them anywhere (such as to drmWrap() call) they got to be good. Or you see
things like this.

> 
> 
> On Nov 17, 2014, at 4:58 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> 
> So this is not a problem of A'A computation -- the input is obviously
> invalid.
> 
> Question is what you did before you got a A handle -- read it from file?
> parallelized it from in-core matrix (drmParallelize)? as a result of other
> computation (if yes than what)? wrapped around manually crafted RDD
> (drmWrap)?
> 
> I don't understand the question about non-continuous ids. You are referring
> to some context of your computation assuming I am in context (but i am
> unfortunately not)
> 
> On Mon, Nov 17, 2014 at 4:55 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
> 
>> 
>> 
>> On Mon, Nov 17, 2014 at 3:46 PM, Pat Ferrel <pa...@occamsmachete.com>
> wrote:
>> 
>>> A matrix with about 4600 rows and somewhere around 27790 columns when
>>> executing the following line from AtA (not sure of the exact dimensions)
>>> 
>>>    /** The version of A'A that does not use GraphX */
>>>    def at_a_nongraph(op: OpAtA[_], srcRdd: DrmRdd[_]): DrmRdd[Int] = {
>>> 
>>> a vector is created whose size is causes the error. How could I have
>>> constructed a drm that would cause this error? If the column IDs were
>>> non-contiguous would that yield this error?
>>> 
>> 
>> what did you do specifically to build matrix A?
>> 
>> 
>>> ==================
>>> 
>>> 14/11/12 17:56:03 ERROR executor.Executor: Exception in task 5.0 in
> stage
>>> 18.0 (TID 66169)
>>> org.apache.mahout.math.IndexException: Index 27792 is outside allowable
>>> range of [0,27789)
>>>       at
>>> org.apache.mahout.math.AbstractVector.viewPart(AbstractVector.java:147)
>>>       at
>>> org.apache.mahout.math.scalabindings.VectorOps.apply(VectorOps.scala:37)
>>>       at
>>> 
> org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply(AtA.scala:152)
>>>       at
>>> 
> org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply(AtA.scala:149)
>>>       at
>>> scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
>>>       at
>>> scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
>>>       at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
>>>       at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
>>>       at
>>> 
> scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:980)
>>>       at
>>> 
> scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:980)
>>>       at
>>> 
> scala.collection.immutable.StreamIterator$LazyCell.v$lzycompute(Stream.scala:969)
>>>       at
>>> scala.collection.immutable.StreamIterator$LazyCell.v(Stream.scala:969)
>>>       at
>>> scala.collection.immutable.StreamIterator.hasNext(Stream.scala:974)
>>>       at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>>>       at
>>> 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:137)
>>>       at
>>> org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
>>>       at
>>> 
> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:55)
>>>       at
>>> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>       at
>>> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>       at org.apache.spark.scheduler.Task.run(Task.scala:54)
>>>       at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>>>       at
>>> 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>>>       at
>>> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>>>       at java.lang.Thread.run(Thread.java:695)
>>> 
>>> 
>> 
> 
>

Re: AtA error

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

On Mon, Nov 17, 2014 at 5:16 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> It’s in spark-itemsimilarity. This job reads elements and assigns them to
> one of two RDD backed drms.
>
> I assumed it was a badly formed drm but it’s a 140MB dataset and a bit
> hard to nail down—just looking for a clue. I read this to say that an ID
> for an element in a row vector was larger than drm.ncol, correct?
>

yes.

and then it again comes back to the question how the matrix was
constructed. General construction of dimensions (ncol, nrow) is
automatic-lazy, meaning if you have not specified dimensions anywhere
explicitly, it will lazily compute it for you. But if you did volunteer
them anywhere (such as to drmWrap() call) they got to be good. Or you see
things like this.

>
>
> On Nov 17, 2014, at 4:58 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>
> So this is not a problem of A'A computation -- the input is obviously
> invalid.
>
> Question is what you did before you got a A handle -- read it from file?
> parallelized it from in-core matrix (drmParallelize)? as a result of other
> computation (if yes than what)? wrapped around manually crafted RDD
> (drmWrap)?
>
> I don't understand the question about non-continuous ids. You are referring
> to some context of your computation assuming I am in context (but i am
> unfortunately not)
>
> On Mon, Nov 17, 2014 at 4:55 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
> >
> >
> > On Mon, Nov 17, 2014 at 3:46 PM, Pat Ferrel <pa...@occamsmachete.com>
> wrote:
> >
> >> A matrix with about 4600 rows and somewhere around 27790 columns when
> >> executing the following line from AtA (not sure of the exact dimensions)
> >>
> >>     /** The version of A'A that does not use GraphX */
> >>     def at_a_nongraph(op: OpAtA[_], srcRdd: DrmRdd[_]): DrmRdd[Int] = {
> >>
> >> a vector is created whose size is causes the error. How could I have
> >> constructed a drm that would cause this error? If the column IDs were
> >> non-contiguous would that yield this error?
> >>
> >
> > what did you do specifically to build matrix A?
> >
> >
> >> ==================
> >>
> >> 14/11/12 17:56:03 ERROR executor.Executor: Exception in task 5.0 in
> stage
> >> 18.0 (TID 66169)
> >> org.apache.mahout.math.IndexException: Index 27792 is outside allowable
> >> range of [0,27789)
> >>        at
> >> org.apache.mahout.math.AbstractVector.viewPart(AbstractVector.java:147)
> >>        at
> >> org.apache.mahout.math.scalabindings.VectorOps.apply(VectorOps.scala:37)
> >>        at
> >>
> org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply(AtA.scala:152)
> >>        at
> >>
> org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply(AtA.scala:149)
> >>        at
> >> scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
> >>        at
> >> scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
> >>        at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
> >>        at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
> >>        at
> >>
> scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:980)
> >>        at
> >>
> scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:980)
> >>        at
> >>
> scala.collection.immutable.StreamIterator$LazyCell.v$lzycompute(Stream.scala:969)
> >>        at
> >> scala.collection.immutable.StreamIterator$LazyCell.v(Stream.scala:969)
> >>        at
> >> scala.collection.immutable.StreamIterator.hasNext(Stream.scala:974)
> >>        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
> >>        at
> >>
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:137)
> >>        at
> >> org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
> >>        at
> >>
> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:55)
> >>        at
> >>
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> >>        at
> >>
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> >>        at org.apache.spark.scheduler.Task.run(Task.scala:54)
> >>        at
> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
> >>        at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> >>        at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> >>        at java.lang.Thread.run(Thread.java:695)
> >>
> >>
> >
>
>

Re: AtA error

Posted by Pat Ferrel <pa...@occamsmachete.com>.

It’s in spark-itemsimilarity. This job reads elements and assigns them to one of two RDD backed drms.

I assumed it was a badly formed drm but it’s a 140MB dataset and a bit hard to nail down—just looking for a clue. I read this to say that an ID for an element in a row vector was larger than drm.ncol, correct?


On Nov 17, 2014, at 4:58 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

So this is not a problem of A'A computation -- the input is obviously
invalid.

Question is what you did before you got a A handle -- read it from file?
parallelized it from in-core matrix (drmParallelize)? as a result of other
computation (if yes than what)? wrapped around manually crafted RDD
(drmWrap)?

I don't understand the question about non-continuous ids. You are referring
to some context of your computation assuming I am in context (but i am
unfortunately not)

On Mon, Nov 17, 2014 at 4:55 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> 
> 
> On Mon, Nov 17, 2014 at 3:46 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
> 
>> A matrix with about 4600 rows and somewhere around 27790 columns when
>> executing the following line from AtA (not sure of the exact dimensions)
>> 
>>     /** The version of A'A that does not use GraphX */
>>     def at_a_nongraph(op: OpAtA[_], srcRdd: DrmRdd[_]): DrmRdd[Int] = {
>> 
>> a vector is created whose size is causes the error. How could I have
>> constructed a drm that would cause this error? If the column IDs were
>> non-contiguous would that yield this error?
>> 
> 
> what did you do specifically to build matrix A?
> 
> 
>> ==================
>> 
>> 14/11/12 17:56:03 ERROR executor.Executor: Exception in task 5.0 in stage
>> 18.0 (TID 66169)
>> org.apache.mahout.math.IndexException: Index 27792 is outside allowable
>> range of [0,27789)
>>        at
>> org.apache.mahout.math.AbstractVector.viewPart(AbstractVector.java:147)
>>        at
>> org.apache.mahout.math.scalabindings.VectorOps.apply(VectorOps.scala:37)
>>        at
>> org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply(AtA.scala:152)
>>        at
>> org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply(AtA.scala:149)
>>        at
>> scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
>>        at
>> scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
>>        at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
>>        at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
>>        at
>> scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:980)
>>        at
>> scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:980)
>>        at
>> scala.collection.immutable.StreamIterator$LazyCell.v$lzycompute(Stream.scala:969)
>>        at
>> scala.collection.immutable.StreamIterator$LazyCell.v(Stream.scala:969)
>>        at
>> scala.collection.immutable.StreamIterator.hasNext(Stream.scala:974)
>>        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>>        at
>> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:137)
>>        at
>> org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
>>        at
>> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:55)
>>        at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>        at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>        at org.apache.spark.scheduler.Task.run(Task.scala:54)
>>        at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>>        at java.lang.Thread.run(Thread.java:695)
>> 
>> 
>

Re: AtA error

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

So this is not a problem of A'A computation -- the input is obviously
invalid.

Question is what you did before you got a A handle -- read it from file?
parallelized it from in-core matrix (drmParallelize)? as a result of other
computation (if yes than what)? wrapped around manually crafted RDD
(drmWrap)?

I don't understand the question about non-continuous ids. You are referring
to some context of your computation assuming I am in context (but i am
unfortunately not)

On Mon, Nov 17, 2014 at 4:55 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

>
>
> On Mon, Nov 17, 2014 at 3:46 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:
>
>> A matrix with about 4600 rows and somewhere around 27790 columns when
>> executing the following line from AtA (not sure of the exact dimensions)
>>
>>      /** The version of A'A that does not use GraphX */
>>      def at_a_nongraph(op: OpAtA[_], srcRdd: DrmRdd[_]): DrmRdd[Int] = {
>>
>> a vector is created whose size is causes the error. How could I have
>> constructed a drm that would cause this error? If the column IDs were
>> non-contiguous would that yield this error?
>>
>
> what did you do specifically to build matrix A?
>
>
>> ==================
>>
>> 14/11/12 17:56:03 ERROR executor.Executor: Exception in task 5.0 in stage
>> 18.0 (TID 66169)
>> org.apache.mahout.math.IndexException: Index 27792 is outside allowable
>> range of [0,27789)
>>         at
>> org.apache.mahout.math.AbstractVector.viewPart(AbstractVector.java:147)
>>         at
>> org.apache.mahout.math.scalabindings.VectorOps.apply(VectorOps.scala:37)
>>         at
>> org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply(AtA.scala:152)
>>         at
>> org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply(AtA.scala:149)
>>         at
>> scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
>>         at
>> scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
>>         at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
>>         at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
>>         at
>> scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:980)
>>         at
>> scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:980)
>>         at
>> scala.collection.immutable.StreamIterator$LazyCell.v$lzycompute(Stream.scala:969)
>>         at
>> scala.collection.immutable.StreamIterator$LazyCell.v(Stream.scala:969)
>>         at
>> scala.collection.immutable.StreamIterator.hasNext(Stream.scala:974)
>>         at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>>         at
>> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:137)
>>         at
>> org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
>>         at
>> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:55)
>>         at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>         at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>         at org.apache.spark.scheduler.Task.run(Task.scala:54)
>>         at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>>         at java.lang.Thread.run(Thread.java:695)
>>
>>
>

Re: AtA error

Posted by Dmitriy Lyubimov <dl...@gmail.com>.

On Mon, Nov 17, 2014 at 3:46 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> A matrix with about 4600 rows and somewhere around 27790 columns when
> executing the following line from AtA (not sure of the exact dimensions)
>
>      /** The version of A'A that does not use GraphX */
>      def at_a_nongraph(op: OpAtA[_], srcRdd: DrmRdd[_]): DrmRdd[Int] = {
>
> a vector is created whose size is causes the error. How could I have
> constructed a drm that would cause this error? If the column IDs were
> non-contiguous would that yield this error?
>

what did you do specifically to build matrix A?


> ==================
>
> 14/11/12 17:56:03 ERROR executor.Executor: Exception in task 5.0 in stage
> 18.0 (TID 66169)
> org.apache.mahout.math.IndexException: Index 27792 is outside allowable
> range of [0,27789)
>         at
> org.apache.mahout.math.AbstractVector.viewPart(AbstractVector.java:147)
>         at
> org.apache.mahout.math.scalabindings.VectorOps.apply(VectorOps.scala:37)
>         at
> org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply(AtA.scala:152)
>         at
> org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply(AtA.scala:149)
>         at
> scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
>         at
> scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
>         at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
>         at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
>         at
> scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:980)
>         at
> scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:980)
>         at
> scala.collection.immutable.StreamIterator$LazyCell.v$lzycompute(Stream.scala:969)
>         at
> scala.collection.immutable.StreamIterator$LazyCell.v(Stream.scala:969)
>         at
> scala.collection.immutable.StreamIterator.hasNext(Stream.scala:974)
>         at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>         at
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:137)
>         at
> org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
>         at
> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:55)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:54)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>         at java.lang.Thread.run(Thread.java:695)
>
>