You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Yang <te...@gmail.com> on 2014/10/31 00:37:17 UTC

NaN produced by SSVD ?

we are running ssvd on a dataset (this one is relatively small, with 8000
rows, number of columns is 64 ),  we ran it with rank = 58, since sampling
p=5.

the result had NaN on multiple columns.

why would this appear ?

I am now running with lower rank=20 , to see if it goes away.


Thanks
Yang

Re: NaN produced by SSVD ?

Posted by Yang <te...@gmail.com>.
oh yes I just checked the -q, we have always been using -q 1


thanks
yang

On Mon, Nov 3, 2014 at 2:18 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> Ok. so that's what i suspected.
>
> The method generally is not intended to run on inputs with ranks smaller
> than k+p parameters. MR version doesn't even check for it.
>
> However as i mentioned in manual, i did run tests with -q=0 in which case
> correspondent singular vectors on the right should be reset to 0.0, not
> NaNs . It is possible that with -q=1 power iterations do something
> inadmissible in that situation.
>
> just for the record, what -q setting have you used?
>
> On Mon, Nov 3, 2014 at 2:00 PM, Yang <te...@gmail.com> wrote:
>
> > it does have something to do with K. previously I used a formular to
> > determine my rank to use by
> >
> > rank = N - p - 1 = 64 - 5 -1   = 58 , where N is the number of columns of
> > the original matrix.
> >
> > then I tried using rank = 50, it worked.
> >
> > well.... as I write this email, I realized that the reason might be that
> > the actual rank R of the original matrix may be much smaller than N, that
> > could be the reason. but it is a bit difficult to figure out that R
> > beforehand.
> >
> >
> > thanks
> > Yang
> >
> > On Fri, Oct 31, 2014 at 5:01 PM, Dmitriy Lyubimov <dl...@gmail.com>
> > wrote:
> >
> > > is the matrix by any chance constructed so that it may have rank < k? I
> > > think MR code is not checking for that.
> > >
> > > In spark shell i have :
> > >
> > > mahout> val a = dense( (0,0),(0,0) )
> > > a: org.apache.mahout.math.DenseMatrix =
> > > {
> > >   0  => {}
> > >   1  => {}
> > > }
> > > mahout> svd(a)
> > > res0: (org.apache.mahout.math.Matrix, org.apache.mahout.math.Matrix,
> > > org.apache.mahout.math.DenseVector) =
> > > ({
> > >   0  => {0:1.0}
> > >   1  => {1:1.0}
> > > },{
> > >   0  => {0:-1.0}
> > >   1  => {1:-1.0}
> > > },{})
> > >
> > > But :
> > >
> > > mahout> ssvd(a,2,0)
> > >
> > > java.lang.AssertionError: assertion failed: Rank-deficiency detected
> > during
> > > s-SVD
> > >
> > > or
> > > mahout> val drmA = drmParallelize(a,2)
> > > mahout> dssvd(drmA, k=2)
> > > java.lang.IllegalArgumentException: R is rank-deficient.
> > >
> > >
> > > the MR version doesn't check for these effects and it may create some
> > > degenerate results, although i thought those should be 0s, at least
> when
> > > -q=0. I am not sure for -q=1,2...
> > >
> > >
> > >
> > >
> > > On Thu, Oct 30, 2014 at 10:35 PM, Yang <te...@gmail.com> wrote:
> > >
> > > > i am talking about the MR one.
> > > >
> > > > thanks
> > > > yang
> > > > On Oct 30, 2014 8:16 PM, "Dmitriy Lyubimov" <dl...@gmail.com>
> wrote:
> > > >
> > > > > This is not a known problem...
> > > > >
> > > > > there are few ssvd here, sequential, MR and spark one. for the
> > record,
> > > > > which one are you running?
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Oct 30, 2014 at 4:37 PM, Yang <te...@gmail.com>
> wrote:
> > > > >
> > > > > > we are running ssvd on a dataset (this one is relatively small,
> > with
> > > > 8000
> > > > > > rows, number of columns is 64 ),  we ran it with rank = 58, since
> > > > > sampling
> > > > > > p=5.
> > > > > >
> > > > > > the result had NaN on multiple columns.
> > > > > >
> > > > > > why would this appear ?
> > > > > >
> > > > > > I am now running with lower rank=20 , to see if it goes away.
> > > > > >
> > > > > >
> > > > > > Thanks
> > > > > > Yang
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: NaN produced by SSVD ?

Posted by Yang <te...@gmail.com>.
let me check in the morning......

btw  Dmitriy   we are now trying to use the new spark version of  ssvd
(from git),   i see that u are still the author,  so i'll be coming here
again with more questions :)

we are also exploring using pLSA  directly instead of  matrix
factorization,  that could possibly be faster.  again some new
implementations  are  available  on JIRAs
On Nov 3, 2014 2:20 PM, "Dmitriy Lyubimov" <dl...@gmail.com> wrote:

> Ok. so that's what i suspected.
>
> The method generally is not intended to run on inputs with ranks smaller
> than k+p parameters. MR version doesn't even check for it.
>
> However as i mentioned in manual, i did run tests with -q=0 in which case
> correspondent singular vectors on the right should be reset to 0.0, not
> NaNs . It is possible that with -q=1 power iterations do something
> inadmissible in that situation.
>
> just for the record, what -q setting have you used?
>
> On Mon, Nov 3, 2014 at 2:00 PM, Yang <te...@gmail.com> wrote:
>
> > it does have something to do with K. previously I used a formular to
> > determine my rank to use by
> >
> > rank = N - p - 1 = 64 - 5 -1   = 58 , where N is the number of columns of
> > the original matrix.
> >
> > then I tried using rank = 50, it worked.
> >
> > well.... as I write this email, I realized that the reason might be that
> > the actual rank R of the original matrix may be much smaller than N, that
> > could be the reason. but it is a bit difficult to figure out that R
> > beforehand.
> >
> >
> > thanks
> > Yang
> >
> > On Fri, Oct 31, 2014 at 5:01 PM, Dmitriy Lyubimov <dl...@gmail.com>
> > wrote:
> >
> > > is the matrix by any chance constructed so that it may have rank < k? I
> > > think MR code is not checking for that.
> > >
> > > In spark shell i have :
> > >
> > > mahout> val a = dense( (0,0),(0,0) )
> > > a: org.apache.mahout.math.DenseMatrix =
> > > {
> > >   0  => {}
> > >   1  => {}
> > > }
> > > mahout> svd(a)
> > > res0: (org.apache.mahout.math.Matrix, org.apache.mahout.math.Matrix,
> > > org.apache.mahout.math.DenseVector) =
> > > ({
> > >   0  => {0:1.0}
> > >   1  => {1:1.0}
> > > },{
> > >   0  => {0:-1.0}
> > >   1  => {1:-1.0}
> > > },{})
> > >
> > > But :
> > >
> > > mahout> ssvd(a,2,0)
> > >
> > > java.lang.AssertionError: assertion failed: Rank-deficiency detected
> > during
> > > s-SVD
> > >
> > > or
> > > mahout> val drmA = drmParallelize(a,2)
> > > mahout> dssvd(drmA, k=2)
> > > java.lang.IllegalArgumentException: R is rank-deficient.
> > >
> > >
> > > the MR version doesn't check for these effects and it may create some
> > > degenerate results, although i thought those should be 0s, at least
> when
> > > -q=0. I am not sure for -q=1,2...
> > >
> > >
> > >
> > >
> > > On Thu, Oct 30, 2014 at 10:35 PM, Yang <te...@gmail.com> wrote:
> > >
> > > > i am talking about the MR one.
> > > >
> > > > thanks
> > > > yang
> > > > On Oct 30, 2014 8:16 PM, "Dmitriy Lyubimov" <dl...@gmail.com>
> wrote:
> > > >
> > > > > This is not a known problem...
> > > > >
> > > > > there are few ssvd here, sequential, MR and spark one. for the
> > record,
> > > > > which one are you running?
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Oct 30, 2014 at 4:37 PM, Yang <te...@gmail.com>
> wrote:
> > > > >
> > > > > > we are running ssvd on a dataset (this one is relatively small,
> > with
> > > > 8000
> > > > > > rows, number of columns is 64 ),  we ran it with rank = 58, since
> > > > > sampling
> > > > > > p=5.
> > > > > >
> > > > > > the result had NaN on multiple columns.
> > > > > >
> > > > > > why would this appear ?
> > > > > >
> > > > > > I am now running with lower rank=20 , to see if it goes away.
> > > > > >
> > > > > >
> > > > > > Thanks
> > > > > > Yang
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: NaN produced by SSVD ?

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Ok. so that's what i suspected.

The method generally is not intended to run on inputs with ranks smaller
than k+p parameters. MR version doesn't even check for it.

However as i mentioned in manual, i did run tests with -q=0 in which case
correspondent singular vectors on the right should be reset to 0.0, not
NaNs . It is possible that with -q=1 power iterations do something
inadmissible in that situation.

just for the record, what -q setting have you used?

On Mon, Nov 3, 2014 at 2:00 PM, Yang <te...@gmail.com> wrote:

> it does have something to do with K. previously I used a formular to
> determine my rank to use by
>
> rank = N - p - 1 = 64 - 5 -1   = 58 , where N is the number of columns of
> the original matrix.
>
> then I tried using rank = 50, it worked.
>
> well.... as I write this email, I realized that the reason might be that
> the actual rank R of the original matrix may be much smaller than N, that
> could be the reason. but it is a bit difficult to figure out that R
> beforehand.
>
>
> thanks
> Yang
>
> On Fri, Oct 31, 2014 at 5:01 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
> > is the matrix by any chance constructed so that it may have rank < k? I
> > think MR code is not checking for that.
> >
> > In spark shell i have :
> >
> > mahout> val a = dense( (0,0),(0,0) )
> > a: org.apache.mahout.math.DenseMatrix =
> > {
> >   0  => {}
> >   1  => {}
> > }
> > mahout> svd(a)
> > res0: (org.apache.mahout.math.Matrix, org.apache.mahout.math.Matrix,
> > org.apache.mahout.math.DenseVector) =
> > ({
> >   0  => {0:1.0}
> >   1  => {1:1.0}
> > },{
> >   0  => {0:-1.0}
> >   1  => {1:-1.0}
> > },{})
> >
> > But :
> >
> > mahout> ssvd(a,2,0)
> >
> > java.lang.AssertionError: assertion failed: Rank-deficiency detected
> during
> > s-SVD
> >
> > or
> > mahout> val drmA = drmParallelize(a,2)
> > mahout> dssvd(drmA, k=2)
> > java.lang.IllegalArgumentException: R is rank-deficient.
> >
> >
> > the MR version doesn't check for these effects and it may create some
> > degenerate results, although i thought those should be 0s, at least when
> > -q=0. I am not sure for -q=1,2...
> >
> >
> >
> >
> > On Thu, Oct 30, 2014 at 10:35 PM, Yang <te...@gmail.com> wrote:
> >
> > > i am talking about the MR one.
> > >
> > > thanks
> > > yang
> > > On Oct 30, 2014 8:16 PM, "Dmitriy Lyubimov" <dl...@gmail.com> wrote:
> > >
> > > > This is not a known problem...
> > > >
> > > > there are few ssvd here, sequential, MR and spark one. for the
> record,
> > > > which one are you running?
> > > >
> > > >
> > > >
> > > > On Thu, Oct 30, 2014 at 4:37 PM, Yang <te...@gmail.com> wrote:
> > > >
> > > > > we are running ssvd on a dataset (this one is relatively small,
> with
> > > 8000
> > > > > rows, number of columns is 64 ),  we ran it with rank = 58, since
> > > > sampling
> > > > > p=5.
> > > > >
> > > > > the result had NaN on multiple columns.
> > > > >
> > > > > why would this appear ?
> > > > >
> > > > > I am now running with lower rank=20 , to see if it goes away.
> > > > >
> > > > >
> > > > > Thanks
> > > > > Yang
> > > > >
> > > >
> > >
> >
>

Re: NaN produced by SSVD ?

Posted by Yang <te...@gmail.com>.
it does have something to do with K. previously I used a formular to
determine my rank to use by

rank = N - p - 1 = 64 - 5 -1   = 58 , where N is the number of columns of
the original matrix.

then I tried using rank = 50, it worked.

well.... as I write this email, I realized that the reason might be that
the actual rank R of the original matrix may be much smaller than N, that
could be the reason. but it is a bit difficult to figure out that R
beforehand.


thanks
Yang

On Fri, Oct 31, 2014 at 5:01 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> is the matrix by any chance constructed so that it may have rank < k? I
> think MR code is not checking for that.
>
> In spark shell i have :
>
> mahout> val a = dense( (0,0),(0,0) )
> a: org.apache.mahout.math.DenseMatrix =
> {
>   0  => {}
>   1  => {}
> }
> mahout> svd(a)
> res0: (org.apache.mahout.math.Matrix, org.apache.mahout.math.Matrix,
> org.apache.mahout.math.DenseVector) =
> ({
>   0  => {0:1.0}
>   1  => {1:1.0}
> },{
>   0  => {0:-1.0}
>   1  => {1:-1.0}
> },{})
>
> But :
>
> mahout> ssvd(a,2,0)
>
> java.lang.AssertionError: assertion failed: Rank-deficiency detected during
> s-SVD
>
> or
> mahout> val drmA = drmParallelize(a,2)
> mahout> dssvd(drmA, k=2)
> java.lang.IllegalArgumentException: R is rank-deficient.
>
>
> the MR version doesn't check for these effects and it may create some
> degenerate results, although i thought those should be 0s, at least when
> -q=0. I am not sure for -q=1,2...
>
>
>
>
> On Thu, Oct 30, 2014 at 10:35 PM, Yang <te...@gmail.com> wrote:
>
> > i am talking about the MR one.
> >
> > thanks
> > yang
> > On Oct 30, 2014 8:16 PM, "Dmitriy Lyubimov" <dl...@gmail.com> wrote:
> >
> > > This is not a known problem...
> > >
> > > there are few ssvd here, sequential, MR and spark one. for the record,
> > > which one are you running?
> > >
> > >
> > >
> > > On Thu, Oct 30, 2014 at 4:37 PM, Yang <te...@gmail.com> wrote:
> > >
> > > > we are running ssvd on a dataset (this one is relatively small, with
> > 8000
> > > > rows, number of columns is 64 ),  we ran it with rank = 58, since
> > > sampling
> > > > p=5.
> > > >
> > > > the result had NaN on multiple columns.
> > > >
> > > > why would this appear ?
> > > >
> > > > I am now running with lower rank=20 , to see if it goes away.
> > > >
> > > >
> > > > Thanks
> > > > Yang
> > > >
> > >
> >
>

Re: NaN produced by SSVD ?

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
is the matrix by any chance constructed so that it may have rank < k? I
think MR code is not checking for that.

In spark shell i have :

mahout> val a = dense( (0,0),(0,0) )
a: org.apache.mahout.math.DenseMatrix =
{
  0  => {}
  1  => {}
}
mahout> svd(a)
res0: (org.apache.mahout.math.Matrix, org.apache.mahout.math.Matrix,
org.apache.mahout.math.DenseVector) =
({
  0  => {0:1.0}
  1  => {1:1.0}
},{
  0  => {0:-1.0}
  1  => {1:-1.0}
},{})

But :

mahout> ssvd(a,2,0)

java.lang.AssertionError: assertion failed: Rank-deficiency detected during
s-SVD

or
mahout> val drmA = drmParallelize(a,2)
mahout> dssvd(drmA, k=2)
java.lang.IllegalArgumentException: R is rank-deficient.


the MR version doesn't check for these effects and it may create some
degenerate results, although i thought those should be 0s, at least when
-q=0. I am not sure for -q=1,2...




On Thu, Oct 30, 2014 at 10:35 PM, Yang <te...@gmail.com> wrote:

> i am talking about the MR one.
>
> thanks
> yang
> On Oct 30, 2014 8:16 PM, "Dmitriy Lyubimov" <dl...@gmail.com> wrote:
>
> > This is not a known problem...
> >
> > there are few ssvd here, sequential, MR and spark one. for the record,
> > which one are you running?
> >
> >
> >
> > On Thu, Oct 30, 2014 at 4:37 PM, Yang <te...@gmail.com> wrote:
> >
> > > we are running ssvd on a dataset (this one is relatively small, with
> 8000
> > > rows, number of columns is 64 ),  we ran it with rank = 58, since
> > sampling
> > > p=5.
> > >
> > > the result had NaN on multiple columns.
> > >
> > > why would this appear ?
> > >
> > > I am now running with lower rank=20 , to see if it goes away.
> > >
> > >
> > > Thanks
> > > Yang
> > >
> >
>

Re: NaN produced by SSVD ?

Posted by Yang <te...@gmail.com>.
i am talking about the MR one.

thanks
yang
On Oct 30, 2014 8:16 PM, "Dmitriy Lyubimov" <dl...@gmail.com> wrote:

> This is not a known problem...
>
> there are few ssvd here, sequential, MR and spark one. for the record,
> which one are you running?
>
>
>
> On Thu, Oct 30, 2014 at 4:37 PM, Yang <te...@gmail.com> wrote:
>
> > we are running ssvd on a dataset (this one is relatively small, with 8000
> > rows, number of columns is 64 ),  we ran it with rank = 58, since
> sampling
> > p=5.
> >
> > the result had NaN on multiple columns.
> >
> > why would this appear ?
> >
> > I am now running with lower rank=20 , to see if it goes away.
> >
> >
> > Thanks
> > Yang
> >
>

Re: NaN produced by SSVD ?

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
This is not a known problem...

there are few ssvd here, sequential, MR and spark one. for the record,
which one are you running?



On Thu, Oct 30, 2014 at 4:37 PM, Yang <te...@gmail.com> wrote:

> we are running ssvd on a dataset (this one is relatively small, with 8000
> rows, number of columns is 64 ),  we ran it with rank = 58, since sampling
> p=5.
>
> the result had NaN on multiple columns.
>
> why would this appear ?
>
> I am now running with lower rank=20 , to see if it goes away.
>
>
> Thanks
> Yang
>