You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Yang <te...@gmail.com> on 2014/10/31 00:37:17 UTC
NaN produced by SSVD ?
we are running ssvd on a dataset (this one is relatively small, with 8000
rows, number of columns is 64 ), we ran it with rank = 58, since sampling
p=5.
the result had NaN on multiple columns.
why would this appear ?
I am now running with lower rank=20 , to see if it goes away.
Thanks
Yang
Re: NaN produced by SSVD ?
Posted by Yang <te...@gmail.com>.
oh yes I just checked the -q, we have always been using -q 1
thanks
yang
On Mon, Nov 3, 2014 at 2:18 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> Ok. so that's what i suspected.
>
> The method generally is not intended to run on inputs with ranks smaller
> than k+p parameters. MR version doesn't even check for it.
>
> However as i mentioned in manual, i did run tests with -q=0 in which case
> correspondent singular vectors on the right should be reset to 0.0, not
> NaNs . It is possible that with -q=1 power iterations do something
> inadmissible in that situation.
>
> just for the record, what -q setting have you used?
>
> On Mon, Nov 3, 2014 at 2:00 PM, Yang <te...@gmail.com> wrote:
>
> > it does have something to do with K. previously I used a formular to
> > determine my rank to use by
> >
> > rank = N - p - 1 = 64 - 5 -1 = 58 , where N is the number of columns of
> > the original matrix.
> >
> > then I tried using rank = 50, it worked.
> >
> > well.... as I write this email, I realized that the reason might be that
> > the actual rank R of the original matrix may be much smaller than N, that
> > could be the reason. but it is a bit difficult to figure out that R
> > beforehand.
> >
> >
> > thanks
> > Yang
> >
> > On Fri, Oct 31, 2014 at 5:01 PM, Dmitriy Lyubimov <dl...@gmail.com>
> > wrote:
> >
> > > is the matrix by any chance constructed so that it may have rank < k? I
> > > think MR code is not checking for that.
> > >
> > > In spark shell i have :
> > >
> > > mahout> val a = dense( (0,0),(0,0) )
> > > a: org.apache.mahout.math.DenseMatrix =
> > > {
> > > 0 => {}
> > > 1 => {}
> > > }
> > > mahout> svd(a)
> > > res0: (org.apache.mahout.math.Matrix, org.apache.mahout.math.Matrix,
> > > org.apache.mahout.math.DenseVector) =
> > > ({
> > > 0 => {0:1.0}
> > > 1 => {1:1.0}
> > > },{
> > > 0 => {0:-1.0}
> > > 1 => {1:-1.0}
> > > },{})
> > >
> > > But :
> > >
> > > mahout> ssvd(a,2,0)
> > >
> > > java.lang.AssertionError: assertion failed: Rank-deficiency detected
> > during
> > > s-SVD
> > >
> > > or
> > > mahout> val drmA = drmParallelize(a,2)
> > > mahout> dssvd(drmA, k=2)
> > > java.lang.IllegalArgumentException: R is rank-deficient.
> > >
> > >
> > > the MR version doesn't check for these effects and it may create some
> > > degenerate results, although i thought those should be 0s, at least
> when
> > > -q=0. I am not sure for -q=1,2...
> > >
> > >
> > >
> > >
> > > On Thu, Oct 30, 2014 at 10:35 PM, Yang <te...@gmail.com> wrote:
> > >
> > > > i am talking about the MR one.
> > > >
> > > > thanks
> > > > yang
> > > > On Oct 30, 2014 8:16 PM, "Dmitriy Lyubimov" <dl...@gmail.com>
> wrote:
> > > >
> > > > > This is not a known problem...
> > > > >
> > > > > there are few ssvd here, sequential, MR and spark one. for the
> > record,
> > > > > which one are you running?
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Oct 30, 2014 at 4:37 PM, Yang <te...@gmail.com>
> wrote:
> > > > >
> > > > > > we are running ssvd on a dataset (this one is relatively small,
> > with
> > > > 8000
> > > > > > rows, number of columns is 64 ), we ran it with rank = 58, since
> > > > > sampling
> > > > > > p=5.
> > > > > >
> > > > > > the result had NaN on multiple columns.
> > > > > >
> > > > > > why would this appear ?
> > > > > >
> > > > > > I am now running with lower rank=20 , to see if it goes away.
> > > > > >
> > > > > >
> > > > > > Thanks
> > > > > > Yang
> > > > > >
> > > > >
> > > >
> > >
> >
>
Re: NaN produced by SSVD ?
Posted by Yang <te...@gmail.com>.
let me check in the morning......
btw Dmitriy we are now trying to use the new spark version of ssvd
(from git), i see that u are still the author, so i'll be coming here
again with more questions :)
we are also exploring using pLSA directly instead of matrix
factorization, that could possibly be faster. again some new
implementations are available on JIRAs
On Nov 3, 2014 2:20 PM, "Dmitriy Lyubimov" <dl...@gmail.com> wrote:
> Ok. so that's what i suspected.
>
> The method generally is not intended to run on inputs with ranks smaller
> than k+p parameters. MR version doesn't even check for it.
>
> However as i mentioned in manual, i did run tests with -q=0 in which case
> correspondent singular vectors on the right should be reset to 0.0, not
> NaNs . It is possible that with -q=1 power iterations do something
> inadmissible in that situation.
>
> just for the record, what -q setting have you used?
>
> On Mon, Nov 3, 2014 at 2:00 PM, Yang <te...@gmail.com> wrote:
>
> > it does have something to do with K. previously I used a formular to
> > determine my rank to use by
> >
> > rank = N - p - 1 = 64 - 5 -1 = 58 , where N is the number of columns of
> > the original matrix.
> >
> > then I tried using rank = 50, it worked.
> >
> > well.... as I write this email, I realized that the reason might be that
> > the actual rank R of the original matrix may be much smaller than N, that
> > could be the reason. but it is a bit difficult to figure out that R
> > beforehand.
> >
> >
> > thanks
> > Yang
> >
> > On Fri, Oct 31, 2014 at 5:01 PM, Dmitriy Lyubimov <dl...@gmail.com>
> > wrote:
> >
> > > is the matrix by any chance constructed so that it may have rank < k? I
> > > think MR code is not checking for that.
> > >
> > > In spark shell i have :
> > >
> > > mahout> val a = dense( (0,0),(0,0) )
> > > a: org.apache.mahout.math.DenseMatrix =
> > > {
> > > 0 => {}
> > > 1 => {}
> > > }
> > > mahout> svd(a)
> > > res0: (org.apache.mahout.math.Matrix, org.apache.mahout.math.Matrix,
> > > org.apache.mahout.math.DenseVector) =
> > > ({
> > > 0 => {0:1.0}
> > > 1 => {1:1.0}
> > > },{
> > > 0 => {0:-1.0}
> > > 1 => {1:-1.0}
> > > },{})
> > >
> > > But :
> > >
> > > mahout> ssvd(a,2,0)
> > >
> > > java.lang.AssertionError: assertion failed: Rank-deficiency detected
> > during
> > > s-SVD
> > >
> > > or
> > > mahout> val drmA = drmParallelize(a,2)
> > > mahout> dssvd(drmA, k=2)
> > > java.lang.IllegalArgumentException: R is rank-deficient.
> > >
> > >
> > > the MR version doesn't check for these effects and it may create some
> > > degenerate results, although i thought those should be 0s, at least
> when
> > > -q=0. I am not sure for -q=1,2...
> > >
> > >
> > >
> > >
> > > On Thu, Oct 30, 2014 at 10:35 PM, Yang <te...@gmail.com> wrote:
> > >
> > > > i am talking about the MR one.
> > > >
> > > > thanks
> > > > yang
> > > > On Oct 30, 2014 8:16 PM, "Dmitriy Lyubimov" <dl...@gmail.com>
> wrote:
> > > >
> > > > > This is not a known problem...
> > > > >
> > > > > there are few ssvd here, sequential, MR and spark one. for the
> > record,
> > > > > which one are you running?
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Oct 30, 2014 at 4:37 PM, Yang <te...@gmail.com>
> wrote:
> > > > >
> > > > > > we are running ssvd on a dataset (this one is relatively small,
> > with
> > > > 8000
> > > > > > rows, number of columns is 64 ), we ran it with rank = 58, since
> > > > > sampling
> > > > > > p=5.
> > > > > >
> > > > > > the result had NaN on multiple columns.
> > > > > >
> > > > > > why would this appear ?
> > > > > >
> > > > > > I am now running with lower rank=20 , to see if it goes away.
> > > > > >
> > > > > >
> > > > > > Thanks
> > > > > > Yang
> > > > > >
> > > > >
> > > >
> > >
> >
>
Re: NaN produced by SSVD ?
Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Ok. so that's what i suspected.
The method generally is not intended to run on inputs with ranks smaller
than k+p parameters. MR version doesn't even check for it.
However as i mentioned in manual, i did run tests with -q=0 in which case
correspondent singular vectors on the right should be reset to 0.0, not
NaNs . It is possible that with -q=1 power iterations do something
inadmissible in that situation.
just for the record, what -q setting have you used?
On Mon, Nov 3, 2014 at 2:00 PM, Yang <te...@gmail.com> wrote:
> it does have something to do with K. previously I used a formular to
> determine my rank to use by
>
> rank = N - p - 1 = 64 - 5 -1 = 58 , where N is the number of columns of
> the original matrix.
>
> then I tried using rank = 50, it worked.
>
> well.... as I write this email, I realized that the reason might be that
> the actual rank R of the original matrix may be much smaller than N, that
> could be the reason. but it is a bit difficult to figure out that R
> beforehand.
>
>
> thanks
> Yang
>
> On Fri, Oct 31, 2014 at 5:01 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
>
> > is the matrix by any chance constructed so that it may have rank < k? I
> > think MR code is not checking for that.
> >
> > In spark shell i have :
> >
> > mahout> val a = dense( (0,0),(0,0) )
> > a: org.apache.mahout.math.DenseMatrix =
> > {
> > 0 => {}
> > 1 => {}
> > }
> > mahout> svd(a)
> > res0: (org.apache.mahout.math.Matrix, org.apache.mahout.math.Matrix,
> > org.apache.mahout.math.DenseVector) =
> > ({
> > 0 => {0:1.0}
> > 1 => {1:1.0}
> > },{
> > 0 => {0:-1.0}
> > 1 => {1:-1.0}
> > },{})
> >
> > But :
> >
> > mahout> ssvd(a,2,0)
> >
> > java.lang.AssertionError: assertion failed: Rank-deficiency detected
> during
> > s-SVD
> >
> > or
> > mahout> val drmA = drmParallelize(a,2)
> > mahout> dssvd(drmA, k=2)
> > java.lang.IllegalArgumentException: R is rank-deficient.
> >
> >
> > the MR version doesn't check for these effects and it may create some
> > degenerate results, although i thought those should be 0s, at least when
> > -q=0. I am not sure for -q=1,2...
> >
> >
> >
> >
> > On Thu, Oct 30, 2014 at 10:35 PM, Yang <te...@gmail.com> wrote:
> >
> > > i am talking about the MR one.
> > >
> > > thanks
> > > yang
> > > On Oct 30, 2014 8:16 PM, "Dmitriy Lyubimov" <dl...@gmail.com> wrote:
> > >
> > > > This is not a known problem...
> > > >
> > > > there are few ssvd here, sequential, MR and spark one. for the
> record,
> > > > which one are you running?
> > > >
> > > >
> > > >
> > > > On Thu, Oct 30, 2014 at 4:37 PM, Yang <te...@gmail.com> wrote:
> > > >
> > > > > we are running ssvd on a dataset (this one is relatively small,
> with
> > > 8000
> > > > > rows, number of columns is 64 ), we ran it with rank = 58, since
> > > > sampling
> > > > > p=5.
> > > > >
> > > > > the result had NaN on multiple columns.
> > > > >
> > > > > why would this appear ?
> > > > >
> > > > > I am now running with lower rank=20 , to see if it goes away.
> > > > >
> > > > >
> > > > > Thanks
> > > > > Yang
> > > > >
> > > >
> > >
> >
>
Re: NaN produced by SSVD ?
Posted by Yang <te...@gmail.com>.
it does have something to do with K. previously I used a formular to
determine my rank to use by
rank = N - p - 1 = 64 - 5 -1 = 58 , where N is the number of columns of
the original matrix.
then I tried using rank = 50, it worked.
well.... as I write this email, I realized that the reason might be that
the actual rank R of the original matrix may be much smaller than N, that
could be the reason. but it is a bit difficult to figure out that R
beforehand.
thanks
Yang
On Fri, Oct 31, 2014 at 5:01 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> is the matrix by any chance constructed so that it may have rank < k? I
> think MR code is not checking for that.
>
> In spark shell i have :
>
> mahout> val a = dense( (0,0),(0,0) )
> a: org.apache.mahout.math.DenseMatrix =
> {
> 0 => {}
> 1 => {}
> }
> mahout> svd(a)
> res0: (org.apache.mahout.math.Matrix, org.apache.mahout.math.Matrix,
> org.apache.mahout.math.DenseVector) =
> ({
> 0 => {0:1.0}
> 1 => {1:1.0}
> },{
> 0 => {0:-1.0}
> 1 => {1:-1.0}
> },{})
>
> But :
>
> mahout> ssvd(a,2,0)
>
> java.lang.AssertionError: assertion failed: Rank-deficiency detected during
> s-SVD
>
> or
> mahout> val drmA = drmParallelize(a,2)
> mahout> dssvd(drmA, k=2)
> java.lang.IllegalArgumentException: R is rank-deficient.
>
>
> the MR version doesn't check for these effects and it may create some
> degenerate results, although i thought those should be 0s, at least when
> -q=0. I am not sure for -q=1,2...
>
>
>
>
> On Thu, Oct 30, 2014 at 10:35 PM, Yang <te...@gmail.com> wrote:
>
> > i am talking about the MR one.
> >
> > thanks
> > yang
> > On Oct 30, 2014 8:16 PM, "Dmitriy Lyubimov" <dl...@gmail.com> wrote:
> >
> > > This is not a known problem...
> > >
> > > there are few ssvd here, sequential, MR and spark one. for the record,
> > > which one are you running?
> > >
> > >
> > >
> > > On Thu, Oct 30, 2014 at 4:37 PM, Yang <te...@gmail.com> wrote:
> > >
> > > > we are running ssvd on a dataset (this one is relatively small, with
> > 8000
> > > > rows, number of columns is 64 ), we ran it with rank = 58, since
> > > sampling
> > > > p=5.
> > > >
> > > > the result had NaN on multiple columns.
> > > >
> > > > why would this appear ?
> > > >
> > > > I am now running with lower rank=20 , to see if it goes away.
> > > >
> > > >
> > > > Thanks
> > > > Yang
> > > >
> > >
> >
>
Re: NaN produced by SSVD ?
Posted by Dmitriy Lyubimov <dl...@gmail.com>.
is the matrix by any chance constructed so that it may have rank < k? I
think MR code is not checking for that.
In spark shell i have :
mahout> val a = dense( (0,0),(0,0) )
a: org.apache.mahout.math.DenseMatrix =
{
0 => {}
1 => {}
}
mahout> svd(a)
res0: (org.apache.mahout.math.Matrix, org.apache.mahout.math.Matrix,
org.apache.mahout.math.DenseVector) =
({
0 => {0:1.0}
1 => {1:1.0}
},{
0 => {0:-1.0}
1 => {1:-1.0}
},{})
But :
mahout> ssvd(a,2,0)
java.lang.AssertionError: assertion failed: Rank-deficiency detected during
s-SVD
or
mahout> val drmA = drmParallelize(a,2)
mahout> dssvd(drmA, k=2)
java.lang.IllegalArgumentException: R is rank-deficient.
the MR version doesn't check for these effects and it may create some
degenerate results, although i thought those should be 0s, at least when
-q=0. I am not sure for -q=1,2...
On Thu, Oct 30, 2014 at 10:35 PM, Yang <te...@gmail.com> wrote:
> i am talking about the MR one.
>
> thanks
> yang
> On Oct 30, 2014 8:16 PM, "Dmitriy Lyubimov" <dl...@gmail.com> wrote:
>
> > This is not a known problem...
> >
> > there are few ssvd here, sequential, MR and spark one. for the record,
> > which one are you running?
> >
> >
> >
> > On Thu, Oct 30, 2014 at 4:37 PM, Yang <te...@gmail.com> wrote:
> >
> > > we are running ssvd on a dataset (this one is relatively small, with
> 8000
> > > rows, number of columns is 64 ), we ran it with rank = 58, since
> > sampling
> > > p=5.
> > >
> > > the result had NaN on multiple columns.
> > >
> > > why would this appear ?
> > >
> > > I am now running with lower rank=20 , to see if it goes away.
> > >
> > >
> > > Thanks
> > > Yang
> > >
> >
>
Re: NaN produced by SSVD ?
Posted by Yang <te...@gmail.com>.
i am talking about the MR one.
thanks
yang
On Oct 30, 2014 8:16 PM, "Dmitriy Lyubimov" <dl...@gmail.com> wrote:
> This is not a known problem...
>
> there are few ssvd here, sequential, MR and spark one. for the record,
> which one are you running?
>
>
>
> On Thu, Oct 30, 2014 at 4:37 PM, Yang <te...@gmail.com> wrote:
>
> > we are running ssvd on a dataset (this one is relatively small, with 8000
> > rows, number of columns is 64 ), we ran it with rank = 58, since
> sampling
> > p=5.
> >
> > the result had NaN on multiple columns.
> >
> > why would this appear ?
> >
> > I am now running with lower rank=20 , to see if it goes away.
> >
> >
> > Thanks
> > Yang
> >
>
Re: NaN produced by SSVD ?
Posted by Dmitriy Lyubimov <dl...@gmail.com>.
This is not a known problem...
there are few ssvd here, sequential, MR and spark one. for the record,
which one are you running?
On Thu, Oct 30, 2014 at 4:37 PM, Yang <te...@gmail.com> wrote:
> we are running ssvd on a dataset (this one is relatively small, with 8000
> rows, number of columns is 64 ), we ran it with rank = 58, since sampling
> p=5.
>
> the result had NaN on multiple columns.
>
> why would this appear ?
>
> I am now running with lower rank=20 , to see if it goes away.
>
>
> Thanks
> Yang
>