You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Pat Ferrel <pa...@gmail.com> on 2014/07/20 06:06:43 UTC

A + 1

Using methods instead of symbolic ops returns different types so methods work, ops don’t. If math is math, they should do the smae thing so I’d like to know what it is supposed to do, can you please allow me to ask specific people the question?

  test("plus one"){
    val a = dense(
      (1, 1),
      (0, 0))

    val drmA1 = drmParallelize(m = a, numPartitions = 2)

    // modified to return a new CheckpointedDrm so maintains immutability but still only increases the row cardinality
    // by returning new CheckpointedDrmSpark[K](rdd, nrow + n, ncol, _cacheStorageLevel ) Hack for now.
    val drmABigger1 = drmA1.addToRowCardinality(1)

    val drmABiggerPlusOne1 = drmABigger1.plus(1.0)  // drmABigger has no row 2 in the rdd but an empty row 1
    // drmABiggerPlusOne1 is a dense matrix
    println(drmABiggerPlusOne1)

    val drmA2 = drmParallelize(m = a, numPartitions = 2)
    val drmABigger2 = drmA2.addToRowCardinality(1)
    val drmABiggerPlusOne2 = drmABigger2 + 1.0
    drmABiggerPlusOne2.writeDRM("tmp/plus-one/drma-bigger-plus-one-ops/")


    val bp = 0
  }

method #1 works, #2 doesn’t. Even when I create a new CheckpintedDrmSpark with larger _nrow than is in the data—even without the addToRowCardinality.

I agree that in some cases it doesn’t can you please allow me to ask if it _should_?



On Jul 19, 2014, at 8:56 PM, Anand Avati <av...@gluster.org> wrote:




On Sat, Jul 19, 2014 at 6:50 PM, Pat Ferrel <pa...@gmail.com> wrote:

On another thread I’ll send you code that shows A + 1 works with blank rows in A.

I don't see how that worked for you. See this:

  test("DRM addToRowCardinality - will fail") {
    val inCoreA = sparse(
      0 -> 1 :: 1 -> 2 :: Nil,
      0 -> 3 :: 1 -> 4 :: Nil,
      0 -> 2 :: 1 -> 0.0 :: Nil
    )

    val inCoreBControl = sparse(
      0 -> 2 :: 1 -> 3 :: Nil,
      0 -> 4 :: 1 -> 5 :: Nil,
      0 -> 3 :: 1 -> 1 :: Nil,
      0 -> 1 :: 1 -> 1 :: Nil,
      0 -> 1 :: 1 -> 1 :: Nil
    )

    val drmA = drmParallelize(inCoreA)
    drmA.addToRowCardinality(2)
    val drmB = (drmA + 1.0).checkpoint()

    (drmB.collect - inCoreBControl).norm should be < 1e-3
  }
  test("DRM addToRowCardinality - wont fail") {
    val inCoreA = sparse(
      0 -> 1 :: 1 -> 2 :: Nil,
      0 -> 3 :: 1 -> 4 :: Nil,
      0 -> 2 :: 1 -> 0.0 :: Nil
    )

    val inCoreBWrong = sparse(
      0 -> 2 :: 1 -> 3 :: Nil,
      0 -> 4 :: 1 -> 5 :: Nil,
      0 -> 3 :: 1 -> 1 :: Nil,
      0 -> 0 :: 1 -> 0 :: Nil,
      0 -> 0 :: 1 -> 0 :: Nil
    )

    val drmA = drmParallelize(inCoreA)
    drmA.addToRowCardinality(2)
    val drmB = (drmA + 1.0).checkpoint()

    (drmB.collect - inCoreBWrong).norm should be < 1e-3
  }

And sure enough, inCoreBControl fails, and inCoreBWrong succeeds:

- DRM addToRowCardinality - will fail *** FAILED ***
  2.0 was not less than 0.001 (DrmLikeSuiteBase.scala:116)
- DRM addToRowCardinality - wont fail

BTW this implies rbind will not solve the problem, it is firmly in data prep. But until I know the rules I won’t know how to do the right thing.

Rbind expects both A and B to have their Int row keys filled from 0 to nrow-1, which is how they should be ideally.

Re: A + 1

Posted by Anand Avati <av...@gluster.org>.

On Sun, Jul 20, 2014 at 8:16 AM, Pat Ferrel <pa...@gmail.com> wrote:

> Oh, it’s working. The conversion to dense is saving the day in #1 I think.
> Where it is using nrow to create enough rows rather than looking in the rdd
> for row keys—just a guess.
>

The conversion to in-core Matrix (dense or sparse does not matter) is
saving the day.


>
> I think it was you that said dense or sparse the math should produce the
> same result. One or the other is a bug, right?
>
>
Yes, sparse or dense the math should produce the same result. And
similarly, with in-core or DRM. #1 is using the syntactic sugar to convert
DRM to in-core and is testing on in-core. The variable drmABigger1 is not
even a drm. Note that there isn't even a plus() method defined on DRM.



> I am going to add some matrix multiply tests that work on rdd backed
> objects. The current tests could use some additions. I suspect that
> multiply and transpose will work correctly with non-existant rows/columns
> but still need the designers to come out on one side or the other.
>


A failing test case trumps any number of working test cases. If you want
nrow fudging to work, you will have to first make sure it does not have any
surprise side effects for future users of that operator/method (especially
correctness, not even performance), no matter what the intention of the
designers was.


> If row keys need to be sequential and unbroken, this is s big deal and new
> to me.


Int row keys are sequential and unbroken. There is code which depends on
this:

- AewScalar operator does not even get hold of rows if key/vec is non
existent, resulting in bugs like I have shown
- AewB operator (element-wise +,-,*,/ with B) will fail for the same
reason, but with "java.lang.UnsupportedOperationException: empty.reduceLeft"

and some which hint at the intention:

- rowRange() operator rewrites/modifies int keys to be sequential/unbroken
in the resulting drm.
- Even in-core sparse matrices have all rows (even if the given row is
empty).

There are probably more issues if you dig deeper. Even if the intention of
the designers was other-wise (which I doubt), reality is you have plenty of
code and operators to fix (in non trivial ways) to make nrow fudging a safe
operation.

Thanks


> On Jul 19, 2014, at 10:09 PM, Anand Avati <av...@gluster.org> wrote:
>
> On Sat, Jul 19, 2014 at 9:06 PM, Pat Ferrel <pa...@gmail.com> wrote:
>
> > Using methods instead of symbolic ops returns different types so methods
> > work, ops don’t. If math is math, they should do the smae thing so I’d
> like
> > to know what it is supposed to do, can you please allow me to ask
> specific
> > people the question?
> >
>
>
> I don't see how I'm coming in the way of anybody else from replying. By
> posting to a public mailing list, you by definition allow anybody to review
> and comment.
>
>
>  test("plus one"){
> >    val a = dense(
> >      (1, 1),
> >      (0, 0))
> >
> >    val drmA1 = drmParallelize(m = a, numPartitions = 2)
> >
> >    // modified to return a new CheckpointedDrm so maintains immutability
> > but still only increases the row cardinality
> >    // by returning new CheckpointedDrmSpark[K](rdd, nrow + n, ncol,
> > _cacheStorageLevel ) Hack for now.
> >    val drmABigger1 = drmA1.addToRowCardinality(1)
> >
> >    val drmABiggerPlusOne1 = drmABigger1.plus(1.0)  // drmABigger has no
> > row 2 in the rdd but an empty row 1
> >    // drmABiggerPlusOne1 is a dense matrix
> >    println(drmABiggerPlusOne1)
> >
> >    val drmA2 = drmParallelize(m = a, numPartitions = 2)
> >    val drmABigger2 = drmA2.addToRowCardinality(1)
> >    val drmABiggerPlusOne2 = drmABigger2 + 1.0
> >    drmABiggerPlusOne2.writeDRM("tmp/plus-one/drma-bigger-plus-one-ops/")
> >
> >
> >    val bp = 0
> >  }
> >
> > method #1 works, #2 doesn’t. Even when I create a new CheckpintedDrmSpark
> > with larger _nrow than is in the data—even without the
> addToRowCardinality.
> >
>
> In Method #1, plus() is not even operating on the DRM. The plus() is
> operating on in core Matrix which is implicitly collect()ed because of the
> drm2InCore implicit type converter. So drmABiggerPlusOne1 is neither a
> DrmLike nor CheckpointedDrm, but is actually just an incore Matrix. You
> will have to drmParallelize() it again in order to do any distributed
> operations.
>
> I agree that in some cases it doesn’t can you please allow me to ask if it
> > _should_?
> >
>
> I don't think it is working in any case. The implicit converter is making
> it feel like it is working.
>
> Thanks
>
>
>
>
> > On Jul 19, 2014, at 8:56 PM, Anand Avati <av...@gluster.org> wrote:
> >
> >
> >
> >
> > On Sat, Jul 19, 2014 at 6:50 PM, Pat Ferrel <pa...@gmail.com>
> wrote:
> >
> >>
> >> On another thread I’ll send you code that shows A + 1 works with blank
> >> rows in A.
> >>
> >
> > I don't see how that worked for you. See this:
> >
> >  test("DRM addToRowCardinality - will fail") {
> >    val inCoreA = sparse(
> >      0 -> 1 :: 1 -> 2 :: Nil,
> >      0 -> 3 :: 1 -> 4 :: Nil,
> >      0 -> 2 :: 1 -> 0.0 :: Nil
> >    )
> >
> >    val inCoreBControl = sparse(
> >      0 -> 2 :: 1 -> 3 :: Nil,
> >      0 -> 4 :: 1 -> 5 :: Nil,
> >      0 -> 3 :: 1 -> 1 :: Nil,
> >      0 -> 1 :: 1 -> 1 :: Nil,
> >      0 -> 1 :: 1 -> 1 :: Nil
> >    )
> >
> >    val drmA = drmParallelize(inCoreA)
> >    drmA.addToRowCardinality(2)
> >    val drmB = (drmA + 1.0).checkpoint()
> >
> >    (drmB.collect - inCoreBControl).norm should be < 1e-3
> >
> >  }
> >
> >  test("DRM addToRowCardinality - wont fail") {
> >    val inCoreA = sparse(
> >      0 -> 1 :: 1 -> 2 :: Nil,
> >      0 -> 3 :: 1 -> 4 :: Nil,
> >      0 -> 2 :: 1 -> 0.0 :: Nil
> >    )
> >
> >    val inCoreBWrong = sparse(
> >      0 -> 2 :: 1 -> 3 :: Nil,
> >      0 -> 4 :: 1 -> 5 :: Nil,
> >      0 -> 3 :: 1 -> 1 :: Nil,
> >      0 -> 0 :: 1 -> 0 :: Nil,
> >      0 -> 0 :: 1 -> 0 :: Nil
> >    )
> >
> >    val drmA = drmParallelize(inCoreA)
> >    drmA.addToRowCardinality(2)
> >    val drmB = (drmA + 1.0).checkpoint()
> >
> >    (drmB.collect - inCoreBWrong).norm should be < 1e-3
> >  }
> >
> >
> > And sure enough, inCoreBControl fails, and inCoreBWrong succeeds:
> >
> > - DRM addToRowCardinality - will fail *** FAILED ***
> >
> >  2.0 was not less than 0.001 (DrmLikeSuiteBase.scala:116)
> > - DRM addToRowCardinality - wont fail
> >
> >
> > BTW this implies rbind will not solve the problem, it is firmly in data
> >> prep. But until I know the rules I won’t know how to do the right thing.
> >>
> >
> > Rbind expects both A and B to have their Int row keys filled from 0 to
> > nrow-1, which is how they should be ideally.
> >
> >
> >
>
>

Re: A + 1

Posted by Pat Ferrel <pa...@gmail.com>.

Sorry but my question keep getting lost in the back and forth, no one reads that much email.

Oh, it’s working. The conversion to dense is saving the day in #1 I think. Where it is using nrow to create enough rows rather than looking in the rdd for row keys—just a guess.

I think it was you that said dense or sparse the math should produce the same result. One or the other is a bug, right?

I am going to add some matrix multiply tests that work on rdd backed objects. The current tests could use some additions. I suspect that multiply and transpose will work correctly with non-existant rows/columns but still need the designers to come out on one side or the other. 

If row keys need to be sequential and unbroken, this is s big deal and new to me.

On Jul 19, 2014, at 10:09 PM, Anand Avati <av...@gluster.org> wrote:

On Sat, Jul 19, 2014 at 9:06 PM, Pat Ferrel <pa...@gmail.com> wrote:

> Using methods instead of symbolic ops returns different types so methods
> work, ops don’t. If math is math, they should do the smae thing so I’d like
> to know what it is supposed to do, can you please allow me to ask specific
> people the question?
> 

I don't see how I'm coming in the way of anybody else from replying. By
posting to a public mailing list, you by definition allow anybody to review
and comment.

 test("plus one"){
>    val a = dense(
>      (1, 1),
>      (0, 0))
> 
>    val drmA1 = drmParallelize(m = a, numPartitions = 2)
> 
>    // modified to return a new CheckpointedDrm so maintains immutability
> but still only increases the row cardinality
>    // by returning new CheckpointedDrmSpark[K](rdd, nrow + n, ncol,
> _cacheStorageLevel ) Hack for now.
>    val drmABigger1 = drmA1.addToRowCardinality(1)
> 
>    val drmABiggerPlusOne1 = drmABigger1.plus(1.0)  // drmABigger has no
> row 2 in the rdd but an empty row 1
>    // drmABiggerPlusOne1 is a dense matrix
>    println(drmABiggerPlusOne1)
> 
>    val drmA2 = drmParallelize(m = a, numPartitions = 2)
>    val drmABigger2 = drmA2.addToRowCardinality(1)
>    val drmABiggerPlusOne2 = drmABigger2 + 1.0
>    drmABiggerPlusOne2.writeDRM("tmp/plus-one/drma-bigger-plus-one-ops/")
> 
> 
>    val bp = 0
>  }
> 
> method #1 works, #2 doesn’t. Even when I create a new CheckpintedDrmSpark
> with larger _nrow than is in the data—even without the addToRowCardinality.
> 

In Method #1, plus() is not even operating on the DRM. The plus() is
operating on in core Matrix which is implicitly collect()ed because of the
drm2InCore implicit type converter. So drmABiggerPlusOne1 is neither a
DrmLike nor CheckpointedDrm, but is actually just an incore Matrix. You
will have to drmParallelize() it again in order to do any distributed
operations.

I agree that in some cases it doesn’t can you please allow me to ask if it
> _should_?
> 

I don't think it is working in any case. The implicit converter is making
it feel like it is working.

Thanks

> On Jul 19, 2014, at 8:56 PM, Anand Avati <av...@gluster.org> wrote:
> 
> 
> 
> 
> On Sat, Jul 19, 2014 at 6:50 PM, Pat Ferrel <pa...@gmail.com> wrote:
> 
>> 
>> On another thread I’ll send you code that shows A + 1 works with blank
>> rows in A.
>> 
> 
> I don't see how that worked for you. See this:
> 
>  test("DRM addToRowCardinality - will fail") {
>    val inCoreA = sparse(
>      0 -> 1 :: 1 -> 2 :: Nil,
>      0 -> 3 :: 1 -> 4 :: Nil,
>      0 -> 2 :: 1 -> 0.0 :: Nil
>    )
> 
>    val inCoreBControl = sparse(
>      0 -> 2 :: 1 -> 3 :: Nil,
>      0 -> 4 :: 1 -> 5 :: Nil,
>      0 -> 3 :: 1 -> 1 :: Nil,
>      0 -> 1 :: 1 -> 1 :: Nil,
>      0 -> 1 :: 1 -> 1 :: Nil
>    )
> 
>    val drmA = drmParallelize(inCoreA)
>    drmA.addToRowCardinality(2)
>    val drmB = (drmA + 1.0).checkpoint()
> 
>    (drmB.collect - inCoreBControl).norm should be < 1e-3
> 
>  }
> 
>  test("DRM addToRowCardinality - wont fail") {
>    val inCoreA = sparse(
>      0 -> 1 :: 1 -> 2 :: Nil,
>      0 -> 3 :: 1 -> 4 :: Nil,
>      0 -> 2 :: 1 -> 0.0 :: Nil
>    )
> 
>    val inCoreBWrong = sparse(
>      0 -> 2 :: 1 -> 3 :: Nil,
>      0 -> 4 :: 1 -> 5 :: Nil,
>      0 -> 3 :: 1 -> 1 :: Nil,
>      0 -> 0 :: 1 -> 0 :: Nil,
>      0 -> 0 :: 1 -> 0 :: Nil
>    )
> 
>    val drmA = drmParallelize(inCoreA)
>    drmA.addToRowCardinality(2)
>    val drmB = (drmA + 1.0).checkpoint()
> 
>    (drmB.collect - inCoreBWrong).norm should be < 1e-3
>  }
> 
> 
> And sure enough, inCoreBControl fails, and inCoreBWrong succeeds:
> 
> - DRM addToRowCardinality - will fail *** FAILED ***
> 
>  2.0 was not less than 0.001 (DrmLikeSuiteBase.scala:116)
> - DRM addToRowCardinality - wont fail
> 
> 
> BTW this implies rbind will not solve the problem, it is firmly in data
>> prep. But until I know the rules I won’t know how to do the right thing.
>> 
> 
> Rbind expects both A and B to have their Int row keys filled from 0 to
> nrow-1, which is how they should be ideally.
> 
> 
>

Re: A + 1

Posted by Anand Avati <av...@gluster.org>.

On Sat, Jul 19, 2014 at 9:06 PM, Pat Ferrel <pa...@gmail.com> wrote:

> Using methods instead of symbolic ops returns different types so methods
> work, ops don’t. If math is math, they should do the smae thing so I’d like
> to know what it is supposed to do, can you please allow me to ask specific
> people the question?
>


I don't see how I'm coming in the way of anybody else from replying. By
posting to a public mailing list, you by definition allow anybody to review
and comment.


  test("plus one"){
>     val a = dense(
>       (1, 1),
>       (0, 0))
>
>     val drmA1 = drmParallelize(m = a, numPartitions = 2)
>
>     // modified to return a new CheckpointedDrm so maintains immutability
> but still only increases the row cardinality
>     // by returning new CheckpointedDrmSpark[K](rdd, nrow + n, ncol,
> _cacheStorageLevel ) Hack for now.
>     val drmABigger1 = drmA1.addToRowCardinality(1)
>
>     val drmABiggerPlusOne1 = drmABigger1.plus(1.0)  // drmABigger has no
> row 2 in the rdd but an empty row 1
>     // drmABiggerPlusOne1 is a dense matrix
>     println(drmABiggerPlusOne1)
>
>     val drmA2 = drmParallelize(m = a, numPartitions = 2)
>     val drmABigger2 = drmA2.addToRowCardinality(1)
>     val drmABiggerPlusOne2 = drmABigger2 + 1.0
>     drmABiggerPlusOne2.writeDRM("tmp/plus-one/drma-bigger-plus-one-ops/")
>
>
>     val bp = 0
>   }
>
> method #1 works, #2 doesn’t. Even when I create a new CheckpintedDrmSpark
> with larger _nrow than is in the data—even without the addToRowCardinality.
>

In Method #1, plus() is not even operating on the DRM. The plus() is
operating on in core Matrix which is implicitly collect()ed because of the
drm2InCore implicit type converter. So drmABiggerPlusOne1 is neither a
DrmLike nor CheckpointedDrm, but is actually just an incore Matrix. You
will have to drmParallelize() it again in order to do any distributed
operations.

I agree that in some cases it doesn’t can you please allow me to ask if it
> _should_?
>

I don't think it is working in any case. The implicit converter is making
it feel like it is working.

Thanks




> On Jul 19, 2014, at 8:56 PM, Anand Avati <av...@gluster.org> wrote:
>
>
>
>
> On Sat, Jul 19, 2014 at 6:50 PM, Pat Ferrel <pa...@gmail.com> wrote:
>
>>
>> On another thread I’ll send you code that shows A + 1 works with blank
>> rows in A.
>>
>
> I don't see how that worked for you. See this:
>
>   test("DRM addToRowCardinality - will fail") {
>     val inCoreA = sparse(
>       0 -> 1 :: 1 -> 2 :: Nil,
>       0 -> 3 :: 1 -> 4 :: Nil,
>       0 -> 2 :: 1 -> 0.0 :: Nil
>     )
>
>     val inCoreBControl = sparse(
>       0 -> 2 :: 1 -> 3 :: Nil,
>       0 -> 4 :: 1 -> 5 :: Nil,
>       0 -> 3 :: 1 -> 1 :: Nil,
>       0 -> 1 :: 1 -> 1 :: Nil,
>       0 -> 1 :: 1 -> 1 :: Nil
>     )
>
>     val drmA = drmParallelize(inCoreA)
>     drmA.addToRowCardinality(2)
>     val drmB = (drmA + 1.0).checkpoint()
>
>     (drmB.collect - inCoreBControl).norm should be < 1e-3
>
>   }
>
>   test("DRM addToRowCardinality - wont fail") {
>     val inCoreA = sparse(
>       0 -> 1 :: 1 -> 2 :: Nil,
>       0 -> 3 :: 1 -> 4 :: Nil,
>       0 -> 2 :: 1 -> 0.0 :: Nil
>     )
>
>     val inCoreBWrong = sparse(
>       0 -> 2 :: 1 -> 3 :: Nil,
>       0 -> 4 :: 1 -> 5 :: Nil,
>       0 -> 3 :: 1 -> 1 :: Nil,
>       0 -> 0 :: 1 -> 0 :: Nil,
>       0 -> 0 :: 1 -> 0 :: Nil
>     )
>
>     val drmA = drmParallelize(inCoreA)
>     drmA.addToRowCardinality(2)
>     val drmB = (drmA + 1.0).checkpoint()
>
>     (drmB.collect - inCoreBWrong).norm should be < 1e-3
>   }
>
>
> And sure enough, inCoreBControl fails, and inCoreBWrong succeeds:
>
> - DRM addToRowCardinality - will fail *** FAILED ***
>
>   2.0 was not less than 0.001 (DrmLikeSuiteBase.scala:116)
> - DRM addToRowCardinality - wont fail
>
>
> BTW this implies rbind will not solve the problem, it is firmly in data
>> prep. But until I know the rules I won’t know how to do the right thing.
>>
>
> Rbind expects both A and B to have their Int row keys filled from 0 to
> nrow-1, which is how they should be ideally.
>
>
>