You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "Baeriswyl Kuno SBB CFF FFS (Extern)" <ku...@sbb.ch> on 2020/06/10 09:53:14 UTC

How to do logical subsetting in Mathout

Hi all,

I've pumped into the Mahout, because I need to migrate a R Script including matric algebra to Spark Cluster.

Mahouts Scala/Spark Binding provides all of the operations, except of logical subsetting.

Example:

x1 = c(1.0,4.0,2.0,5.0)
x2 = c(0,0,0,0)
x2[x1 > 1] = 2

Would set value 2 to return Row 2,3 and 4.

Is there an equivalent function in Mahout?


Thanks.

Kuno


Re: How to do logical subsetting in Mathout

Posted by Trevor Grant <tr...@gmail.com>.
Very nice... How would you feel about writing some docs on this?

tg


On Tue, Jul 21, 2020 at 1:54 AM Baeriswyl Kuno SBB CFF FFS (Extern) <
kuno.baeriswyl@sbb.ch> wrote:

> Hallo Andrew,
> thanks for your hint.
>
> Yes, that's way I've found too.
>
> def createIndexMap(x : CheckpointedDrm[Int]) : RDD[(Int, Int)] = {
>     val xIndexFiltered = x.rdd
>     .filter(r => r._2.get(0) > 0)
>     .map(r => r._1)
>
>     xIndexFiltered.zipWithIndex
>     .map(r => (r._1,r._2.toInt))
> }
>
> First, I filter the DRM and create a map with old and new indexes, as you
> mentioned.
>
> By appling joins this index map, I'm can reduce the rows in my DRM
> according to certain condition, do some more calculation and map back the
> newly calculated values to the original DRM.
>
> Like:
> def mergeDrm(drmOrig : CheckpointedDrm[Int],drmFiltriert :
> CheckpointedDrm[Int], indexMapping: RDD[(Int, Int)]) :
> CheckpointedDrm[Int] = {
>    drmWrap (
>             drmOrig.rdd
>             .map(r => Pair(r._1, r._2))
>             .leftOuterJoin(indexMapping.map(r => Pair(r._1, r._2)))
>             .map(r=> Pair(r._2._2, (r._1, r._2._1)))
>             .leftOuterJoin(drmFiltriert.rdd.map(r => Pair(Option(r._1),
> r._2)))
>             .map(r=> (r._2._1._1, r._2._2.getOrElse(r._2._1._2)))
>     )
> }
>
> Greets
>
> Kuno
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: Andrew Musselman <an...@gmail.com>
> Gesendet: Dienstag, 7. Juli 2020 23:16
> An: user@mahout.apache.org
> Betreff: Re: How to do logical subsetting in Mathout
>
> Kuno, thanks for your note. I don't know of an equivalent function out of
> the box, but if you want to get the indices where a condition is true you
> could try something in Scala like:
>
> myList.zipWithIndex.collect { case (item, index) if item > 1 => index }
>
> Hope this is helpful.
>
> On Wed, Jun 10, 2020 at 2:53 AM Baeriswyl Kuno SBB CFF FFS (Extern) <
> kuno.baeriswyl@sbb.ch> wrote:
>
> > Hi all,
> >
> > I've pumped into the Mahout, because I need to migrate a R Script
> > including matric algebra to Spark Cluster.
> >
> > Mahouts Scala/Spark Binding provides all of the operations, except of
> > logical subsetting.
> >
> > Example:
> >
> > x1 = c(1.0,4.0,2.0,5.0)
> > x2 = c(0,0,0,0)
> > x2[x1 > 1] = 2
> >
> > Would set value 2 to return Row 2,3 and 4.
> >
> > Is there an equivalent function in Mahout?
> >
> >
> > Thanks.
> >
> > Kuno
> >
> >
>

AW: How to do logical subsetting in Mathout

Posted by "Baeriswyl Kuno SBB CFF FFS (Extern)" <ku...@sbb.ch>.
Hallo Andrew,
thanks for your hint.

Yes, that's way I've found too. 

def createIndexMap(x : CheckpointedDrm[Int]) : RDD[(Int, Int)] = {
    val xIndexFiltered = x.rdd
    .filter(r => r._2.get(0) > 0)
    .map(r => r._1)

    xIndexFiltered.zipWithIndex 
    .map(r => (r._1,r._2.toInt))
}

First, I filter the DRM and create a map with old and new indexes, as you mentioned.

By appling joins this index map, I'm can reduce the rows in my DRM according to certain condition, do some more calculation and map back the newly calculated values to the original DRM.

Like:
def mergeDrm(drmOrig : CheckpointedDrm[Int],drmFiltriert : CheckpointedDrm[Int], indexMapping: RDD[(Int, Int)]) :  CheckpointedDrm[Int] = {
   drmWrap (
            drmOrig.rdd
            .map(r => Pair(r._1, r._2))
            .leftOuterJoin(indexMapping.map(r => Pair(r._1, r._2)))
            .map(r=> Pair(r._2._2, (r._1, r._2._1)))
            .leftOuterJoin(drmFiltriert.rdd.map(r => Pair(Option(r._1), r._2)))
            .map(r=> (r._2._1._1, r._2._2.getOrElse(r._2._1._2)))
    )
}

Greets

Kuno



-----Ursprüngliche Nachricht-----
Von: Andrew Musselman <an...@gmail.com> 
Gesendet: Dienstag, 7. Juli 2020 23:16
An: user@mahout.apache.org
Betreff: Re: How to do logical subsetting in Mathout

Kuno, thanks for your note. I don't know of an equivalent function out of the box, but if you want to get the indices where a condition is true you could try something in Scala like:

myList.zipWithIndex.collect { case (item, index) if item > 1 => index }

Hope this is helpful.

On Wed, Jun 10, 2020 at 2:53 AM Baeriswyl Kuno SBB CFF FFS (Extern) < kuno.baeriswyl@sbb.ch> wrote:

> Hi all,
>
> I've pumped into the Mahout, because I need to migrate a R Script 
> including matric algebra to Spark Cluster.
>
> Mahouts Scala/Spark Binding provides all of the operations, except of 
> logical subsetting.
>
> Example:
>
> x1 = c(1.0,4.0,2.0,5.0)
> x2 = c(0,0,0,0)
> x2[x1 > 1] = 2
>
> Would set value 2 to return Row 2,3 and 4.
>
> Is there an equivalent function in Mahout?
>
>
> Thanks.
>
> Kuno
>
>

Re: How to do logical subsetting in Mathout

Posted by Andrew Musselman <an...@gmail.com>.
Kuno, thanks for your note. I don't know of an equivalent function out of
the box, but if you want to get the indices where a condition is true you
could try something in Scala like:

myList.zipWithIndex.collect { case (item, index) if item > 1 => index }

Hope this is helpful.

On Wed, Jun 10, 2020 at 2:53 AM Baeriswyl Kuno SBB CFF FFS (Extern) <
kuno.baeriswyl@sbb.ch> wrote:

> Hi all,
>
> I've pumped into the Mahout, because I need to migrate a R Script
> including matric algebra to Spark Cluster.
>
> Mahouts Scala/Spark Binding provides all of the operations, except of
> logical subsetting.
>
> Example:
>
> x1 = c(1.0,4.0,2.0,5.0)
> x2 = c(0,0,0,0)
> x2[x1 > 1] = 2
>
> Would set value 2 to return Row 2,3 and 4.
>
> Is there an equivalent function in Mahout?
>
>
> Thanks.
>
> Kuno
>
>