You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@systemml.apache.org by Ethan Xu <et...@gmail.com> on 2016/03/31 17:47:33 UTC

Logical indexing?

Does SystemML support logical indexing?

For example if X is a numerical matrix with 2 columns and n rows (in my
case n ~ 35 million). I'd like to split the matrix row-wise according to
values of the first column. This is useful when I need to find
distributions of subgroups of population.  In R I can do

Y = X[ X[ ,1] > 10, ]

OR

ind = which(X[ ,1] > 10)
Y = X[ind, ]

It seems neither syntex works in SystemML.

I noticed there's an aggregate() function for SystemML, but it supports
coded categorical variable.

Perhaps one way to do that is creating an indicator n by 1 matrix Z that
takes values 1 and 2 where 1 corresponds to X[, 1] <= 10 and 2 corresponds
to X[,1] > 10. Then aggregate() X[,2] with respect to Z.

It seems transform() with 'bin' option is one obvious way to create such a
Z, however the 'bin' method only supports 'equi-width' currently.

Is looping through X[,1] the best option? Maybe I missed some other
convenient functions.

Any suggestions are greatly appreciated!

Best,

Ethan

Re: Logical indexing?

Posted by Ethan Xu <et...@gmail.com>.
Thanks a lot Matthias. Following up with your suggestions, I tried option 2:

# option 2: via removeEmpty
Ind = (X[,1]>10);
Y = removeEmpty(target=X, select=Ind);

SystemML throws a complaint (maybe I'm not using the correct version?):

Named parameter 'margin' missing. Please specify 'rows' or 'cols'.

It works correctly after adding the 'margin' argument:

# option 2: via removeEmpty
Ind = (X[,1]>10);
Y = removeEmpty(target=X, margin = "rows", select=Ind);

I know the document is being updated continuously, just want to point out
the help file for 'removeEmpty()' (link below) does not contain an
explanation of the 'select' argument yet:)
http://apache.github.io/incubator-systemml/dml-language-reference.html#matrix-construction-manipulation-and-aggregation-built-in-functions

Thanks again for your help

Best,

Ethan


On Sun, Apr 3, 2016 at 12:45 AM, Matthias Boehm <mb...@us.ibm.com> wrote:

> absolutely, if your want to count or aggregate values of the two groups,
> you should definitely go with the aggregate() call instead. The snippets I
> provided are just for the case where you want to run some other analysis
> over the subsets (e.g., running an algorithm over a sample or fold).
>
> Regards,
> Matthias
>
>
> [image: Inactive hide details for Ethan Xu ---03/31/2016 11:31:32 AM---Ah
> I missed the 'removeEmpty()' function. That's a smart ways to]Ethan Xu
> ---03/31/2016 11:31:32 AM---Ah I missed the 'removeEmpty()' function.
> That's a smart ways to trim matrix. Thanks Matthias!
>
> From: Ethan Xu <et...@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 03/31/2016 11:31 AM
> Subject: Re: Logical indexing?
> ------------------------------
>
>
>
> Ah I missed the 'removeEmpty()' function. That's a smart ways to trim
> matrix. Thanks Matthias!
>
> Also from your answer I realized 'ind = (X[,1] > 10);' is acceptable, so
> aggregation would work with
>
> ind = (X[,1] > 10) + 1;
> F = aggregate(target = X[,2], groups = ind, fn = "sum");
>
> Ethan
>
>
> On Thu, Mar 31, 2016 at 1:22 PM, Matthias Boehm <mb...@us.ibm.com> wrote:
>
> > just a quick correction of option 2:
> >
> > Ind = (X[,1]>10);
> > Y = removeEmpty(target=X, select=Ind);
> >
> > Regards,
> > Matthias
> >
> > [image: Inactive hide details for Matthias Boehm---03/31/2016 10:14:50
> > AM---that's a good question - no SystemML does not support set i]Matthias
> > Boehm---03/31/2016 10:14:50 AM---that's a good question - no SystemML
> does
> > not support set indexing yet but you can emulate it via pe
> >
> > From: Matthias Boehm/Almaden/IBM@IBMUS
> > To: dev@systemml.incubator.apache.org
> > Date: 03/31/2016 10:14 AM
> > Subject: Re: Logical indexing?
> > ------------------------------
> >
> >
> >
> > that's a good question - no SystemML does not support set indexing yet
> but
> > you can emulate it via permutation matrices or similar transformations.
> > Here are some examples:
> >
> > # option 1: via permutation (aka selection) matrices
> > P = removeEmpty(target=diag(X[,1]>10), margin="rows");
> > Y = P %*% X;
> >
> > # option 2: via removeEmpty
> > Ind = diag(X[,1]>10);
> > Y = removeEmpty(target=X, select=Ind);
> >
> >
> > Regards,
> > Matthias
> >
> > Ethan Xu ---03/31/2016 08:47:43 AM---Does SystemML support logical
> > indexing? For example if X is a numerical matrix with 2 columns and n
> >
> > From: Ethan Xu <et...@gmail.com>
> > To: dev@systemml.incubator.apache.org
> > Date: 03/31/2016 08:47 AM
> > Subject: Logical indexing?
> > ------------------------------
> >
> >
> >
> > Does SystemML support logical indexing?
> >
> > For example if X is a numerical matrix with 2 columns and n rows (in my
> > case n ~ 35 million). I'd like to split the matrix row-wise according to
> > values of the first column. This is useful when I need to find
> > distributions of subgroups of population.  In R I can do
> >
> > Y = X[ X[ ,1] > 10, ]
> >
> > OR
> >
> > ind = which(X[ ,1] > 10)
> > Y = X[ind, ]
> >
> > It seems neither syntex works in SystemML.
> >
> > I noticed there's an aggregate() function for SystemML, but it supports
> > coded categorical variable.
> >
> > Perhaps one way to do that is creating an indicator n by 1 matrix Z that
> > takes values 1 and 2 where 1 corresponds to X[, 1] <= 10 and 2
> corresponds
> > to X[,1] > 10. Then aggregate() X[,2] with respect to Z.
> >
> > It seems transform() with 'bin' option is one obvious way to create such
> a
> > Z, however the 'bin' method only supports 'equi-width' currently.
> >
> > Is looping through X[,1] the best option? Maybe I missed some other
> > convenient functions.
> >
> > Any suggestions are greatly appreciated!
> >
> > Best,
> >
> > Ethan
> >
> >
> >
> >
>
>
>

Re: Logical indexing?

Posted by Matthias Boehm <mb...@us.ibm.com>.
absolutely, if your want to count or aggregate values of the two groups,
you should definitely go with the aggregate() call instead. The snippets I
provided are just for the case where you want to run some other analysis
over the subsets (e.g., running an algorithm over a sample or fold).

Regards,
Matthias




From:	Ethan Xu <et...@gmail.com>
To:	dev@systemml.incubator.apache.org
Date:	03/31/2016 11:31 AM
Subject:	Re: Logical indexing?



Ah I missed the 'removeEmpty()' function. That's a smart ways to trim
matrix. Thanks Matthias!

Also from your answer I realized 'ind = (X[,1] > 10);' is acceptable, so
aggregation would work with

ind = (X[,1] > 10) + 1;
F = aggregate(target = X[,2], groups = ind, fn = "sum");

Ethan


On Thu, Mar 31, 2016 at 1:22 PM, Matthias Boehm <mb...@us.ibm.com> wrote:

> just a quick correction of option 2:
>
> Ind = (X[,1]>10);
> Y = removeEmpty(target=X, select=Ind);
>
> Regards,
> Matthias
>
> [image: Inactive hide details for Matthias Boehm---03/31/2016 10:14:50
> AM---that's a good question - no SystemML does not support set i]Matthias
> Boehm---03/31/2016 10:14:50 AM---that's a good question - no SystemML
does
> not support set indexing yet but you can emulate it via pe
>
> From: Matthias Boehm/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date: 03/31/2016 10:14 AM
> Subject: Re: Logical indexing?
> ------------------------------
>
>
>
> that's a good question - no SystemML does not support set indexing yet
but
> you can emulate it via permutation matrices or similar transformations.
> Here are some examples:
>
> # option 1: via permutation (aka selection) matrices
> P = removeEmpty(target=diag(X[,1]>10), margin="rows");
> Y = P %*% X;
>
> # option 2: via removeEmpty
> Ind = diag(X[,1]>10);
> Y = removeEmpty(target=X, select=Ind);
>
>
> Regards,
> Matthias
>
> Ethan Xu ---03/31/2016 08:47:43 AM---Does SystemML support logical
> indexing? For example if X is a numerical matrix with 2 columns and n
>
> From: Ethan Xu <et...@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 03/31/2016 08:47 AM
> Subject: Logical indexing?
> ------------------------------
>
>
>
> Does SystemML support logical indexing?
>
> For example if X is a numerical matrix with 2 columns and n rows (in my
> case n ~ 35 million). I'd like to split the matrix row-wise according to
> values of the first column. This is useful when I need to find
> distributions of subgroups of population.  In R I can do
>
> Y = X[ X[ ,1] > 10, ]
>
> OR
>
> ind = which(X[ ,1] > 10)
> Y = X[ind, ]
>
> It seems neither syntex works in SystemML.
>
> I noticed there's an aggregate() function for SystemML, but it supports
> coded categorical variable.
>
> Perhaps one way to do that is creating an indicator n by 1 matrix Z that
> takes values 1 and 2 where 1 corresponds to X[, 1] <= 10 and 2
corresponds
> to X[,1] > 10. Then aggregate() X[,2] with respect to Z.
>
> It seems transform() with 'bin' option is one obvious way to create such
a
> Z, however the 'bin' method only supports 'equi-width' currently.
>
> Is looping through X[,1] the best option? Maybe I missed some other
> convenient functions.
>
> Any suggestions are greatly appreciated!
>
> Best,
>
> Ethan
>
>
>
>


Re: Logical indexing?

Posted by Ethan Xu <et...@gmail.com>.
Ah I missed the 'removeEmpty()' function. That's a smart ways to trim
matrix. Thanks Matthias!

Also from your answer I realized 'ind = (X[,1] > 10);' is acceptable, so
aggregation would work with

ind = (X[,1] > 10) + 1;
F = aggregate(target = X[,2], groups = ind, fn = "sum");

Ethan


On Thu, Mar 31, 2016 at 1:22 PM, Matthias Boehm <mb...@us.ibm.com> wrote:

> just a quick correction of option 2:
>
> Ind = (X[,1]>10);
> Y = removeEmpty(target=X, select=Ind);
>
> Regards,
> Matthias
>
> [image: Inactive hide details for Matthias Boehm---03/31/2016 10:14:50
> AM---that's a good question - no SystemML does not support set i]Matthias
> Boehm---03/31/2016 10:14:50 AM---that's a good question - no SystemML does
> not support set indexing yet but you can emulate it via pe
>
> From: Matthias Boehm/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date: 03/31/2016 10:14 AM
> Subject: Re: Logical indexing?
> ------------------------------
>
>
>
> that's a good question - no SystemML does not support set indexing yet but
> you can emulate it via permutation matrices or similar transformations.
> Here are some examples:
>
> # option 1: via permutation (aka selection) matrices
> P = removeEmpty(target=diag(X[,1]>10), margin="rows");
> Y = P %*% X;
>
> # option 2: via removeEmpty
> Ind = diag(X[,1]>10);
> Y = removeEmpty(target=X, select=Ind);
>
>
> Regards,
> Matthias
>
> Ethan Xu ---03/31/2016 08:47:43 AM---Does SystemML support logical
> indexing? For example if X is a numerical matrix with 2 columns and n
>
> From: Ethan Xu <et...@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 03/31/2016 08:47 AM
> Subject: Logical indexing?
> ------------------------------
>
>
>
> Does SystemML support logical indexing?
>
> For example if X is a numerical matrix with 2 columns and n rows (in my
> case n ~ 35 million). I'd like to split the matrix row-wise according to
> values of the first column. This is useful when I need to find
> distributions of subgroups of population.  In R I can do
>
> Y = X[ X[ ,1] > 10, ]
>
> OR
>
> ind = which(X[ ,1] > 10)
> Y = X[ind, ]
>
> It seems neither syntex works in SystemML.
>
> I noticed there's an aggregate() function for SystemML, but it supports
> coded categorical variable.
>
> Perhaps one way to do that is creating an indicator n by 1 matrix Z that
> takes values 1 and 2 where 1 corresponds to X[, 1] <= 10 and 2 corresponds
> to X[,1] > 10. Then aggregate() X[,2] with respect to Z.
>
> It seems transform() with 'bin' option is one obvious way to create such a
> Z, however the 'bin' method only supports 'equi-width' currently.
>
> Is looping through X[,1] the best option? Maybe I missed some other
> convenient functions.
>
> Any suggestions are greatly appreciated!
>
> Best,
>
> Ethan
>
>
>
>

Re: Logical indexing?

Posted by Matthias Boehm <mb...@us.ibm.com>.
just a quick correction of option 2:

Ind = (X[,1]>10);
Y = removeEmpty(target=X, select=Ind);

Regards,
Matthias



From:	Matthias Boehm/Almaden/IBM@IBMUS
To:	dev@systemml.incubator.apache.org
Date:	03/31/2016 10:14 AM
Subject:	Re: Logical indexing?



that's a good question - no SystemML does not support set indexing yet but
you can emulate it via permutation matrices or similar transformations.
Here are some examples:

# option 1: via permutation (aka selection) matrices
P = removeEmpty(target=diag(X[,1]>10), margin="rows");
Y = P %*% X;

# option 2: via removeEmpty
Ind = diag(X[,1]>10);
Y = removeEmpty(target=X, select=Ind);


Regards,
Matthias

Ethan Xu ---03/31/2016 08:47:43 AM---Does SystemML support logical
indexing? For example if X is a numerical matrix with 2 columns and n

From: Ethan Xu <et...@gmail.com>
To: dev@systemml.incubator.apache.org
Date: 03/31/2016 08:47 AM
Subject: Logical indexing?



Does SystemML support logical indexing?

For example if X is a numerical matrix with 2 columns and n rows (in my
case n ~ 35 million). I'd like to split the matrix row-wise according to
values of the first column. This is useful when I need to find
distributions of subgroups of population.  In R I can do

Y = X[ X[ ,1] > 10, ]

OR

ind = which(X[ ,1] > 10)
Y = X[ind, ]

It seems neither syntex works in SystemML.

I noticed there's an aggregate() function for SystemML, but it supports
coded categorical variable.

Perhaps one way to do that is creating an indicator n by 1 matrix Z that
takes values 1 and 2 where 1 corresponds to X[, 1] <= 10 and 2 corresponds
to X[,1] > 10. Then aggregate() X[,2] with respect to Z.

It seems transform() with 'bin' option is one obvious way to create such a
Z, however the 'bin' method only supports 'equi-width' currently.

Is looping through X[,1] the best option? Maybe I missed some other
convenient functions.

Any suggestions are greatly appreciated!

Best,

Ethan



Re: Logical indexing?

Posted by Matthias Boehm <mb...@us.ibm.com>.
that's a good question - no SystemML does not support set indexing yet but
you can emulate it via permutation matrices or similar transformations.
Here are some examples:

# option 1: via permutation (aka selection) matrices
P = removeEmpty(target=diag(X[,1]>10), margin="rows");
Y = P %*% X;

# option 2: via removeEmpty
Ind = diag(X[,1]>10);
Y = removeEmpty(target=X, select=Ind);


Regards,
Matthias



From:	Ethan Xu <et...@gmail.com>
To:	dev@systemml.incubator.apache.org
Date:	03/31/2016 08:47 AM
Subject:	Logical indexing?



Does SystemML support logical indexing?

For example if X is a numerical matrix with 2 columns and n rows (in my
case n ~ 35 million). I'd like to split the matrix row-wise according to
values of the first column. This is useful when I need to find
distributions of subgroups of population.  In R I can do

Y = X[ X[ ,1] > 10, ]

OR

ind = which(X[ ,1] > 10)
Y = X[ind, ]

It seems neither syntex works in SystemML.

I noticed there's an aggregate() function for SystemML, but it supports
coded categorical variable.

Perhaps one way to do that is creating an indicator n by 1 matrix Z that
takes values 1 and 2 where 1 corresponds to X[, 1] <= 10 and 2 corresponds
to X[,1] > 10. Then aggregate() X[,2] with respect to Z.

It seems transform() with 'bin' option is one obvious way to create such a
Z, however the 'bin' method only supports 'equi-width' currently.

Is looping through X[,1] the best option? Maybe I missed some other
convenient functions.

Any suggestions are greatly appreciated!

Best,

Ethan