You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Mike Dusenberry (JIRA)" <ji...@apache.org> on 2016/09/12 17:11:20 UTC

[jira] [Comment Edited] (SYSTEMML-906) Use of `df.first` in MLContextUtil is a Bottleneck

    [ https://issues.apache.org/jira/browse/SYSTEMML-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15484656#comment-15484656 ] 

Mike Dusenberry edited comment on SYSTEMML-906 at 9/12/16 5:11 PM:
-------------------------------------------------------------------

Well as a good question, why would someone pass in a double as a string within a DataFrame?  I would say that we just require doubles and/or vectors if the user wishes for a matrix.  I think that's a reasonable expectation.


was (Author: mwdusenb@us.ibm.com):
Well as a good question, why would someone pass in a double as a string within a DataFrame?  I would say that we just require doubles or vectors if the user wishes for a matrix.  I think that's a reasonable expectation.

> Use of `df.first` in MLContextUtil is a Bottleneck
> --------------------------------------------------
>
>                 Key: SYSTEMML-906
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-906
>             Project: SystemML
>          Issue Type: Improvement
>            Reporter: Mike Dusenberry
>
> The use of {{dataframe.first()}} at {{MLContextUtil.java:497}} causes a severe bottleneck if the input DataFrame is a lazy result of a compute-intensive chain.  We should change this to use {{dataframe.schema}}, and then iterate over the types in the schema.
> cc [~deron]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)