You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Mike Dusenberry (JIRA)" <ji...@apache.org> on 2016/09/23 05:02:20 UTC

[jira] [Created] (SYSTEMML-952) Efficient Counts During Conversions

Mike Dusenberry created SYSTEMML-952:
----------------------------------------

             Summary: Efficient Counts During Conversions
                 Key: SYSTEMML-952
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-952
             Project: SystemML
          Issue Type: Improvement
            Reporter: Mike Dusenberry


Currently, we spend a lot of time on {{count}} during the conversions from wide DataFrames. When calling {{count}} in Spark on these DataFrames directly, it is much quicker to just select one of the simple double columns (say the id column) and then {{count}}, in that it it does not read in the heavy vector column as well.

Therefore, we should perform the row count only on the index column, and the column count on the first row.

cc [~mboehm7]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)