You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@systemml.apache.org by "Matthias Boehm (JIRA)" <ji...@apache.org> on 2016/05/10 02:11:12 UTC

[jira] [Created] (SYSTEMML-677) Random data generator for decision tree fails w/ data type mismatch

Matthias Boehm created SYSTEMML-677:
---------------------------------------

             Summary: Random data generator for decision tree fails w/ data type mismatch 
                 Key: SYSTEMML-677
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-677
             Project: SystemML
          Issue Type: Bug
            Reporter: Matthias Boehm
             Fix For: SystemML 0.9


The data generator for decision tree is composed of a shell script that calls two dml scripts in order to apply the file-based transform (which requires an existing file during compilation) in the second script. However, there is a data type mismatch as the first script outputs a matrix and the second script expects a frame.

This task covers (1) a script level change to output a frame from the first script, and (2) a fix for writing the frame meta data file with a value type accepted by the subsequent transform. 

Note that the script level change already exploits matrix-frame casting which has been introduced as part of SYSTEMML-554 but this builtin function is as of today only in CP. This means, the data generator only works for small data that fits into the driver memory. Once the Spark/MR converters from SYSTEMML- are fully integrated, the script will runs for large data too without further script changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)