You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "Nandish Jayaram (JIRA)" <ji...@apache.org> on 2018/06/08 19:29:00 UTC

[jira] [Created] (MADLIB-1245) Randomize data after standardization

Nandish Jayaram created MADLIB-1245:
---------------------------------------

             Summary: Randomize data after standardization
                 Key: MADLIB-1245
                 URL: https://issues.apache.org/jira/browse/MADLIB-1245
             Project: Apache MADlib
          Issue Type: Improvement
          Components: Module: Utilities
            Reporter: Nandish Jayaram


The functions `utils_ind_var_scales` and  `utils_ind_var_scales_grouping` in `convex.utils_regularization` are used to standardize the input data, which is then fed to the underlying gradient descent solver. Most often, randomizing the data works well with gradient descent.
The current functions create a temp table consisting of the standardized version of the input data, but the rows are not randomly distributed. Can we distribute it randomly? This might affect multiple modules, so all those affected modules must be tested well to ensure this change is acceptable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)