You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@madlib.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/04/06 20:41:25 UTC

[jira] [Commented] (MADLIB-978) Implement skipping of arrays-with-NULL for elastic net training

    [ https://issues.apache.org/jira/browse/MADLIB-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228859#comment-15228859 ] 

ASF GitHub Bot commented on MADLIB-978:
---------------------------------------

GitHub user njayaram2 opened a pull request:

    https://github.com/apache/incubator-madlib/pull/35

            Elastic Net: Skip arrays with NULL values in train

            Jira: MADLIB-978
    
            Having NULL values in the input array of the training
            data was leading to an unhandled exception. This fix
            now catches the exception and skips such input arrays.
            The fix also modifies the code which was used to normalize
            the input data (independent variable), which now ignores
            such arrays with NULLs while normalizing.
            The number of rows in the input table was used while
            normalizing the data. The query used to get the number
            of rows is now changed to count only those rows that have
            no NULL values in the array.
            The mean of the dependent variable was still computed using
            an SQL command which was not ignoring the independent variables
            (arrays) with NULLs. They are ignored which computing the mean now.
    
    @mktal 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/njayaram2/incubator-madlib elasticnet_train

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-madlib/pull/35.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #35
    
----

----


> Implement skipping of arrays-with-NULL for elastic net training
> ---------------------------------------------------------------
>
>                 Key: MADLIB-978
>                 URL: https://issues.apache.org/jira/browse/MADLIB-978
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Module: Regularized Regression
>            Reporter: Frank McQuillan
>            Priority: Minor
>             Fix For: v1.9.1
>
>
> This JIRA is related to 
> https://issues.apache.org/jira/browse/MADLIB-919
> for predict function
> Implement skipping of arrays-with-NULL for elastic net predict.  Some context for this JIRA is below…
> (Q)
> Question came in this week from a MADlib user:
> Function "madlib.elastic_net_gaussian_predict(double precision[],double precision,double precision[])": Error converting an array w/ NULL value    s to dense format. (UDF_impl.hpp:210)
> Is there a typical pattern for handling nulls in such a scenario, perhaps converting to 0.0 or something like this?
> (A)
> Answer:
> The skipping of arrays-with-NULL has not been implemented for elastic net predict yet.
> You can workaround it by creating the below function: 
> http://stackoverflow.com/questions/7819021/replace-null-values-in-an-array-in-postgresql
> CREATE OR REPLACE FUNCTION f_array_replace_null (double precision[], double precision)
> RETURNS double precision[] AS
> $$
> SELECT ARRAY (
> SELECT COALESCE(x, $2)
> FROM unnest($1) x);
> $$ LANGUAGE SQL IMMUTABLE;
> They'll have to add the function before the feature array in the elastic_net statement: 
> f_array_replace_null(array["pf_calc_fdy_position", ...], 0)
> This would replace each NULL with a 0. The downside is it could get slower since the unnest and nest would happen with each call. If performance is a concern, and if they're running over this data multiple times, I would create a new table with the NULLs replaced and execute elastic_net_xxx in the regular way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)