You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by iyerr3 <gi...@git.apache.org> on 2018/05/01 22:53:02 UTC

[GitHub] madlib pull request #268: DT: Don't use NULL value to get dep_var type

GitHub user iyerr3 opened a pull request:

    https://github.com/apache/madlib/pull/268

    DT: Don't use NULL value to get dep_var type

    JIRA: MADLIB-1233
    
    Function `_is_dep_categorical` is used to obtain the type of the
    dependent variable expression. This function gets a random value using
    `LIMIT 1` and checks the type of the corresponding value in Python.
    Further this does not filter out NULL values.
    Since NULL values are not filtered out,
    it's possible the `LIMIT 1` returns a "None" type in Python, leading to
    incorrect results.
    
    This commit updates the type extraction by checking the type in the
    database instead of in Python and also filters out NULL values.
    Additionally it checks if at least one non-NULL value is obtained, else
    throws an appropriate error.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/madlib/madlib bugfix/dt_dep_var_type_null

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/madlib/pull/268.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #268
    
----
commit 6570a27278aca301c0c8869899a8fad26c69bb7d
Author: Rahul Iyer <ri...@...>
Date:   2018-05-01T21:24:34Z

    DT: Don't use NULL value to get dep_var type
    
    JIRA: MADLIB-1233
    
    Function `_is_dep_categorical` is used to obtain the type of the
    dependent variable expression. This function gets a random value using
    `LIMIT 1` and checks the type of the corresponding value in Python.
    Further this does not filter out NULL values.
    Since NULL values are not filtered out,
    it's possible the `LIMIT 1` returns a "None" type in Python, leading to
    incorrect results.
    
    This commit updates the type extraction by checking the type in the
    database instead of in Python and also filters out NULL values.
    Additionally it checks if at least one non-NULL value is obtained, else
    throws an appropriate error.

----


---

[GitHub] madlib issue #268: DT: Don't use NULL value to get dep_var type

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/268
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/477/



---

[GitHub] madlib pull request #268: DT: Don't use NULL value to get dep_var type

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/madlib/pull/268


---

[GitHub] madlib issue #268: DT: Don't use NULL value to get dep_var type

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/268
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/463/



---

[GitHub] madlib issue #268: DT: Don't use NULL value to get dep_var type

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/268
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/475/



---

[GitHub] madlib issue #268: DT: Don't use NULL value to get dep_var type

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/268
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/466/



---

[GitHub] madlib issue #268: DT: Don't use NULL value to get dep_var type

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/268
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/468/



---

[GitHub] madlib issue #268: DT: Don't use NULL value to get dep_var type

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/268
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/467/



---

[GitHub] madlib pull request #268: DT: Don't use NULL value to get dep_var type

Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/268#discussion_r192268593
  
    --- Diff: src/ports/postgres/modules/utilities/validate_args.py_in ---
    @@ -368,8 +375,10 @@ def get_expr_type(expr, tbl):
             SELECT pg_typeof({0}) AS type
             FROM {1}
             LIMIT 1
    -        """.format(expr, tbl))[0]['type']
    -    return expr_type.lower()
    +        """.format(expr, tbl))
    +    if not expr_type:
    +        plpy.error("Table {0} does not contain any valid tuples".format(tbl))
    --- End diff --
    
    Sure, I'll add the text and merge the PR. 


---

[GitHub] madlib issue #268: DT: Don't use NULL value to get dep_var type

Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on the issue:

    https://github.com/apache/madlib/pull/268
  
    jenkins, please retest


---

[GitHub] madlib pull request #268: DT: Don't use NULL value to get dep_var type

Posted by kaknikhil <gi...@git.apache.org>.
Github user kaknikhil commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/268#discussion_r185970102
  
    --- Diff: src/ports/postgres/modules/utilities/validate_args.py_in ---
    @@ -368,8 +375,10 @@ def get_expr_type(expr, tbl):
             SELECT pg_typeof({0}) AS type
             FROM {1}
             LIMIT 1
    -        """.format(expr, tbl))[0]['type']
    -    return expr_type.lower()
    +        """.format(expr, tbl))
    +    if not expr_type:
    +        plpy.error("Table {0} does not contain any valid tuples".format(tbl))
    --- End diff --
    
    maybe the error message should also mention that we failed to get the expression type of column foo in table bar. what do you think ?


---

[GitHub] madlib issue #268: DT: Don't use NULL value to get dep_var type

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit commented on the issue:

    https://github.com/apache/madlib/pull/268
  
    
    Refer to this link for build results (access rights to CI server needed): 
    https://builds.apache.org/job/madlib-pr-build/465/



---