You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/29 19:35:00 UTC

[jira] [Commented] (TRAFODION-2912) Non-deterministic scalar UDFs not executed once per row

    [ https://issues.apache.org/jira/browse/TRAFODION-2912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343873#comment-16343873 ] 

ASF GitHub Bot commented on TRAFODION-2912:
-------------------------------------------

GitHub user zellerh opened a pull request:

    https://github.com/apache/trafodion/pull/1420

    [TRAFODION-2912] Better handling of non-deterministic scalar UDFs

    Fix some issues found by Andy Yang and others while writing a
    non-deterministic scalar UDF (a random generator in this case).
    
    This UDF was transformed into a hash join, which executes the UDF
    only once and not once per row. Another problem is the probe cache,
    which can also lead to a single execution instead of once per row.
    
    The fix records the non-deterministic UDF attribute in the group
    attributes and it adds checks in the normalizer to suppress the
    conversion from a TSJ to a non-TSJ when non-deterministic UDFs are
    present. The probe cache logic already had this check, so all that was
    needed was to set the attribute.
    
    Note that there may be some more complex queries where we still won't
    execute the UDF once per row. In general, there is no absolute
    guarantee that a non-deterministic scalar UDF is executed once per row
    (of the cartesian product of all the tables joined??). However, in
    simple cases like the added test we should try to call the UDF for
    every row that satisfies the join predicates.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zellerh/trafodion bug/R23

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/trafodion/pull/1420.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1420
    
----
commit a725654560fabfafc2b81568720f43a24a1b3007
Author: Hans Zeller <hz...@...>
Date:   2018-01-29T19:20:50Z

    [TRAFODION-2912] Better handling of non-deterministic scalar UDFs
    
    Fix some issues found by Andy Yang and others while writing a
    non-deterministic scalar UDF (a random generator in this case).
    
    This UDF was transformed into a hash join, which executes the UDF
    only once and not once per row. Another problem is the probe cache,
    which can also lead to a single execution instead of once per row.
    
    The fix records the non-deterministic UDF attribute in the group
    attributes and it adds checks in the normalizer to suppress the
    conversion from a TSJ to a non-TSJ when non-deterministic UDFs are
    present. The probe cache logic already had this check, so all that was
    needed was to set the attribute.
    
    Note that there may be some more complex queries where we still won't
    execute the UDF once per row. In general, there is no absolute
    guarantee that a non-deterministic scalar UDF is executed once per row
    (of the cartesian product of all the tables joined??). However, in
    simple cases like the added test we should try to call the UDF for
    every row that satisfies the join predicates.

----


> Non-deterministic scalar UDFs not executed once per row
> -------------------------------------------------------
>
>                 Key: TRAFODION-2912
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2912
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-cmp
>    Affects Versions: 2.0-incubating
>            Reporter: Hans Zeller
>            Assignee: Hans Zeller
>            Priority: Major
>             Fix For: 2.3
>
>
> This problem was found by Andy Yang.
> Andy created a random generator scalar UDF and found that it did not return a different random value for each row:
> {noformat}
> >>select scalar_rand_udf(), scalar_rand_udf()
> +>from (values (1), (2), (3)) T(s);
> RND RND 
> ----------- -----------
> 846930886 1804289383
> 846930886 1804289383
> 846930886 1804289383
> --- 3 row(s) selected.
> >>
> {noformat}
> Here is the explain, it shows that we are using hash joins, not nested joins, to evaluate the UDFs:
> {noformat}
> >>explain options 'f' s;
> LC   RC   OP   OPERATOR              OPT       DESCRIPTION           CARD
> ---- ---- ---- --------------------  --------  --------------------  ---------
> 5    .    6    root                                                  3.00E+000
> 4    1    5    hybrid_hash_join                                      3.00E+000
> 3    2    4    hybrid_hash_join                                      1.00E+000
> .    .    3    isolated_scalar_udf             SCALAR_RAND_UDF       1.00E+000
> .    .    2    isolated_scalar_udf             SCALAR_RAND_UDF       1.00E+000
> .    .    1    tuplelist                                             3.00E+000
> --- SQL operation complete.
> >>
> {noformat}
> The problem is that we don't check for non-deterministic UDFs when we transform a TSJ to a regular join in the transformer or normalizer. We don't even set the non-deterministic flag in the group attributes of the IsolatedScalarUDF node.
> The fix is to set this flag correctly and to add a check and not transform routine joins for non-deterministic isolated scalar UDFs into a regular join.
> To recreate:
> Here is the source code of the UDF:
> {noformat}
> #include "sqludr.h"
> #include <stdlib.h>
> SQLUDR_LIBFUNC SQLUDR_INT32 scalar_rand_udf(SQLUDR_INT32 *out1,
>                                             SQLUDR_INT16 *outInd1,
>                                             SQLUDR_TRAIL_ARGS)
> {
>   if (calltype == SQLUDR_CALLTYPE_FINAL)
>     return SQLUDR_SUCCESS;
>   (*out1) = rand();
>   return SQLUDR_SUCCESS;
> }
> {noformat}
> Compile the UDF:
> {noformat}
> gcc -g -Wall -I$TRAF_HOME/export/include/sql -shared -fPIC -o scalar_rand_udf.so scalar_rand_udf.c
> {noformat}
> Create the UDF and run it:
> {noformat}
> drop function scalar_rand_udf;
> drop library scalar_rand_udf_lib;
> create library scalar_rand_udf_lib
>  file '/home/zellerh/src/scalar_rand_udf/scalar_rand_udf.so';
> create function scalar_rand_udf() returns (rnd int)
>   external name 'scalar_rand_udf' library scalar_rand_udf_lib
>   not deterministic no sql no transaction required;
> prepare s from
> select scalar_rand_udf(), scalar_rand_udf()
> from (values (1), (2), (3)) T(s);
> explain options 'f' s;
> execute s;
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)