You are viewing a plain text version of this content. The canonical link for it is here.
Posted to codereview@trafodion.apache.org by zellerh <gi...@git.apache.org> on 2018/01/29 19:34:08 UTC

[GitHub] trafodion pull request #1420: [TRAFODION-2912] Better handling of non-determ...

GitHub user zellerh opened a pull request:

    https://github.com/apache/trafodion/pull/1420

    [TRAFODION-2912] Better handling of non-deterministic scalar UDFs

    Fix some issues found by Andy Yang and others while writing a
    non-deterministic scalar UDF (a random generator in this case).
    
    This UDF was transformed into a hash join, which executes the UDF
    only once and not once per row. Another problem is the probe cache,
    which can also lead to a single execution instead of once per row.
    
    The fix records the non-deterministic UDF attribute in the group
    attributes and it adds checks in the normalizer to suppress the
    conversion from a TSJ to a non-TSJ when non-deterministic UDFs are
    present. The probe cache logic already had this check, so all that was
    needed was to set the attribute.
    
    Note that there may be some more complex queries where we still won't
    execute the UDF once per row. In general, there is no absolute
    guarantee that a non-deterministic scalar UDF is executed once per row
    (of the cartesian product of all the tables joined??). However, in
    simple cases like the added test we should try to call the UDF for
    every row that satisfies the join predicates.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zellerh/trafodion bug/R23

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/trafodion/pull/1420.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1420
    
----
commit a725654560fabfafc2b81568720f43a24a1b3007
Author: Hans Zeller <hz...@...>
Date:   2018-01-29T19:20:50Z

    [TRAFODION-2912] Better handling of non-deterministic scalar UDFs
    
    Fix some issues found by Andy Yang and others while writing a
    non-deterministic scalar UDF (a random generator in this case).
    
    This UDF was transformed into a hash join, which executes the UDF
    only once and not once per row. Another problem is the probe cache,
    which can also lead to a single execution instead of once per row.
    
    The fix records the non-deterministic UDF attribute in the group
    attributes and it adds checks in the normalizer to suppress the
    conversion from a TSJ to a non-TSJ when non-deterministic UDFs are
    present. The probe cache logic already had this check, so all that was
    needed was to set the attribute.
    
    Note that there may be some more complex queries where we still won't
    execute the UDF once per row. In general, there is no absolute
    guarantee that a non-deterministic scalar UDF is executed once per row
    (of the cartesian product of all the tables joined??). However, in
    simple cases like the added test we should try to call the UDF for
    every row that satisfies the join predicates.

----


---

[GitHub] trafodion pull request #1420: [TRAFODION-2912] Better handling of non-determ...

Posted by sureshsubbiah <gi...@git.apache.org>.
Github user sureshsubbiah commented on a diff in the pull request:

    https://github.com/apache/trafodion/pull/1420#discussion_r164610299
  
    --- Diff: core/sql/regress/udr/EXPECTED103 ---
    @@ -396,6 +458,68 @@ CREATE LIBRARY TRAFODION.UDR103SCH.FUNCTIONSFORTEST103 FILE '/mnt2/ansharma/ansh
     +>  library functionsForTest103
     +>  deterministic no sql final call allow any parallelism state area size 1024 ;
     
    +--- SQL operation complete.
    +>>
    +>>create function nonDeterministicRandom
    ++>  () returns (r integer)
    ++>  language c parameter style sql external name 'nonDeterministicRandom'
    ++>  library functionsForTest103
    ++>  not deterministic no sql final call allow any parallelism state area size 1024 ;
    +
    +--- SQL operation complete.
    +>>
    +>>cqd nested_joins 'off';
    +
    +--- SQL operation complete.
    +>>cqd merge_joins 'off';
    +
    +--- SQL operation complete.
    +>>cqd join_order_by_user 'on';
    +
    +--- SQL operation complete.
    +>>prepare s from
    ++>select a, nonDeterministicRandom() r1,
    ++>       generateRandomNumber(a, 4) r2, generateRandomNumber(123, 4) r3
    ++>from (values (1), (2), (3)) T(a);
    +
    +--- SQL command prepared.
    +>>explain options 'f' s;
    --- End diff --
    
    Will we get the same join order every time? I wonder what in the compiler affects the join order for such a query. This is a general question, not specific to this change.


---

[GitHub] trafodion pull request #1420: [TRAFODION-2912] Better handling of non-determ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/trafodion/pull/1420


---

[GitHub] trafodion pull request #1420: [TRAFODION-2912] Better handling of non-determ...

Posted by zellerh <gi...@git.apache.org>.
Github user zellerh commented on a diff in the pull request:

    https://github.com/apache/trafodion/pull/1420#discussion_r164610400
  
    --- Diff: core/sql/regress/udr/TEST103_functions.cpp ---
    @@ -117,11 +117,41 @@ SQLUDR_LIBFUNC SQLUDR_INT32 genRandomNumber(SQLUDR_INT32 *in1,
         }
       }
       
    -  strcpy(out, result.c_str());
    +  memcpy(out, result.c_str(), result.length());
       return SQLUDR_SUCCESS;
     }
     
     
    +SQLUDR_LIBFUNC SQLUDR_INT32 nonDeterministicRandom(SQLUDR_INT32 *out1,
    +                                                   SQLUDR_INT16 *outInd1,
    +                                                   SQLUDR_TRAIL_ARGS)
    +{
    +  if (calltype == SQLUDR_CALLTYPE_FINAL)
    +    return SQLUDR_SUCCESS;
    +
    +  // pointer to the buffer in the state area that is
    +  // available for the lifetime of this statement,
    +  // this can be used by the UDF to maintain state
    +  int *my_state = (int *) statearea->stmt_data.data;
    +
    +  if (calltype == SQLUDR_CALLTYPE_INITIAL && *my_state == 0)
    +    {
    +      *my_state = 555;
    +    }
    +  else
    +    // Use a simple linear congruential generator, we
    +    // want deterministic regression results, despite
    +    // the name of this function. Note that a
    --- End diff --
    
    Thanks, I think I probably fixed the issues I wanted to comment on with some CQDs. Will delete the half sentence.


---

[GitHub] trafodion pull request #1420: [TRAFODION-2912] Better handling of non-determ...

Posted by sureshsubbiah <gi...@git.apache.org>.
Github user sureshsubbiah commented on a diff in the pull request:

    https://github.com/apache/trafodion/pull/1420#discussion_r164609598
  
    --- Diff: core/sql/regress/udr/TEST103_functions.cpp ---
    @@ -117,11 +117,41 @@ SQLUDR_LIBFUNC SQLUDR_INT32 genRandomNumber(SQLUDR_INT32 *in1,
         }
       }
       
    -  strcpy(out, result.c_str());
    +  memcpy(out, result.c_str(), result.length());
       return SQLUDR_SUCCESS;
     }
     
     
    +SQLUDR_LIBFUNC SQLUDR_INT32 nonDeterministicRandom(SQLUDR_INT32 *out1,
    +                                                   SQLUDR_INT16 *outInd1,
    +                                                   SQLUDR_TRAIL_ARGS)
    +{
    +  if (calltype == SQLUDR_CALLTYPE_FINAL)
    +    return SQLUDR_SUCCESS;
    +
    +  // pointer to the buffer in the state area that is
    +  // available for the lifetime of this statement,
    +  // this can be used by the UDF to maintain state
    +  int *my_state = (int *) statearea->stmt_data.data;
    +
    +  if (calltype == SQLUDR_CALLTYPE_INITIAL && *my_state == 0)
    +    {
    +      *my_state = 555;
    +    }
    +  else
    +    // Use a simple linear congruential generator, we
    +    // want deterministic regression results, despite
    +    // the name of this function. Note that a
    --- End diff --
    
    Typo. Extra "a". Smallest nit possible :)


---

[GitHub] trafodion pull request #1420: [TRAFODION-2912] Better handling of non-determ...

Posted by zellerh <gi...@git.apache.org>.
Github user zellerh commented on a diff in the pull request:

    https://github.com/apache/trafodion/pull/1420#discussion_r164610617
  
    --- Diff: core/sql/regress/udr/EXPECTED103 ---
    @@ -396,6 +458,68 @@ CREATE LIBRARY TRAFODION.UDR103SCH.FUNCTIONSFORTEST103 FILE '/mnt2/ansharma/ansh
     +>  library functionsForTest103
     +>  deterministic no sql final call allow any parallelism state area size 1024 ;
     
    +--- SQL operation complete.
    +>>
    +>>create function nonDeterministicRandom
    ++>  () returns (r integer)
    ++>  language c parameter style sql external name 'nonDeterministicRandom'
    ++>  library functionsForTest103
    ++>  not deterministic no sql final call allow any parallelism state area size 1024 ;
    +
    +--- SQL operation complete.
    +>>
    +>>cqd nested_joins 'off';
    +
    +--- SQL operation complete.
    +>>cqd merge_joins 'off';
    +
    +--- SQL operation complete.
    +>>cqd join_order_by_user 'on';
    +
    +--- SQL operation complete.
    +>>prepare s from
    ++>select a, nonDeterministicRandom() r1,
    ++>       generateRandomNumber(a, 4) r2, generateRandomNumber(123, 4) r3
    ++>from (values (1), (2), (3)) T(a);
    +
    +--- SQL command prepared.
    +>>explain options 'f' s;
    --- End diff --
    
    Yes, I was concerned about that as well and I saw different join orders. For that reason I used a bunch of CQDs - HJ, MJ off and JOBU, to prevent random plan changes.


---