You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Kim Jin Chul (Code Review)" <ge...@cloudera.org> on 2017/11/06 06:38:57 UTC

[Impala-ASF-CR] IMPALA-5754: Improve randomness of rand()/random()

Kim Jin Chul has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/8355 )

Change subject: IMPALA-5754: Improve randomness of rand()/random()
......................................................................

IMPALA-5754: Improve randomness of rand()/random()

Currently implementation of rand/random built-in functions
use rand_r of C library. We recognized its randomness was poor.
std::mt19937 in C++11 libarary shows better randomness than
rand_r because it has much longer period than that of rand in C.
(More details in http://www.pcg-random.org/)

Here is the comparison between before and after:
* Before
> select count(distinct(rand(1))), count(*) from t1
+---------------------------+-----------+
| count(distinct (rand(1))) | count(*)  |
+---------------------------+-----------+
| 17053                     | 103809024 |
+---------------------------+-----------+

* After
> select count(distinct(rand(1))), count(*) from t1
+---------------------------+-----------+
| count(distinct (rand(1))) | count(*)  |
+---------------------------+-----------+
| 34603008                  | 103809024 |
+---------------------------+-----------+

You may expect maximum randomness(e.g. 103809024).
Due to the issue IMPALA-6117, randomness could be
"maximum randomess / n". "n" means the number of Impala
execution engines. n is 3 in this example and each
execution engine loads and processes data in parallel.

This change introduces a new utility code for random because
we have a plan to replace the legacy in IMPALA-4954 with
the utility code.

Testing:
rand-util-test is newly addded. It checks randomness,
deterministic and range.

Change-Id: Idafdd5fe7502ff242c76a91a815c565146108684
---
M be/src/exprs/expr-test.cc
M be/src/exprs/math-functions-ir.cc
M be/src/util/CMakeLists.txt
A be/src/util/rand-util-test.cc
A be/src/util/rand-util.cc
A be/src/util/rand-util.h
6 files changed, 187 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/8355/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8355
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idafdd5fe7502ff242c76a91a815c565146108684
Gerrit-Change-Number: 8355
Gerrit-PatchSet: 3
Gerrit-Owner: Kim Jin Chul <ji...@gmail.com>