You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zach Amsden (Code Review)" <ge...@cloudera.org> on 2017/02/04 01:28:49 UTC

[Impala-ASF-CR] IMPALA-2020: Make it easy to work with big numbers

Zach Amsden has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5902

Change subject: IMPALA-2020: Make it easy to work with big numbers
......................................................................

IMPALA-2020: Make it easy to work with big numbers

We're going to be working with some really big numbers that
can't always be expressed in legal C++ but can be generated
at compile time.

Testing: Ran this in godbolt:

int lotsanines = LargeNum<int>(9, 9);
__int128 bignines = LargeNum<__int128>(9, 38);`

lotsanines:
        .long   999999999
bignines:
        .quad   687399551400673279
        .quad   5421010862427522170

Ran this query in Python:

>>> print 687399551400673279 + (5421010862427522170 * 2**64)
99999999999999999999999999999999999999
>>>

Looks legit.

Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
---
M be/src/common/init.cc
M be/src/util/decimal-util.cc
M be/src/util/decimal-util.h
3 files changed, 8 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/5902/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Inline big number strings
......................................................................


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5902/5//COMMIT_MSG
Commit Message:

PS5, Line 38: 0.83s
> I mean run the test a few times and provide something like https://en.wikip
Do we have any nice scripts to do that?  I'd rather not go to that trouble when all that is left is the obvious win of getting the constants declared in the header and not requiring DecimalUtil::InitMaxUnscaledDecimal()


http://gerrit.cloudera.org:8080/#/c/5902/5/be/src/util/decimal-util.h
File be/src/util/decimal-util.h:

Line 57:   template<typename T>
> Neither looks done in PS9: private or rename
Moot point now - killed the code


PS5, Line 58: T
> I'm not so sure: it would lose precision there in the low bits.
Sadly, other uses GetScaleMultiplier below for exactly this - to get a power of 10 in double.


PS5, Line 59: boost::
> It helps me keep track of comments on patches when they are responded to on
Duly noted


Line 60:     return scale == 0 ? 1 : 10 * GetConstScaleMultiplier<T>(scale-1);
> Why not just another '?:' operator?
moot


Line 136:   if (__builtin_const_p(scale) && p < 10) return GetConstScaleMultiplier<int32_t>(scale);
> Please post the numbers here when you have decided.
Wash.  The only win was from getting the MAX_UNSCALED_DECIMAL constants inline.


-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Inline big number strings
......................................................................


Patch Set 9: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5902/9/be/src/util/decimal-util.h
File be/src/util/decimal-util.h:

PS9, Line 59: -
style nit: spaces: "scale - 1"


-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Make it easy to work with big numbers

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#2).

Change subject: IMPALA-2020: Make it easy to work with big numbers
......................................................................

IMPALA-2020: Make it easy to work with big numbers

We're going to be working with some really big numbers that
can't always be expressed in legal C++ but can be generated
at compile time.

Testing: Ran this in godbolt:

int lotsanines = LargeNum<int>(9, 9);
__int128 bignines = LargeNum<__int128>(9, 38);`

lotsanines:
        .long   999999999
bignines:
        .quad   687399551400673279
        .quad   5421010862427522170

Ran this query in Python:

>>> print 687399551400673279 + (5421010862427522170 * 2**64)
99999999999999999999999999999999999999
>>>

Looks legit.

Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
---
M be/src/benchmarks/overflow-benchmark.cc
M be/src/common/init.cc
M be/src/util/decimal-util.cc
M be/src/util/decimal-util.h
4 files changed, 8 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/5902/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Inline big number strings
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5902/12/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 42:   // This means the multiplication can cause an unwanted decimal overflow.
> What does this mean in terms of regressions when decimal_v2=false? Will we 
There's no regression in precision, this has not changed and both old and new versions are equally imprecise.  The old code did the scale check first, and the multiply second, which has the potential for overflow.  Now we will at least detect the overflow.  Fixing the precision is harder.  We could use long doubles or extract this out to an integral type and do the multiplication ourselves if we want this to be more precise, but it's not a regression.


-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: IMPALA-2020: Inline big number strings
......................................................................


Patch Set 5:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/5902/5//COMMIT_MSG
Commit Message:

PS5, Line 27: Query: select sum(cast(cast(l_extendedprice as double) as
            : decimal(14,2))) from tpch10_parquet.lineitem
            : Query submitted at: 2017-02-18 01:23:54 (Coordinator:
            : http://impala-dev:25000)
            : Query progress can be monitored at:
            : http://impala-dev:25000/query_plan?query_id=f0410a35e981a86b:e6d9207000000000
elide


PS5, Line 38: 0.83s
error bars?


http://gerrit.cloudera.org:8080/#/c/5902/5/be/src/exprs/expr-value.h
File be/src/exprs/expr-value.h:

Line 177:             decimal4_val = +DecimalUtil::MAX_UNSCALED_DECIMAL4;
> Unary plus necessary to convert to an Rvalue
Why do you want to convert it to an Rvalue? Just to avoid having to define it in a .cc file? If so, does defining (not declaring or initializing) in a .cc file slow down performance?

Also, add that comment in the file for future readers.


http://gerrit.cloudera.org:8080/#/c/5902/5/be/src/util/decimal-util.h
File be/src/util/decimal-util.h:

Line 57:   template<typename T>
Describe what it does.

Maybe call it Pow10 and call the parameter exponent?

private:


PS5, Line 58: T
Always is_integral?


PS5, Line 59: boost::
static_assert is now part of the language, but I'm not sure if it works on parameters, since parameters can't be constexpr.


Line 60:     return scale == 0 ? 1 : 10 * GetConstScaleMultiplier<T>(scale-1);
Check for no overflow, if it can be done simply.


Line 136:   if (__builtin_const_p(scale) && p < 10) return GetConstScaleMultiplier<int32_t>(scale);
Does this by itself speed anything up? If scale is constant at compile time, since this is inlined, won't it just inline to the right value?

https://godbolt.org/g/Tlfl8l


-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#8).

Change subject: IMPALA-2020: Inline big number strings
......................................................................

IMPALA-2020: Inline big number strings

We can directly spcecify these in the header instead of keeping the
constants out in a .o.  This lets the compiler inline them in a lot
more places, potentially giving better behavior than a table lookup
by directly inserting the constant.  This in turn allows divides by
now known compile time constants to be converted into multiply.

This is likely to become even more imporant with big integer divide,
which requires additional computation for DECIMAL_V2.

Testing: Compare the binary output of decimal-util.cc before and
after; ran expr test suite.

Using the following query I observe a small but statistically present
win of ~3% over an unpatched build.

select sum(l_extendedprice / l_discount) from tpch10_parquet.lineitem;

Before:

+-----------------------------------+
| sum(l_extendedprice / l_discount) |
+-----------------------------------+
| 61076151920731.010714279183910    |
+-----------------------------------+
Fetched 1 row(s) in 9.05s

After:

+-----------------------------------+
| sum(l_extendedprice / l_discount) |
+-----------------------------------+
| 61076151920731.010714279183910    |
+-----------------------------------+
Fetched 1 row(s) in 8.79s

Will do some additional benchmarking after the V2 stuff and rounding
changes land.

Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
---
M be/src/benchmarks/overflow-benchmark.cc
M be/src/common/init.cc
M be/src/util/decimal-util.cc
M be/src/util/decimal-util.h
4 files changed, 18 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/5902/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Make it easy to work with big numbers

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: IMPALA-2020: Make it easy to work with big numbers
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5902/4/be/src/util/decimal-util.cc
File be/src/util/decimal-util.cc:

Line 25: static constexpr T AllNines(unsigned precision) {
If it's going to be this specific, why not

    const int128_t DecimalUtil::MAX_UNSCALED_DECIMAL16 = 99 + 100 *
        (MAX_UNSCALED_DECIMAL8 + (1 + MAX_UNSCALED_DECIMAL8) *
         static_cast<int128_t>(MAX_UNSCALED_DECIMAL8));


-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Inline big number strings
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5902/8/be/src/util/decimal-util.h
File be/src/util/decimal-util.h:

Line 233:     return MAX_UNSCALED_DECIMAL16 / GetConstScaleMultiplier<int128_t>(scale);
multi-line if-stmts need braces per our style


-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#5).

Change subject: IMPALA-2020: Inline big number strings
......................................................................

IMPALA-2020: Inline big number strings

We can directly spcecify these in the header instead of keeping the
constants out in a .o.  This lets the compiler inline them in a lot
more places, potentially giving better behavior than a table lookup
by directly inserting the constant.  This in turn allows divides by
now known compile time constants to be converted into multiply.
Avoiding the table lookup seems to be a win for other types as well,
giving anywhere from 18% to ~33% speedup over baseline.

This is likely to become even more imporant with big integer divide,
which requires additional computation for DECIMAL_V2.

Testing: Compare the binary output of decimal-util.cc before and
after; ran expr test suite.

Before:

[localhost:21000] > select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:23:54 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=f0410a35e981a86b:e6d9207000000000
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.83s

After:

[localhost:21000] > select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-18 02:23:33 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=97438c98a3d950a4:1a45eeb900000000
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s

Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
---
M be/src/benchmarks/overflow-benchmark.cc
M be/src/common/init.cc
M be/src/exprs/expr-value.h
M be/src/runtime/decimal-test.cc
M be/src/util/CMakeLists.txt
D be/src/util/decimal-util.cc
M be/src/util/decimal-util.h
7 files changed, 20 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/5902/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#9).

Change subject: IMPALA-2020: Inline big number strings
......................................................................

IMPALA-2020: Inline big number strings

We can directly spcecify these in the header instead of keeping the
constants out in a .o.  This lets the compiler inline them in a lot
more places, potentially giving better behavior than a table lookup
by directly inserting the constant.  This in turn allows divides by
now known compile time constants to be converted into multiply.

This is likely to become even more imporant with big integer divide,
which requires additional computation for DECIMAL_V2.

Testing: Compare the binary output of decimal-util.cc before and
after; ran expr test suite.

Using the following query I observe a small but statistically present
win of ~3% over an unpatched build.

select sum(l_extendedprice / l_discount) from tpch10_parquet.lineitem;

Before:

+-----------------------------------+
| sum(l_extendedprice / l_discount) |
+-----------------------------------+
| 61076151920731.010714279183910    |
+-----------------------------------+
Fetched 1 row(s) in 9.05s

After:

+-----------------------------------+
| sum(l_extendedprice / l_discount) |
+-----------------------------------+
| 61076151920731.010714279183910    |
+-----------------------------------+
Fetched 1 row(s) in 8.79s

Will do some additional benchmarking after the V2 stuff and rounding
changes land.

Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
---
M be/src/benchmarks/overflow-benchmark.cc
M be/src/common/init.cc
M be/src/util/decimal-util.cc
M be/src/util/decimal-util.h
4 files changed, 25 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/5902/9
-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Inline big number strings
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5902/5/be/src/exprs/expr-value.h
File be/src/exprs/expr-value.h:

Line 177:             decimal4_val = +DecimalUtil::MAX_UNSCALED_DECIMAL4;
Unary plus necessary to convert to an Rvalue


http://gerrit.cloudera.org:8080/#/c/5902/4/be/src/util/decimal-util.cc
File be/src/util/decimal-util.cc:

Line 25
> If it's going to be this specific, why not
Let's go full tilt and actually fully inline this :)


-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#7).

Change subject: IMPALA-2020: Inline big number strings
......................................................................

IMPALA-2020: Inline big number strings

We can directly spcecify these in the header instead of keeping the
constants out in a .o.  This lets the compiler inline them in a lot
more places, potentially giving better behavior than a table lookup
by directly inserting the constant.  This in turn allows divides by
now known compile time constants to be converted into multiply.
Avoiding the table lookup seems to be a win for other types as well,
giving anywhere from 18% to ~33% speedup over baseline.

This is likely to become even more imporant with big integer divide,
which requires additional computation for DECIMAL_V2.

Testing: Compare the binary output of decimal-util.cc before and
after; ran expr test suite.

Before:

[localhost:21000] > select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-18 01:23:54 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=f0410a35e981a86b:e6d9207000000000
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.83s

After:

[localhost:21000] > select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem;
Query: select sum(cast(cast(l_extendedprice as double) as
decimal(14,2))) from tpch10_parquet.lineitem
Query submitted at: 2017-02-18 02:23:33 (Coordinator:
http://impala-dev:25000)
Query progress can be monitored at:
http://impala-dev:25000/query_plan?query_id=97438c98a3d950a4:1a45eeb900000000
+-------------------------------------------------------------+
| sum(cast(cast(l_extendedprice as double) as decimal(14,2))) |
+-------------------------------------------------------------+
| 2293813121802.09                                            |
+-------------------------------------------------------------+
Fetched 1 row(s) in 0.63s

Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
---
M be/src/benchmarks/overflow-benchmark.cc
M be/src/common/init.cc
M be/src/exprs/expr-value.h
M be/src/runtime/decimal-test.cc
M be/src/util/CMakeLists.txt
D be/src/util/decimal-util.cc
M be/src/util/decimal-util.h
7 files changed, 19 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/5902/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-2020: Inline big number strings
......................................................................


Patch Set 13:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/338/

-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Inline big number strings
......................................................................


Patch Set 8:

(1 comment)

Also fixing the too long lines

http://gerrit.cloudera.org:8080/#/c/5902/8/be/src/util/decimal-util.h
File be/src/util/decimal-util.h:

Line 233:     return MAX_UNSCALED_DECIMAL16 / GetConstScaleMultiplier<int128_t>(scale);
> multi-line if-stmts need braces per our style
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: IMPALA-2020: Inline big number strings
......................................................................


Patch Set 9:

Zach, could you please hold on this until responding to my issues from Feb 17?

-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5902

to look at the new patch set (#12).

Change subject: IMPALA-2020: Inline big number strings
......................................................................

IMPALA-2020: Inline big number strings

We can directly spcecify these in the header instead of keeping the
constants out in a .o.  This lets the compiler inline them in a lot
more places, potentially giving better behavior than a table lookup
by directly inserting the constant.  This in turn allows divides by
now known compile time constants to be converted into multiply.

This is likely to become even more imporant with big integer divide,
which requires additional computation for DECIMAL_V2.

Testing: Compare the binary output of decimal-util.cc before and
after; ran expr test suite.

Using the following query I observe a small but statistically present
win of ~3% over an unpatched build.

select sum(l_extendedprice / l_discount) from tpch10_parquet.lineitem;

Before:

+-----------------------------------+
| sum(l_extendedprice / l_discount) |
+-----------------------------------+
| 61076151920731.010714279183910    |
+-----------------------------------+
Fetched 1 row(s) in 9.05s

After:

+-----------------------------------+
| sum(l_extendedprice / l_discount) |
+-----------------------------------+
| 61076151920731.010714279183910    |
+-----------------------------------+
Fetched 1 row(s) in 8.79s

Will do some additional benchmarking after the V2 stuff and rounding
changes land.

Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
---
M be/src/benchmarks/overflow-benchmark.cc
M be/src/common/init.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/decimal-util.cc
M be/src/util/decimal-util.h
5 files changed, 10 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/5902/12
-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-2020: Inline big number strings
......................................................................


IMPALA-2020: Inline big number strings

We can directly spcecify these in the header instead of keeping the
constants out in a .o.  This lets the compiler inline them in a lot
more places, potentially giving better behavior than a table lookup
by directly inserting the constant.  This in turn allows divides by
now known compile time constants to be converted into multiply.

This is likely to become even more imporant with big integer divide,
which requires additional computation for DECIMAL_V2.

Testing: Compare the binary output of decimal-util.cc before and
after; ran expr test suite.

Using the following query I observe a small but statistically present
win of ~3% over an unpatched build.

select sum(l_extendedprice / l_discount) from tpch10_parquet.lineitem;

Before:

+-----------------------------------+
| sum(l_extendedprice / l_discount) |
+-----------------------------------+
| 61076151920731.010714279183910    |
+-----------------------------------+
Fetched 1 row(s) in 9.05s

After:

+-----------------------------------+
| sum(l_extendedprice / l_discount) |
+-----------------------------------+
| 61076151920731.010714279183910    |
+-----------------------------------+
Fetched 1 row(s) in 8.79s

Will do some additional benchmarking after the V2 stuff and rounding
changes land.

Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Reviewed-on: http://gerrit.cloudera.org:8080/5902
Reviewed-by: Dan Hecht <dh...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/benchmarks/overflow-benchmark.cc
M be/src/common/init.cc
M be/src/runtime/decimal-value.inline.h
M be/src/util/decimal-util.cc
M be/src/util/decimal-util.h
5 files changed, 10 insertions(+), 26 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Dan Hecht: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Inline big number strings
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5902/12/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 42:   // This means the multiplication can cause an unwanted decimal overflow.
<slight facepalm>
This comment was incorrect.  The utility I was using to convert large numbers to binary apparently went through a double conversion itself somewhere, and truncated all binary numbers to the leading 53 bit representation, in turn leading me to believe we had precision available that was not utilized.

</slight facepalm>


-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Make it easy to work with big numbers

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: IMPALA-2020: Make it easy to work with big numbers
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5902/2//COMMIT_MSG
Commit Message:

PS2, Line 7: Make it easy to work with big numbers
More specifically: allow writing 128-bit literals


PS2, Line 10: legal C++
This is a gcc/clang issue, not a C++ one


Line 16: __int128 bignines = LargeNum<__int128>(9, 38);`
What happens if you pass parameters that make it overflow?


http://gerrit.cloudera.org:8080/#/c/5902/2/be/src/util/decimal-util.h
File be/src/util/decimal-util.h:

Line 44:   template<typename T>
If this is only supposed to be called at compile-time, all the parameters can be template parameters.


Line 45:   static constexpr T LargeNum(int digit, int precision, int base = 10) {
This is the kind of API that I would foul up by swapping the params. Is there anything that be done to make this more fool-proof? Some ideas:

1. digit should be less than base.
2. digit should never be 0 or negative.
3. The top-level recursive call should always produce a number that doesn't fit in 64 bits and does fit in 128 bits
4. Can it be called on a const char *, perhaps, reducing digit and precision to one parameter?


PS2, Line 46: (
> nit: extraneous parens
Handle non-positive precision.


-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Make it easy to work with big numbers

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Make it easy to work with big numbers
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5902/2/be/src/util/decimal-util.h
File be/src/util/decimal-util.h:

Line 44:   template<typename T>
> If this is only supposed to be called at compile-time, all the parameters c
Didn't think of that.


Line 45:   static constexpr T LargeNum(int digit, int precision, int base = 10) {
> This is the kind of API that I would foul up by swapping the params. Is the
digit < base would be a good check - can be a static assertion


-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Make it easy to work with big numbers

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#3).

Change subject: IMPALA-2020: Make it easy to work with big numbers
......................................................................

IMPALA-2020: Make it easy to work with big numbers

We're going to be working with some really big numbers that
can't always be expressed in legal C++ but can be generated
at compile time.

Testing: Ran this in godbolt:

int lotsanines = LargeNum<int>(9, 9);
__int128 bignines = LargeNum<__int128>(9, 38);`

lotsanines:
        .long   999999999
bignines:
        .quad   687399551400673279
        .quad   5421010862427522170

Ran this query in Python:

>>> print 687399551400673279 + (5421010862427522170 * 2**64)
99999999999999999999999999999999999999
>>>

Looks legit.

Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
---
M be/src/benchmarks/overflow-benchmark.cc
M be/src/common/init.cc
M be/src/util/decimal-util.cc
M be/src/util/decimal-util.h
4 files changed, 8 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/5902/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Inline big number strings
......................................................................


Patch Set 5:

Zach, can you do a rebase on top of master and then I can kick a build off?

-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: IMPALA-2020: Inline big number strings
......................................................................


Patch Set 12: Code-Review+1

Thanks for simplifying so much.

-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Inline big number strings
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5902/12/be/src/runtime/decimal-value.inline.h
File be/src/runtime/decimal-value.inline.h:

Line 42:   // This means the multiplication can cause an unwanted decimal overflow.
> <slight facepalm>
What does this mean in terms of regressions when decimal_v2=false? Will we incorrectly overflow too soon now?  Or was the old code not overflowing soon enough?  Or something different?


-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Inline big number strings
......................................................................


Patch Set 13: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-2020: Make it easy to work with big numbers

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2020: Make it easy to work with big numbers
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5902/2/be/src/util/decimal-util.h
File be/src/util/decimal-util.h:

PS2, Line 46: (
nit: extraneous parens


PS2, Line 47: -
nit: spaces around -


-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Inline big number strings
......................................................................


Patch Set 5:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/5902/5//COMMIT_MSG
Commit Message:

PS5, Line 27: Query: select sum(cast(cast(l_extendedprice as double) as
            : decimal(14,2))) from tpch10_parquet.lineitem
            : Query submitted at: 2017-02-18 01:23:54 (Coordinator:
            : http://impala-dev:25000)
            : Query progress can be monitored at:
            : http://impala-dev:25000/query_plan?query_id=f0410a35e981a86b:e6d9207000000000
> elide
Done


PS5, Line 38: 0.83s
> error bars?
what do you mean exactly?


http://gerrit.cloudera.org:8080/#/c/5902/5/be/src/exprs/expr-value.h
File be/src/exprs/expr-value.h:

Line 177:             decimal4_val = +DecimalUtil::MAX_UNSCALED_DECIMAL4;
> Why do you want to convert it to an Rvalue? Just to avoid having to define 
I just defined it in the .cc - not doing so (and not converting to an rvalue) breaks the debug build, but not the release build, as this does actually get inlined if defined in the header.


http://gerrit.cloudera.org:8080/#/c/5902/5/be/src/util/decimal-util.h
File be/src/util/decimal-util.h:

Line 57:   template<typename T>
> Describe what it does.
Done


PS5, Line 58: T
> Always is_integral?
Not necessarily, this would work for float / double as well.


PS5, Line 59: boost::
> static_assert is now part of the language, but I'm not sure if it works on 
I'll try it


Line 60:     return scale == 0 ? 1 : 10 * GetConstScaleMultiplier<T>(scale-1);
> Check for no overflow, if it can be done simply.
I think it can be done, not sure how simply.  Since this is constexpr, we are pretty constrained in what we can use but I think another static assert could do it.


Line 136:   if (__builtin_const_p(scale) && p < 10) return GetConstScaleMultiplier<int32_t>(scale);
> Does this by itself speed anything up? If scale is constant at compile time
Let's remove it and find out.


-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Hello Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5902

to look at the new patch set (#10).

Change subject: IMPALA-2020: Inline big number strings
......................................................................

IMPALA-2020: Inline big number strings

We can directly spcecify these in the header instead of keeping the
constants out in a .o.  This lets the compiler inline them in a lot
more places, potentially giving better behavior than a table lookup
by directly inserting the constant.  This in turn allows divides by
now known compile time constants to be converted into multiply.

This is likely to become even more imporant with big integer divide,
which requires additional computation for DECIMAL_V2.

Testing: Compare the binary output of decimal-util.cc before and
after; ran expr test suite.

Using the following query I observe a small but statistically present
win of ~3% over an unpatched build.

select sum(l_extendedprice / l_discount) from tpch10_parquet.lineitem;

Before:

+-----------------------------------+
| sum(l_extendedprice / l_discount) |
+-----------------------------------+
| 61076151920731.010714279183910    |
+-----------------------------------+
Fetched 1 row(s) in 9.05s

After:

+-----------------------------------+
| sum(l_extendedprice / l_discount) |
+-----------------------------------+
| 61076151920731.010714279183910    |
+-----------------------------------+
Fetched 1 row(s) in 8.79s

Will do some additional benchmarking after the V2 stuff and rounding
changes land.

Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
---
M be/src/benchmarks/overflow-benchmark.cc
M be/src/common/init.cc
M be/src/util/decimal-util.cc
M be/src/util/decimal-util.h
4 files changed, 8 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/5902/10
-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-2020: Inline big number strings
......................................................................


Patch Set 13: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-2020: Make it easy to work with big numbers

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-2020: Make it easy to work with big numbers
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5902/2/be/src/util/decimal-util.h
File be/src/util/decimal-util.h:

PS2, Line 46: (
> nit: extraneous parens
Done


PS2, Line 47: -
> nit: spaces around -
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-2020: Make it easy to work with big numbers

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#4).

Change subject: IMPALA-2020: Make it easy to work with big numbers
......................................................................

IMPALA-2020: Make it easy to work with big numbers

Well the generic form turned out to not be super useful because
we generally only use these inside strings during tests.  In
lieu of having a general purpose function, let's just get rid of
the need to call some special large number init function.

Testing: Ran this in godbolt:

__int128_t val = AllNines<__int128_t>(38);

bignines:
        .quad   687399551400673279
        .quad   5421010862427522170

Ran this in Python:

>>> print 687399551400673279 + (5421010862427522170 * 2**64)
99999999999999999999999999999999999999
>>>

Looks legit.

Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
---
M be/src/benchmarks/overflow-benchmark.cc
M be/src/common/init.cc
M be/src/util/decimal-util.cc
M be/src/util/decimal-util.h
4 files changed, 8 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/5902/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-2020: Inline big number strings

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change.

Change subject: IMPALA-2020: Inline big number strings
......................................................................


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/5902/5//COMMIT_MSG
Commit Message:

PS5, Line 38: 0.83s
> what do you mean exactly?
I mean run the test a few times and provide something like https://en.wikipedia.org/wiki/Interquartile_range so we know if it sped things up or slowed them down.


http://gerrit.cloudera.org:8080/#/c/5902/5/be/src/util/decimal-util.h
File be/src/util/decimal-util.h:

Line 57:   template<typename T>
> Done
Neither looks done in PS9: private or rename


PS5, Line 58: T
> Not necessarily, this would work for float / double as well.
I'm not so sure: it would lose precision there in the low bits.


PS5, Line 59: boost::
> I'll try it
It helps me keep track of comments on patches when they are responded to only when the ball is back in my court.


Line 60:     return scale == 0 ? 1 : 10 * GetConstScaleMultiplier<T>(scale-1);
> I think it can be done, not sure how simply.  Since this is constexpr, we a
Why not just another '?:' operator?


Line 136:   if (__builtin_const_p(scale) && p < 10) return GetConstScaleMultiplier<int32_t>(scale);
> Let's remove it and find out.
Please post the numbers here when you have decided.


-- 
To view, visit http://gerrit.cloudera.org:8080/5902
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5095a366d914cebb0b64bd434a08dbb55c90ed30
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Michael Ho
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes