You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2018/02/21 16:33:09 UTC

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9381


Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................

IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Quick fix of Parquet write path until the Parquet community
agrees on the ordering of floating point numbers.

This commit ignores NaNs during min/max statistics
calculation. Only write NaN as 'min_value' and 'max_value'
if all the values are NaNs in the column chunk.

Added e2e tests as well.

Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
---
M be/src/exec/parquet-column-stats.inline.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
2 files changed, 36 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/9381/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Gabor Kaszab, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9381

to look at the new patch set (#2).

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................

IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Quick fix of Parquet write path until the Parquet community
agrees on the ordering of floating point numbers.

This commit ignores NaNs during min/max statistics
calculation. Only write NaN as 'min_value' and 'max_value'
if all the values are NaNs in the column chunk.

Added e2e tests as well.

Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
---
M be/src/exec/parquet-column-stats.inline.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
2 files changed, 39 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/9381/2
-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9381/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9381/5//COMMIT_MSG@12
PS5, Line 12: ignores
> If two NaN's are inserted, it wouldn't ignore them but write them into the 
Yeah, the next sentence describes that case. Let me rephrase it.

I'm not sure if this is the best option, but it was intentional. I had a discussion with the parquet-cpp community about the behavior of the quick fix. We agreed to follow what fmax/fmin does: they only return NaN when both arguments are NaNs.

IMPALA-6539 is about the long-term fix, but it can't be implemented until PARQUET-1222 is resolved.


http://gerrit.cloudera.org:8080/#/c/9381/5/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/9381/5/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@580
PS5, Line 580: ====
> Can you add a test that inserts multiple NaNs, with and without following t
Added some tests for these cases.



-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 06 Mar 2018 17:25:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h@37
PS1, Line 37: std::is_floating_point<T>::value>
> If T is not a numeric type, then it can be a problem, e.g. you have the fol
Thanks for checking and for the explanations.


http://gerrit.cloudera.org:8080/#/c/9381/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/9381/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@507
PS1, Line 507: # the min filter
nit: .



-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Feb 2018 19:50:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h@37
PS1, Line 37: std::is_floating_point<T>::value>
> I'm not sure I get this. I played around a little bit with putting is_float
If T is not a numeric type, then it can be a problem, e.g. you have the following code:

 template <typename T>
 void printer(const T& t) {
   if (is_floating_point<T>::value) {
     cout << fmax(t, t) << endl;
   } else {
     cout << "no" << endl;
   }
 }

As long as you invoke this function with numbers, your code compiles. But, when you invoke this function with T = std::string, you'll get a compiler error because there is no fmax() function that takes std::string as a parameter.

However, the C++17 constexpr ifs will enable these code structures.


http://gerrit.cloudera.org:8080/#/c/9381/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/9381/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@495
PS2, Line 495: # IMPALA-6527: NaN values lead to incorrect filtering.
> The following 3 tests were introduced with the read path fix. Do you think 
Yeah I think you are right. I'll drop the equality test.
* 'val > 0' tests that the max filter doesn't reject the column chunk.
* 'van < 100' tests that the min filter doesn't reject the column chunk.
So, I keep these two.


http://gerrit.cloudera.org:8080/#/c/9381/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@530
PS2, Line 530: nof
> nit: Typo or some abbreviation I don't know? :)
typo :)



-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Feb 2018 16:20:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 3:

(1 comment)

Thanks for the explanation, Zoli!
If there is no way to merge the general and the floating point cases to one place, than I'm fine with it.

http://gerrit.cloudera.org:8080/#/c/9381/3/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/3/be/src/exec/parquet-column-stats.inline.h@29
PS3, Line 29: template <typename T, typename Enable = void>
nit: Shouldn't these live inside either ColumnStats or somewhere in util dir?



-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Feb 2018 16:53:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9381/4/be/src/util/stat-util.h
File be/src/util/stat-util.h:

http://gerrit.cloudera.org:8080/#/c/9381/4/be/src/util/stat-util.h@29
PS4, Line 29: struct MinMaxTrait {
I think this would be better placed in some parquet-specific files since it is the min/max ordering we use for parquet files. The other function in stat-util looks like it's for things to compute statistical values. My preference would be to have it as a class member of ColumnStatsBase.



-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Feb 2018 17:15:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h@37
PS1, Line 37: std::is_floating_point<T>::value>
Wouldn't normal function overloads work here, too?

Alternatively, wouldn't this also work using a if inside a method?

  if (std::is_floating_point<T>::value) ... else ...;

The compiler should still be able to optimize it all away but it might be more concise and easier to follow.



-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Feb 2018 19:31:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9381/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/9381/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@495
PS2, Line 495: # IMPALA-6527: NaN values lead to incorrect filtering.
The following 3 tests were introduced with the read path fix. Do you think these are still required since we have a separate test suite to test the reader (with the inputs written by an incorrect writer)?



-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Feb 2018 15:53:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h@37
PS1, Line 37: std::is_floating_point<T>::value>
> you're welcome!
If you prefer, you can use

    template<typename T>
    const std::enable_if_t<!std::is_floating_point<T>{}, T> &
    Bigger(const T& x, const T& y) {
      return std::max(x, y);
    }
    
    template<typename T>
    std::enable_if_t<std::is_floating_point<T>{}, T>
    Bigger(T x, T y) {
      return std::fmax(x, y);
    }



-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Sat, 24 Feb 2018 19:12:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9381/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9381/5//COMMIT_MSG@12
PS5, Line 12: ignores
If two NaN's are inserted, it wouldn't ignore them but write them into the stats, no? Is this desired?


http://gerrit.cloudera.org:8080/#/c/9381/5/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/9381/5/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@580
PS5, Line 580: ====
Can you add a test that inserts multiple NaNs, with and without following them with a non-NaN?



-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Mar 2018 21:55:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 5: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Mar 2018 11:13:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h@37
PS1, Line 37: std::is_floating_point<T>::value>
> Yes, but I would need to do it for float and double as well.
sgtm, thanks.



-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Feb 2018 22:52:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Gabor Kaszab, Jim Apple, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9381

to look at the new patch set (#5).

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................

IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Quick fix of Parquet write path until the Parquet community
agrees on the ordering of floating point numbers.

This commit ignores NaNs during min/max statistics
calculation. Only write NaN as 'min_value' and 'max_value'
if all the values are NaNs in the column chunk.

Added e2e tests as well.

Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
---
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
3 files changed, 41 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/9381/5
-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Mar 2018 03:54:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 7: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Mar 2018 07:34:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Gabor Kaszab, Jim Apple, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9381

to look at the new patch set (#6).

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................

IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Quick fix of Parquet write path until the Parquet community
agrees on the ordering of floating point numbers.

The behavior follows the way fmax()/fmin() works, ie. Impala
will only write NaN into the stats when all the values are NaNs.
This behavior is aligned with the quick fix of Parquet-CPP.

Added e2e tests as well.

Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
---
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
3 files changed, 90 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/9381/6
-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................

IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Quick fix of Parquet write path until the Parquet community
agrees on the ordering of floating point numbers.

The behavior follows the way fmax()/fmin() works, ie. Impala
will only write NaN into the stats when all the values are NaNs.
This behavior is aligned with the quick fix of Parquet-CPP.

Added e2e tests as well.

Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Reviewed-on: http://gerrit.cloudera.org:8080/9381
Reviewed-by: Lars Volker <lv...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
3 files changed, 90 insertions(+), 17 deletions(-)

Approvals:
  Lars Volker: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h@37
PS1, Line 37: std::is_floating_point<T>::value>
> Wouldn't normal function overloads work here, too?
Yes, but I would need to do it for float and double as well.

The 'if' can also work, however the return type must be 'T' in that case. 

Currently the return type is 'const T&' for std::min/max, and 'T' for fmin/fmax. But, I suspect it doesn't really matter because the extra work will be optimized away anyway.

I'll choose the 'if' if you don't have a strong preference for the other.



-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Feb 2018 19:45:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h@37
PS1, Line 37: std::is_floating_point<T>::value>
> I just realized that these kind of constexpr ifs will only be valid in C++1
I'm not sure I get this. I played around a little bit with putting is_floating_point<T>::value to an if condition and for me this seems working even with c++11. Am I missing something?


http://gerrit.cloudera.org:8080/#/c/9381/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/9381/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@530
PS2, Line 530: nof
nit: Typo or some abbreviation I don't know? :)



-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Feb 2018 15:32:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Gabor Kaszab, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9381

to look at the new patch set (#3).

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................

IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Quick fix of Parquet write path until the Parquet community
agrees on the ordering of floating point numbers.

This commit ignores NaNs during min/max statistics
calculation. Only write NaN as 'min_value' and 'max_value'
if all the values are NaNs in the column chunk.

Added e2e tests as well.

Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
---
M be/src/exec/parquet-column-stats.inline.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
2 files changed, 32 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/9381/3
-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2062/


-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Mar 2018 03:54:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 5:

(2 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h@37
PS1, Line 37: 
> If you prefer, you can use
Yes, it is also a viable approach, however I'd stick with the current solution because of the followings:
* more concise
* fewer enable_ifs
* decltype(auto) chooses the right return type we need, and I think it is more readable as well


http://gerrit.cloudera.org:8080/#/c/9381/4/be/src/util/stat-util.h
File be/src/util/stat-util.h:

http://gerrit.cloudera.org:8080/#/c/9381/4/be/src/util/stat-util.h@29
PS4, Line 29:   template <typename T>
> I think this would be better placed in some parquet-specific files since it
Yeah I agree, btw ColumnStatsBase was my other candidate as well.

Fixed.



-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 02 Mar 2018 20:08:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Lars Volker (Code Review)" <ge...@cloudera.org>.
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Mar 2018 03:53:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h@37
PS1, Line 37: std::is_floating_point<T>::value>
> sgtm, thanks.
I just realized that these kind of constexpr ifs will only be valid in C++17.

In C++14 it would generate a compile-time error since there is no fmax() for StringValue and so on.

I think the current solution with the comment is readable, but if you think normal function overloads or full template specializations would be better, I'll change the code.



-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Feb 2018 22:54:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9381/3/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/3/be/src/exec/parquet-column-stats.inline.h@29
PS3, Line 29: template <typename T, typename Enable = void>
> nit: Shouldn't these live inside either ColumnStats or somewhere in util di
Yeah, probably it is not the best place for them.

ColumnStats is a template class and putting it there would be confusing I think.
Maybe I could nest it inside ColumnStatsBase.

Anyway, I found an unused header in the util dir called stat-util.h, I moved the definitions there for now.


http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h@37
PS1, Line 37: std::is_floating_point<T>::value>
> Thanks for checking and for the explanations.
you're welcome!


http://gerrit.cloudera.org:8080/#/c/9381/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/9381/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@507
PS1, Line 507: select * from test_nan where val < 100
> nit: .
added . here, and also to some other sentences.



-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Feb 2018 11:31:53 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Lars Volker, Gabor Kaszab, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9381

to look at the new patch set (#4).

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics
......................................................................

IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Quick fix of Parquet write path until the Parquet community
agrees on the ordering of floating point numbers.

This commit ignores NaNs during min/max statistics
calculation. Only write NaN as 'min_value' and 'max_value'
if all the values are NaNs in the column chunk.

Added e2e tests as well.

Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
---
M be/src/exec/parquet-column-stats.inline.h
M be/src/util/stat-util.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
3 files changed, 44 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/9381/4
-- 
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>