You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Norbert Luksa (Code Review)" <ge...@cloudera.org> on 2019/09/26 09:19:42 UTC

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Norbert Luksa has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14264


Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................

IMPALA-8498: Write column index for floating types when NaN is not present

IMPALA-7307 disabled column index writing for floating point columns
until PARQUET-1222 is resolved. However, the problematic values are
only the NaNs. Therefore we can write column index if NaNs are not
present in data.

Testing:
  * Added tests which should fail if a column index is
    present while having NaN values in the column.

Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M tests/query_test/test_parquet_page_index.py
2 files changed, 106 insertions(+), 23 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/14264/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 2
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................

IMPALA-8498: Write column index for floating types when NaN is not present

IMPALA-7307 disabled column index writing for floating point columns
until PARQUET-1222 is resolved. However, the problematic values are
only the NaNs. Therefore we can write column index if NaNs are not
present in data.

Testing:
  * Added tests which should fail if a column index is
    present while having NaN values in the column.

Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Reviewed-on: http://gerrit.cloudera.org:8080/14264
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
M tests/query_test/test_parquet_page_index.py
3 files changed, 96 insertions(+), 37 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 11
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4764/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 7
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 09 Oct 2019 11:40:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 8: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 8
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 09 Oct 2019 14:57:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 9: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 9
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 14 Oct 2019 11:04:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4779/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 9
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 10 Oct 2019 12:07:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 10: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 10
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 14 Oct 2019 11:04:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................

IMPALA-8498: Write column index for floating types when NaN is not present

IMPALA-7307 disabled column index writing for floating point columns
until PARQUET-1222 is resolved. However, the problematic values are
only the NaNs. Therefore we can write column index if NaNs are not
present in data.

Testing:
  * Added tests which should fail if a column index is
    present while having NaN values in the column.

Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M tests/query_test/test_parquet_page_index.py
2 files changed, 110 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/14264/6
-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 6
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 7
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 09 Oct 2019 14:34:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4715/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 6
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 03 Oct 2019 14:25:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 3:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/4645/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 3
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 26 Sep 2019 10:03:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14264/5/tests/query_test/test_parquet_page_index.py
File tests/query_test/test_parquet_page_index.py:

http://gerrit.cloudera.org:8080/#/c/14264/5/tests/query_test/test_parquet_page_index.py@498
PS5, Line 498:  
flake8: E222 multiple spaces after operator



-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 5
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 03 Oct 2019 07:06:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 6: Code-Review+1

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14264/6/be/src/exec/parquet/hdfs-parquet-table-writer.cc
File be/src/exec/parquet/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/14264/6/be/src/exec/parquet/hdfs-parquet-table-writer.cc@436
PS6, Line 436: //UNLIKELY(
remove?


http://gerrit.cloudera.org:8080/#/c/14264/6/be/src/exec/parquet/hdfs-parquet-table-writer.cc@470
PS6, Line 470:     // IMPALA-7304: Don't write column index for floating-point columns until
             :     // PARQUET-1222 is resolved. This is modified by:
             :     // IMPALA-8498: Write column index for floating types when NaN is not present
nit: I think that mentioning IMPALA-7304 just make this comment more confusing.


http://gerrit.cloudera.org:8080/#/c/14264/6/tests/query_test/test_parquet_page_index.py
File tests/query_test/test_parquet_page_index.py:

http://gerrit.cloudera.org:8080/#/c/14264/6/tests/query_test/test_parquet_page_index.py@305
PS6, Line 305: 
nit: extra line (we generally leave only 1 line between functions)


http://gerrit.cloudera.org:8080/#/c/14264/6/tests/query_test/test_parquet_page_index.py@466
PS6, Line 466:       unique_database, no_nan_tbl,
             :       "(1.5), (2.3), (4.5), (42.42), (3.1415), (0.0)")
here and at several other lines: needs + 4 indentation



-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 6
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 03 Oct 2019 15:56:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 4:

(3 comments)

Sorry for joining late when the tests are already running.

http://gerrit.cloudera.org:8080/#/c/14264/4/be/src/exec/parquet/hdfs-parquet-table-writer.cc
File be/src/exec/parquet/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/14264/4/be/src/exec/parquet/hdfs-parquet-table-writer.cc@438
PS4, Line 438:     // IMPALA-7304: Don't write column index for floating-point columns until
             :     // PARQUET-1222 is resolved. This is modified by:
             :     // IMPALA-8498: Write column index for floating types when NaN is not present
             :     if (std::is_floating_point<T>::value) {
             :       if (type().type == TYPE_FLOAT) {
             :         float* v = static_cast<float*>(value);
             :         if (std::isnan(*v)) valid_column_index_ = false;
             :       } else if (type().type == TYPE_DOUBLE) {
             :         double* v = static_cast<double*>(value);
             :         if (std::isnan(*v)) valid_column_index_ = false;
             :       } else {
             :         DCHECK(false);
             :       }
             :     }
I have 3 minor issues about this block:

- as it is related to stats, it seems more logical to me to move it to the end of this function, near page_stats_->Update(*CastValue(value));

- as T contains the info whether this a FLOAT or DOUBLE, I think this could simplified to if (std::is_floating_point<T>::value && std::isnan(*CastValue(value))) valid_column_index_ = false;

- if (type().type == TYPE_FLOAT) seems like a runtime check and the change above would turn it to a compile time check

+ as we use CastValue at several places, I think it would be better to add T* v = CastValue(value); to the beginning of the  function.


http://gerrit.cloudera.org:8080/#/c/14264/4/be/src/exec/parquet/hdfs-parquet-table-writer.cc@452
PS4, Line 452: if (current_encoding_ == parquet::Encoding::PLAIN_DICTIONARY) 
optional: a potential optimization would be to do the NaN check here only for PLAIN pages, and add a similar check to dictionary finalization for DICTIONARY pages. With tests this may be too much work, but I think it worth a TODO.


http://gerrit.cloudera.org:8080/#/c/14264/4/tests/query_test/test_parquet_page_index.py
File tests/query_test/test_parquet_page_index.py:

http://gerrit.cloudera.org:8080/#/c/14264/4/tests/query_test/test_parquet_page_index.py@461
PS4, Line 461: col_type = "FLOAT"
I am bit worried about double being untested, and it seems simple to run the same tests with col_type = "DOUBLE" for both table creation and to be substituted in CAST expressions.



-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 4
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 26 Sep 2019 15:27:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 3: Code-Review+2

LGTM. The clang-tidy job failed with a maven server error.


-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 3
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 26 Sep 2019 14:05:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................

IMPALA-8498: Write column index for floating types when NaN is not present

IMPALA-7307 disabled column index writing for floating point columns
until PARQUET-1222 is resolved. However, the problematic values are
only the NaNs. Therefore we can write column index if NaNs are not
present in data.

Testing:
  * Added tests which should fail if a column index is
    present while having NaN values in the column.

Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M tests/query_test/test_parquet_page_index.py
2 files changed, 110 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/14264/5
-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 5
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 8: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5079/


-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 8
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 09 Oct 2019 19:10:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 9: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 9
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 Oct 2019 11:34:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5086/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 10
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 14 Oct 2019 11:04:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14264/4/be/src/exec/parquet/hdfs-parquet-table-writer.cc
File be/src/exec/parquet/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/14264/4/be/src/exec/parquet/hdfs-parquet-table-writer.cc@444
PS4, Line 444: std::isnan(*v)
std::isnan() could be put in an UNLIKELY macro.



-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 4
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 30 Sep 2019 16:16:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14264/2/tests/query_test/test_parquet_page_index.py
File tests/query_test/test_parquet_page_index.py:

http://gerrit.cloudera.org:8080/#/c/14264/2/tests/query_test/test_parquet_page_index.py@429
PS2, Line 429: s
flake8: E501 line too long (93 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/14264/2/tests/query_test/test_parquet_page_index.py@462
PS2, Line 462: #
flake8: E116 unexpected indentation (comment)


http://gerrit.cloudera.org:8080/#/c/14264/2/tests/query_test/test_parquet_page_index.py@473
PS2, Line 473: #
flake8: E116 unexpected indentation (comment)


http://gerrit.cloudera.org:8080/#/c/14264/2/tests/query_test/test_parquet_page_index.py@483
PS2, Line 483: #
flake8: E116 unexpected indentation (comment)



-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 2
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 26 Sep 2019 09:20:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4644/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 2
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 26 Sep 2019 09:38:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has removed a vote on this change.

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Removed Code-Review+2 by Impala Public Jenkins <im...@cloudera.com>
-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 4
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................

IMPALA-8498: Write column index for floating types when NaN is not present

IMPALA-7307 disabled column index writing for floating point columns
until PARQUET-1222 is resolved. However, the problematic values are
only the NaNs. Therefore we can write column index if NaNs are not
present in data.

Testing:
  * Added tests which should fail if a column index is
    present while having NaN values in the column.

Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
M tests/query_test/test_parquet_page_index.py
3 files changed, 96 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/14264/9
-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 9
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 5:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/14264/4/be/src/exec/parquet/hdfs-parquet-table-writer.cc
File be/src/exec/parquet/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/14264/4/be/src/exec/parquet/hdfs-parquet-table-writer.cc@444
PS4, Line 444: 
> std::isnan() could be put in an UNLIKELY macro.
Done


http://gerrit.cloudera.org:8080/#/c/14264/4/be/src/exec/parquet/hdfs-parquet-table-writer.cc@438
PS4, Line 438:     T* val = CastValue(value);
             :     if (current_encoding_ == parquet::Encoding::PLAIN_DICTIONARY) {
             :       if (UNLIKELY(num_values_since_dict_size_check_ >=
             :                    DICTIONARY_DATA_PAGE_SIZE_CHECK_PERIOD)) {
             :         num_values_since_dict_size_check_ = 0;
             :         if (dict_encoder_->EstimatedDataEncodedSize() >= page_size_) return false;
             :       }
             :       ++num_values_since_dict_size_check_;
             :       *bytes_needed = dict_encoder_->Put(*val);
             :       // If the dictionary contains the maximum number of values, switch to plain
             :       // encoding for the next page. The current page is full and must be written out.
             :       if (UNLIKELY(*bytes_needed < 0)) {
             :         next_page_encoding_ = parquet::Encoding::PLAIN;
             :      
> I have 3 minor issues about this block:
I have moved the block to the end, right before updating page stats.
The problem I run into with your suggestion was that I cannot use CastValue since it returns a T*, and the compiler can't deduce from std::is_floating_point that T can be passed to std::isnan.
But I understand your point regard compile/runtime checks, proposed a solution with only compile time checks. It separates the double and float type checks using std::is_same.

+: Added the cast to the beginning of the function.


http://gerrit.cloudera.org:8080/#/c/14264/4/be/src/exec/parquet/hdfs-parquet-table-writer.cc@452
PS4, Line 452:   }
> optional: a potential optimization would be to do the NaN check here only f
I would skip this optimization for now, not sure if it's worth the extra work, since this is only a workaround until the aforementioned Parquet issue is solved.


http://gerrit.cloudera.org:8080/#/c/14264/2/tests/query_test/test_parquet_page_index.py
File tests/query_test/test_parquet_page_index.py:

http://gerrit.cloudera.org:8080/#/c/14264/2/tests/query_test/test_parquet_page_index.py@429
PS2, Line 429: 
> flake8: E501 line too long (93 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/14264/2/tests/query_test/test_parquet_page_index.py@462
PS2, Line 462:  
> flake8: E116 unexpected indentation (comment)
Done


http://gerrit.cloudera.org:8080/#/c/14264/2/tests/query_test/test_parquet_page_index.py@473
PS2, Line 473: 
> flake8: E116 unexpected indentation (comment)
Done


http://gerrit.cloudera.org:8080/#/c/14264/2/tests/query_test/test_parquet_page_index.py@483
PS2, Line 483: 
> flake8: E116 unexpected indentation (comment)
Done


http://gerrit.cloudera.org:8080/#/c/14264/4/tests/query_test/test_parquet_page_index.py
File tests/query_test/test_parquet_page_index.py:

http://gerrit.cloudera.org:8080/#/c/14264/4/tests/query_test/test_parquet_page_index.py@461
PS4, Line 461: for col_type in ["
> I am bit worried about double being untested, and it seems simple to run th
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 5
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 03 Oct 2019 07:05:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................

IMPALA-8498: Write column index for floating types when NaN is not present

IMPALA-7307 disabled column index writing for floating point columns
until PARQUET-1222 is resolved. However, the problematic values are
only the NaNs. Therefore we can write column index if NaNs are not
present in data.

Testing:
  * Added tests which should fail if a column index is
    present while having NaN values in the column.

Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M tests/query_test/test_parquet_page_index.py
2 files changed, 89 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/14264/7
-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 7
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4711/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 5
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 03 Oct 2019 07:46:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 10: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 10
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 14 Oct 2019 15:21:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 4
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 26 Sep 2019 14:06:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Norbert Luksa (Code Review)" <ge...@cloudera.org>.
Norbert Luksa has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................

IMPALA-8498: Write column index for floating types when NaN is not present

IMPALA-7307 disabled column index writing for floating point columns
until PARQUET-1222 is resolved. However, the problematic values are
only the NaNs. Therefore we can write column index if NaNs are not
present in data.

Testing:
  * Added tests which should fail if a column index is
    present while having NaN values in the column.

Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M tests/query_test/test_parquet_page_index.py
2 files changed, 107 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/14264/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 3
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 5: Code-Review+1

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14264/5/be/src/exec/parquet/hdfs-parquet-table-writer.cc
File be/src/exec/parquet/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/14264/5/be/src/exec/parquet/hdfs-parquet-table-writer.cc@473
PS5, Line 473: typename std::remove_cv<T>::type
nit: can be shortened to 'std::remove_cv_t<T>'


http://gerrit.cloudera.org:8080/#/c/14264/5/tests/query_test/test_parquet_page_index.py
File tests/query_test/test_parquet_page_index.py:

http://gerrit.cloudera.org:8080/#/c/14264/5/tests/query_test/test_parquet_page_index.py@461
PS5, Line 461: FLOAT", "DOUBLE
you can use lower cased 'float' and 'double', the CAST below will work just fine. And you won't need to call 'lower()' all the time.


http://gerrit.cloudera.org:8080/#/c/14264/5/tests/query_test/test_parquet_page_index.py@475
PS5, Line 475: "
nit: you could add '_' here and below



-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 5
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 03 Oct 2019 12:29:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5003/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 4
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 26 Sep 2019 14:06:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8498: Write column index for floating types when NaN is not present

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is not present
......................................................................


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5079/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 8
Gerrit-Owner: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <no...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 09 Oct 2019 14:57:21 +0000
Gerrit-HasComments: No