You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2019/02/28 13:57:39 UTC

[Impala-ASF-CR] IMPALA-8257: Parquet writer sometimes hits DCHECK when handling empty string

Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12636


Change subject: IMPALA-8257: Parquet writer sometimes hits DCHECK when handling empty string
......................................................................

IMPALA-8257: Parquet writer sometimes hits DCHECK when handling empty string

We had a too rigorous DCHECK in the code of
ColumnStats<StringValue>::Merge(). The DCHECK makes sure that we copy
the StringValues into their own buffer from the RowBatch memory.
Otherwise their value can be overwritten by following row batches.

The internal pointer of empty StringValues are NULL, so there is no
need to copy them to another buffer, therefore the DCHECKs are
unnecessary and moreover, they can result in crashes.

Now we only evaluate the DCHECKs when the corresponding StringValues
are not empty strings.

Testing:
I added an e2e test that inserts a lot of empty strings into a table.

Change-Id: I934b53c17720e41231e4d614fbc70f1937e19289
---
M be/src/exec/parquet/parquet-column-stats.inline.h
M testdata/workloads/tpch/queries/insert_parquet.test
2 files changed, 19 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/12636/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I934b53c17720e41231e4d614fbc70f1937e19289
Gerrit-Change-Number: 12636
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8257: Parquet writer sometimes hits DCHECK when handling empty string

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12636 )

Change subject: IMPALA-8257: Parquet writer sometimes hits DCHECK when handling empty string
......................................................................

IMPALA-8257: Parquet writer sometimes hits DCHECK when handling empty string

We had a too rigorous DCHECK in the code of
ColumnStats<StringValue>::Merge(). The DCHECK makes sure that we copy
the StringValues into their own buffer from the RowBatch memory.
Otherwise their value can be overwritten by following row batches.

The internal pointer of empty StringValues are NULL, so there is no
need to copy them to another buffer, therefore the DCHECKs are
unnecessary and moreover, they can result in crashes.

Now we only evaluate the DCHECKs when the corresponding StringValues
are not empty strings.

Testing:
I added an e2e test that inserts a lot of empty strings into a table.

Change-Id: I934b53c17720e41231e4d614fbc70f1937e19289
Reviewed-on: http://gerrit.cloudera.org:8080/12636
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/parquet/parquet-column-stats.inline.h
M testdata/workloads/tpch/queries/insert_parquet.test
2 files changed, 19 insertions(+), 5 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I934b53c17720e41231e4d614fbc70f1937e19289
Gerrit-Change-Number: 12636
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-8257: Parquet writer sometimes hits DCHECK when handling empty string

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12636 )

Change subject: IMPALA-8257: Parquet writer sometimes hits DCHECK when handling empty string
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2297/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I934b53c17720e41231e4d614fbc70f1937e19289
Gerrit-Change-Number: 12636
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 28 Feb 2019 14:41:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8257: Parquet writer sometimes hits DCHECK when handling empty string

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12636 )

Change subject: IMPALA-8257: Parquet writer sometimes hits DCHECK when handling empty string
......................................................................


Patch Set 2: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I934b53c17720e41231e4d614fbc70f1937e19289
Gerrit-Change-Number: 12636
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 28 Feb 2019 20:20:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8257: Parquet writer sometimes hits DCHECK when handling empty string

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12636 )

Change subject: IMPALA-8257: Parquet writer sometimes hits DCHECK when handling empty string
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I934b53c17720e41231e4d614fbc70f1937e19289
Gerrit-Change-Number: 12636
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 28 Feb 2019 16:19:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8257: Parquet writer sometimes hits DCHECK when handling empty string

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12636 )

Change subject: IMPALA-8257: Parquet writer sometimes hits DCHECK when handling empty string
......................................................................


Patch Set 1: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I934b53c17720e41231e4d614fbc70f1937e19289
Gerrit-Change-Number: 12636
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 28 Feb 2019 16:13:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8257: Parquet writer sometimes hits DCHECK when handling empty string

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12636 )

Change subject: IMPALA-8257: Parquet writer sometimes hits DCHECK when handling empty string
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3850/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/12636
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I934b53c17720e41231e4d614fbc70f1937e19289
Gerrit-Change-Number: 12636
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 28 Feb 2019 16:19:06 +0000
Gerrit-HasComments: No