You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2021/09/07 14:35:21 UTC

[Impala-ASF-CR] IMPALA-10900: Add Iceberg tests that write many files

Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17831


Change subject: IMPALA-10900: Add Iceberg tests that write many files
......................................................................

IMPALA-10900: Add Iceberg tests that write many files

In earlier versions of Impala we had a bug that affected
insertions to Iceberg tables. When Impala wrote multiple
files during a single INSERT statement it could crash, or
even worse, it could silently omit data files from the
Iceberg metadata.

The current master doesn't have this bug, but we don't
really have tests for this case.

This patch adds tests that write many files during inserts
to an Iceberg table. Both non-partitioned and partitioned
Iceberg tables are tested.

We achieve writing lots of files by setting 'parquet_file_size'
to 8 megabytes.

Testing:
 * added e2e test that write many data files
 * added exhaustive e2e test that writes even more data files

Change-Id: Ia2dbc2c5f9574153842af308a61f9d91994d067b
---
A testdata/workloads/functional-query/queries/QueryTest/iceberg-write-many-files-stress.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-write-many-files.test
M tests/query_test/test_iceberg.py
3 files changed, 193 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/17831/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17831
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia2dbc2c5f9574153842af308a61f9d91994d067b
Gerrit-Change-Number: 17831
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10900: Add Iceberg tests that write many files

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17831 )

Change subject: IMPALA-10900: Add Iceberg tests that write many files
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17831
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2dbc2c5f9574153842af308a61f9d91994d067b
Gerrit-Change-Number: 17831
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Sep 2021 12:21:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10900: Add Iceberg tests that write many files

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17831 )

Change subject: IMPALA-10900: Add Iceberg tests that write many files
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17831/1/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/17831/1/tests/query_test/test_iceberg.py@459
PS1, Line 459: 
flake8: W292 no newline at end of file



-- 
To view, visit http://gerrit.cloudera.org:8080/17831
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2dbc2c5f9574153842af308a61f9d91994d067b
Gerrit-Change-Number: 17831
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Sep 2021 14:36:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10900: Add Iceberg tests that write many files

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17831 )

Change subject: IMPALA-10900: Add Iceberg tests that write many files
......................................................................


Patch Set 2: Code-Review+2

Thx for adding more tests!


-- 
To view, visit http://gerrit.cloudera.org:8080/17831
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2dbc2c5f9574153842af308a61f9d91994d067b
Gerrit-Change-Number: 17831
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Sep 2021 11:33:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10900: Add Iceberg tests that write many files

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17831 )

Change subject: IMPALA-10900: Add Iceberg tests that write many files
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9426/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17831
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2dbc2c5f9574153842af308a61f9d91994d067b
Gerrit-Change-Number: 17831
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Sep 2021 14:57:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10900: Add Iceberg tests that write many files

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17831 )

Change subject: IMPALA-10900: Add Iceberg tests that write many files
......................................................................


Patch Set 2: Code-Review+1

(1 comment)

Thanks for taking a look!

Carry +1

http://gerrit.cloudera.org:8080/#/c/17831/1/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/17831/1/tests/query_test/test_iceberg.py@459
PS1, Line 459: 
> flake8: W292 no newline at end of file
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17831
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2dbc2c5f9574153842af308a61f9d91994d067b
Gerrit-Change-Number: 17831
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Sep 2021 15:46:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10900: Add Iceberg tests that write many files

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Riza Suminto, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17831

to look at the new patch set (#2).

Change subject: IMPALA-10900: Add Iceberg tests that write many files
......................................................................

IMPALA-10900: Add Iceberg tests that write many files

In earlier versions of Impala we had a bug that affected
insertions to Iceberg tables. When Impala wrote multiple
files during a single INSERT statement it could crash, or
even worse, it could silently omit data files from the
Iceberg metadata.

The current master doesn't have this bug, but we don't
really have tests for this case.

This patch adds tests that write many files during inserts
to an Iceberg table. Both non-partitioned and partitioned
Iceberg tables are tested.

We achieve writing lots of files by setting 'parquet_file_size'
to 8 megabytes.

Testing:
 * added e2e test that write many data files
 * added exhaustive e2e test that writes even more data files

Change-Id: Ia2dbc2c5f9574153842af308a61f9d91994d067b
---
A testdata/workloads/functional-query/queries/QueryTest/iceberg-write-many-files-stress.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-write-many-files.test
M tests/query_test/test_iceberg.py
3 files changed, 193 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/17831/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17831
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia2dbc2c5f9574153842af308a61f9d91994d067b
Gerrit-Change-Number: 17831
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-10900: Add Iceberg tests that write many files

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17831 )

Change subject: IMPALA-10900: Add Iceberg tests that write many files
......................................................................


Patch Set 1: Code-Review+1

Other than flake8 warning, the test looks good to me.
Thanks Zoltan!


-- 
To view, visit http://gerrit.cloudera.org:8080/17831
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2dbc2c5f9574153842af308a61f9d91994d067b
Gerrit-Change-Number: 17831
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Sep 2021 15:16:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10900: Add Iceberg tests that write many files

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17831 )

Change subject: IMPALA-10900: Add Iceberg tests that write many files
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7458/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17831
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2dbc2c5f9574153842af308a61f9d91994d067b
Gerrit-Change-Number: 17831
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Sep 2021 12:21:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10900: Add Iceberg tests that write many files

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17831 )

Change subject: IMPALA-10900: Add Iceberg tests that write many files
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9428/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17831
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2dbc2c5f9574153842af308a61f9d91994d067b
Gerrit-Change-Number: 17831
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Sep 2021 16:07:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10900: Add Iceberg tests that write many files

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17831 )

Change subject: IMPALA-10900: Add Iceberg tests that write many files
......................................................................

IMPALA-10900: Add Iceberg tests that write many files

In earlier versions of Impala we had a bug that affected
insertions to Iceberg tables. When Impala wrote multiple
files during a single INSERT statement it could crash, or
even worse, it could silently omit data files from the
Iceberg metadata.

The current master doesn't have this bug, but we don't
really have tests for this case.

This patch adds tests that write many files during inserts
to an Iceberg table. Both non-partitioned and partitioned
Iceberg tables are tested.

We achieve writing lots of files by setting 'parquet_file_size'
to 8 megabytes.

Testing:
 * added e2e test that write many data files
 * added exhaustive e2e test that writes even more data files

Change-Id: Ia2dbc2c5f9574153842af308a61f9d91994d067b
Reviewed-on: http://gerrit.cloudera.org:8080/17831
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
A testdata/workloads/functional-query/queries/QueryTest/iceberg-write-many-files-stress.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-write-many-files.test
M tests/query_test/test_iceberg.py
3 files changed, 193 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17831
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia2dbc2c5f9574153842af308a61f9d91994d067b
Gerrit-Change-Number: 17831
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10900: Add Iceberg tests that write many files

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17831 )

Change subject: IMPALA-10900: Add Iceberg tests that write many files
......................................................................


Patch Set 3: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17831
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2dbc2c5f9574153842af308a61f9d91994d067b
Gerrit-Change-Number: 17831
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Sep 2021 18:38:36 +0000
Gerrit-HasComments: No