You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Quanlong Huang (Code Review)" <ge...@cloudera.org> on 2022/08/01 03:35:57 UTC

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18801


Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................

IMPALA-11464: Skip listing staging dirs to avoid failures on them

Hive or other systems will generate staging/tmp dirs under the
table/partition folders while loading/inserting data. They are removed
when the operation is done. File metadata loading in catalogd could fail
if it's listing files of such dirs. This is found on HDFS where file
listing is done in batches. Each batch contains a partial list of 1000
items (configured by "dfs.ls.limit"). If the dir is removed, the next
listing, e.g. the next hasNext() call on the RemoteIterator, will fail
with FileNotFoundException. Such error on staging/tmp dirs should not
fail the metadata loading. However, if it happens on a partition dir,
the metadata loading should fail to avoid stale metadata.

This patch adds a check before listing the dir. If it's a staging/tmp
dir, catalogd will just ignore it. Also adds a debug action,
catalogd_pause_after_hdfs_remote_iterator_creation, to inject
sleeps after the first partial listing (happens in creating the
RemoteIterator). So we can reproduce the FileNotFoundException stably.

Tests:
 - Add test on removing a large staging dir (contains 1024 files) during
   REFRESH. Metadata loading fails consistently before this fix.
 - Add test on removing a large partition dir (contains 1024 files)
   during REFRESH. Verify metadata loading fails as expected.

Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
---
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/util/DebugUtils.java
M tests/metadata/test_recursive_listing.py
M tests/util/filesystem_base.py
M tests/util/hdfs_util.py
5 files changed, 194 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/18801/1
-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Xiang Yang (Code Review)" <ge...@cloudera.org>.
Xiang Yang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/18801/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/18801/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@942
PS4, Line 942:       if (LOG.isTraceEnabled()) {
Hi quanlong, why isTraceEnabled and then LOG.info() ?


http://gerrit.cloudera.org:8080/#/c/18801/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@988
PS4, Line 988:       if (LOG.isTraceEnabled()) {
generally for log4j, there is no need to add additional log level judgement.


http://gerrit.cloudera.org:8080/#/c/18801/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@1006
PS4, Line 1006:       if (LOG.isTraceEnabled()) {
same as above


http://gerrit.cloudera.org:8080/#/c/18801/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@1007
PS4, Line 1007:         LOG.trace("listed sub dir: " + fileStatus.getPath());
better to use '{}' to substitute args, see https://www.slf4j.org/faq.html#logging_performance



-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xiang Yang <yx...@126.com>
Gerrit-Comment-Date: Wed, 03 Aug 2022 03:18:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11066/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 01 Aug 2022 03:59:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xiang Yang <yx...@126.com>
Gerrit-Comment-Date: Fri, 05 Aug 2022 02:32:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8403/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xiang Yang <yx...@126.com>
Gerrit-Comment-Date: Thu, 04 Aug 2022 21:43:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8386/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 02 Aug 2022 01:34:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11079/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 03 Aug 2022 03:14:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Daniel Becker, Kurt Deschler, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18801

to look at the new patch set (#3).

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................

IMPALA-11464: Skip listing staging dirs to avoid failures on them

Hive or other systems will generate staging/tmp dirs under the
table/partition folders while loading/inserting data. They are removed
when the operation is done. File metadata loading in catalogd could fail
if it's listing files of such dirs. This is found on HDFS where file
listing is done in batches. Each batch contains a partial list of 1000
items (configured by "dfs.ls.limit"). If the dir is removed, the next
listing, e.g. the next hasNext() call on the RemoteIterator, will fail
with FileNotFoundException. Such error on staging/tmp dirs should not
fail the metadata loading. However, if it happens on a partition dir,
the metadata loading should fail to avoid stale metadata.

This patch adds a check before listing the dir. If it's a staging/tmp
dir, catalogd will just ignore it. Also adds a debug action,
catalogd_pause_after_hdfs_remote_iterator_creation, to inject
sleeps after the first partial listing (happens in creating the
RemoteIterator). So we can reproduce the FileNotFoundException stably.

Tests:
 - Add test on removing a large staging dir (contains 1024 files) during
   REFRESH. Metadata loading fails consistently before this fix.
 - Add test on removing a large partition dir (contains 1024 files)
   during REFRESH. Verify metadata loading fails as expected.

Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
---
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/util/DebugUtils.java
M tests/metadata/test_recursive_listing.py
M tests/util/filesystem_base.py
M tests/util/hdfs_util.py
5 files changed, 156 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/18801/3
-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18801/3/tests/metadata/test_recursive_listing.py
File tests/metadata/test_recursive_listing.py:

http://gerrit.cloudera.org:8080/#/c/18801/3/tests/metadata/test_recursive_listing.py@204
PS3, Line 204: 
> flake8: W504 line break after binary operator
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 03 Aug 2022 02:53:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18801/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/18801/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@988
PS4, Line 988:         LOG.debug("Ignoring {} since it is either a hidden directory or a temporary "
> generally for log4j, there is no need to add additional log level judgement
Done


http://gerrit.cloudera.org:8080/#/c/18801/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@1006
PS4, Line 1006: 
> same as above
Done


http://gerrit.cloudera.org:8080/#/c/18801/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@1007
PS4, Line 1007:     @Override
> Thanks for sharing the link. It's an interesting question and I did think a
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xiang Yang <yx...@126.com>
Gerrit-Comment-Date: Wed, 03 Aug 2022 09:17:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 2:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8386/


-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xiang Yang <yx...@126.com>
Gerrit-Comment-Date: Wed, 03 Aug 2022 12:07:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 2: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8384/


-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 02 Aug 2022 01:35:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 2:

(2 comments)

Should any of the file iterator changes made for IMPALA-9122 be reverted as well as part of this fix?

http://gerrit.cloudera.org:8080/#/c/18801/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/18801/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@995
PS2, Line 995:       if (isIgnoredDir(fileStatus.getPath())) {
Should this check be performed first so that temporary files are consistently ignored?


http://gerrit.cloudera.org:8080/#/c/18801/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@998
PS2, Line 998:         return;
This case should set curFile_ to something since the original function set it in all cases.



-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 02 Aug 2022 11:44:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Daniel Becker, Kurt Deschler, Impala Public Jenkins, Xiang Yang, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18801

to look at the new patch set (#5).

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................

IMPALA-11464: Skip listing staging dirs to avoid failures on them

Hive or other systems will generate staging/tmp dirs under the
table/partition folders while loading/inserting data. They are removed
when the operation is done. File metadata loading in catalogd could fail
if it's listing files of such dirs. This is found on HDFS where file
listing is done in batches. Each batch contains a partial list of 1000
items (configured by "dfs.ls.limit"). If the dir is removed, the next
listing, e.g. the next hasNext() call on the RemoteIterator, will fail
with FileNotFoundException. Such error on staging/tmp dirs should not
fail the metadata loading. However, if it happens on a partition dir,
the metadata loading should fail to avoid stale metadata.

This patch adds a check before listing the dir. If it's a staging/tmp
dir, catalogd will just ignore it. Also adds a debug action,
catalogd_pause_after_hdfs_remote_iterator_creation, to inject
sleeps after the first partial listing (happens in creating the
RemoteIterator). So we can reproduce the FileNotFoundException stably.

Tests:
 - Add test on removing a large staging dir (contains 1024 files) during
   REFRESH. Metadata loading fails consistently before this fix.
 - Add test on removing a large partition dir (contains 1024 files)
   during REFRESH. Verify metadata loading fails as expected.

Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
---
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/util/DebugUtils.java
M tests/metadata/test_recursive_listing.py
M tests/util/filesystem_base.py
M tests/util/hdfs_util.py
5 files changed, 150 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/18801/5
-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xiang Yang <yx...@126.com>

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8384/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 01 Aug 2022 06:30:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18801

to look at the new patch set (#2).

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................

IMPALA-11464: Skip listing staging dirs to avoid failures on them

Hive or other systems will generate staging/tmp dirs under the
table/partition folders while loading/inserting data. They are removed
when the operation is done. File metadata loading in catalogd could fail
if it's listing files of such dirs. This is found on HDFS where file
listing is done in batches. Each batch contains a partial list of 1000
items (configured by "dfs.ls.limit"). If the dir is removed, the next
listing, e.g. the next hasNext() call on the RemoteIterator, will fail
with FileNotFoundException. Such error on staging/tmp dirs should not
fail the metadata loading. However, if it happens on a partition dir,
the metadata loading should fail to avoid stale metadata.

This patch adds a check before listing the dir. If it's a staging/tmp
dir, catalogd will just ignore it. Also adds a debug action,
catalogd_pause_after_hdfs_remote_iterator_creation, to inject
sleeps after the first partial listing (happens in creating the
RemoteIterator). So we can reproduce the FileNotFoundException stably.

Tests:
 - Add test on removing a large staging dir (contains 1024 files) during
   REFRESH. Metadata loading fails consistently before this fix.
 - Add test on removing a large partition dir (contains 1024 files)
   during REFRESH. Verify metadata loading fails as expected.

Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
---
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/util/DebugUtils.java
M tests/metadata/test_recursive_listing.py
M tests/util/filesystem_base.py
M tests/util/hdfs_util.py
5 files changed, 191 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/18801/2
-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 2:

(5 comments)

> Should any of the file iterator changes made for IMPALA-9122 be reverted as well as part of this fix?

We have reverted them in IMPALA-10579.

http://gerrit.cloudera.org:8080/#/c/18801/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/18801/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@995
PS2, Line 995:       if (isIgnoredDir(fileStatus.getPath())) {
> Should this check be performed first so that temporary files are consistent
That's a good point. Normally tmp files will be in tmp dirs and the dirs will be accessed first since we are doing a recursive listing. So after this change we should not see tmp files in this method. However, if some systems tend to add tmp files directly into the partition/table folder, we should be able to torelate it. I'll move this check above.


http://gerrit.cloudera.org:8080/#/c/18801/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@998
PS2, Line 998:         return;
> This case should set curFile_ to something since the original function set 
We just keep curFile_ as null in this case. So the while-loop in hasNext() can continue. I will set it to null to improve readability.


http://gerrit.cloudera.org:8080/#/c/18801/2/tests/metadata/test_recursive_listing.py
File tests/metadata/test_recursive_listing.py:

http://gerrit.cloudera.org:8080/#/c/18801/2/tests/metadata/test_recursive_listing.py@147
PS2, Line 147:     """Test REFRESH survives with concurrent add/remove ops on large staging/tmp dirs
> Nit: we could mention also here that this is a regression test for IMPALA-1
Done. BTW, we mentioned it at line 187 as well.


http://gerrit.cloudera.org:8080/#/c/18801/2/tests/metadata/test_recursive_listing.py@211
PS2, Line 211:   def test_partition_dir_removed_inflight(self, vector, unique_database):
> This function and test_large_staging_dirs() are almost completely the same.
Done


http://gerrit.cloudera.org:8080/#/c/18801/2/tests/util/hdfs_util.py
File tests/util/hdfs_util.py:

http://gerrit.cloudera.org:8080/#/c/18801/2/tests/util/hdfs_util.py@325
PS2, Line 325:   def move(self, src, dst):
> Do we use it somewhere? Or would you like to add it in case it's useful in 
Oops, I forget to remove it. It was used in an abandaned idea.



-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 03 Aug 2022 02:51:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................

IMPALA-11464: Skip listing staging dirs to avoid failures on them

Hive or other systems will generate staging/tmp dirs under the
table/partition folders while loading/inserting data. They are removed
when the operation is done. File metadata loading in catalogd could fail
if it's listing files of such dirs. This is found on HDFS where file
listing is done in batches. Each batch contains a partial list of 1000
items (configured by "dfs.ls.limit"). If the dir is removed, the next
listing, e.g. the next hasNext() call on the RemoteIterator, will fail
with FileNotFoundException. Such error on staging/tmp dirs should not
fail the metadata loading. However, if it happens on a partition dir,
the metadata loading should fail to avoid stale metadata.

This patch adds a check before listing the dir. If it's a staging/tmp
dir, catalogd will just ignore it. Also adds a debug action,
catalogd_pause_after_hdfs_remote_iterator_creation, to inject
sleeps after the first partial listing (happens in creating the
RemoteIterator). So we can reproduce the FileNotFoundException stably.

Tests:
 - Add test on removing a large staging dir (contains 1024 files) during
   REFRESH. Metadata loading fails consistently before this fix.
 - Add test on removing a large partition dir (contains 1024 files)
   during REFRESH. Verify metadata loading fails as expected.

Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Reviewed-on: http://gerrit.cloudera.org:8080/18801
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/util/DebugUtils.java
M tests/metadata/test_recursive_listing.py
M tests/util/filesystem_base.py
M tests/util/hdfs_util.py
5 files changed, 150 insertions(+), 10 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xiang Yang <yx...@126.com>

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18801/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/18801/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@942
PS4, Line 942:       if (LOG.isTraceEnabled()) {
> Hi quanlong, why isTraceEnabled and then LOG.info() ?
Oops, it's a mistake.


http://gerrit.cloudera.org:8080/#/c/18801/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@1007
PS4, Line 1007:         LOG.trace("listed sub dir: " + fileStatus.getPath());
Thanks for sharing the link. It's an interesting question and I did think about it.

Although the current code has an additional log-level check, I think the simple string concatenating is faster than parsing the string format. The doc mentioned 

> The following two lines will yield the exact same output. However, the second form will outperform the first form by a factor of at least 30, in case of a disabled logging statement.
> logger.debug("The new entry is "+entry+".");
> logger.debug("The new entry is {}.", entry);

However, it doesn't compare the case if we check log-level in the first line. Without the check, it will always concatenate a new string, which introduce significant overhead so the speedup can be 30.

In our case, I'm not sure which is faster:

* Two identical log-level checks + one string concatenating
* One log-level check + string format parsing

I can change to the popular pattern for consistency with other logging codes.



-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xiang Yang <yx...@126.com>
Gerrit-Comment-Date: Wed, 03 Aug 2022 09:16:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18801/3/tests/metadata/test_recursive_listing.py
File tests/metadata/test_recursive_listing.py:

http://gerrit.cloudera.org:8080/#/c/18801/3/tests/metadata/test_recursive_listing.py@204
PS3, Line 204: +
flake8: W504 line break after binary operator



-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 03 Aug 2022 02:50:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/18801/1/tests/metadata/test_recursive_listing.py
File tests/metadata/test_recursive_listing.py:

http://gerrit.cloudera.org:8080/#/c/18801/1/tests/metadata/test_recursive_listing.py@16
PS1, Line 16: from multiprocessing.pool import ThreadPool
flake8: F401 'multiprocessing.pool.ThreadPool' imported but unused


http://gerrit.cloudera.org:8080/#/c/18801/1/tests/metadata/test_recursive_listing.py@17
PS1, Line 17: from multiprocessing import TimeoutError
flake8: F401 'multiprocessing.TimeoutError' imported but unused


http://gerrit.cloudera.org:8080/#/c/18801/1/tests/metadata/test_recursive_listing.py@22
PS1, Line 22: from tests.common.skip import (SkipIfLocal, SkipIfS3, SkipIfGCS, SkipIfCOS,
flake8: F401 'tests.common.skip.SkipIfABFS' imported but unused


http://gerrit.cloudera.org:8080/#/c/18801/1/tests/metadata/test_recursive_listing.py@234
PS1, Line 234: s
flake8: F841 local variable 'show_files_stmt' is assigned to but never used



-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 01 Aug 2022 03:36:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11078/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 03 Aug 2022 03:10:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 5: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18801/2/tests/metadata/test_recursive_listing.py
File tests/metadata/test_recursive_listing.py:

http://gerrit.cloudera.org:8080/#/c/18801/2/tests/metadata/test_recursive_listing.py@147
PS2, Line 147:   @SkipIfGCS.variable_listing_times
> Done. BTW, we mentioned it at line 187 as well.
Thanks, I think it's clearer this way, if it is mentioned in the doc comment too.



-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xiang Yang <yx...@126.com>
Gerrit-Comment-Date: Wed, 03 Aug 2022 11:54:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Kurt Deschler (Code Review)" <ge...@cloudera.org>.
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 5: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xiang Yang <yx...@126.com>
Gerrit-Comment-Date: Wed, 03 Aug 2022 15:10:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11081/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xiang Yang <yx...@126.com>
Gerrit-Comment-Date: Wed, 03 Aug 2022 09:38:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Daniel Becker, Kurt Deschler, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18801

to look at the new patch set (#4).

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................

IMPALA-11464: Skip listing staging dirs to avoid failures on them

Hive or other systems will generate staging/tmp dirs under the
table/partition folders while loading/inserting data. They are removed
when the operation is done. File metadata loading in catalogd could fail
if it's listing files of such dirs. This is found on HDFS where file
listing is done in batches. Each batch contains a partial list of 1000
items (configured by "dfs.ls.limit"). If the dir is removed, the next
listing, e.g. the next hasNext() call on the RemoteIterator, will fail
with FileNotFoundException. Such error on staging/tmp dirs should not
fail the metadata loading. However, if it happens on a partition dir,
the metadata loading should fail to avoid stale metadata.

This patch adds a check before listing the dir. If it's a staging/tmp
dir, catalogd will just ignore it. Also adds a debug action,
catalogd_pause_after_hdfs_remote_iterator_creation, to inject
sleeps after the first partial listing (happens in creating the
RemoteIterator). So we can reproduce the FileNotFoundException stably.

Tests:
 - Add test on removing a large staging dir (contains 1024 files) during
   REFRESH. Metadata loading fails consistently before this fix.
 - Add test on removing a large partition dir (contains 1024 files)
   during REFRESH. Verify metadata loading fails as expected.

Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
---
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/util/DebugUtils.java
M tests/metadata/test_recursive_listing.py
M tests/util/filesystem_base.py
M tests/util/hdfs_util.py
5 files changed, 156 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/18801/4
-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11065/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 01 Aug 2022 03:56:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Daniel Becker (Code Review)" <ge...@cloudera.org>.
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18801/2/tests/metadata/test_recursive_listing.py
File tests/metadata/test_recursive_listing.py:

http://gerrit.cloudera.org:8080/#/c/18801/2/tests/metadata/test_recursive_listing.py@147
PS2, Line 147:     """Test REFRESH survives with concurrent add/remove ops on large staging/tmp dirs
Nit: we could mention also here that this is a regression test for IMPALA-11464.


http://gerrit.cloudera.org:8080/#/c/18801/2/tests/metadata/test_recursive_listing.py@211
PS2, Line 211:   def test_partition_dir_removed_inflight(self, vector, unique_database):
This function and test_large_staging_dirs() are almost completely the same. Can we make a general parametrized helper function that, based on the parameters, produces the behaviour of test_large_staging_dirs() or test_partition_dir_removed_inflight()? Then the top level test functions would only need to call that helper with different parameters.


http://gerrit.cloudera.org:8080/#/c/18801/2/tests/util/hdfs_util.py
File tests/util/hdfs_util.py:

http://gerrit.cloudera.org:8080/#/c/18801/2/tests/util/hdfs_util.py@325
PS2, Line 325:   def move(self, src, dst):
Do we use it somewhere? Or would you like to add it in case it's useful in the future?



-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 02 Aug 2022 11:52:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xiang Yang <yx...@126.com>
Gerrit-Comment-Date: Thu, 04 Aug 2022 21:43:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11464: Skip listing staging dirs to avoid failures on them

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/18801 )

Change subject: IMPALA-11464: Skip listing staging dirs to avoid failures on them
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18801
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic848e6c8563a1e0bf294cd50167dfc40f66a56cb
Gerrit-Change-Number: 18801
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Xiang Yang <yx...@126.com>
Gerrit-Comment-Date: Thu, 04 Aug 2022 12:15:06 +0000
Gerrit-HasComments: No