You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2019/07/01 22:40:45 UTC

[Impala-ASF-CR] IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories

Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13665 )

Change subject: IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories
......................................................................


Patch Set 11:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@545
PS8, Line 545: 
             :    * If 'recursive' is true, all 
> Actually, I added this line in the doc since I noticed that the RecursingIt
hrm, were the semantics already broken then? See the unit test AcidUtilsTest.testAcidStateNoBase:

    assertFiltering(new String[]{
            "base_01.txt",
            "post_upgrade.txt",
            "delta_000005_000005_0000/",
            "delta_000005_000005_0000/lmn.txt",
            "base_000010/",
            "delta_0000012_0000012_0000/",
            "delta_0000012_0000012_0000/0000_0",
            "delta_0000012_0000012_0000/0000_1"},
        "", // writeIdList that accepts all transactions as valid
        new String[]{
            // Post upgrade files are ignored if there is a valid base.
            "delta_0000012_0000012_0000/0000_0",
            "delta_0000012_0000012_0000/0000_1"});


ie here the empty base directory has important semantics. Maybe we don't have a valid e2e test case to reproduce this yet because of this Hive bug: https://issues.apache.org/jira/browse/HIVE-21750 but it certainly seems like this is an issue. Let's file a JIRA to make sure we validate this case before releasing this feature.


http://gerrit.cloudera.org:8080/#/c/13665/11/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/13665/11/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@731
PS11, Line 731:       // we use listFiles API to natively recurse on S3 for performance reasons, so we
Are we really gaining much from using this RecursiveIteratorWithFilter class for the S3 case now? It seems like this is getting to be a bit of spaghetti, with this being named "Recursive" but having a flag that makes it non-recursive.

What about just having a separate class like 'FilteringIterator' which wraps RemoteIterator and does filtering, and keep this one as always-recursive, so the code path is easier to follow?


http://gerrit.cloudera.org:8080/#/c/13665/11/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@736
PS11, Line 736:         LOG.info("Ignoring {} since its contained in a ignored directory",
This probably shouldn't be at INFO level



-- 
To view, visit http://gerrit.cloudera.org:8080/13665
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa
Gerrit-Change-Number: 13665
Gerrit-PatchSet: 11
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Mon, 01 Jul 2019 22:40:45 +0000
Gerrit-HasComments: Yes