You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Sudhanshu Arora (Code Review)" <ge...@cloudera.org> on 2019/04/23 22:12:50 UTC

[Impala-ASF-CR] [acid] Predicate to test if dir must be included.

Sudhanshu Arora has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13091


Change subject: [acid] Predicate to test if dir must be included.
......................................................................

[acid] Predicate to test if dir must be included.

The predicate is currently not set anywhere but the plan is to modify
HdfsTable and use the validWriteIdList there in the predicate
implementation and remove the directories that should not be used.

Testing Done:
- mvn test -Dtest=RecursingIteratorTest
Added RecursingIteratorTest. Could have also used FileMetadataLoaderTest
but oh! well

Change-Id: If0e88281d277127c9499d37b95fbba55dcc7761c
---
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/catalog/HdfsPartitionTest.java
A fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java
A fe/src/test/java/org/apache/impala/planner/ExplainTest.java
8 files changed, 418 insertions(+), 20 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/13091/2
-- 
To view, visit http://gerrit.cloudera.org:8080/13091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If0e88281d277127c9499d37b95fbba55dcc7761c
Gerrit-Change-Number: 13091
Gerrit-PatchSet: 2
Gerrit-Owner: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>

[Impala-ASF-CR] [acid] Predicate to test if dir must be included.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13091 )

Change subject: [acid] Predicate to test if dir must be included.
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3222/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0e88281d277127c9499d37b95fbba55dcc7761c
Gerrit-Change-Number: 13091
Gerrit-PatchSet: 3
Gerrit-Owner: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 May 2019 21:58:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [acid] Predicate to test if dir must be included.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13091 )

Change subject: [acid] Predicate to test if dir must be included.
......................................................................


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@123
PS2, Line 123:         fileStatuses = FileSystemUtil.listStatus(fs, partDir_, recursive_, pathPredicate_);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@563
PS2, Line 563:       FileMetadataLoader loader = new FileMetadataLoader(e.getKey(), /*recursive=*/isRecursive,
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@607
PS2, Line 607:    * TODO:Sudhanshu: Move this to interface and then use this implementation in BaseTableRef
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@567
PS2, Line 567:     RecursingIterator(FileSystem fs, Path startPath, Predicate<String> predicate) throws IOException {
line too long (102 > 90)


http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java
File fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java:

http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java@129
PS2, Line 129:     when(fs.listStatusIterator(new Path("/user/hive/warehouse/test/delta_0000023_0000023_0000"))).
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java@131
PS2, Line 131:     when(fs.listStatusIterator(new Path("/user/hive/warehouse/test/delta_0000024_0000024_0000"))).
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java@138
PS2, Line 138:             new FileSystemUtil.RecursingIterator(fs, new Path("/user/hive/warehouse/test"), x->true);
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java@153
PS2, Line 153:             new FileSystemUtil.RecursingIterator(fs, new Path("/user/hive/warehouse/test"),
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/13091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0e88281d277127c9499d37b95fbba55dcc7761c
Gerrit-Change-Number: 13091
Gerrit-PatchSet: 2
Gerrit-Owner: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Apr 2019 22:13:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [acid] Predicate to test if dir must be included.

Posted by "Fredy Wijaya (Code Review)" <ge...@cloudera.org>.
Fredy Wijaya has posted comments on this change. ( http://gerrit.cloudera.org:8080/13091 )

Change subject: [acid] Predicate to test if dir must be included.
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13091/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13091/3//COMMIT_MSG@7
PS3, Line 7: [acid] Predicate to test if dir must be included.
Just passing by, but I think this CR should warrant a JIRA.



-- 
To view, visit http://gerrit.cloudera.org:8080/13091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0e88281d277127c9499d37b95fbba55dcc7761c
Gerrit-Change-Number: 13091
Gerrit-PatchSet: 3
Gerrit-Owner: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Tue, 14 May 2019 22:02:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [acid] Predicate to test if dir must be included.

Posted by "Sudhanshu Arora (Code Review)" <ge...@cloudera.org>.
Sudhanshu Arora has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/13091 )

Change subject: [acid] Predicate to test if dir must be included.
......................................................................

[acid] Predicate to test if dir must be included.

The predicate is currently not set anywhere but the plan is to modify
HdfsTable and use the validWriteIdList there in the predicate
implementation and remove the directories that should not be used.

Testing Done:
- mvn test -Dtest=FileMetadataLoaderTest

Change-Id: If0e88281d277127c9499d37b95fbba55dcc7761c
---
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/catalog/HdfsPartitionTest.java
6 files changed, 66 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/13091/3
-- 
To view, visit http://gerrit.cloudera.org:8080/13091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If0e88281d277127c9499d37b95fbba55dcc7761c
Gerrit-Change-Number: 13091
Gerrit-PatchSet: 3
Gerrit-Owner: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>

[Impala-ASF-CR] [acid] Predicate to test if dir must be included.

Posted by "Sudhanshu Arora (Code Review)" <ge...@cloudera.org>.
Sudhanshu Arora has abandoned this change. ( http://gerrit.cloudera.org:8080/13091 )

Change subject: [acid] Predicate to test if dir must be included.
......................................................................


Abandoned
-- 
To view, visit http://gerrit.cloudera.org:8080/13091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: If0e88281d277127c9499d37b95fbba55dcc7761c
Gerrit-Change-Number: 13091
Gerrit-PatchSet: 7
Gerrit-Owner: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Fredy Wijaya <fw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>

[Impala-ASF-CR] [acid] Predicate to test if dir must be included.

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13091 )

Change subject: [acid] Predicate to test if dir must be included.
......................................................................


Patch Set 2:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@73
PS2, Line 73: pathPredicate
Can you clarify in the javadocs whether this predicate is applied to all paths or just to directories (to limit recursion)? Also, why use String here instead of Path? If using String we should also clarify what the string is (full URL? just reltaive path name to the part dir?)


http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@121
PS2, Line 121:         fileStatuses = FileSystemUtil.listFiles(fs, partDir_, recursive_);
I'd think we'd need the predicate here too.


http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@485
PS2, Line 485:   public void initializePartitionMetadata(
was this necessary>?


http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@607
PS2, Line 607:    * TODO:Sudhanshu: Move this to interface and then use this implementation in BaseTableRef
yea FeFsTable.Utils already has some similar stuff (it's in that Utils class because it predcates java 8 default interfaces)


http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@616
PS2, Line 616: "true"
perhaps we should be using Boolean.parseValue here? Maybe we need a tolower as well? is 'TRUE' supported?


http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1562
PS2, Line 1562:     Preconditions.checkNotNull(hdfsBaseDir_, "HdfsTable base dir is null");
how did we get to this state?


http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1726
PS2, Line 1726:   public FileSystemUtil.FsType getFsType() {
seems to have been pulled into your patch by mistake?


http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java@293
PS2, Line 293:         //TODO:Sudhanshu: Understand how DirectMetadataProvider is used.
sadly it's not used right now. Basically dead code from a half-finished attempt to excise catalogd. Perhaps we should remove it for the time being.


http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@525
PS2, Line 525:           return listFiles(fs, p, recursive);
We need to postprocess this result to evaluate the predicate, right?


http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@603
PS2, Line 603:       boolean includeDir = predicate_.test(fileStatus.getPath().toString());
We should clarify throughout that the predicate only applies to limit recursion and does not actually limit the returned _files_.


http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
File fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java:

http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@45
PS2, Line 45:         /* oldFds = */Collections.emptyList(), hostIndex, x->true);
maybe add a new case to this test that passes a predicate to exclude one of the partitions or something?


http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java
File fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java:

http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/common/RecursingIteratorTest.java@46
PS2, Line 46:   @Mock
impressive mocking, but I wonder whether it isn't simpler to just change one of the other tests that actually runs against the real filesystem. It's still relatively fast to run, and we always assume that unit tests run in an envronment with FileSystem set up


http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/planner/ExplainTest.java
File fe/src/test/java/org/apache/impala/planner/ExplainTest.java:

http://gerrit.cloudera.org:8080/#/c/13091/2/fe/src/test/java/org/apache/impala/planner/ExplainTest.java@59
PS2, Line 59: public class ExplainTest extends FrontendTestBase {
somehow this got pulled into your patch?



-- 
To view, visit http://gerrit.cloudera.org:8080/13091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0e88281d277127c9499d37b95fbba55dcc7761c
Gerrit-Change-Number: 13091
Gerrit-PatchSet: 2
Gerrit-Owner: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Apr 2019 06:04:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [acid] Predicate to test if dir must be included.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13091 )

Change subject: [acid] Predicate to test if dir must be included.
......................................................................


Patch Set 2:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/2865/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/13091
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If0e88281d277127c9499d37b95fbba55dcc7761c
Gerrit-Change-Number: 13091
Gerrit-PatchSet: 2
Gerrit-Owner: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yongzhi Chen <yc...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 Apr 2019 22:56:35 +0000
Gerrit-HasComments: No