You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2019/04/11 07:19:50 UTC

[Impala-ASF-CR] Initial support for recursive file listing within a partition

Hello Bharath Vissapragada, Vihang Karajgaonkar, Anonymous Coward (486),

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/12991

to review the following change.


Change subject: Initial support for recursive file listing within a partition
......................................................................

Initial support for recursive file listing within a partition

This adds support to FileMetadataLoader to recursively list a directory
and create file descriptors. The changes are as follows:

* FileMetadataLoader can now take a 'recursive' argument to trigger the
  new behavior. All the non-test code paths still use non-recursive
  (i.e. this new feature isn't exposed for real tables as of yet).

* FileSystemUtil has some functionality for recursive directory listing.
  There are a few notes there around unexpected optimizations for S3 vs
  HDFS.

* Renamed the 'file_name' field to 'relative_path' for FileDescriptor
  and HDFS splits, since now the file descriptors may be more than a
  single path component.

The new functionality is just unit tested at the moment. Later, this
functionality will be used in a couple cases, including:

- ability to access "bucketed" tables written by Hive or Spark in a
  read-only manner. Today we ignore the bucketing and they end up being
  read as empty tables.

- ability to list files inside the hierarchical layout for ACID tables.

We may want to expose recursive listing support for user tables as well
(as suggested in IMPALA-4596). However, the global configuration flag
suggested in that JIRA doesn't seem so great, so I'm leaving that out
for now as well.

Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M common/fbs/CatalogObjects.fbs
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/catalog/HdfsPartitionTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java
16 files changed, 253 insertions(+), 59 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/12991/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Gerrit-Change-Number: 12991
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Anonymous Coward (486)
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>

[Impala-ASF-CR] IMPALA-8454 (part 2): Initial support for recursive file listing within a partition

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/12991 )

Change subject: IMPALA-8454 (part 2): Initial support for recursive file listing within a partition
......................................................................


Patch Set 3: Code-Review+2

(2 comments)

We should probably wait for part 1 to be merged before kicking off a GVO on this.

http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@556
PS1, Line 556:         // protocol doesn't provide any such recursive support anyway. In other words,
> I was hesitant to add a unit test which is more or less redundant with the 
Fair point.


http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
File fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java:

http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@36
PS1, Line 36: 
> Per the commit message, we don't currently actually have the recursive load
sorry, confused about the first part. I missed that recursive is false in every callsite.



-- 
To view, visit http://gerrit.cloudera.org:8080/12991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Gerrit-Change-Number: 12991
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Apr 2019 06:12:16 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8454 (part 2): Initial support for recursive file listing within a partition

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12991 )

Change subject: IMPALA-8454 (part 2): Initial support for recursive file listing within a partition
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2926/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Gerrit-Change-Number: 12991
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Apr 2019 00:44:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] Initial support for recursive file listing within a partition

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12991 )

Change subject: Initial support for recursive file listing within a partition
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2734/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Gerrit-Change-Number: 12991
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Anonymous Coward (486)
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 Apr 2019 08:04:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8454 (part 2): Initial support for recursive file listing within a partition

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12991 )

Change subject: IMPALA-8454 (part 2): Initial support for recursive file listing within a partition
......................................................................


Patch Set 3: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Gerrit-Change-Number: 12991
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Apr 2019 09:56:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] Initial support for recursive file listing within a partition

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/12991 )

Change subject: Initial support for recursive file listing within a partition
......................................................................


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/12991/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12991/1//COMMIT_MSG@7
PS1, Line 7: Initial support for recursive file listing within a partition
> nit: Create a jira, since the change is substantial?
Done


http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@154
PS1, Line 154:       FlatBufferBuilder fbb = new FlatBufferBuilder(1);
> Why remove the Precondition?
it was sort of ugly to pass the FileSystem in here, since the FileSystem was only used for the purpose of this precondition. I think it's sort of redundant anyway since it would be tough to call this method (which takes BlockLocations) from a filesystem which doesn't yield BlockLocations


http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@521
PS1, Line 521:         if (isS3AFileSystem(fs)) {
> Do you happen to know if Azure has a similar optimization builtin?
It doesn't (I checked)


http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@556
PS1, Line 556:   private static class RecursingIterator implements RemoteIterator<FileStatus> {
> Should we add some unit tests for this?
I was hesitant to add a unit test which is more or less redundant with the test which uses this by way of FileMetadataLoader. i.e this code is already well covered by another relatively-small test. I'm not sure there's a lot of value in an additional one here.


http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@560
PS1, Line 560:     private FileStatus curFile;
> nit:use _ for class members
Done


http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@564
PS1, Line 564: startPath
> Preconditions.checkNotNull()?
Done


http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@569
PS1, Line 569: curFile == nul
> mind adding docs? (what it means to be null)
Done


http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
File fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java:

http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@36
PS1, Line 36: 
> Add an e-e test to read a bucketized hive table?
Per the commit message, we don't currently actually have the recursive loading support hooked in anywhere. I didn't want to complicate this patch with the complexities of trying to handle bucketed tables during analysis, REFRESH, etc.



-- 
To view, visit http://gerrit.cloudera.org:8080/12991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Gerrit-Change-Number: 12991
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Apr 2019 07:17:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8454 (part 2): Initial support for recursive file listing within a partition

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/12991 )

Change subject: IMPALA-8454 (part 2): Initial support for recursive file listing within a partition
......................................................................


Patch Set 4: Verified+1

The failed build was due to IMPALA-8466 (known issue on trunk). Checked with Tim and he says we should be fine to override the -1.


-- 
To view, visit http://gerrit.cloudera.org:8080/12991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Gerrit-Change-Number: 12991
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Mon, 29 Apr 2019 18:15:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8454 (part 2): Initial support for recursive file listing within a partition

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Bharath Vissapragada, Vihang Karajgaonkar, Sudhanshu Arora, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12991

to look at the new patch set (#3).

Change subject: IMPALA-8454 (part 2): Initial support for recursive file listing within a partition
......................................................................

IMPALA-8454 (part 2): Initial support for recursive file listing within a partition

This adds support to FileMetadataLoader to recursively list a directory
and create file descriptors. The changes are as follows:

* FileMetadataLoader can now take a 'recursive' argument to trigger the
  new behavior. All the non-test code paths still use non-recursive
  (i.e. this new feature isn't exposed for real tables as of yet).

* FileSystemUtil has some functionality for recursive directory listing.
  There are a few notes there around unexpected optimizations for S3 vs
  HDFS.

* Renamed the 'file_name' field to 'relative_path' for FileDescriptor
  and HDFS splits, since now the file descriptors may be more than a
  single path component.

The new functionality is just unit tested at the moment. Later, this
functionality will be tied into the actual table code paths to solve
issues with Hive interop, along with end-to-end tests.

Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M common/fbs/CatalogObjects.fbs
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/catalog/HdfsPartitionTest.java
M fe/src/test/java/org/apache/impala/planner/ExplainTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java
17 files changed, 267 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/12991/3
-- 
To view, visit http://gerrit.cloudera.org:8080/12991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Gerrit-Change-Number: 12991
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>

[Impala-ASF-CR] IMPALA-8454 (part 2): Initial support for recursive file listing within a partition

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12991 )

Change subject: IMPALA-8454 (part 2): Initial support for recursive file listing within a partition
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4076/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/12991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Gerrit-Change-Number: 12991
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Apr 2019 04:30:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] Initial support for recursive file listing within a partition

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/12991 )

Change subject: Initial support for recursive file listing within a partition
......................................................................


Patch Set 1:

(7 comments)

Generally looks good to me.

http://gerrit.cloudera.org:8080/#/c/12991/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12991/1//COMMIT_MSG@7
PS1, Line 7: Initial support for recursive file listing within a partition
nit: Create a jira, since the change is substantial?


http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@521
PS1, Line 521:         if (isS3AFileSystem(fs)) {
Do you happen to know if Azure has a similar optimization builtin?


http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@556
PS1, Line 556:   private static class RecursingIterator implements RemoteIterator<FileStatus> {
Should we add some unit tests for this?


http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@560
PS1, Line 560:     private FileStatus curFile;
nit:use _ for class members


http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@564
PS1, Line 564: startPath
Preconditions.checkNotNull()?


http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@569
PS1, Line 569: curFile == nul
mind adding docs? (what it means to be null)


http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
File fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java:

http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java@36
PS1, Line 36: 
Add an e-e test to read a bucketized hive table?



-- 
To view, visit http://gerrit.cloudera.org:8080/12991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Gerrit-Change-Number: 12991
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Anonymous Coward (486)
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Sun, 14 Apr 2019 19:30:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8454 (part 2): Initial support for recursive file listing within a partition

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Bharath Vissapragada, Vihang Karajgaonkar, Sudhanshu Arora, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12991

to look at the new patch set (#2).

Change subject: IMPALA-8454 (part 2): Initial support for recursive file listing within a partition
......................................................................

IMPALA-8454 (part 2): Initial support for recursive file listing within a partition

This adds support to FileMetadataLoader to recursively list a directory
and create file descriptors. The changes are as follows:

* FileMetadataLoader can now take a 'recursive' argument to trigger the
  new behavior. All the non-test code paths still use non-recursive
  (i.e. this new feature isn't exposed for real tables as of yet).

* FileSystemUtil has some functionality for recursive directory listing.
  There are a few notes there around unexpected optimizations for S3 vs
  HDFS.

* Renamed the 'file_name' field to 'relative_path' for FileDescriptor
  and HDFS splits, since now the file descriptors may be more than a
  single path component.

The new functionality is just unit tested at the moment. Later, this
functionality will be used in a couple cases, including:

- ability to access "bucketed" tables written by Hive or Spark in a
  read-only manner. Today we ignore the bucketing and they end up being
  read as empty tables.

- ability to list files inside the hierarchical layout for ACID tables.

Fully supporting those use cases will require some other changes (eg to
the REFRESH code path which currently assumes that a top-level partition
modification timestamp is sufficient to determine if files changed).
I'll handle those separately to keep the patches small.

We may want to expose recursive listing support for user tables as well
(as suggested in IMPALA-4596). However, the global configuration flag
suggested in that JIRA doesn't seem so great, so I'm leaving that out
for now as well until we can find a more reasonable table-level way to
specify it (eg a table property)

Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M common/fbs/CatalogObjects.fbs
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/catalog/HdfsPartitionTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java
16 files changed, 266 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/12991/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Gerrit-Change-Number: 12991
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>

[Impala-ASF-CR] IMPALA-8454 (part 2): Initial support for recursive file listing within a partition

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12991 )

Change subject: IMPALA-8454 (part 2): Initial support for recursive file listing within a partition
......................................................................

IMPALA-8454 (part 2): Initial support for recursive file listing within a partition

This adds support to FileMetadataLoader to recursively list a directory
and create file descriptors. The changes are as follows:

* FileMetadataLoader can now take a 'recursive' argument to trigger the
  new behavior. All the non-test code paths still use non-recursive
  (i.e. this new feature isn't exposed for real tables as of yet).

* FileSystemUtil has some functionality for recursive directory listing.
  There are a few notes there around unexpected optimizations for S3 vs
  HDFS.

* Renamed the 'file_name' field to 'relative_path' for FileDescriptor
  and HDFS splits, since now the file descriptors may be more than a
  single path component.

The new functionality is just unit tested at the moment. Later, this
functionality will be tied into the actual table code paths to solve
issues with Hive interop, along with end-to-end tests.

Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Reviewed-on: http://gerrit.cloudera.org:8080/12991
Tested-by: Todd Lipcon <to...@apache.org>
Reviewed-by: Todd Lipcon <to...@apache.org>
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M common/fbs/CatalogObjects.fbs
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/catalog/HdfsPartitionTest.java
M fe/src/test/java/org/apache/impala/planner/ExplainTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java
17 files changed, 267 insertions(+), 59 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Gerrit-Change-Number: 12991
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>

[Impala-ASF-CR] IMPALA-8454 (part 2): Initial support for recursive file listing within a partition

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/12991 )

Change subject: IMPALA-8454 (part 2): Initial support for recursive file listing within a partition
......................................................................


Patch Set 4: Code-Review+2

Forwarding Bharath's +2


-- 
To view, visit http://gerrit.cloudera.org:8080/12991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Gerrit-Change-Number: 12991
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Mon, 29 Apr 2019 18:15:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] Initial support for recursive file listing within a partition

Posted by "Sudhanshu Arora (Code Review)" <ge...@cloudera.org>.
Sudhanshu Arora has posted comments on this change. ( http://gerrit.cloudera.org:8080/12991 )

Change subject: Initial support for recursive file listing within a partition
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/12991/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@154
PS1, Line 154:       FlatBufferBuilder fbb = new FlatBufferBuilder(1);
Why remove the Precondition?



-- 
To view, visit http://gerrit.cloudera.org:8080/12991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Gerrit-Change-Number: 12991
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 Apr 2019 22:26:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8454 (part 2): Initial support for recursive file listing within a partition

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12991 )

Change subject: IMPALA-8454 (part 2): Initial support for recursive file listing within a partition
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4086/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/12991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Gerrit-Change-Number: 12991
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Apr 2019 17:53:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8454 (part 2): Initial support for recursive file listing within a partition

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12991 )

Change subject: IMPALA-8454 (part 2): Initial support for recursive file listing within a partition
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2910/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Gerrit-Change-Number: 12991
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Apr 2019 09:44:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8454 (part 2): Initial support for recursive file listing within a partition

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12991 )

Change subject: IMPALA-8454 (part 2): Initial support for recursive file listing within a partition
......................................................................


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/4086/


-- 
To view, visit http://gerrit.cloudera.org:8080/12991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Gerrit-Change-Number: 12991
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Apr 2019 23:11:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8454 (part 2): Initial support for recursive file listing within a partition

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has removed a vote on this change.

Change subject: IMPALA-8454 (part 2): Initial support for recursive file listing within a partition
......................................................................


Removed Verified-1 by Impala Public Jenkins <im...@cloudera.com>
-- 
To view, visit http://gerrit.cloudera.org:8080/12991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Gerrit-Change-Number: 12991
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>