You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2019/04/25 08:53:14 UTC

[Impala-ASF-CR] IMPALA-8454 (part 2): Initial support for recursive file listing within a partition

Hello Bharath Vissapragada, Vihang Karajgaonkar, Sudhanshu Arora, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12991

to look at the new patch set (#2).

Change subject: IMPALA-8454 (part 2): Initial support for recursive file listing within a partition
......................................................................

IMPALA-8454 (part 2): Initial support for recursive file listing within a partition

This adds support to FileMetadataLoader to recursively list a directory
and create file descriptors. The changes are as follows:

* FileMetadataLoader can now take a 'recursive' argument to trigger the
  new behavior. All the non-test code paths still use non-recursive
  (i.e. this new feature isn't exposed for real tables as of yet).

* FileSystemUtil has some functionality for recursive directory listing.
  There are a few notes there around unexpected optimizations for S3 vs
  HDFS.

* Renamed the 'file_name' field to 'relative_path' for FileDescriptor
  and HDFS splits, since now the file descriptors may be more than a
  single path component.

The new functionality is just unit tested at the moment. Later, this
functionality will be used in a couple cases, including:

- ability to access "bucketed" tables written by Hive or Spark in a
  read-only manner. Today we ignore the bucketing and they end up being
  read as empty tables.

- ability to list files inside the hierarchical layout for ACID tables.

Fully supporting those use cases will require some other changes (eg to
the REFRESH code path which currently assumes that a top-level partition
modification timestamp is sufficient to determine if files changed).
I'll handle those separately to keep the patches small.

We may want to expose recursive listing support for user tables as well
(as suggested in IMPALA-4596). However, the global configuration flag
suggested in that JIRA doesn't seem so great, so I'm leaving that out
for now as well until we can find a more reasonable table-level way to
specify it (eg a table property)

Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler.cc
M common/fbs/CatalogObjects.fbs
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/catalog/HdfsPartitionTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/BlockIdGenerator.java
16 files changed, 266 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/12991/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12991
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9b151d7abb8443c0d9de0a0d82a9f13e07ad5109
Gerrit-Change-Number: 12991
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sudhanshu Arora <su...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>