You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Miklos Szurap (JIRA)" <ji...@apache.org> on 2018/05/25 12:09:00 UTC

[jira] [Created] (HDFS-13623) getContentSummary to return ContentSummary without hidden files

Miklos Szurap created HDFS-13623:
------------------------------------

             Summary: getContentSummary to return ContentSummary without hidden files
                 Key: HDFS-13623
                 URL: https://issues.apache.org/jira/browse/HDFS-13623
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: hdfs, namenode
    Affects Versions: 3.1.0
            Reporter: Miklos Szurap


Improve the [FileSystem.getContentSummary()|http://hadoop.apache.org/docs/r3.1.0/api/org/apache/hadoop/fs/FileSystem.html#getContentSummary-org.apache.hadoop.fs.Path-] method to return ContentSummary object with "getFileCountWithoutHiddenFiles()" and "getLengthWithoutHiddenFiles()".

That two new counter should not include hidden files and hidden directories (and it's sub-contents).
{code:java}
public static final PathFilter HIDDEN_FILES_PATH_FILTER = new PathFilter() {
  public boolean accept(Path p) {
   String name = p.getName();
   return !name.startsWith("_") && !name.startsWith(".");
  }
};{code}
This would be especially useful for Hive: to compute table statistics with a single {{contentSummary}} call instead of {{globStatus}} (multiple {{listStatus}} calls) and iterating over multiple thousand of objects on client side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org