You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2021/10/04 19:15:00 UTC

[jira] [Comment Edited] (HUDI-2005) Audit and remove references of fs.listStatus() and fs.getFileStatus() or fs.exists()

    [ https://issues.apache.org/jira/browse/HUDI-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408836#comment-17408836 ] 

sivabalan narayanan edited comment on HUDI-2005 at 10/4/21, 7:14 PM:
---------------------------------------------------------------------

{code:java}
grep -irl ".listStatus" hudi-*/* | grep -v Test | grep .java
hudi-cli/src/main/java/org/apache/hudi/cli/commands/MetadataCommand.java
hudi-cli/src/main/java/org/apache/hudi/cli/commands/HoodieLogFileCommand.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/BaseTwoToOneDowngradeHandler.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/bootstrap/BootstrapUtils.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/DirectWriteMarkers.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/heartbeat/HoodieHeartbeatClient.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/CompactionAdminClient.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
hudi-client/hudi-java-client/src/main/java/org/apache/hudi/table/action/rollback/JavaListingBasedRollbackHelper.java
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
hudi-common/src/main/java/org/apache/hudi/common/util/MarkerUtils.java
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieWrapperFileSystem.java
hudi-common/src/main/java/org/apache/hudi/common/fs/inline/InLineFileSystem.java
hudi-common/src/main/java/org/apache/hudi/common/fs/inline/InMemoryFileSystem.java
hudi-common/src/main/java/org/apache/hudi/common/fs/FailSafeConsistencyGuard.java
hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java
hudi-flink/src/main/java/org/apache/hudi/table/format/FilePathUtils.java
hudi-flink/src/main/java/org/apache/hudi/table/format/cow/CopyOnWriteInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieHFileInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieInputFormatUtils.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieHFileRealtimeInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/reader/DFSHoodieDatasetInputReader.java
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateAsyncOperations.java
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateDatasetNode.java
hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/marker/MarkerDirState.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HiveIncrPullSource.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/DatePartitionPathSelector.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/DFSPathSelector.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/checkpointing/KafkaConnectHdfsProvider.java
{code}
 

Potential places that needs to be fixed: 

1. ListingBasedRollbackHelper
{code:java}
deleteBaseAndLogFiles( ...) {
  ...
  FileStatus[] toBeDeleted = fs.listStatus(FSUtils.getPartitionPath(config.getBasePath(), partitionPath), filter);
  ...
}{code}
2. HoodieTableMetaCient does fs.listStatus only for meta path. This should be fine. 
{code:java}
public static FileStatus[] scanFiles(FileSystem fs, Path metaPath, PathFilter nameFilter) throws IOException {
  return fs.listStatus(metaPath, nameFilter);
}
{code}
 

3. AbstractTableFileSystemView. There are quite a few callers to this. Need to understand in detail. 
{code:java}
protected FileStatus[] listPartition(Path partitionPath) throws IOException {
  return metaClient.getFs().listStatus(partitionPath);
}
{code}
This is called from within 

ensurePartitionLoadedCorrectly() in the same class. This method is overriden by HoodieMetadataFileSystemView.

 

4. SparkMarkerBasedrollbackStrategy
{code:java}
protected Map<FileStatus, Long> getWrittenLogFileSizeMap(String partitionPathStr, String baseCommitTime, String fileId) throws IOException {
  // collect all log files that is supposed to be deleted with this rollback
  return FSUtils.getAllLogFiles(table.getMetaClient().getFs(),
      FSUtils.getPartitionPath(config.getBasePath(), partitionPathStr), fileId, HoodieFileFormat.HOODIE_LOG.getFileExtension(), baseCommitTime)
      .collect(Collectors.toMap(HoodieLogFile::getFileStatus, value -> value.getFileStatus().getLen()));
}
{code}


was (Author: shivnarayan):
{code:java}
grep -irl ".listStatus" hudi-*/* | grep -v Test | grep .java
hudi-cli/src/main/java/org/apache/hudi/cli/commands/MetadataCommand.java
hudi-cli/src/main/java/org/apache/hudi/cli/commands/HoodieLogFileCommand.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/BaseTwoToOneDowngradeHandler.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/bootstrap/BootstrapUtils.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/DirectWriteMarkers.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/heartbeat/HoodieHeartbeatClient.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/CompactionAdminClient.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
hudi-client/hudi-java-client/src/main/java/org/apache/hudi/table/action/rollback/JavaListingBasedRollbackHelper.java
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
hudi-common/src/main/java/org/apache/hudi/common/util/MarkerUtils.java
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieWrapperFileSystem.java
hudi-common/src/main/java/org/apache/hudi/common/fs/inline/InLineFileSystem.java
hudi-common/src/main/java/org/apache/hudi/common/fs/inline/InMemoryFileSystem.java
hudi-common/src/main/java/org/apache/hudi/common/fs/FailSafeConsistencyGuard.java
hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java
hudi-flink/src/main/java/org/apache/hudi/table/format/FilePathUtils.java
hudi-flink/src/main/java/org/apache/hudi/table/format/cow/CopyOnWriteInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieHFileInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieInputFormatUtils.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieHFileRealtimeInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/reader/DFSHoodieDatasetInputReader.java
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateAsyncOperations.java
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateDatasetNode.java
hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/marker/MarkerDirState.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HiveIncrPullSource.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/DatePartitionPathSelector.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/DFSPathSelector.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/checkpointing/KafkaConnectHdfsProvider.java
{code}
 

Potential places that needs to be fixed: 

1. ListingBasedRollbackHelper
{code:java}
deleteBaseAndLogFiles( ...) {
  ...
  FileStatus[] toBeDeleted = fs.listStatus(FSUtils.getPartitionPath(config.getBasePath(), partitionPath), filter);
  ...
}{code}
2. HoodieTableMetaCient does fs.listStatus only for meta path. This should be fine. 
{code:java}
public static FileStatus[] scanFiles(FileSystem fs, Path metaPath, PathFilter nameFilter) throws IOException {
  return fs.listStatus(metaPath, nameFilter);
}
{code}
 

3. AbstractTableFileSystemView. There are quite a few callers to this. Need to understand in detail. 
{code:java}
protected FileStatus[] listPartition(Path partitionPath) throws IOException {
  return metaClient.getFs().listStatus(partitionPath);
}
{code}
This is called from within 

ensurePartitionLoadedCorrectly() in the same class. 

 

4. SparkMarkerBasedrollbackStrategy
{code:java}
protected Map<FileStatus, Long> getWrittenLogFileSizeMap(String partitionPathStr, String baseCommitTime, String fileId) throws IOException {
  // collect all log files that is supposed to be deleted with this rollback
  return FSUtils.getAllLogFiles(table.getMetaClient().getFs(),
      FSUtils.getPartitionPath(config.getBasePath(), partitionPathStr), fileId, HoodieFileFormat.HOODIE_LOG.getFileExtension(), baseCommitTime)
      .collect(Collectors.toMap(HoodieLogFile::getFileStatus, value -> value.getFileStatus().getLen()));
}
{code}

> Audit and remove references of fs.listStatus() and fs.getFileStatus() or fs.exists()
> ------------------------------------------------------------------------------------
>
>                 Key: HUDI-2005
>                 URL: https://issues.apache.org/jira/browse/HUDI-2005
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Nishith Agarwal
>            Assignee: sivabalan narayanan
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)