You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2021/10/04 19:15:00 UTC
[jira] [Comment Edited] (HUDI-2005) Audit and remove references of
fs.listStatus() and fs.getFileStatus() or fs.exists()
[ https://issues.apache.org/jira/browse/HUDI-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17408836#comment-17408836 ]
sivabalan narayanan edited comment on HUDI-2005 at 10/4/21, 7:14 PM:
---------------------------------------------------------------------
{code:java}
grep -irl ".listStatus" hudi-*/* | grep -v Test | grep .java
hudi-cli/src/main/java/org/apache/hudi/cli/commands/MetadataCommand.java
hudi-cli/src/main/java/org/apache/hudi/cli/commands/HoodieLogFileCommand.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/BaseTwoToOneDowngradeHandler.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/bootstrap/BootstrapUtils.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/DirectWriteMarkers.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/heartbeat/HoodieHeartbeatClient.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/CompactionAdminClient.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
hudi-client/hudi-java-client/src/main/java/org/apache/hudi/table/action/rollback/JavaListingBasedRollbackHelper.java
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
hudi-common/src/main/java/org/apache/hudi/common/util/MarkerUtils.java
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieWrapperFileSystem.java
hudi-common/src/main/java/org/apache/hudi/common/fs/inline/InLineFileSystem.java
hudi-common/src/main/java/org/apache/hudi/common/fs/inline/InMemoryFileSystem.java
hudi-common/src/main/java/org/apache/hudi/common/fs/FailSafeConsistencyGuard.java
hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java
hudi-flink/src/main/java/org/apache/hudi/table/format/FilePathUtils.java
hudi-flink/src/main/java/org/apache/hudi/table/format/cow/CopyOnWriteInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieHFileInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieInputFormatUtils.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieHFileRealtimeInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/reader/DFSHoodieDatasetInputReader.java
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateAsyncOperations.java
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateDatasetNode.java
hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/marker/MarkerDirState.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HiveIncrPullSource.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/DatePartitionPathSelector.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/DFSPathSelector.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/checkpointing/KafkaConnectHdfsProvider.java
{code}
Potential places that needs to be fixed:
1. ListingBasedRollbackHelper
{code:java}
deleteBaseAndLogFiles( ...) {
...
FileStatus[] toBeDeleted = fs.listStatus(FSUtils.getPartitionPath(config.getBasePath(), partitionPath), filter);
...
}{code}
2. HoodieTableMetaCient does fs.listStatus only for meta path. This should be fine.
{code:java}
public static FileStatus[] scanFiles(FileSystem fs, Path metaPath, PathFilter nameFilter) throws IOException {
return fs.listStatus(metaPath, nameFilter);
}
{code}
3. AbstractTableFileSystemView. There are quite a few callers to this. Need to understand in detail.
{code:java}
protected FileStatus[] listPartition(Path partitionPath) throws IOException {
return metaClient.getFs().listStatus(partitionPath);
}
{code}
This is called from within
ensurePartitionLoadedCorrectly() in the same class. This method is overriden by HoodieMetadataFileSystemView.
4. SparkMarkerBasedrollbackStrategy
{code:java}
protected Map<FileStatus, Long> getWrittenLogFileSizeMap(String partitionPathStr, String baseCommitTime, String fileId) throws IOException {
// collect all log files that is supposed to be deleted with this rollback
return FSUtils.getAllLogFiles(table.getMetaClient().getFs(),
FSUtils.getPartitionPath(config.getBasePath(), partitionPathStr), fileId, HoodieFileFormat.HOODIE_LOG.getFileExtension(), baseCommitTime)
.collect(Collectors.toMap(HoodieLogFile::getFileStatus, value -> value.getFileStatus().getLen()));
}
{code}
was (Author: shivnarayan):
{code:java}
grep -irl ".listStatus" hudi-*/* | grep -v Test | grep .java
hudi-cli/src/main/java/org/apache/hudi/cli/commands/MetadataCommand.java
hudi-cli/src/main/java/org/apache/hudi/cli/commands/HoodieLogFileCommand.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/upgrade/BaseTwoToOneDowngradeHandler.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/bootstrap/BootstrapUtils.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/marker/DirectWriteMarkers.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/heartbeat/HoodieHeartbeatClient.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/CompactionAdminClient.java
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
hudi-client/hudi-java-client/src/main/java/org/apache/hudi/table/action/rollback/JavaListingBasedRollbackHelper.java
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/rollback/ListingBasedRollbackHelper.java
hudi-common/src/main/java/org/apache/hudi/common/util/MarkerUtils.java
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieWrapperFileSystem.java
hudi-common/src/main/java/org/apache/hudi/common/fs/inline/InLineFileSystem.java
hudi-common/src/main/java/org/apache/hudi/common/fs/inline/InMemoryFileSystem.java
hudi-common/src/main/java/org/apache/hudi/common/fs/FailSafeConsistencyGuard.java
hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java
hudi-common/src/main/java/org/apache/hudi/metadata/FileSystemBackedTableMetadata.java
hudi-flink/src/main/java/org/apache/hudi/table/format/FilePathUtils.java
hudi-flink/src/main/java/org/apache/hudi/table/format/cow/CopyOnWriteInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieHFileInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieParquetInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieInputFormatUtils.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieHFileRealtimeInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/hive/HoodieCombineHiveInputFormat.java
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/reader/DFSHoodieDatasetInputReader.java
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateAsyncOperations.java
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateDatasetNode.java
hudi-timeline-service/src/main/java/org/apache/hudi/timeline/service/handlers/marker/MarkerDirState.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotExporter.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieSnapshotCopier.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/HiveIncrPullSource.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/DatePartitionPathSelector.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/helpers/DFSPathSelector.java
hudi-utilities/src/main/java/org/apache/hudi/utilities/checkpointing/KafkaConnectHdfsProvider.java
{code}
Potential places that needs to be fixed:
1. ListingBasedRollbackHelper
{code:java}
deleteBaseAndLogFiles( ...) {
...
FileStatus[] toBeDeleted = fs.listStatus(FSUtils.getPartitionPath(config.getBasePath(), partitionPath), filter);
...
}{code}
2. HoodieTableMetaCient does fs.listStatus only for meta path. This should be fine.
{code:java}
public static FileStatus[] scanFiles(FileSystem fs, Path metaPath, PathFilter nameFilter) throws IOException {
return fs.listStatus(metaPath, nameFilter);
}
{code}
3. AbstractTableFileSystemView. There are quite a few callers to this. Need to understand in detail.
{code:java}
protected FileStatus[] listPartition(Path partitionPath) throws IOException {
return metaClient.getFs().listStatus(partitionPath);
}
{code}
This is called from within
ensurePartitionLoadedCorrectly() in the same class.
4. SparkMarkerBasedrollbackStrategy
{code:java}
protected Map<FileStatus, Long> getWrittenLogFileSizeMap(String partitionPathStr, String baseCommitTime, String fileId) throws IOException {
// collect all log files that is supposed to be deleted with this rollback
return FSUtils.getAllLogFiles(table.getMetaClient().getFs(),
FSUtils.getPartitionPath(config.getBasePath(), partitionPathStr), fileId, HoodieFileFormat.HOODIE_LOG.getFileExtension(), baseCommitTime)
.collect(Collectors.toMap(HoodieLogFile::getFileStatus, value -> value.getFileStatus().getLen()));
}
{code}
> Audit and remove references of fs.listStatus() and fs.getFileStatus() or fs.exists()
> ------------------------------------------------------------------------------------
>
> Key: HUDI-2005
> URL: https://issues.apache.org/jira/browse/HUDI-2005
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Nishith Agarwal
> Assignee: sivabalan narayanan
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)