You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@carbondata.apache.org by ku...@apache.org on 2020/12/03 14:26:32 UTC
[carbondata] branch master updated: [CARBONDATA-4050]Avoid
redundant RPC calls to get file status when CarbonFile is instantiated with
fileStatus constructor
This is an automated email from the ASF dual-hosted git repository.
kunalkapoor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git
The following commit(s) were added to refs/heads/master by this push:
new c5f464f [CARBONDATA-4050]Avoid redundant RPC calls to get file status when CarbonFile is instantiated with fileStatus constructor
c5f464f is described below
commit c5f464fab7117000aa76150c30a616ca02b5d443
Author: Venu Reddy <k....@gmail.com>
AuthorDate: Thu Nov 19 00:33:20 2020 +0530
[CARBONDATA-4050]Avoid redundant RPC calls to get file status when CarbonFile is
instantiated with fileStatus constructor
Why is this PR needed?
In createCarbonDataFileBlockMetaInfoMapping method, we get list of carbondata
files in the segment, loop through all the carbon files and make a map of
fileNameToMetaInfoMapping<path-string, BlockMetaInfo>
In that carbon files loop, if the file is of AbstractDFSCarbonFile type, we get
the org.apache.hadoop.fs.FileStatus thrice for each file. And the method to get
file status is an RPC call(fileSystem.getFileStatus(path)). It takes ~2ms in the
cluster for each call. Thus, incur an overhead of ~6ms per file. So overall driver
side query processing time has increased significantly when there are more carbon
files. Hence caused TPC-DS queries performance degradation.
What changes were proposed in this PR?
Avoided redundant RPC calls to get file status in getAbsolutePath(), getSize() and
getLocations() methods when CarbonFile is instantiated with FileStatus constructor
This closes #4010
---
.../core/datastore/filesystem/AbstractDFSCarbonFile.java | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java b/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java
index 338076e..25337c7 100644
--- a/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java
+++ b/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java
@@ -107,6 +107,9 @@ public abstract class AbstractDFSCarbonFile implements CarbonFile {
@Override
public String getAbsolutePath() {
try {
+ if (fileStatus != null) {
+ return fileStatus.getPath().toString();
+ }
return fileSystem.getFileStatus(path).getPath().toString();
} catch (IOException e) {
throw new CarbonFileException("Unable to get file status: ", e);
@@ -155,6 +158,9 @@ public abstract class AbstractDFSCarbonFile implements CarbonFile {
@Override
public long getSize() {
try {
+ if (fileStatus != null) {
+ return fileStatus.getLen();
+ }
return fileSystem.getFileStatus(path).getLen();
} catch (IOException e) {
throw new CarbonFileException("Unable to get file status for " + path.toString(), e);
@@ -541,7 +547,10 @@ public abstract class AbstractDFSCarbonFile implements CarbonFile {
@Override
public String[] getLocations() throws IOException {
BlockLocation[] blkLocations;
- FileStatus fileStatus = fileSystem.getFileStatus(path);
+ FileStatus fileStatus = this.fileStatus;
+ if (fileStatus == null) {
+ fileStatus = fileSystem.getFileStatus(path);
+ }
if (fileStatus instanceof LocatedFileStatus) {
blkLocations = ((LocatedFileStatus) fileStatus).getBlockLocations();
} else {