You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/04/26 15:47:01 UTC

[jira] [Commented] (IMPALA-10658) LOAD DATA INPATH silently fails between HDFS and Azure ABFS

    [ https://issues.apache.org/jira/browse/IMPALA-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332523#comment-17332523 ] 

ASF subversion and git services commented on IMPALA-10658:
----------------------------------------------------------

Commit 8336b7b3cd8ba90d37c3f7454a9c9c4074bca1f0 in impala's branch refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8336b7b ]

IMPALA-10658: LOAD DATA INPATH silently fails between HDFS and Azure ABFS

LOAD DATA INPATH silently fails when Impala tries to move files from
HDFS to ABFS. The problem is that we use FileSystem.makeQualified(Path)
to decide if path is on a given filesystem. We expect to get an
IllegalArgumentException if path is on a different filesystem. However,
the Azure FileSystem implementation doesn't throw this exception.

Because of that Impala thinks that an 'hdfs://' path and an 'abfs://'
path is on the same filesystem, so it tries to move files with
FileSystem.rename(). In case of errors rename() might throw an
Exception, or return false. Impala doesn't check the return value,
therefore if rename() returns false then the error remains silent.

This patch fixes Impala's isPathOnFileSystem() and also adds a check
for the return value of rename().

Testing:
 * tested manually between HDFS and Azure ABFS.
 * added JUnit test to FileSystemUtilTest

Change-Id: Id807e8a200b83283a09d3a917185cabab930017d
Reviewed-on: http://gerrit.cloudera.org:8080/17316
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> LOAD DATA INPATH silently fails between HDFS and Azure ABFS
> -----------------------------------------------------------
>
>                 Key: IMPALA-10658
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10658
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>
> LOAD DATA INPATH silently fails when Impala tries to move files from HDFS to ABFS.
> The problem is that in 'relocateFile()' we try to figure out if 'sourceFile' is on the destination filesystem:
> https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L246
> We use the following code to decide this:
> https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L581-L591
> However, the Azure FileSystem implementation doesn't throw an exception in 'fs.makeQualified(path);'. I just happily returns a new Path substituting the prefix "hdfs://" to "abfs://".
> So in relocateFile() Impala thinks the 'sourceFile' and 'destFile' are on the same filesystem so it tries to invoke 'destFs.rename()':
> https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L266
> From https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_rename.28Path_src.2C_Path_d.29 : "In terms of its implementation, it is the one with the most ambiguity regarding when to return false versus raising an exception."
> Seems like the Azure FileSystem implementation doesn't throw an exception on failure, but returns false instead. Unfortunately Impala doesn't check the return value of destFs.rename() (see above), so the error remains silent.
> To fix this issue we need to do two things:
> * fix FileSystemUtil.isPathOnFileSystem()
> * check the return value of destFs.rename() and throw an exception when it's false



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org