You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Zoltán Borók-Nagy (Jira)" <ji...@apache.org> on 2021/04/26 15:51:00 UTC

[jira] [Resolved] (IMPALA-10658) LOAD DATA INPATH silently fails between HDFS and Azure ABFS

     [ https://issues.apache.org/jira/browse/IMPALA-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zoltán Borók-Nagy resolved IMPALA-10658.
----------------------------------------
    Fix Version/s: Impala 4.0
       Resolution: Fixed

> LOAD DATA INPATH silently fails between HDFS and Azure ABFS
> -----------------------------------------------------------
>
>                 Key: IMPALA-10658
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10658
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>             Fix For: Impala 4.0
>
>
> LOAD DATA INPATH silently fails when Impala tries to move files from HDFS to ABFS.
> The problem is that in 'relocateFile()' we try to figure out if 'sourceFile' is on the destination filesystem:
> https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L246
> We use the following code to decide this:
> https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L581-L591
> However, the Azure FileSystem implementation doesn't throw an exception in 'fs.makeQualified(path);'. I just happily returns a new Path substituting the prefix "hdfs://" to "abfs://".
> So in relocateFile() Impala thinks the 'sourceFile' and 'destFile' are on the same filesystem so it tries to invoke 'destFs.rename()':
> https://github.com/apache/impala/blob/6b16df9e9a4696b46b6f9c7fe2fc0aaded285623/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java#L266
> From https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_rename.28Path_src.2C_Path_d.29 : "In terms of its implementation, it is the one with the most ambiguity regarding when to return false versus raising an exception."
> Seems like the Azure FileSystem implementation doesn't throw an exception on failure, but returns false instead. Unfortunately Impala doesn't check the return value of destFs.rename() (see above), so the error remains silent.
> To fix this issue we need to do two things:
> * fix FileSystemUtil.isPathOnFileSystem()
> * check the return value of destFs.rename() and throw an exception when it's false



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org