You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/10/24 10:28:00 UTC

[jira] [Commented] (IMPALA-10266) Replace instanceof *FileSystem with FS scheme checks

    [ https://issues.apache.org/jira/browse/IMPALA-10266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220043#comment-17220043 ] 

ASF subversion and git services commented on IMPALA-10266:
----------------------------------------------------------

Commit 4b5c66f329cdd818dd11cd1a9c68b58c84bcf45c in impala's branch refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=4b5c66f ]

IMPALA-10266: Identify FileSystem type based on the protocol scheme.

Frontend identifies the type of FileSystem in two ways. The first is
done using the instanceof operator with subclasses of
org.apache.hadoop.fs.FileSystem. The second is by checking the
FileSystem protocol scheme. This patch standardizes the FileSystem
identification based on the scheme only.

Testing:
- Add several tests in FileSystemUtilTest to check validity of some
  FileSystemUtil functions.
- Run and pass core tests.

Change-Id: I04492326a6e84895eef369fc11a3ec11f1536b6b
Reviewed-on: http://gerrit.cloudera.org:8080/16628
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Replace instanceof *FileSystem with FS scheme checks
> ----------------------------------------------------
>
>                 Key: IMPALA-10266
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10266
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Tim Armstrong
>            Assignee: Riza Suminto
>            Priority: Major
>              Labels: newbie, ramp-up
>
> In the Impala code we have checks in various places about which filesystem implementation we are using. E.g in the frontend, many of these checks are here - https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java.
> In the frontend, some of these checks are done using the instanceof operator with subclasses of  org.apache.hadoop.fs.FileSystem. E.g.
> {code}
>   public static boolean supportsStorageIds(FileSystem fs) {
>     // Common case.
>     if (isDistributedFileSystem(fs)) return true;
>     // Blacklist FileSystems that are known to not to include storage UUIDs.
>     return !(fs instanceof S3AFileSystem || fs instanceof LocalFileSystem ||
>         fs instanceof AzureBlobFileSystem || fs instanceof SecureAzureBlobFileSystem ||
>         fs instanceof AdlFileSystem);
>   }
> {code}
> We also identify filesystem based on the scheme, e.g. s3a in a URL like s3a://path/
> {code}
>     private static final Map<String, FsType> SCHEME_TO_FS_MAPPING =
>         ImmutableMap.<String, FsType>builder()
>             .put("abfs", ADLS)
>             .put("abfss", ADLS)
>             .put("adl", ADLS)
>             .put("file", LOCAL)
>             .put("hdfs", HDFS)
>             .put("s3a", S3)
>             .put("o3fs", OZONE)
>             .put("alluxio", ALLUXIO)
>             .build();
> {code}
> The proposal is to replace all instanceof use with checks based on the scheme, which we can get from the FileSystem - https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html#getScheme--
> Checking the java class and the scheme are not exactly equivalent because there are some cases where a new scheme is handled by a known class (or subclass of that class) - that's what happened with Alluxio with IMPALA-10087 where we accidentally supported it for a bit until we broke it. But since IMPALA-6050 we need to check both the scheme and the class, so it would be better at this point to just standardise on the scheme AFAICT.
> In future we could conceivably then remove some of this hardcoded logic and consolidate the information about filesystem capabilities into one place.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org