You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/10/20 16:59:00 UTC

[jira] [Created] (IMPALA-10266) Replace instanceof *FileSystem with FS scheme checks

Tim Armstrong created IMPALA-10266:
--------------------------------------

             Summary: Replace instanceof *FileSystem with FS scheme checks
                 Key: IMPALA-10266
                 URL: https://issues.apache.org/jira/browse/IMPALA-10266
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
            Reporter: Tim Armstrong


In the Impala code we have checks in various places about which filesystem implementation we are using. E.g in the frontend, many of these checks are here - https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java.

In the frontend, some of these checks are done using the instanceof operator with subclasses of  org.apache.hadoop.fs.FileSystem. E.g.
{code}
  public static boolean supportsStorageIds(FileSystem fs) {
    // Common case.
    if (isDistributedFileSystem(fs)) return true;
    // Blacklist FileSystems that are known to not to include storage UUIDs.
    return !(fs instanceof S3AFileSystem || fs instanceof LocalFileSystem ||
        fs instanceof AzureBlobFileSystem || fs instanceof SecureAzureBlobFileSystem ||
        fs instanceof AdlFileSystem);
  }
{code}

We also identify filesystem based on the scheme, e.g. s3a in a URL like s3a://path/
{code}
    private static final Map<String, FsType> SCHEME_TO_FS_MAPPING =
        ImmutableMap.<String, FsType>builder()
            .put("abfs", ADLS)
            .put("abfss", ADLS)
            .put("adl", ADLS)
            .put("file", LOCAL)
            .put("hdfs", HDFS)
            .put("s3a", S3)
            .put("o3fs", OZONE)
            .put("alluxio", ALLUXIO)
            .build();
{code}

The proposal is to replace all instanceof use with checks based on the scheme, which we can get from the FileSystem - https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html#getScheme--

Checking the java class and the scheme are not exactly equivalent because there are some cases where a new scheme is handled by a known class (or subclass of that class) - that's what happened with Alluxio with IMPALA-10087 where we accidentally supported it for a bit until we broke it. But since IMPALA-6050 we need to check both the scheme and the class, so it would be better at this point to just standardise on the scheme AFAICT.

In future we could conceivably then remove some of this hardcoded logic and consolidate the information about filesystem capabilities into one place.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)