You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Dan Hecht (JIRA)" <ji...@apache.org> on 2017/09/13 20:49:00 UTC

[jira] [Created] (IMPALA-5931) Don't synthesize block metadata in the catalog for S3/ADLS

Dan Hecht created IMPALA-5931:
---------------------------------

             Summary: Don't synthesize block metadata in the catalog for S3/ADLS
                 Key: IMPALA-5931
                 URL: https://issues.apache.org/jira/browse/IMPALA-5931
             Project: IMPALA
          Issue Type: Improvement
          Components: Catalog
            Reporter: Dan Hecht


Today, the catalog synthesizes block metadata for S3/ADLS by just breaking up splittable files into "blocks" with the FileSystem's default block size. Rather than carrying these blocks around in the catalog and distributing them to all impalad's, we might as well generate the scan ranges on-the-fly during planning. That would save the memory and network bandwidth of blocks.

That does mean that the planner will have to instantiate and call the filesystem to get the default block size, but for these FileSystem's, that's just a matter of reading the config.

Perhaps the same can be done for erasure coding, though that depends on what a block location actually means in that context and whether they contain useful info.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)