You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2019/09/19 17:32:00 UTC

[jira] [Commented] (IMPALA-5931) Don't synthesize block metadata in the catalog for S3/ADLS

    [ https://issues.apache.org/jira/browse/IMPALA-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933615#comment-16933615 ] 

ASF subversion and git services commented on IMPALA-5931:
---------------------------------------------------------

Commit feed25084a999fe0a4e7b58b5264fce5829c43e7 in impala's branch refs/heads/master from stakiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=feed250 ]

IMPALA-8944: Update and re-enable S3PlannerTest

Addresses several test infra issues that were preventing the
S3PlannerTest from running successfully. Disables a few tests that are
no longer working, and removes some planner checks that are no longer
applicable when running on S3. Specifically, this patch removes the
checks in PlannerTestBase#checkScanRangeLocations when running against
S3, because the planner no longer generates scan ranges; generation is
deferred to the scheduler (IMPALA-5931).

Replaces the old logic of specifying S3-specific fe/ tests with a
combination of JUnit Categories and Maven Profiles. The previous method
was broken and assumed that all S3-specific fe/ tests started with S3*.
The new approach removes that restriction and only requires S3-specific
JUnit tests to be tagged with the Java annotation
'@Category(S3Tests.class)' (entire classes or individual tests can be
tagged with the annotation).

Testing:
* Ran fe/ tests with TARGET_FILESYSTEM=s3

Change-Id: I1690b6c5346376c1111fd4845c72062cc237e0f9
Reviewed-on: http://gerrit.cloudera.org:8080/14248
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Don't synthesize block metadata in the catalog for S3/ADLS
> ----------------------------------------------------------
>
>                 Key: IMPALA-5931
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5931
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>            Reporter: Dan Hecht
>            Assignee: Vuk Ercegovac
>            Priority: Major
>             Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> Today, the catalog synthesizes block metadata for S3/ADLS by just breaking up splittable files into "blocks" with the FileSystem's default block size. Rather than carrying these blocks around in the catalog and distributing them to all impalad's, we might as well generate the scan ranges on-the-fly during planning. That would save the memory and network bandwidth of blocks.
> That does mean that the planner will have to instantiate and call the filesystem to get the default block size, but for these FileSystem's, that's just a matter of reading the config.
> Perhaps the same can be done for HDFS erasure coding, though that depends on what a block location actually means in that context and whether they contain useful info.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org