You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2019/09/23 23:38:00 UTC

[jira] [Commented] (IMPALA-3453) S3 : Uneven split sizes are generated for Parquet causing execution skew

    [ https://issues.apache.org/jira/browse/IMPALA-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16936254#comment-16936254 ] 

ASF subversion and git services commented on IMPALA-3453:
---------------------------------------------------------

Commit 3984c69f03c14355731dd6a518aa64dfc8219450 in impala's branch refs/heads/master from stakiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=3984c69 ]

IMPALA-8942: Set file format specific split sizes on non-block stores

On non-block based stores (e.g. S3, ADLS, etc.), the planner creates
split sizes based on the value of FileSystem.getDefaultBlockSize(Path).
This does not work well for Parquet, because the scanners will only
process a split if the data range defined by the split overlaps with
the midpoint of the Parquet row group. This is done to ensure that
scanners treat Parquet row groups as the unit of processing. The default
block size for non-block based stores is typically much lower than the
Parquet row group size. This causes a lot of dummy Parquet splits to be
created and processed, most of which end up doing nothing. The major
issue this causes is skew, and each scanner ends up processing a skewed
amount of data (see IMPALA-3453 for details on the skew issue).

This patch adds a new query option PARQUET_OBJECT_STORE_SPLIT_SIZE
(defaults to 256 MB) that controls the size of Parquet splits on
non-block stores.

Impala docs actually recommend setting fs.s3a.block.size to 128 MB
(row group size used by Hive / Spark) or 256 MB (row group size used by
Impala). Setting the block size to the row group size results in ideal
split assignment, but experiments show that using a 256 MB block size
for 128 MB row groups is better than using a 128 MB block size for 256
MB row groups, so the default value of PARQUET_OBJECT_STORE_SPLIT_SIZE is
256 MB. Updated the docs accordingly.

Testing:
* Ran core tests
* Added tests to test_scanners.py

Change-Id: I0995b2a3b732d39d6f58e9b3bb04111ac04601e6
Reviewed-on: http://gerrit.cloudera.org:8080/14247
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> S3 : Uneven split sizes are generated for Parquet causing execution skew
> ------------------------------------------------------------------------
>
>                 Key: IMPALA-3453
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3453
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.6.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Mostafa Mokhtar
>            Priority: Critical
>              Labels: performance, s3
>             Fix For: Impala 2.6.0
>
>         Attachments: FullScanQueryParquet.txt, SplitSkewProfile.txt, TPC-H Q6 profile.txt, image.png, profile_after_IMPALA-3453.txt, profile_after_IMPALA-3453_128MB.txt, profile_after_IMPALA-3453_256MB.txt, profile_before_IMPALA-3453.txt
>
>
> With Impala on S3 unevenly sized splits are assigned to the scan nodes which introduces execution skew
> {code}
>   Averaged Fragment F00:(Total: 1m17s, non-child: 0.000ns, % non-child: 0.00%)
>       split sizes:  min: 5.01 GB, max: 11.63 GB, avg: 5.91 GB, stddev: 1.08 GB
>       completion times: min:5s442ms  max:2m17s  mean: 1m17s  stddev:48s312ms
>       execution rates: min:47.64 MB/sec  max:1.06 GB/sec  mean:324.41 MB/sec  stddev:406.41 MB/sec
>       num instances: 32
> {code}
> Running the same query against the exact HDFS layout doesn't produce skew.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org