You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/07/30 01:28:00 UTC

[jira] [Commented] (IMPALA-10545) Tune data_cache_write_concurrency based on the type of IO device

    [ https://issues.apache.org/jira/browse/IMPALA-10545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17573189#comment-17573189 ] 

ASF subversion and git services commented on IMPALA-10545:
----------------------------------------------------------

Commit 89c3e1f821ccd335c3c5507496bb53b80c1cc07a in impala's branch refs/heads/master from Michael Smith
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=89c3e1f82 ]

IMPALA-10545: Higher data_cache_write_concurrency for SSDs

Provide device-specific defaults for `data_cache_write_concurrency`
based on device type. Rotational disks continue to use a default of 1,
while non-rotational disks use a default of 8. Option default of 0 is
used to select this mode.

Added unit test confirming concurrency based on mocked partitions and
block device info. Replaced FRIEND_TEST macros for a test that no longer
exists.

Started cluster with
    start-impala-cluster.py --data_cache_dir=/home/michael/cache
      --data_cache_size=1G --impalad_args=--always_use_data_cache=true

and observed
> Default data_cache_write_concurrency=8 for non-rotational disk nvme0n1

Change-Id: I60761faa2710f4795f1f3eaf66da866b5553f609
Reviewed-on: http://gerrit.cloudera.org:8080/18616
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Joe McDonnell <jo...@cloudera.com>


> Tune data_cache_write_concurrency based on the type of IO device
> ----------------------------------------------------------------
>
>                 Key: IMPALA-10545
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10545
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 4.0.0
>            Reporter: Joe McDonnell
>            Assignee: Michael Smith
>            Priority: Major
>              Labels: ramp-up
>         Attachments: test.sh, test1.out, test8.out
>
>
> The data cache limits concurrency writes to the cache to avoid overwhelming the underlying IO device. This is controlled by the data_cache_write_concurrency flags and defaults to 1. For SSDs, we should be able to increase this to allow more concurrent writes to the data cache. This would allow the data cache to warm up faster and stay more up to date.
> One option is to detect the underlying IO device (similar to how we do this for other parts of Disk IO Mgr) and tune this parameter higher for SSDs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org