You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "David Rorke (Jira)" <ji...@apache.org> on 2021/10/20 18:25:00 UTC

[jira] [Commented] (IMPALA-10481) Lack of TServer affinity in remote Kudu scans results in bad OS buffer cache behavior on tablet servers

    [ https://issues.apache.org/jira/browse/IMPALA-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17431415#comment-17431415 ] 

David Rorke commented on IMPALA-10481:
--------------------------------------

KUDU-3248 changed the behavior of the default CLOSEST_REPLICA option to implement deterministic selection from a given client process.  This improves affinity/locality and resolves the buffer cache warmup problem without requiring any changes to Impala.  Updated results for the TPC-DS query 33 warmup test below showing the impact of the Kudu fix:
||Config||Iteration 1||Iter 2||Iter 3||Iter 4||Iter 5||Iter 6||Iter 7||Iter 8||Iter 9||
|Local|111.4|14.6| | | | | | | |
|Remote (default config without KUDU-3248)|110.8|56.9|49.9|43.3|37.3|44.0|20.0|28.9|14.9|
|Remote (LEADER_ONLY)|120.1|16.2|15.7|14.2| | | | | |
|Remote with KUDU-3248 fix|144.0|23.9|18.8| | | | | | |

Also overall benchmark results with a concurrent TPC-DS workload show a major improvement in remote reads from the KUDU-3248 fix, with remote throughput now exceeding local (note the remote configuration used separate 10 node clusters for Kudu and Impala vs a single 10 node cluster in the local case):
|| ||Local||Remote without KUDU-3248||Remote with KUDU-3248||
|Queries per Hour|378|237|409|

> Lack of TServer affinity in remote Kudu scans results in bad OS buffer cache behavior on tablet servers
> -------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-10481
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10481
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.0.0
>            Reporter: David Rorke
>            Priority: Major
>              Labels: kudu, performance
>
> Remote Kudu scans can take many iterations against the same scan range before achieving good performance if the OS buffer cache is initially cold on the tablet servers.  The slow warmup of the buffer cache is exacerbated by the fact that remote scans in the default Impala config choose a tablet server at random from the replica candidates.  The Kudu client supports a LEADER_ONLY option that provides hard affinity to the leader replica, and Impala allows this to be configured using the --pick_only_leaders_for_tests option, but this is currently considered a testing only option and by default Impala will connect to a random replica.
> The following is a series of iterations of TPC-DS query 33 (times in seconds), against a freshly started Kudu cluster, in 3 configurations (1) local reads, with Impala running on Kudu cluster, (2) remote reads from separate Impala cluster with default config, (3) remote reads with pick_only_leaders_for_tests=true (LEADER_ONLY affinity)
>  
> ||Config||Iteration 1||Iter 2||Iter 3||Iter 4||Iter 5||Iter 6||Iter 7||Iter 8||Iter 9||
> |Local|111.4|14.6| | | | | | | |
> |Remote (default config)|110.8|56.9|49.9|43.3|37.3|44.0|20.0|28.9|14.9|
> |Remote (LEADER_ONLY)|120.1|16.2|15.7|14.2| | | | | |
> With pick_only_leaders_for_tests, the remote performance improves quickly, approaching local performance on the second iteration and warming up fully by iteration 4.   In the default config it takes 9 iterations of the query before we see the same performance.
> Running similar experiments after explicitly dropping the buffer cache on the tablet servers confirmed that this slow warmup is caused by poor buffer cache hit rates until the cache is fully warm.
> I suspect that slow warmup isn't the only consequence of this.  Caching a given tablet in the buffer cache on multiple tablet servers increases the overall buffer cache footprint and will increase tserver memory pressure under load.
> We should consider setting the LEADER_ONLY option by default for remote Kudu reads.  The only concern would be that this might result in worse load balancing and hotspots, in which case Kudu might need to implement some additional connection option that provides a better mix of affinity and load balancing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org