You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Thomas Tauber-Marshall (JIRA)" <ji...@apache.org> on 2017/07/24 14:37:00 UTC

[jira] [Resolved] (IMPALA-5167) Reduce number of Kudu clients that get created

     [ https://issues.apache.org/jira/browse/IMPALA-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Tauber-Marshall resolved IMPALA-5167.
--------------------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.10.0

commit 399b184bbcf5a1fb06b5afbebf9062e69d02beed
Author: Thomas Tauber-Marshall <tm...@cloudera.com>
Date:   Tue May 16 09:37:03 2017 -0700

    IMPALA-5167: Reduce the number of Kudu clients created (FE)
    
    Creating Kudu clients is very expensive as each will fetch
    metadata from the Kudu master, so we should minimize the
    number of Kudu clients that get created.
    
    This patch stores a map from Kudu master addressed to Kudu
    clients in KuduUtil to be used across the FE and catalog.
    Another patch has already addressed the BE.
    
    Future work will consider providing a way to invalidate
    the stored Kudu clients in case something goes wrong
    (IMPALA-5685)
    
    This relies on two changes on the Kudu side: one that clears
    non-covered range entries from the client's cache on table
    open (d07ecd6ded01201c912d2e336611a6a941f48d98), and one
    that automatically refreshes auth tokens when they expire
    (603c1578c78c0377ffafdd9c427ebfd8a206bda3).
    
    This patch disables some tests that no longer work as
    they relied on Kudu metadata loading operations timing out,
    but since we're reusing clients the metadata is already
    loaded when the test is run.
    
    Testing:
    - Ran a stress test on a 10 node cluster: scan of a small
      Kudu table, 1000 concurrent queries, load on the Kudu
      master was reduced signficantly, from ~50% cpu to ~5%.
      (with the BE changes included)
    - Ran the Kudu e2e tests.
    - Manually ran a test with concurrent INSERTs and
      'ALTER TABLE ADD PARTITION' (which is affected by the
      Kudu side change mentiond above) and verified
      correctness.
    
    Change-Id: I9b0b346f37ee43f7f0eefe34a093eddbbdcf2a5e
    Reviewed-on: http://gerrit.cloudera.org:8080/6898
    Reviewed-by: Thomas Tauber-Marshall <tm...@cloudera.com>
    Tested-by: Impala Public Jenkins

> Reduce number of Kudu clients that get created
> ----------------------------------------------
>
>                 Key: IMPALA-5167
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5167
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.8.0
>            Reporter: Matthew Jacobs
>            Assignee: Thomas Tauber-Marshall
>              Labels: kudu
>             Fix For: Impala 2.10.0
>
>
> Creating Kudu clients is very expensive as each will fetch metadata from the Kudu master. We can reduce the load on the Kudu master by reusing Kudu clients when possible. To start, we can use a single client for the entire BE and another for the entire FE.
> This is dependent on a metadata invalidation improvement from Kudu (https://gerrit.cloudera.org/#/c/6719/)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)