You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by cs...@apache.org on 2021/04/28 09:10:15 UTC

[impala] 02/02: IMPALA-10682: Add buffering to hs2-http client in impala-shell

This is an automated email from the ASF dual-hosted git repository.

csringhofer pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit f672c315bc4d08d56cc7399b86767d30c9676287
Author: Csaba Ringhofer <cs...@cloudera.com>
AuthorDate: Tue Apr 27 16:35:27 2021 +0200

    IMPALA-10682: Add buffering to hs2-http client in impala-shell
    
    This change reduces to following command from 8.5s to 1.5s on my
    machine:
    shell/impala_shell.py -B -q "select * from tpch_parquet.lineitem limit 100000;" --protocol hs2-http > /dev/null
    
    This nearly eliminates the speed difference between hs2 and hs2-http.
    
    The root cause of the original slowness is the large number of
    calls to socket.recv(). The query above used to call it 2809090 times,
    now it is only 9007.
    
    Testing:
    - ran shell tests
    
    Change-Id: If11f287be65b10bee2b0afffea118e3dc70fdbbd
    Reviewed-on: http://gerrit.cloudera.org:8080/17346
    Reviewed-by: Quanlong Huang <hu...@gmail.com>
    Tested-by: Csaba Ringhofer <cs...@cloudera.com>
---
 shell/impala_client.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/shell/impala_client.py b/shell/impala_client.py
index 65e8033..5fae0af 100755
--- a/shell/impala_client.py
+++ b/shell/impala_client.py
@@ -412,6 +412,9 @@ class ImpalaClient(object):
       auth = base64.encodestring(user_passwd.encode()).decode().strip('\n')
       transport.setCustomHeaders({"Authorization": "Basic {0}".format(auth)})
 
+    # Without buffering Thrift would call socket.recv() each time it deserializes
+    # something (e.g. a member in a struct).
+    transport = TBufferedTransport(transport)
     transport.open()
     return transport