You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by cs...@apache.org on 2021/04/28 09:10:15 UTC
[impala] 02/02: IMPALA-10682: Add buffering to hs2-http client in
impala-shell
This is an automated email from the ASF dual-hosted git repository.
csringhofer pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git
commit f672c315bc4d08d56cc7399b86767d30c9676287
Author: Csaba Ringhofer <cs...@cloudera.com>
AuthorDate: Tue Apr 27 16:35:27 2021 +0200
IMPALA-10682: Add buffering to hs2-http client in impala-shell
This change reduces to following command from 8.5s to 1.5s on my
machine:
shell/impala_shell.py -B -q "select * from tpch_parquet.lineitem limit 100000;" --protocol hs2-http > /dev/null
This nearly eliminates the speed difference between hs2 and hs2-http.
The root cause of the original slowness is the large number of
calls to socket.recv(). The query above used to call it 2809090 times,
now it is only 9007.
Testing:
- ran shell tests
Change-Id: If11f287be65b10bee2b0afffea118e3dc70fdbbd
Reviewed-on: http://gerrit.cloudera.org:8080/17346
Reviewed-by: Quanlong Huang <hu...@gmail.com>
Tested-by: Csaba Ringhofer <cs...@cloudera.com>
---
shell/impala_client.py | 3 +++
1 file changed, 3 insertions(+)
diff --git a/shell/impala_client.py b/shell/impala_client.py
index 65e8033..5fae0af 100755
--- a/shell/impala_client.py
+++ b/shell/impala_client.py
@@ -412,6 +412,9 @@ class ImpalaClient(object):
auth = base64.encodestring(user_passwd.encode()).decode().strip('\n')
transport.setCustomHeaders({"Authorization": "Basic {0}".format(auth)})
+ # Without buffering Thrift would call socket.recv() each time it deserializes
+ # something (e.g. a member in a struct).
+ transport = TBufferedTransport(transport)
transport.open()
return transport