You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Sahil Takiar (Jira)" <ji...@apache.org> on 2019/08/29 20:20:00 UTC
[jira] [Resolved] (IMPALA-1618) Impala server should always try to
fulfill requested fetch size
[ https://issues.apache.org/jira/browse/IMPALA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sahil Takiar resolved IMPALA-1618.
----------------------------------
Fix Version/s: Impala 3.4.0
Resolution: Fixed
> Impala server should always try to fulfill requested fetch size
> ---------------------------------------------------------------
>
> Key: IMPALA-1618
> URL: https://issues.apache.org/jira/browse/IMPALA-1618
> Project: IMPALA
> Issue Type: Sub-task
> Components: Backend
> Affects Versions: Impala 2.0.1
> Reporter: casey
> Priority: Minor
> Labels: usability
> Fix For: Impala 3.4.0
>
>
> The thrift fetch request specifies the number of rows that it would like but the Impala server may return fewer even though more results are available.
> For example, using the default row_batch size of 1024, if the client requests 1023 rows, the first response contains 1023 rows but the second response contains only 1 row. This is because the server internally uses row_batch (1024), returns the requested count (1023) and caches the remaining row, then the next time around only uses the cache.
> In general the end user should set both the row batch size and the thrift request size. In practice the query writer setting row_batch and the driver/programmer setting fetch size may often be different people.
> There is one case that works fine now though - setting the batch size to less than the thrift req size. In this case the thrift response is always the same as batch size.
> Code example:
> {noformat}
> dev@localhost:~/impyla$ git diff
> diff --git a/impala/_rpc/hiveserver2.py b/impala/_rpc/hiveserver2.py
> index 6139002..31fdab7 100644
> --- a/impala/_rpc/hiveserver2.py
> +++ b/impala/_rpc/hiveserver2.py
> @@ -265,6 +265,7 @@ def fetch_results(service, operation_handle, hs2_protocol_version, schema=None,
> req = TFetchResultsReq(operationHandle=operation_handle,
> orientation=orientation,
> maxRows=max_rows)
> + print("req: " + str(max_rows))
> resp = service.FetchResults(req)
> err_if_rpc_not_ok(resp)
>
> @@ -273,6 +274,7 @@ def fetch_results(service, operation_handle, hs2_protocol_version, schema=None,
> for (i, col) in enumerate(resp.results.columns)]
> num_cols = len(tcols)
> num_rows = len(tcols[0].values)
> + print("rec: " + str(num_rows))
> rows = []
> for i in xrange(num_rows):
> row = []
> dev@localhost:~/impyla$ cat test.py
> from impala.dbapi import connect
> conn = connect()
> cur = conn.cursor()
> cur.set_arraysize(1024)
> cur.execute("set batch_size=1025")
> cur.execute("select * from tpch.lineitem")
> while True:
> rows = cur.fetchmany()
> if not rows:
> break
> cur.close()
> conn.close()
> dev@localhost:~/impyla$ python test.py | head
> Failed to import pandas
> req: 1024
> rec: 1024
> req: 1024
> rec: 1
> req: 1024
> rec: 1024
> req: 1024
> rec: 1
> req: 1024
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org