You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Sahil Takiar (Jira)" <ji...@apache.org> on 2019/08/29 20:20:00 UTC
[jira] [Resolved] (IMPALA-1618) Impala server should always try to fulfill requested fetch size

     [ https://issues.apache.org/jira/browse/IMPALA-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sahil Takiar resolved IMPALA-1618.
----------------------------------
    Fix Version/s: Impala 3.4.0
       Resolution: Fixed

> Impala server should always try to fulfill requested fetch size
> ---------------------------------------------------------------
>
>                 Key: IMPALA-1618
>                 URL: https://issues.apache.org/jira/browse/IMPALA-1618
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>    Affects Versions: Impala 2.0.1
>            Reporter: casey
>            Priority: Minor
>              Labels: usability
>             Fix For: Impala 3.4.0
>
>
> The thrift fetch request specifies the number of rows that it would like but the Impala server may return fewer even though more results are available.
> For example, using the default row_batch size of 1024, if the client requests 1023 rows, the first response contains 1023 rows but the second response contains only 1 row. This is because the server internally uses row_batch (1024), returns the requested count (1023) and caches the remaining row, then the next time around only uses the cache.
> In general the end user should set both the row batch size and the thrift request size. In practice the query writer setting row_batch and the driver/programmer setting fetch size may often be different people.
> There is one case that works fine now though - setting the batch size to less than the thrift req size. In this case the thrift response is always the same as batch size.
> Code example:
> {noformat}
> dev@localhost:~/impyla$ git diff
> diff --git a/impala/_rpc/hiveserver2.py b/impala/_rpc/hiveserver2.py
> index 6139002..31fdab7 100644
> --- a/impala/_rpc/hiveserver2.py
> +++ b/impala/_rpc/hiveserver2.py
> @@ -265,6 +265,7 @@ def fetch_results(service, operation_handle, hs2_protocol_version, schema=None,
>      req = TFetchResultsReq(operationHandle=operation_handle,
>                             orientation=orientation,
>                             maxRows=max_rows)
> +    print("req: " + str(max_rows))
>      resp = service.FetchResults(req)
>      err_if_rpc_not_ok(resp)
>  
> @@ -273,6 +274,7 @@ def fetch_results(service, operation_handle, hs2_protocol_version, schema=None,
>                   for (i, col) in enumerate(resp.results.columns)]
>          num_cols = len(tcols)
>          num_rows = len(tcols[0].values)
> +        print("rec: " + str(num_rows))
>          rows = []
>          for i in xrange(num_rows):
>              row = []
> dev@localhost:~/impyla$ cat test.py 
> from impala.dbapi import connect
> conn = connect()
> cur = conn.cursor()
> cur.set_arraysize(1024)
> cur.execute("set batch_size=1025")
> cur.execute("select * from tpch.lineitem")
> while True:
>     rows = cur.fetchmany()
>     if not rows:
>         break
> cur.close()
> conn.close()
> dev@localhost:~/impyla$ python test.py | head
> Failed to import pandas
> req: 1024
> rec: 1024
> req: 1024
> rec: 1
> req: 1024
> rec: 1024
> req: 1024
> rec: 1
> req: 1024
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org