You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@impala.apache.org by "Tianyi Wang (JIRA)" <ji...@apache.org> on 2017/10/28 00:09:00 UTC

[jira] [Resolved] (IMPALA-2758) Remove BufferedTupleStream::GetRows()

     [ https://issues.apache.org/jira/browse/IMPALA-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tianyi Wang resolved IMPALA-2758.
---------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.11.0

IMPALA-2758: Remove BufferedTupleStream::GetRows

This patch removes BufferedTupleStream::GetRows. This function pins a
stream and reads all the rows into a single batch. It is not a good API
since it creates an arbitrarily large row batch. In this patch the call
sites pin the stream and then directly use GetNext to retrieve a single
batch at a time.

Testing: It passes existing tests. A test case for GetRows is removed.

Change-Id: I3831c38994da2b69775a9809ff01de5d23584414
Reviewed-on: http://gerrit.cloudera.org:8080/8226
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins

> Remove BufferedTupleStream::GetRows()
> -------------------------------------
>
>                 Key: IMPALA-2758
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2758
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.3.0
>            Reporter: Tim Armstrong
>            Assignee: Tianyi Wang
>            Priority: Minor
>              Labels: ramp-up, resource-management
>             Fix For: Impala 2.11.0
>
>
> The BufferedTupleStream::GetRows() API returns all of the rows from a pinned stream in a single row batch. This is not a very good API, particularly if we wish to simplify memory management because it can result in creating an arbitrarily large RowBatch. This comes with unbounded memory overhead for tuple pointers, and has potential to cause problems in code that does not anticipate very large batches.
> Instead we should convert callsites to call BufferedTupleStream::GetNext() and process a batch at a time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)