You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/09/05 20:13:00 UTC

[jira] [Commented] (IMPALA-7477) Improve QueryResultSet interface to allow appending a batch of rows at a time

    [ https://issues.apache.org/jira/browse/IMPALA-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604883#comment-16604883 ] 

ASF subversion and git services commented on IMPALA-7477:
---------------------------------------------------------

Commit b288a6af2eda9631b2bad91896ae4bfd2a3fdf30 in impala's branch refs/heads/master from [~tarmstrong@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=b288a6a ]

IMPALA-7477: Batch-oriented query set construction

Rework the row-by-row construction of query result sets in PlanRootSink
so that it materialises an output column at a time. Make some minor
optimisations like preallocating output vectors and initialising
strings more efficiently.

My intent is both to make this faster and to make the QueryResultSet
interface better before IMPALA-4268 does a bunch of surgery on this
part of the code.

Testing:
Ran core tests.

Perf:
Downloaded tpch_parquet.orders via JDBC driver.
Before: 3.01s, After: 2.57s.

Downloaded l_orderkey from tpch_parquet.lineitem.
Before: 1.21s, After: 1.08s.

Change-Id: Ibc87a84c34935d0d5841c7f5528eb802527fa809
Reviewed-on: http://gerrit.cloudera.org:8080/11297
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Improve QueryResultSet interface to allow appending a batch of rows at a time
> -----------------------------------------------------------------------------
>
>                 Key: IMPALA-7477
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7477
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>
> The QueryResultSet interface used from PlanRootSink operates on a row at a time and is inefficient and inelegant. We can improve code readability by moving some logic from PlanRootSink to QueryResultSet and improve perf if we switch to an interface that allows appending a batch.
> This will make IMPALA-4268 incrementally easier.
> There are some TODOs in the code related to this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org