You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2019/12/03 23:34:00 UTC

[jira] [Commented] (IMPALA-9181) Serialize TQueryCtx once per query

    [ https://issues.apache.org/jira/browse/IMPALA-9181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987356#comment-16987356 ] 

ASF subversion and git services commented on IMPALA-9181:
---------------------------------------------------------

Commit a1588e44980c648cb7f9263cbd0409abfbaeacf7 in impala's branch refs/heads/master from Thomas Tauber-Marshall
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a1588e4 ]

IMPALA-9181: Serialize TQueryCtx once per query

When issuing Exec() rpcs to backends, we currently serialize the
TQueryCtx once per backend. This is inefficient as the TQueryCtx is
the same for all backends and really only needs to be serialized once.

Serializing the TQueryCtx can be expensive as it contains both the
full text of the original query and the descriptor table, which can be
quite large. In a synthetic dataset I tested with, scanning a table
with 100k partitions leads to a descriptor table size of ~20MB.

This patch serializes the TQueryCtx in the coordinator and then passes
it to each BackendState when calling Exec().

Followup work might consider if we really need all of the info in the
TQueryCtx to be distributed to all backends.

Testing:
- Passed full run of existing tests.
- Single node perf run showed no significant change.

Change-Id: I6a4dd302fd5602ec2775492a041ddd51e7d7a6c6
Reviewed-on: http://gerrit.cloudera.org:8080/14777
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Serialize TQueryCtx once per query
> ----------------------------------
>
>                 Key: IMPALA-9181
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9181
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 3.4.0
>            Reporter: Thomas Tauber-Marshall
>            Assignee: Thomas Tauber-Marshall
>            Priority: Major
>
> When issuing Exec() rpcs to backends, we currently serialize the TQueryCtx once per backend. This is inefficient as the TQueryCtx is the same for all backends and really only needs to be serialized once.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org