You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Michael Ho (Code Review)" <ge...@cloudera.org> on 2016/05/12 21:21:02 UTC

[Impala-CR](cdh5-trunk) Impala-3286: Prefetching For Phj Probing.

Michael Ho has uploaded a new patch set (#4).

Change subject: Impala-3286: Prefetching For Phj Probing.
......................................................................

Impala-3286: Prefetching For Phj Probing.

This Change Pipelines The Code Which Probes The Hash Tables.
This Is Based On The Idea Which Mostafa Presented Earlier.
Essentially, All Rows In A Row Batch Will Be Evaluated And
Hashed First Before Being Probed Against The Hash Tables.
Hash Table Buckets Are Prefetched As Hash Values Of Rows Are
Computed.

To Avoid Re-Evaluating The Rows Again During Probing (As The Rows
Have Been Evaluated Once To Compute The Hash Values), Hash Table
Context Has Been Updated To Cache The Evaluated Expression Values,
Nullness Of Values And Hash Values Of Some Number Of Rows. The New
Cache Embedded In Hash Table Context Provides An Iterator Like
Interface To Read/Write Cached Values.

A Prefetch_Mode Query Option Has Also Been Added To Disable Prefetching
If Necessary. The Default Mode Is 1 Which Means Hash Table Buckets Will
Be Prefetched. In The Future, This Mode May Be Extended To Support Hash
Table Buckets' Data Prefetching Too.

Combined With The Build Side Prefetching, A Self Join Of Table Lineitem
Improves By 40% On A Single Node Run On Average:

Select Count(*)
From Lineitem O1, Lineitem O2
Where O1.L_Orderkey = O2.L_Orderkey And
      O1.L_Linenumber = O2.L_Linenumber;

Change-Id: Ib42b93d99d09c833571e39d20d58c11ef73f3cc0
---
M be/src/exec/hash-table-test.cc
M be/src/exec/hash-table.cc
M be/src/exec/hash-table.h
M be/src/exec/hash-table.inline.h
M be/src/exec/partitioned-aggregation-node-ir.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-node-ir.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/partitioned-hash-join-node.h
M be/src/exec/partitioned-hash-join-node.inline.h
M be/src/runtime/row-batch.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Types.thrift
16 files changed, 982 insertions(+), 439 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/59/2959/4
-- 
To view, visit http://gerrit.cloudera.org:8080/2959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib42b93d99d09c833571e39d20d58c11ef73f3cc0
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>