You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2018/02/22 19:43:00 UTC
[jira] [Resolved] (IMPALA-6461) *DataStreamSender::Channel::AddRow
needs some micro-optimizations to remove per row function call and data
dependency
[ https://issues.apache.org/jira/browse/IMPALA-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mostafa Mokhtar resolved IMPALA-6461.
-------------------------------------
Resolution: Fixed
IMPALA-6461 : Micro-optimizations to DataStreamSender::Send
While analyzing performance of partition exchange operator, I noticed that there is dependency and a function call per row in the hot path. Optimizations in this change are: 1) Remove the data dependency between computing the hash and the channel 2) Inline DataStreamSender::Channel::AddRow 3) Save partition_exprs_.size() to save a couple of instructions This translates to improving CPI for DataStreamSender::Send by 10% Change-Id: I642a9dad531a29d4838a3537ab0e04320a69960d Reviewed-on: [http://gerrit.cloudera.org:8080/9221]Reviewed-by: Mostafa Mokhtar <mm...@cloudera.com> Tested-by: Impala Public Jenkins
> *DataStreamSender::Channel::AddRow needs some micro-optimizations to remove per row function call and data dependency
> ----------------------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-6461
> URL: https://issues.apache.org/jira/browse/IMPALA-6461
> Project: IMPALA
> Issue Type: Improvement
> Components: Distributed Exec
> Reporter: Mostafa Mokhtar
> Assignee: Mostafa Mokhtar
> Priority: Minor
>
> While analyzing performance of partition exchange operator I noticed that there is dependency and a function call per row in the hot path.
> {code}
> // hash-partition batch's rows across channels
> // TODO: encapsulate this in an Expr as we've done for Kudu above and remove this case
> // once we have codegen here.
> int num_channels = channels_.size();
> for (int i = 0; i < batch->num_rows(); ++i) {
> TupleRow* row = batch->GetRow(i);
> uint64_t hash_val = EXCHANGE_HASH_SEED;
> for (int j = 0; j < partition_exprs_.size(); ++j) {
> ScalarExprEvaluator* eval = partition_expr_evals_[j];
> void* partition_val = eval->GetValue(row);
> // We can't use the crc hash function here because it does not result in
> // uncorrelated hashes with different seeds. Instead we use FastHash.
> // TODO: fix crc hash/GetHashValue()
> DCHECK(&(eval->root()) == partition_exprs_[j]);
> hash_val = RawValue::GetHashValueFastHash(
> partition_val, partition_exprs_[j]->type(), hash_val);
> }
> RETURN_IF_ERROR(channels_[hash_val % num_channels]->AddRow(row));
> }
> {code}
> Force inlining DataStreamSender::Channel::AddRow and breaking up the loop improves partition exchange performance by 5%
> Code-generation of the hash computation IMPALA-5168 should give another 10% speedup.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)