You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2017/04/04 23:51:41 UTC

[jira] [Created] (IMPALA-5168) Codegen hash computation in DataStreamSender::Send for partition exchange.

Mostafa Mokhtar created IMPALA-5168:
---------------------------------------

             Summary: Codegen hash computation in DataStreamSender::Send for partition exchange. 
                 Key: IMPALA-5168
                 URL: https://issues.apache.org/jira/browse/IMPALA-5168
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 2.6.0
            Reporter: Mostafa Mokhtar


Hash partition computation for exchange operators can benefit from codegen, profile data ~20% of CPU in the fragment thread is consumed by RawValue::GetHashValueFnv & ExprContext::GetValue

{code}
    // hash-partition batch's rows across channels
    int num_channels = channels_.size();
    for (int i = 0; i < batch->num_rows(); ++i) {
      TupleRow* row = batch->GetRow(i);
      uint32_t hash_val = HashUtil::FNV_SEED;
      for (int i = 0; i < partition_expr_ctxs_.size(); ++i) {
        ExprContext* ctx = partition_expr_ctxs_[i];
        void* partition_val = ctx->GetValue(row);
        // We can't use the crc hash function here because it does not result
        // in uncorrelated hashes with different seeds.  Instead we must use
        // fnv hash.
        // TODO: fix crc hash/GetHashValue()
        hash_val =
            RawValue::GetHashValueFnv(partition_val, ctx->root()->type(), hash_val);
      }
      ExprContext::FreeLocalAllocations(partition_expr_ctxs_);
      RETURN_IF_ERROR(channels_[hash_val % num_channels]->AddRow(row));
    }
{code}

|Function Stack| Effective Time % |
|Total|100%|
| clone|99%|
|  start_thread|99%|
|   thread_proxy|99%|
|    boost::detail::thread_data<boost::_bi::bind_t<>::run|99%|
|     boost::_bi::bind_t<void, void (*)(), ::operator()|99%|
|      operator()<void (*)(const std::basic_string<|99%|
|       impala::Thread::SuperviseThread|99%|
|        boost::function0<void>::operator()|99%|
|         impala::QueryExecMgr::ExecFInstance|99%|
|          impala::FragmentInstanceState::Exec|99%|
|           impala::PlanFragmentExecutor::Exec|99%|
|            impala::PlanFragmentExecutor::ExecInternal|96%|
|             impala::DataStreamSender::Send|91%|
|              impala::DataStreamSender::Channel::AddRow|56%|
|              impala::RawValue::GetHashValueFnv|11%|
|              impala::ExprContext::GetValue|11%|
|              impala::ExprContext::FreeLocalAllocations|6%|
|              impala::RowBatch::GetRow|1%|
|              std::vector<impala::ExprContext*, std::allocator<impala::ExprContext*>>::size|1%|
|              impala::Expr::type|0%|
|              impala::ExprContext::GetValue|0%|
|              impala::RuntimeState::CheckQueryState|0%|
|             impala::HdfsScanNode::GetNext|3%|
|             impala::RowBatch::Reset|1%|
|             Status|0%|
|             ~ScopedTimer|0%|
|            [Unknown stack frame(s)]|4%|




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)