You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2017/09/22 15:28:00 UTC

[jira] [Resolved] (IMPALA-5965) Avoid per-value switch on NeedsConversionInline() when decoding dictionary-encoded strings and timestamps

     [ https://issues.apache.org/jira/browse/IMPALA-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong resolved IMPALA-5965.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.11.0

IMPALA-5965: avoid per-value switch on NeedsConversionInline() in parquet

Testing:
========
Ran core tests

Perf:
====
Ran this query a few times:

  set num_nodes=1;
  set mt_dop=1;
  select min(l_returnflag), min(l_linestatus) from biglineitem;
  summary;

I saw a speedup in scan time from ~2.25s -> 2.11s on my machine.

Change-Id: I04fb4ca73978d0100e1eb9835b87d37540b8b645
Reviewed-on: http://gerrit.cloudera.org:8080/8117
Reviewed-by: Lars Volker <lv...@cloudera.com>
Reviewed-by: Dan Hecht <dh...@cloudera.com>
Tested-by: Impala Public Jenkins


> Avoid per-value switch on NeedsConversionInline() when decoding dictionary-encoded strings and timestamps
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-5965
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5965
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.10.0
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Minor
>              Labels: perf
>             Fix For: Impala 2.11.0
>
>
> I noticed a minor inefficiency in the handling of NeedsConversion() in the parquet scanner. In cases where it's not a runtime constant like dictionary-encoded strings and timestamps, we switch per value. This is probably only a few instructions but in this part of the code that matters.
> I did a quick benchmark and saw speedups from ~2.25s->2.11s in scan time on this query:
> {code}
> use tpch_parquet; 
> set num_nodes=1;
> set mt_dop=1;
> select min(l_returnflag), min(l_linestatus) from biglineitem;
> summary;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)