You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tianyi Wang (JIRA)" <ji...@apache.org> on 2017/08/23 17:26:01 UTC
[jira] [Resolved] (IMPALA-5573) Support decimal codegen in text scanner

     [ https://issues.apache.org/jira/browse/IMPALA-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tianyi Wang resolved IMPALA-5573.
---------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.10.0


IMPALA-5573: Add decimal codegen in text scanner

This patch adds decimal type codegen support in text scanner. Currently
codegen would be disabled if there is a decimal column. With this patch
StringParser::StringToDecimal will be called in generated code. A new
file util/string-parser.cc is created and linked into libUtil. This file
contains proxy functions to StringToDecimal in ordered to keep
StringToDecimal out of LLVM IR.

In a benchmark query:
> select l_quantity, l_extendedprice, l_discount, l_tax from biglineitem where l_quantity > 100.0;
where biglineitem is tpch.lineitem repeated 6 times, the codegen version
is 19% faster than non-codgen version in scanning, and 8% faster in
query time. Codegen time in this simple case is 69ms.

Simple performance tests show that putting the parser in libUtil instead
of impala-sse.bc would reduce codegen time by 2/3 in cases where only
one decimal column is parsed while the scanning time is nearly the same.

Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Reviewed-on: http://gerrit.cloudera.org:8080/7683
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins

> Support decimal codegen in text scanner
> ---------------------------------------
>
>                 Key: IMPALA-5573
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5573
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.10.0
>            Reporter: Tim Armstrong
>            Assignee: Tianyi Wang
>            Priority: Minor
>              Labels: codegen
>             Fix For: Impala 2.10.0
>
>
> Codegen is disabled when scanning text tables with decimal columns. The message is "Decimal not yet supported for codegen."
> The supported types use cross-compiled conversion functions in TextConverter::CodegenWriteSlot():
> {code}
>     IRFunction::Type parse_fn_enum;
>     Function* parse_fn = NULL;
>     switch (slot_desc->type().type) {
>       case TYPE_BOOLEAN:
>         parse_fn_enum = IRFunction::STRING_TO_BOOL;
>         break;
>       case TYPE_TINYINT:
>         parse_fn_enum = IRFunction::STRING_TO_INT8;
>         break;
>       case TYPE_SMALLINT:
>         parse_fn_enum = IRFunction::STRING_TO_INT16;
>         break;
>       case TYPE_INT:
>         parse_fn_enum = IRFunction::STRING_TO_INT32;
>         break;
>       case TYPE_BIGINT:
>         parse_fn_enum = IRFunction::STRING_TO_INT64;
>         break;
>       case TYPE_FLOAT:
>         parse_fn_enum = IRFunction::STRING_TO_FLOAT;
>         break;
>       case TYPE_DOUBLE:
>         parse_fn_enum = IRFunction::STRING_TO_DOUBLE;
>         break;
>       default:
>         DCHECK(false);
>         return NULL;
>     }
> {code}
> Decimal is a bit different to these functions because it accepts precision and scale parameters and the parsing functions are templated. However I think in principle the same approach can be used as for the other types.
> *Note:* StringToDecimal() is a huge inline function and we risk run into codegen-time problems if we're scanning many decimal columns. We should experiment to see if this is a problem and if so try to mitigate it, e.g. by not moving the function out of the header.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)