You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tianyi Wang (JIRA)" <ji...@apache.org> on 2017/08/23 17:26:01 UTC
[jira] [Resolved] (IMPALA-5573) Support decimal codegen in text
scanner
[ https://issues.apache.org/jira/browse/IMPALA-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tianyi Wang resolved IMPALA-5573.
---------------------------------
Resolution: Fixed
Fix Version/s: Impala 2.10.0
IMPALA-5573: Add decimal codegen in text scanner
This patch adds decimal type codegen support in text scanner. Currently
codegen would be disabled if there is a decimal column. With this patch
StringParser::StringToDecimal will be called in generated code. A new
file util/string-parser.cc is created and linked into libUtil. This file
contains proxy functions to StringToDecimal in ordered to keep
StringToDecimal out of LLVM IR.
In a benchmark query:
> select l_quantity, l_extendedprice, l_discount, l_tax from biglineitem where l_quantity > 100.0;
where biglineitem is tpch.lineitem repeated 6 times, the codegen version
is 19% faster than non-codgen version in scanning, and 8% faster in
query time. Codegen time in this simple case is 69ms.
Simple performance tests show that putting the parser in libUtil instead
of impala-sse.bc would reduce codegen time by 2/3 in cases where only
one decimal column is parsed while the scanning time is nearly the same.
Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Reviewed-on: http://gerrit.cloudera.org:8080/7683
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins
> Support decimal codegen in text scanner
> ---------------------------------------
>
> Key: IMPALA-5573
> URL: https://issues.apache.org/jira/browse/IMPALA-5573
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 2.10.0
> Reporter: Tim Armstrong
> Assignee: Tianyi Wang
> Priority: Minor
> Labels: codegen
> Fix For: Impala 2.10.0
>
>
> Codegen is disabled when scanning text tables with decimal columns. The message is "Decimal not yet supported for codegen."
> The supported types use cross-compiled conversion functions in TextConverter::CodegenWriteSlot():
> {code}
> IRFunction::Type parse_fn_enum;
> Function* parse_fn = NULL;
> switch (slot_desc->type().type) {
> case TYPE_BOOLEAN:
> parse_fn_enum = IRFunction::STRING_TO_BOOL;
> break;
> case TYPE_TINYINT:
> parse_fn_enum = IRFunction::STRING_TO_INT8;
> break;
> case TYPE_SMALLINT:
> parse_fn_enum = IRFunction::STRING_TO_INT16;
> break;
> case TYPE_INT:
> parse_fn_enum = IRFunction::STRING_TO_INT32;
> break;
> case TYPE_BIGINT:
> parse_fn_enum = IRFunction::STRING_TO_INT64;
> break;
> case TYPE_FLOAT:
> parse_fn_enum = IRFunction::STRING_TO_FLOAT;
> break;
> case TYPE_DOUBLE:
> parse_fn_enum = IRFunction::STRING_TO_DOUBLE;
> break;
> default:
> DCHECK(false);
> return NULL;
> }
> {code}
> Decimal is a bit different to these functions because it accepts precision and scale parameters and the parsing functions are templated. However I think in principle the same approach can be used as for the other types.
> *Note:* StringToDecimal() is a huge inline function and we risk run into codegen-time problems if we're scanning many decimal columns. We should experiment to see if this is a problem and if so try to mitigate it, e.g. by not moving the function out of the header.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)