You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tianyi Wang (Code Review)" <ge...@cloudera.org> on 2017/08/16 00:35:58 UTC

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Tianyi Wang has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7683

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................

IMPALA-5573: Add decimal codegen in text scanner

This patch adds decimal type codegen support in text scanner. Currently
codegen would be disabled if there is a decimal column. With this patch
StringParser::StringToDecimal will be called in generated code. A new
file util/string-parser.cc is created and linked into libUtil. This file
contains proxy functions to StringToDecimal in ordered to keep
StringToDecimal out of LLVM IR.
Simple performance tests show that putting the parser in libUtil.so
instead of impala-sse.bc would reduce codegen time by 2/3 in cases
where only one decimal column is parsed while the scanning time is
nearly the same.

Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/llvm-codegen.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/text-converter.cc
M be/src/runtime/decimal-value.h
M be/src/util/CMakeLists.txt
A be/src/util/string-parser.cc
M be/src/util/string-parser.h
9 files changed, 148 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/7683/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7683

to look at the new patch set (#7).

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................

IMPALA-5573: Add decimal codegen in text scanner

This patch adds decimal type codegen support in text scanner. Currently
codegen would be disabled if there is a decimal column. With this patch
StringParser::StringToDecimal will be called in generated code. A new
file util/string-parser.cc is created and linked into libUtil. This file
contains proxy functions to StringToDecimal in ordered to keep
StringToDecimal out of LLVM IR.

In a benchmark query:
> select l_quantity, l_extendedprice, l_discount, l_tax from biglineitem where l_quantity > 100.0;
where biglineitem is tpch.lineitem repeated 6 times, the codegen version
is 19% faster than non-codgen version in scanning, and 8% faster in
query time. Codegen time in this simple case is 69ms.

Simple performance tests show that putting the parser in libUtil instead
of impala-sse.bc would reduce codegen time by 2/3 in cases where only
one decimal column is parsed while the scanning time is nearly the same.

Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/text-converter.cc
M be/src/runtime/decimal-value.h
M be/src/util/CMakeLists.txt
A be/src/util/string-parser.cc
M be/src/util/string-parser.h
8 files changed, 131 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/7683/7
-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 8:

After these code re-spawning and inconsistency I decide to run a full test first.

-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1075/

-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


IMPALA-5573: Add decimal codegen in text scanner

This patch adds decimal type codegen support in text scanner. Currently
codegen would be disabled if there is a decimal column. With this patch
StringParser::StringToDecimal will be called in generated code. A new
file util/string-parser.cc is created and linked into libUtil. This file
contains proxy functions to StringToDecimal in ordered to keep
StringToDecimal out of LLVM IR.

In a benchmark query:
> select l_quantity, l_extendedprice, l_discount, l_tax from biglineitem where l_quantity > 100.0;
where biglineitem is tpch.lineitem repeated 6 times, the codegen version
is 19% faster than non-codgen version in scanning, and 8% faster in
query time. Codegen time in this simple case is 69ms.

Simple performance tests show that putting the parser in libUtil instead
of impala-sse.bc would reduce codegen time by 2/3 in cases where only
one decimal column is parsed while the scanning time is nearly the same.

Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Reviewed-on: http://gerrit.cloudera.org:8080/7683
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/text-converter.cc
M be/src/util/CMakeLists.txt
A be/src/util/string-parser.cc
M be/src/util/string-parser.h
7 files changed, 136 insertions(+), 28 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7683

to look at the new patch set (#9).

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................

IMPALA-5573: Add decimal codegen in text scanner

This patch adds decimal type codegen support in text scanner. Currently
codegen would be disabled if there is a decimal column. With this patch
StringParser::StringToDecimal will be called in generated code. A new
file util/string-parser.cc is created and linked into libUtil. This file
contains proxy functions to StringToDecimal in ordered to keep
StringToDecimal out of LLVM IR.

In a benchmark query:
> select l_quantity, l_extendedprice, l_discount, l_tax from biglineitem where l_quantity > 100.0;
where biglineitem is tpch.lineitem repeated 6 times, the codegen version
is 19% faster than non-codgen version in scanning, and 8% faster in
query time. Codegen time in this simple case is 69ms.

Simple performance tests show that putting the parser in libUtil instead
of impala-sse.bc would reduce codegen time by 2/3 in cases where only
one decimal column is parsed while the scanning time is nearly the same.

Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/text-converter.cc
M be/src/util/CMakeLists.txt
M be/src/util/parse-util.cc
A be/src/util/string-parser.cc
M be/src/util/string-parser.h
8 files changed, 137 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/7683/9
-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 8:

Were you able to figure out the problem with static linking?

-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7683/5/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

Line 44:   using value_type = T;
> Not needed now.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#3).

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................

IMPALA-5573: Add decimal codegen in text scanner

This patch adds decimal type codegen support in text scanner. Currently
codegen would be disabled if there is a decimal column. With this patch
StringParser::StringToDecimal will be called in generated code. A new
file util/string-parser.cc is created and linked into libUtil. This file
contains proxy functions to StringToDecimal in ordered to keep
StringToDecimal out of LLVM IR.
Simple performance tests show that putting the parser in libUtil.so
instead of impala-sse.bc would reduce codegen time by 2/3 in cases
where only one decimal column is parsed while the scanning time is
nearly the same.

Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/text-converter.cc
M be/src/runtime/decimal-value.h
M be/src/util/CMakeLists.txt
A be/src/util/string-parser.cc
M be/src/util/string-parser.h
8 files changed, 135 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/7683/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1075/

-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 8: Code-Review+2

No problem

-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 5:

(1 comment)

Looks good, if you fix one thing then rebase I can start the merge.

http://gerrit.cloudera.org:8080/#/c/7683/5/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

Line 44:   using value_type = T;
Not needed now.


-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 8: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1081/

-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 8:

LLVM still cannot find symbols with static linking, which I haven't tested before.

-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 6:

Also, for reference, I think the existing test coverage for this code path is adequate - we run many tests on decimal columns - test_decimal_queries, test_tpch_queries, test_tpcds_queries, etc.

-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7683

to look at the new patch set (#10).

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................

IMPALA-5573: Add decimal codegen in text scanner

This patch adds decimal type codegen support in text scanner. Currently
codegen would be disabled if there is a decimal column. With this patch
StringParser::StringToDecimal will be called in generated code. A new
file util/string-parser.cc is created and linked into libUtil. This file
contains proxy functions to StringToDecimal in ordered to keep
StringToDecimal out of LLVM IR.

In a benchmark query:
> select l_quantity, l_extendedprice, l_discount, l_tax from biglineitem where l_quantity > 100.0;
where biglineitem is tpch.lineitem repeated 6 times, the codegen version
is 19% faster than non-codgen version in scanning, and 8% faster in
query time. Codegen time in this simple case is 69ms.

Simple performance tests show that putting the parser in libUtil instead
of impala-sse.bc would reduce codegen time by 2/3 in cases where only
one decimal column is parsed while the scanning time is nearly the same.

Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/text-converter.cc
M be/src/util/CMakeLists.txt
A be/src/util/string-parser.cc
M be/src/util/string-parser.h
7 files changed, 136 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/7683/10
-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 8:

> Were you able to figure out the problem with static linking?

That is fixed. I'm looking into some failed udf tests.

-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7683

to look at the new patch set (#8).

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................

IMPALA-5573: Add decimal codegen in text scanner

This patch adds decimal type codegen support in text scanner. Currently
codegen would be disabled if there is a decimal column. With this patch
StringParser::StringToDecimal will be called in generated code. A new
file util/string-parser.cc is created and linked into libUtil. This file
contains proxy functions to StringToDecimal in ordered to keep
StringToDecimal out of LLVM IR.

In a benchmark query:
> select l_quantity, l_extendedprice, l_discount, l_tax from biglineitem where l_quantity > 100.0;
where biglineitem is tpch.lineitem repeated 6 times, the codegen version
is 19% faster than non-codgen version in scanning, and 8% faster in
query time. Codegen time in this simple case is 69ms.

Simple performance tests show that putting the parser in libUtil instead
of impala-sse.bc would reduce codegen time by 2/3 in cases where only
one decimal column is parsed while the scanning time is nearly the same.

Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/text-converter.cc
M be/src/util/CMakeLists.txt
A be/src/util/string-parser.cc
M be/src/util/string-parser.h
7 files changed, 130 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/7683/8
-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 3:

(8 comments)

Code change looks good, just a few cleanup requests.

http://gerrit.cloudera.org:8080/#/c/7683/2//COMMIT_MSG
Commit Message:

Line 14: StringToDecimal out of LLVM IR.
Can you add a line break to separate the paragraphs to make it more readable.


PS2, Line 15: libUtil.so
nit: libUtil, since we use static linking too.


Line 19: 
Did you do an experiment to see how much scanning sped up for a larger table? tpch.lineitem has a few decimal columns. I often copy the data a few times for scanner tests like this to get good measurements.

  create table biglineitem as select * from lineitem;
  insert into biglineitem select * from lineitem;
  insert into biglineitem select * from lineitem;

I think you'll see the biggest speedup if you have a condition in the where clause, because codegen of that condition also gets disabled. E.g.

  select l_quantity, l_extendedprice, l_discount, l_tax from biglineitem where l_quantity > 100.0;


http://gerrit.cloudera.org:8080/#/c/7683/2/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

Line 134: 
> I'm sorry it works without this. I have no idea how I came to that conclusi
Good to know :) I was worried something subtle was going on. There are some places in the code where we resort to hacks to force modules to be linked into the output binary: https://github.com/apache/incubator-impala/blob/294d42adc117046f975665834af03ddaa53ec27e/be/src/exprs/scalar-expr-evaluator.cc#L428

Once the functions are linked in LLVM by default falls back to resolving functions in the binary's dynamic symbol table.


Line 134: 
> Without this it crashes with "Program used external function ... which coul
Maybe when you rebuilt Impala you hadn't updated the CMakeLists.txt, or the change hadn't taken affect. That would explain this.


http://gerrit.cloudera.org:8080/#/c/7683/2/be/src/exec/hdfs-scanner-ir.cc
File be/src/exec/hdfs-scanner-ir.cc:

Line 131:   if (*result != ParseResult::PARSE_SUCCESS) *result = ParseResult::PARSE_FAILURE;
> The IR generated by Clang still reads decimalblah and we will need a bitcas
Ah ok, makes sense.

I think it would also be ok to return Decimal*Value and bitcast unconditionally, but this works for me.


http://gerrit.cloudera.org:8080/#/c/7683/2/be/src/exec/text-converter.cc
File be/src/exec/text-converter.cc:

Line 234:           case 16:
> Done. This is copied from TextConverter::WriteSlot and there is an unnecess
Oh interesting :) Not sure what that was there.


http://gerrit.cloudera.org:8080/#/c/7683/3/be/src/exec/text-converter.cc
File be/src/exec/text-converter.cc:

PS3, Line 292: proper_slot
maybe "cast_slot". It's unclear what "proper" means.


-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7683/2/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

Line 134:   string path;
> We shouldn't need to do this (it always won't work on a statically linked b
Without this it crashes with "Program used external function ... which could not be resolved" during codegen.


http://gerrit.cloudera.org:8080/#/c/7683/2/be/src/exec/hdfs-scanner-ir.cc
File be/src/exec/hdfs-scanner-ir.cc:

Line 131: Decimal4Value::value_type IrStringToDecimal4(const char* s, int len, int type_precision,
> What happens if you just return a Decimal4Value?
The IR generated by Clang still reads decimalblah and we will need a bitcast in CodegenWriteSlot() just like the int128 case.


-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7683/7/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

Line 44:   using value_type = T;
This came back from the dead.


-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7683/2//COMMIT_MSG
Commit Message:

Line 14: StringToDecimal out of LLVM IR.
> Can you add a line break to separate the paragraphs to make it more readabl
Done


PS2, Line 15: libUtil.so
> nit: libUtil, since we use static linking too.
Done


Line 19: 
> Did you do an experiment to see how much scanning sped up for a larger tabl
Statistics added


http://gerrit.cloudera.org:8080/#/c/7683/2/be/src/exec/hdfs-scanner-ir.cc
File be/src/exec/hdfs-scanner-ir.cc:

Line 131: Decimal4Value::value_type IrStringToDecimal4(const char* s, int len, int type_precision,
> Ah ok, makes sense.
Applied. Good idea


-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/7683/2/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

Line 134:   string path;
We shouldn't need to do this (it always won't work on a statically linked build). Without this is there some problem resolving the symbols? Generally codegen'd code should be able to call any non-inlined functions in the impalad binary without doing anything special.


http://gerrit.cloudera.org:8080/#/c/7683/2/be/src/exec/hdfs-scanner-ir.cc
File be/src/exec/hdfs-scanner-ir.cc:

PS2, Line 84: FlowAsFailure
IMO it would be clearer to inline this function at the 3 callsites.


Line 131: Decimal4Value::value_type IrStringToDecimal4(const char* s, int len, int type_precision,
What happens if you just return a Decimal4Value?

I'm not sure exactly what will happen but I suspect Clang internally just lowers the type to a plain integer type. 

There's already logic to deal with lowered and unlowered types in codegen-anyval but I don't think it applies to this code, since it operates on the TimestampVal/StringVal/DecimalVal classes used by expresssions.


http://gerrit.cloudera.org:8080/#/c/7683/2/be/src/exec/text-converter.cc
File be/src/exec/text-converter.cc:

Line 234:           case 12:
case 12 seems unnecessary given we have a default case already.


PS2, Line 298:  be casted before storing
Maybe: "be cast to avoid an LLVM type error"


http://gerrit.cloudera.org:8080/#/c/7683/2/be/src/util/string-parser.h
File be/src/util/string-parser.h:

PS2, Line 600: 
             : /// StringToDecimal<T> is too big to be called from IR. These are proxy functions
             : /// linked into util library
I would reword this to something like:

// These functions are too large to benefit from inlining.


-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7683/7/be/src/runtime/decimal-value.h
File be/src/runtime/decimal-value.h:

Line 44:   using value_type = T;
> This came back from the dead.
Sorry. It won't happen again


-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 10: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#2).

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................

IMPALA-5573: Add decimal codegen in text scanner

This patch adds decimal type codegen support in text scanner. Currently
codegen would be disabled if there is a decimal column. With this patch
StringParser::StringToDecimal will be called in generated code. A new
file util/string-parser.cc is created and linked into libUtil. This file
contains proxy functions to StringToDecimal in ordered to keep
StringToDecimal out of LLVM IR.
Simple performance tests show that putting the parser in libUtil.so
instead of impala-sse.bc would reduce codegen time by 2/3 in cases
where only one decimal column is parsed while the scanning time is
nearly the same.

Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/llvm-codegen.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/text-converter.cc
M be/src/runtime/decimal-value.h
M be/src/util/CMakeLists.txt
A be/src/util/string-parser.cc
M be/src/util/string-parser.h
9 files changed, 146 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/7683/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 10: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#6).

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................

IMPALA-5573: Add decimal codegen in text scanner

This patch adds decimal type codegen support in text scanner. Currently
codegen would be disabled if there is a decimal column. With this patch
StringParser::StringToDecimal will be called in generated code. A new
file util/string-parser.cc is created and linked into libUtil. This file
contains proxy functions to StringToDecimal in ordered to keep
StringToDecimal out of LLVM IR.

In a benchmark query:
> select l_quantity, l_extendedprice, l_discount, l_tax from biglineitem where l_quantity > 100.0;
where biglineitem is tpch.lineitem repeated 6 times, the codegen version
is 19% faster than non-codgen version in scanning, and 8% faster in
query time. Codegen time in this simple case is 69ms.

Simple performance tests show that putting the parser in libUtil instead
of impala-sse.bc would reduce codegen time by 2/3 in cases where only
one decimal column is parsed while the scanning time is nearly the same.

Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/text-converter.cc
M be/src/util/CMakeLists.txt
A be/src/util/string-parser.cc
M be/src/util/string-parser.h
7 files changed, 129 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/7683/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#5).

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................

IMPALA-5573: Add decimal codegen in text scanner

This patch adds decimal type codegen support in text scanner. Currently
codegen would be disabled if there is a decimal column. With this patch
StringParser::StringToDecimal will be called in generated code. A new
file util/string-parser.cc is created and linked into libUtil. This file
contains proxy functions to StringToDecimal in ordered to keep
StringToDecimal out of LLVM IR.

In a benchmark query:
> select l_quantity, l_extendedprice, l_discount, l_tax from biglineitem where l_quantity > 100.0;
where biglineitem is tpch.lineitem repeated 6 times, the codegen version
is 19% faster than non-codgen version in scanning, and 8% faster in
query time. Codegen time in this simple case is 69ms.

Simple performance tests show that putting the parser in libUtil instead
of impala-sse.bc would reduce codegen time by 2/3 in cases where only
one decimal column is parsed while the scanning time is nearly the same.

Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/text-converter.cc
M be/src/runtime/decimal-value.h
M be/src/util/CMakeLists.txt
A be/src/util/string-parser.cc
M be/src/util/string-parser.h
8 files changed, 130 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/7683/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7683/2/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

Line 134:   string path;
> Without this it crashes with "Program used external function ... which coul
I'm sorry it works without this. I have no idea how I came to that conclusion and accidentally "fixed" it.


http://gerrit.cloudera.org:8080/#/c/7683/2/be/src/exec/hdfs-scanner-ir.cc
File be/src/exec/hdfs-scanner-ir.cc:

PS2, Line 84: FlowAsFailure
> IMO it would be clearer to inline this function at the 3 callsites.
Done


http://gerrit.cloudera.org:8080/#/c/7683/2/be/src/exec/text-converter.cc
File be/src/exec/text-converter.cc:

Line 234:           case 12:
> case 12 seems unnecessary given we have a default case already.
Done. This is copied from TextConverter::WriteSlot and there is an unnecessary case there by the way.


http://gerrit.cloudera.org:8080/#/c/7683/2/be/src/util/string-parser.h
File be/src/util/string-parser.h:

PS2, Line 600: 
             : /// StringToDecimal<T> is too big to be called from IR. These are proxy functions
             : /// linked into util library
> I would reword this to something like:
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7683/9/be/src/util/parse-util.cc
File be/src/util/parse-util.cc:

Line 99: void StringToDecimalSymbolDummy() {
> Can we move this to /exec/hdfs-scanner-ir.cc since that is the place that d
Done.


PS9, Line 100: emit
> "to link the object file". The compiler emits the .o but then the linker do
Done. Thanks for the explanation.


-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 7:

GCC implicitly converted nullptr to string then to Status.

-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 9:

> Were you able to figure out the problem with static linking?

Done. The tests pass after rebasing to master and reload data.

-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1131/

-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1081/

-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tianyi Wang (Code Review)" <ge...@cloudera.org>.
Tianyi Wang has uploaded a new patch set (#4).

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................

IMPALA-5573: Add decimal codegen in text scanner

This patch adds decimal type codegen support in text scanner. Currently
codegen would be disabled if there is a decimal column. With this patch
StringParser::StringToDecimal will be called in generated code. A new
file util/string-parser.cc is created and linked into libUtil. This file
contains proxy functions to StringToDecimal in ordered to keep
StringToDecimal out of LLVM IR.

In a benchmark query:
> select l_quantity, l_extendedprice, l_discount, l_tax from biglineitem where l_quantity > 100.0;
where biglineitem is tpch.lineitem repeated 6 times, the codegen version
is 19% faster than non-codgen version in scanning, and 8% faster in
query time. Codegen time in this simple case is 69ms.

Simple performance tests show that putting the parser in libUtil instead
of impala-sse.bc would reduce codegen time by 2/3 in cases where only
one decimal column is parsed while the scanning time is nearly the same.

Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/text-converter.cc
M be/src/runtime/decimal-value.h
M be/src/util/CMakeLists.txt
A be/src/util/string-parser.cc
M be/src/util/string-parser.h
8 files changed, 128 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/7683/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5573: Add decimal codegen in text scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-5573: Add decimal codegen in text scanner
......................................................................


Patch Set 9: Code-Review+2

(2 comments)

One minor request then I'll merge.

http://gerrit.cloudera.org:8080/#/c/7683/9/be/src/util/parse-util.cc
File be/src/util/parse-util.cc:

Line 99: void StringToDecimalSymbolDummy() {
Can we move this to /exec/hdfs-scanner-ir.cc since that is the place that depends on it. Otherwise it will be hard to find.


PS9, Line 100: emit
"to link the object file". The compiler emits the .o but then the linker doesn't think that the .o file is needed by anything else in the impalad binary.


-- 
To view, visit http://gerrit.cloudera.org:8080/7683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia65820e969d59094dc92d912a5279fa90f6b179d
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <tw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes