You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Michael Ho (Code Review)" <ge...@cloudera.org> on 2017/11/21 19:36:15 UTC

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8623


Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................

IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Previously, scanners will assume that there are no conjuncts associated
with a scan node when there are no materialized slots (e.g. count(*)).
This is not necessarily the case as one can write queries such as
select count(*) from tpch.lineitem where rand() * 10 < 0; In which case,
the conjuncts should still be evaluated once per row.

This change fixes the problem in the short-circuit handling logic for
count(*) to evaluate the conjuncts once per row and only commits a row
to the output row batch if the conjuncts evaluate to true.

Testing done: Added the example above to the scanner test

Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
---
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M testdata/workloads/functional-query/queries/QueryTest/scanners.test
10 files changed, 74 insertions(+), 18 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8623/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc@226
PS2, Line 226:     DCHECK_EQ(scan_node_->tuple_desc()->byte_size(), 0);
> I'm in favor of separating this existing tuple ptr issue into a new JIRA. M
You guys are way ahead of me. Thanks for filing IMPALA-6258. This patch looks good to me.



-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Nov 2017 01:39:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 4: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Nov 2017 01:40:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc@226
PS2, Line 226:     DCHECK_EQ(scan_node_->tuple_desc()->byte_size(), 0);
> I was able to construct a query (based on an existing test) that had a Tupl
For code simplicity the planner sometimes adds TupleIsNull() where not strictly necessary. A TupleIsNull() can only evaluate to true if all the checked tuples are nullable at that point in the plan, so the TupleIsNull() in the example are not really necessary and will not even inspect the tuple pointers during evaluation.

Writing a dummy non-NULL ptr seems most correct because a NULL tuple ptr has the specific meaning of "outer join non-match".

Pretty sure I can construct a query that will break if we init tuple ptrs to NULL. Stay tuned.



-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Nov 2017 01:26:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc@226
PS2, Line 226:     DCHECK_EQ(scan_node_->tuple_desc()->byte_size(), 0);
> I tried malloc vs. calloc in row-batch.cc on the following query:
Thanks for confirming. That's indeed a bit subtle and warrants extra documentation in the RowBatch() constructor. FWIW, I filed IMPALA-6258 to track the uninitialized pointer issue.



-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Nov 2017 01:36:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8623/1/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8623/1/be/src/exec/hdfs-scanner.cc@215
PS1, Line 215: should
> must, or "which don't reference any slots"
I'm wondering whether this assertion and the code is correct. What if we have a non-deterministic predicate that references a partition column? Examples:

where year > rand()

where rand(year) > 0



-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Nov 2017 18:15:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Nov 2017 01:45:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc@226
PS2, Line 226:     DCHECK_EQ(scan_node_->tuple_desc()->byte_size(), 0);
> For code simplicity the planner sometimes adds TupleIsNull() where not stri
I tried malloc vs. calloc in row-batch.cc on the following query:

select count(v.x) from functional.alltypestiny t3 left outer join (select true as x from functional.alltypestiny t1 left outer join functional.alltypestiny t2 on (true)) v on (v.x = t3.bool_col) where t3.bool_col = true;

With malloc the results are correct and with calloc the results are wrong.



-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Nov 2017 01:32:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc@226
PS2, Line 226:     DCHECK_EQ(scan_node_->tuple_desc()->byte_size(), 0);
> Actually with malloc the results are non-deterministic :D
I'm in favor of separating this existing tuple ptr issue into a new JIRA. Michael can you file one with the example above? The results for the above query are non-deterministic without this patch.



-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Nov 2017 01:38:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Alex Behm, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8623

to look at the new patch set (#2).

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................

IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Previously, scanners will assume that there are no conjuncts associated
with a scan node for queries with no materialized slots (e.g. count(*)).
This is not necessarily the case as one can write queries such as
select count(*) from tpch.lineitem where rand() * 10 < 0; or
select count(*) from tpch.lineitem where rand() > <a partition column>.
In which case, the conjuncts should still be evaluated once per row.

This change fixes the problem in the short-circuit handling logic for
count(*) to evaluate the conjuncts once per row and only commits a row
to the output row batch if the conjuncts evaluate to true.

Testing done: Added the example above to the scanner test

Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
---
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/runtime/row-batch.cc
M testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test
M testdata/workloads/functional-query/queries/QueryTest/scanners.test
12 files changed, 81 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8623/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc@226
PS2, Line 226:     DCHECK_EQ(scan_node_->tuple_desc()->byte_size(), 0);
> I tried malloc vs. calloc in row-batch.cc on the following query:
Actually with malloc the results are non-deterministic :D

Here's the plan with num_nodes=1 for the curious:

+-----------------------------------------------------------------------+
| Explain String                                                        |
+-----------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=1.94MB                      |
| Per-Host Resource Estimates: Memory=97.94MB                           |
| Codegen disabled by planner                                           |
|                                                                       |
| F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                 |
| |  Per-Host Resources: mem-estimate=97.94MB mem-reservation=1.94MB    |
| PLAN-ROOT SINK                                                        |
| |  mem-estimate=0B mem-reservation=0B                                 |
| |                                                                     |
| 05:AGGREGATE [FINALIZE]                                               |
| |  output: count(if(TupleIsNull([1, 2]), NULL, TRUE))                 |
| |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB        |
| |  tuple-ids=4 row-size=8B cardinality=1                              |
| |                                                                     |
| 04:HASH JOIN [LEFT OUTER JOIN]                                        |
| |  hash predicates: t3.bool_col = if(TupleIsNull([1, 2]), NULL, TRUE) |
| |  fk/pk conjuncts: assumed fk/pk                                     |
| |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB    |
| |  tuple-ids=0,1N,2N row-size=1B cardinality=4                        |
| |                                                                     |
| |--03:NESTED LOOP JOIN [LEFT OUTER JOIN]                              |
| |  |  join predicates: (TRUE)                                         |
| |  |  mem-estimate=0B mem-reservation=0B                              |
| |  |  tuple-ids=1,2N row-size=0B cardinality=8                        |
| |  |                                                                  |
| |  |--02:SCAN HDFS [functional.alltypestiny t2]                       |
| |  |     partitions=4/4 files=4 size=460B                             |
| |  |     stats-rows=8 extrapolated-rows=disabled                      |
| |  |     table stats: rows=8 size=unavailable                         |
| |  |     column stats: all                                            |
| |  |     mem-estimate=32.00MB mem-reservation=0B                      |
| |  |     tuple-ids=2 row-size=0B cardinality=8                        |
| |  |                                                                  |
| |  01:SCAN HDFS [functional.alltypestiny t1]                          |
| |     partitions=4/4 files=4 size=460B                                |
| |     stats-rows=8 extrapolated-rows=disabled                         |
| |     table stats: rows=8 size=unavailable                            |
| |     column stats: all                                               |
| |     mem-estimate=32.00MB mem-reservation=0B                         |
| |     tuple-ids=1 row-size=0B cardinality=8                           |
| |                                                                     |
| 00:SCAN HDFS [functional.alltypestiny t3]                             |
|    partitions=4/4 files=4 size=460B                                   |
|    predicates: t3.bool_col = TRUE                                     |
|    stats-rows=8 extrapolated-rows=disabled                            |
|    table stats: rows=8 size=unavailable                               |
|    column stats: all                                                  |
|    parquet dictionary predicates: t3.bool_col = TRUE                  |
|    mem-estimate=32.00MB mem-reservation=0B                            |
|    tuple-ids=0 row-size=1B cardinality=4                              |
+-----------------------------------------------------------------------+



-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Nov 2017 01:36:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 1:

May be I missed it somehow but Hbase scanner doesn't seem to suffer from this problem as it doesn't seem to implement any short-circuiting logic.


-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Nov 2017 19:59:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 4: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Nov 2017 05:53:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8623/3/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8623/3/be/src/exec/hdfs-scanner.h@337
PS3, Line 337: WriteEmptyProjection
> I think this name is still misleading especially compared to Kudu case, sin
lol. We may as well revert to the old name WriteTemplateTuple()


http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc@226
PS2, Line 226:     DCHECK_EQ(scan_node_->tuple_desc()->byte_size(), 0);
> I was able to construct a query (based on an existing test) that had a Tupl
Interesting. I tried this locally in a mini-cluster and it didn't crash. I have yet to look deeper into the problem. PS3 already reverted to the existing behavior. I filed IMPALA-6258 to track this separate issue here.


http://gerrit.cloudera.org:8080/#/c/8623/3/testdata/workloads/functional-query/queries/QueryTest/scanners.test
File testdata/workloads/functional-query/queries/QueryTest/scanners.test:

http://gerrit.cloudera.org:8080/#/c/8623/3/testdata/workloads/functional-query/queries/QueryTest/scanners.test@93
PS3, Line 93: ---- QUERY
> let's add a comment:
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Nov 2017 01:20:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8623/1/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8623/1/be/src/exec/hdfs-scanner.h@333
PS1, Line 333: If template tuple is not NULL
> Actually the null case is common for count(*) of course. So, maybe we shoul
PS3 reverted to the existing behavior of leaving those tuple_ptrs uninitialized.


http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc@226
PS2, Line 226: 
> This seems to violate the invariant that the scan node only produces non-NU
PS3 reverted to the existing behavior for now. I think setting it to some dummy non-NULL pointer is the way to go but I will punt on doing it in this patch until we can come up with an example for the potential problem you describe.


http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/runtime/row-batch.cc@60
PS2, Line 60:   tuple_ptrs_ = reinterpret_cast<Tuple**>(malloc(tuple_ptrs_size_));
> I don't think we should zero-initialise this. We can't depend on the tuple_
Reverted to the existing behavior in PS3.



-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Nov 2017 20:40:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8623/1/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8623/1/be/src/exec/hdfs-scanner.h@333
PS1, Line 333: If template tuple is not NULL
what if the template_tuple_ is null?  Doesn't it effectively set it to that as well? i.e. this case doesn't seem special from an interface perspective.


http://gerrit.cloudera.org:8080/#/c/8623/1/be/src/exec/hdfs-scanner.h@337
PS1, Line 337: HandleEmptyProjection
this has the same name as the KuduScanner method, but they are a bit different. it'd be nice to either make them similar in terms of their side effects or give it a different name.  Maybe WriteEmptyProjection()?


http://gerrit.cloudera.org:8080/#/c/8623/1/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8623/1/be/src/exec/hdfs-scanner.cc@215
PS1, Line 215: should
must, or "which don't reference any slots"



-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Nov 2017 17:42:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1540/


-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Nov 2017 02:21:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Alex Behm, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8623

to look at the new patch set (#3).

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................

IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Previously, scanners will assume that there are no conjuncts associated
with a scan node for queries with no materialized slots (e.g. count(*)).
This is not necessarily the case as one can write queries such as
select count(*) from tpch.lineitem where rand() * 10 < 0; or
select count(*) from tpch.lineitem where rand() > <a partition column>.
In which case, the conjuncts should still be evaluated once per row.

This change fixes the problem in the short-circuit handling logic for
count(*) to evaluate the conjuncts once per row and only commits a row
to the output row batch if the conjuncts evaluate to true.

Testing done: Added the example above to the scanner test

Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
---
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test
M testdata/workloads/functional-query/queries/QueryTest/scanners.test
11 files changed, 80 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8623/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8623/1/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8623/1/be/src/exec/hdfs-scanner.h@333
PS1, Line 333: If template tuple is not NULL
> Does the calloc hurt performance? Since the common case is not-null, why no
Actually the null case is common for count(*) of course. So, maybe we should just leave the malloc and if-cond there for now until we have a real reason to change it.



-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Nov 2017 18:58:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8623/3/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8623/3/be/src/exec/hdfs-scanner.h@337
PS3, Line 337: WriteEmptyProjection
I think this name is still misleading especially compared to Kudu case, since in this case, the partition columns may be projected (so the projection isn't actually empty). 

Maybe MaterializeTemplateTuples()?


http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc@226
PS2, Line 226:     DCHECK_EQ(scan_node_->tuple_desc()->byte_size(), 0);
> PS3 reverted to the existing behavior for now. I think setting it to some d
I agree writing a dummy non-NULL pointer seems most correct, but have we determined whether it's necessary (and how to express why it's not necessary)?


http://gerrit.cloudera.org:8080/#/c/8623/3/testdata/workloads/functional-query/queries/QueryTest/scanners.test
File testdata/workloads/functional-query/queries/QueryTest/scanners.test:

http://gerrit.cloudera.org:8080/#/c/8623/3/testdata/workloads/functional-query/queries/QueryTest/scanners.test@93
PS3, Line 93: ---- QUERY
let's add a comment:
# 'year' and 'month' are partition columns.
or whatever the case is.



-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Nov 2017 22:28:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 3:

> (1 comment)
 > 
 > Would it make sense to leave the existing behaviour and file the
 > potential bug with NULL tuples as a separate JIRA? It sounds like
 > the problem and solution for this bug are clearer but the other
 > problem is very unclear.

Yes, I think separating them makes sense.


-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Nov 2017 01:06:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8623/1/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8623/1/be/src/exec/hdfs-scanner.h@333
PS1, Line 333: If template tuple is not NULL
> what if the template_tuple_ is null?  Doesn't it effectively set it to that
Actually, reading the old code, we seem to leave those tuple ptrs alone so they are most likely random values as we used malloc() to allocate the tuple_ptrs_ buffer in the RowBatch constructor. May be I misunderstood it. I switched to using calloc() in the constructor. Someone more familiar with the intricacy of null tuple pointer should see if it makes sense or flags it as unnecessary.


http://gerrit.cloudera.org:8080/#/c/8623/1/be/src/exec/hdfs-scanner.h@337
PS1, Line 337: WriteEmptyProjection(
> this has the same name as the KuduScanner method, but they are a bit differ
Renamed the function.


http://gerrit.cloudera.org:8080/#/c/8623/1/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8623/1/be/src/exec/hdfs-scanner.cc@215
PS1, Line 215: 
> must, or "which don't reference any slots"
Rephrased to "should reference partition columns only if any."


http://gerrit.cloudera.org:8080/#/c/8623/1/be/src/exec/hdfs-scanner.cc@215
PS1, Line 215: 
> I'm wondering whether this assertion and the code is correct. What if we ha
Good point. Fixed in the new patch. Also added a new test case.


http://gerrit.cloudera.org:8080/#/c/8623/1/testdata/workloads/functional-query/queries/QueryTest/scanners.test
File testdata/workloads/functional-query/queries/QueryTest/scanners.test:

http://gerrit.cloudera.org:8080/#/c/8623/1/testdata/workloads/functional-query/queries/QueryTest/scanners.test@80
PS1, Line 80: select count(*) from alltypes where rand() * 10 >= 0.0;
> I think rand() can return 0.0, so maybe it would be better to make this >= 
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Nov 2017 02:31:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 4:

Nice repro!


-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Nov 2017 02:11:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 2:

(1 comment)

Would it make sense to leave the existing behaviour and file the potential bug with NULL tuples as a separate JIRA? It sounds like the problem and solution for this bug are clearer but the other problem is very unclear.

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc@226
PS2, Line 226:     DCHECK_EQ(scan_node_->tuple_desc()->byte_size(), 0);
> I agree writing a dummy non-NULL pointer seems most correct, but have we de
I was able to construct a query (based on an existing test) that had a TupleIsNull() predicate referencing the tuple from scan that doesn't materialise anything:

  SELECT /* +straight_join */ COUNT(t1.id)
  FROM functional.alltypessmall t1
  LEFT OUTER JOIN (
    SELECT /* +straight_join */ IFNULL(t2.int_col, 1) AS c
    FROM functional.alltypessmall t2
    LEFT OUTER JOIN functional.alltypestiny t3 ON t2.id < 1000
  ) v ON t1.int_col = v.c;

The relevant part of the plan is:

    | 04:HASH JOIN [LEFT OUTER JOIN, PARTITIONED]                                         |
    | |  hash predicates: t1.int_col = if(TupleIsNull(1, 2), NULL, ifnull(t2.int_col, 1)) |
    | |  fk/pk conjuncts: assumed fk/pk                                                   |
    | |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB                  |
    | |  tuple-ids=0,1N,2N row-size=16B cardinality=100                                   |
    | |                                                                                   |
    | |--08:EXCHANGE [HASH(if(TupleIsNull(1, 2), NULL, ifnull(t2.int_col, 1)))]           |
    | |  |  mem-estimate=0B mem-reservation=0B                                            |
    | |  |  tuple-ids=1,2N row-size=8B cardinality=100                                    |
    | |  |                                                                                |
    | |  F01:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                                   |
    | |  Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B                      |
    | |  03:NESTED LOOP JOIN [LEFT OUTER JOIN, BROADCAST]                                 |
    | |  |  join predicates: t2.id < 1000                                                 |
    | |  |  mem-estimate=0B mem-reservation=0B                                            |
    | |  |  tuple-ids=1,2N row-size=8B cardinality=100                                    |
    | |  |                                                                                |
    | |  |--06:EXCHANGE [BROADCAST]                                                       |
    | |  |  |  mem-estimate=0B mem-reservation=0B                                         |
    | |  |  |  tuple-ids=2 row-size=0B cardinality=8                                      |
    | |  |  |                                                                             |
    | |  |  F02:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                                |
    | |  |  Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B                   |
    | |  |  02:SCAN HDFS [functional.alltypestiny t3, RANDOM]                             |
    | |  |     partitions=4/4 files=4 size=460B                                           |
    | |  |     stats-rows=8 extrapolated-rows=disabled                                    |
    | |  |     table stats: rows=8 size=unavailable                                       |
    | |  |     column stats: all                                                          |
    | |  |     mem-estimate=32.00MB mem-reservation=0B                                    |
    | |  |     tuple-ids=2 row-size=0B cardinality=8                                      |
    | |  |                                                                                |
    | |  01:SCAN HDFS [functional.alltypessmall t2, RANDOM]                               |
    | |     partitions=4/4 files=4 size=6.32KB                                            |
    | |     stats-rows=100 extrapolated-rows=disabled                                     |
    | |     table stats: rows=100 size=unavailable                                        |
    | |     column stats: all                                                             |
    | |     mem-estimate=32.00MB mem-reservation=0B                                       |
    | |     tuple-ids=1 row-size=8B cardinality=100                                       |
      

So it does look like the scenario is somehow possible. I don't really understand exactly why it is needed here.



-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Nov 2017 00:41:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 1:

What about HBase?


-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Nov 2017 19:46:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc@226
PS2, Line 226:     DCHECK_EQ(scan_node_->tuple_desc()->byte_size(), 0);
This seems to violate the invariant that the scan node only produces non-NULL tuples (the previous version could too). I'm not sure if the planner can produce plans that have a TupleIsNull() predicate on a scan producing zero slots, because as far as I understand, TupleIsNull() predicates are only emitted when a slot from the scan is referenced, but I could be wrong - I feel like there could be some edge case if you get creative with nested loop joins, etc (which pass the tuple pointers through unmodified).

I think we should either write a dummy non-NULL pointer or have a comment explaining why that's not necessary.


http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/runtime/row-batch.cc@60
PS2, Line 60:   tuple_ptrs_ = reinterpret_cast<Tuple**>(calloc(1, tuple_ptrs_size_));
I don't think we should zero-initialise this. We can't depend on the tuple_ptrs_ being zero-initialised since Reset() doesn't zero-initialise them, so it's just unnecessary overhead.



-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Nov 2017 19:52:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Tim Armstrong, Alex Behm, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8623

to look at the new patch set (#4).

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................

IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Previously, scanners will assume that there are no conjuncts associated
with a scan node for queries with no materialized slots (e.g. count(*)).
This is not necessarily the case as one can write queries such as
select count(*) from tpch.lineitem where rand() * 10 < 0; or
select count(*) from tpch.lineitem where rand() > <a partition column>.
In which case, the conjuncts should still be evaluated once per row.

This change fixes the problem in the short-circuit handling logic for
count(*) to evaluate the conjuncts once per row and only commits a row
to the output row batch if the conjuncts evaluate to true.

Testing done: Added the example above to the scanner test

Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test
M testdata/workloads/functional-query/queries/QueryTest/scanners.test
8 files changed, 74 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8623/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 4: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Nov 2017 01:24:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................

IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Previously, scanners will assume that there are no conjuncts associated
with a scan node for queries with no materialized slots (e.g. count(*)).
This is not necessarily the case as one can write queries such as
select count(*) from tpch.lineitem where rand() * 10 < 0; or
select count(*) from tpch.lineitem where rand() > <a partition column>.
In which case, the conjuncts should still be evaluated once per row.

This change fixes the problem in the short-circuit handling logic for
count(*) to evaluate the conjuncts once per row and only commits a row
to the output row batch if the conjuncts evaluate to true.

Testing done: Added the example above to the scanner test

Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Reviewed-on: http://gerrit.cloudera.org:8080/8623
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Reviewed-by: Alex Behm <al...@cloudera.com>
Reviewed-by: Dan Hecht <dh...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test
M testdata/workloads/functional-query/queries/QueryTest/scanners.test
8 files changed, 74 insertions(+), 11 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, but someone else must approve
  Alex Behm: Looks good to me, but someone else must approve
  Dan Hecht: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 1: Code-Review+1

(1 comment)

I'm comfortable with this. Not sure if anyone else wants to take a look.

http://gerrit.cloudera.org:8080/#/c/8623/1/testdata/workloads/functional-query/queries/QueryTest/scanners.test
File testdata/workloads/functional-query/queries/QueryTest/scanners.test:

http://gerrit.cloudera.org:8080/#/c/8623/1/testdata/workloads/functional-query/queries/QueryTest/scanners.test@80
PS1, Line 80: select count(*) from alltypes where rand() * 10 > 0;
I think rand() can return 0.0, so maybe it would be better to make this >= so it's guaranteed to be always true.



-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Nov 2017 17:35:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8623/1/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8623/1/be/src/exec/hdfs-scanner.h@333
PS1, Line 333: If template tuple is not NULL
> Actually, reading the old code, we seem to leave those tuple ptrs alone so 
Does the calloc hurt performance? Since the common case is not-null, why not use malloc and just remove the if (template_tuple_ != nullptr) check instead so we always set the pointers?



-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Nov 2017 18:14:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6187: Fix missing conjuncts evaluation with empty projection

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8623 )

Change subject: IMPALA-6187: Fix missing conjuncts evaluation with empty projection
......................................................................


Patch Set 4:

(1 comment)

Looks okay to me (ignoring IMPALA-6258). Please have Alex take a look though.

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8623/2/be/src/exec/hdfs-scanner.cc@226
PS2, Line 226:     DCHECK_EQ(scan_node_->tuple_desc()->byte_size(), 0);
> I tried malloc vs. calloc in row-batch.cc on the following query:
Right, we all agree calloc doesn't make sense (the pointers shouldn't be NULL). But we also shouldn't leave it up to chance that they aren't null, hence IMPALA-6258.



-- 
To view, visit http://gerrit.cloudera.org:8080/8623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib530f1fdcd2c6de699977db163b3f6eb38481517
Gerrit-Change-Number: 8623
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Nov 2017 01:37:09 +0000
Gerrit-HasComments: Yes