You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2017/09/29 16:40:05 UTC

[Impala-ASF-CR] PREVIEW: IMPALA-5307: Part 4: copy out uncompressed text and seq

Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8172


Change subject: PREVIEW: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................

PREVIEW: IMPALA-5307: Part 4: copy out uncompressed text and seq

This is the final patch for IMPALA-5307.

Text and Seq scanners are converted to use the same approach
as Avro.

contains_tuple_data is now false so a bunch of dead code in
ScannerContext can be removed. We also no longer attach I/O
buffers to row batches so that logic can be removed.

Testing:
Ran core ASAN tests.

Perf:
TODO - need to run numbers

Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
20 files changed, 85 insertions(+), 156 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/8172/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 9:

(1 comment)

Rebased onto master now that the predecessor patch is merged.

http://gerrit.cloudera.org:8080/#/c/8172/9/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/8172/9/be/src/exec/hdfs-scan-node-base.cc@401
PS9, Line 401:     state->AcquireReaderContext(reader_context_);
It's also possible to remove this logic as a consequence of this patch, but I ended up doing it in a follow-on patch. https://gerrit.cloudera.org/#/c/8408/



-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 03 Nov 2017 18:58:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8172/8/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8172/8/be/src/exec/hdfs-text-scanner.cc@704
PS8, Line 704:   bool split_delimiter_possible = context_->partition_descriptor()->line_delim() == '\n'
I think there's a latent bug here where we're touching byte_buffer_ *after* calling CommitRows(). (I hit some ASAN issues in a follow-on patch). This patch increases the chances of hitting this since memory is recycled earlier.



-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 31 Oct 2017 18:20:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 8: Code-Review+1

rebase


-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 31 Oct 2017 04:31:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Alex Behm, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8172

to look at the new patch set (#12).

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................

IMPALA-5307: Part 4: copy out uncompressed text and seq

This is the final patch for IMPALA-5307.

Text and Seq scanners are converted to use the same approach
as Avro.

contains_tuple_data is now false so a bunch of dead code in
ScannerContext can be removed. We also no longer attach I/O
buffers to row batches so that logic can be removed.

Testing:
Ran core ASAN tests.

Perf:
I reran the same benchmarks as in Part 2. There was no measurable
difference before and after - for both text and seq processing time
is dominated by text parsing.

Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/exec/text-converter.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
23 files changed, 133 insertions(+), 211 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/8172/12
-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 12
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 15:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1445/


-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 15
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2017 01:07:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8172/13/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8172/13/be/src/exec/hdfs-scanner.h@274
PS13, Line 274: //
> nit: ///
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 13
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2017 01:06:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/hdfs-scanner.h@276
PS4, Line 276:   std::vector<SlotOffsets> string_slot_offsets_;
> What's the reason for keeping a separate vector? We also have these in the 
I introduced this in https://gerrit.cloudera.org/#/c/8172/ .    It comes out of the codegen approach used is to cross-compile the interpreted code and then generate a constant IR version of the descriptor so that LLVM can unroll the loop (if it has few enough iterations). Two reasons:
* It's easier to generate the constant IR version of the descriptors since the data structure is much simpler.
* The interpreted path and the case where the loop isn't unrolled will be more efficient walking the dense array instead of walking the whole slot descriptor structure.


http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/hdfs-scanner.h@341
PS4, Line 341:   /// - 'copy_out_strings': if true, strings in returned tuples that pass conjuncts are
> copy_strings?
Done


http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/scanner-context.cc
File be/src/exec/scanner-context.cc:

http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/scanner-context.cc@110
PS4, Line 110:     parent_->scan_node_->num_owned_io_buffers_.Add(-1);
> Not sure this counter is still useful. Do you think we can get rid of it?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Oct 2017 18:19:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................

IMPALA-5307: Part 4: copy out uncompressed text and seq

This is the final patch for IMPALA-5307.

Text and Seq scanners are converted to use the same approach
as Avro.

contains_tuple_data is now false so a bunch of dead code in
ScannerContext can be removed. We also no longer attach I/O
buffers to row batches so that logic can be removed.

Testing:
Ran core ASAN tests.

Perf:
I reran the same benchmarks as in Part 2. There was no measurable
difference before and after - for both text and seq processing time
is dominated by text parsing.

Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Reviewed-on: http://gerrit.cloudera.org:8080/8172
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/exec/text-converter.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
23 files changed, 144 insertions(+), 212 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 17
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/8172/8/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/8172/8/be/src/exec/hdfs-scan-node.cc@238
PS8, Line 238: 
> nit: a lot of unnecessary blank lines. maybe condense a bit so more code fi
Done


http://gerrit.cloudera.org:8080/#/c/8172/8/be/src/exec/hdfs-sequence-scanner.cc
File be/src/exec/hdfs-sequence-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8172/8/be/src/exec/hdfs-sequence-scanner.cc@329
PS8, Line 329:       add_row = WriteCompleteTuple(row_batch->tuple_data_pool(), field_locations_.data(),
> I missed updating this additional code path, we also need to copy out strin
Done. The below test was previously failing but now succeeds. We should get better coverage of some of these things once we switch the I/O mgr to the buffer pool and have ASAN poisoning enabled for recycled buffers.

  ./buildall.sh -asan -skiptests -noclean -ninja -notests && start-impala-cluster.py --impalad_args=--disable_mem_pools=true  && impala-py.test -n1 --verbose tests/query_test/test_scanners.py tests/query_test/test_aggregation.py --workload_exploration_strategy=functional-query:exhaustive -k seq


http://gerrit.cloudera.org:8080/#/c/8172/8/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8172/8/be/src/exec/hdfs-text-scanner.cc@704
PS8, Line 704:   bool split_delimiter_possible = context_->partition_descriptor()->line_delim() == '\n'
> I think there's a latent bug here where we're touching byte_buffer_ *after*
I filed IMPALA-6137. I think there are pre-existing bugs here so won't tackle them in this patch (this patch may have to wait for those though since it increase the odds of hitting them).



-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 31 Oct 2017 21:48:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 8:

This is a follow-on to the last patch you reviewed. Hopefully should be easier to review since it's repeating the previous pattern and deleting dead code only.


-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 31 Oct 2017 17:43:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 11: Code-Review+1

rebased to pick up IMPALA-6137 fix


-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 Nov 2017 17:31:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 13: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8172/13/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8172/13/be/src/exec/hdfs-scanner.h@274
PS13, Line 274: //
nit: ///



-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 13
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Nov 2017 22:50:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] PREVIEW: IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/8172 )

Change subject: PREVIEW: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................

PREVIEW: IMPALA-5307: Part 4: copy out uncompressed text and seq

This is the final patch for IMPALA-5307.

Text and Seq scanners are converted to use the same approach
as Avro.

contains_tuple_data is now false so a bunch of dead code in
ScannerContext can be removed. We also no longer attach I/O
buffers to row batches so that logic can be removed.

Testing:
Ran core ASAN tests.

Perf:
TODO - need to run numbers

Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
20 files changed, 118 insertions(+), 192 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/8172/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 5: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/hdfs-scanner.h@276
PS4, Line 276:   std::vector<SlotOffsets> string_slot_offsets_;
> I introduced this in https://gerrit.cloudera.org/#/c/8172/ .    It comes ou
Sounds good, please add a brief comment.



-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Oct 2017 19:03:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 8:

(3 comments)

Michael, can you do the +2 for this one too?

http://gerrit.cloudera.org:8080/#/c/8172/8/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/8172/8/be/src/exec/hdfs-scan-node.cc@238
PS8, Line 238: 
nit: a lot of unnecessary blank lines. maybe condense a bit so more code fits on a screen.


http://gerrit.cloudera.org:8080/#/c/8172/8/be/src/exec/scanner-context.h
File be/src/exec/scanner-context.h:

http://gerrit.cloudera.org:8080/#/c/8172/8/be/src/exec/scanner-context.h@a184
PS8, Line 184: 
nice to see this get simplified


http://gerrit.cloudera.org:8080/#/c/8172/8/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/8172/8/be/src/runtime/row-batch.h@a215
PS8, Line 215: 
great to see this go



-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 31 Oct 2017 18:21:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/hdfs-scanner.h@276
PS4, Line 276:   std::vector<SlotOffsets> string_slot_offsets_;
> Sounds good, please add a brief comment.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Oct 2017 23:35:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Alex Behm, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8172

to look at the new patch set (#13).

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................

IMPALA-5307: Part 4: copy out uncompressed text and seq

This is the final patch for IMPALA-5307.

Text and Seq scanners are converted to use the same approach
as Avro.

contains_tuple_data is now false so a bunch of dead code in
ScannerContext can be removed. We also no longer attach I/O
buffers to row batches so that logic can be removed.

Testing:
Ran core ASAN tests.

Perf:
I reran the same benchmarks as in Part 2. There was no measurable
difference before and after - for both text and seq processing time
is dominated by text parsing.

Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/exec/text-converter.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
23 files changed, 137 insertions(+), 212 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/8172/13
-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 13
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 16: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 16
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2017 10:09:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] PREVIEW: IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: PREVIEW: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/exec/exec-node.h@109
PS1, Line 109:   /// Caller must not be holding any io buffers. This will cause deadlock.
> Remove - not relevant even before this patch
Done


http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/exec/hdfs-scanner.h@99
PS1, Line 99: /// resources (IO buffers and mem pools) to the current row batch, and passing row batches
> Needs update
Done


http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/runtime/row-batch.h@218
PS1, Line 218:   /// make it consistent between buffers and I/O buffers.
> Needs update
Done


http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/runtime/row-batch.h@254
PS1, Line 254:   /// pool and io buffers.
> needs update
Done


http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/runtime/row-batch.h@275
PS1, Line 275:   /// Acquires state from the 'src' row batch into this row batch. This includes all IO
> Needs update
Done


http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/runtime/row-batch.h@397
PS1, Line 397:   /// Sum of all auxiliary bytes. This includes IoBuffers and memory from
> Old reference to IoBuffers. Can probably rename to total_buffer_bytes_ or s
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 29 Sep 2017 23:51:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 6: Code-Review+1

carry +1


-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Oct 2017 23:35:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 12:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner-ir.cc
File be/src/exec/hdfs-scanner-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner-ir.cc@61
PS11, Line 61: return 0;
Is there a reason why we don't treat this as parse error and returns -1 ?


http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner.h@347
PS11, Line 347:   /// Returns -1 if parsing should be aborted due to parse errors.
May help to also add "Returns 0 if copying strings into 'pool' failed'.


http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner.cc@313
PS11, Line 313: TextConverter::CodegenWriteSlot
> I did think about doing that (and prototyped a similar approach with Avro),
Good point. Thanks for adding the comment.


http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-sequence-scanner.cc
File be/src/exec/hdfs-sequence-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-sequence-scanner.cc@a282
PS11, Line 282: 
It may be trivial but does it make sense to add DCHECK(row_batch->num_tuples_per_row(), 1) somewhere ? That seems to be the assumption WriteAlignedTuple() is making when updating the tuple_row_mem ptr.


http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-text-scanner.cc@849
PS11, Line 849: tuples_returned
There is a subtle behavior here if Tuple::CopyStrings() failed in WriteAlignedTuples(). In that case, tuples_returned here is 0 but we will continue to line 865 below although we would still deduct num_fields at line 857 below. It acts as if num_tuples * scan_node_->materialized_slots().size() worth of slots were ignored. Is there a reason why we should continue in this case ? Please also see comments on the return value at WriteAlignedTuples().


http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-text-scanner.cc@857
PS11, Line 857: num_tuples
Why is this not tuples_returned here ? Is there any guarantee max_added_tuples >= num_tuples ?



-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 12
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Nov 2017 03:19:16 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8172

to look at the new patch set (#6).

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................

IMPALA-5307: Part 4: copy out uncompressed text and seq

This is the final patch for IMPALA-5307.

Text and Seq scanners are converted to use the same approach
as Avro.

contains_tuple_data is now false so a bunch of dead code in
ScannerContext can be removed. We also no longer attach I/O
buffers to row batches so that logic can be removed.

Testing:
Ran core ASAN tests.

Perf:
I reran the same benchmarks as in Part 2. There was no measurable
difference before and after - for both text and seq processing time
is dominated by text parsing.

Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
22 files changed, 121 insertions(+), 204 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/8172/6
-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner.cc@234
PS11, Line 234: false
> Should the callers pass an argument 'copy_strings' ? On the other hand, thi
I think my other comment addresses this - we want to copy the strings only on tuples that survive conjunct evaluation.


http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner.cc@313
PS11, Line 313: TextConverter::CodegenWriteSlot
> One thing I noticed is that the TextConverter::WriteSlot() already has logi
I did think about doing that (and prototyped a similar approach with Avro), but the disadvantage of doing the copy there is that it's before conjuncts are evaluated, so we pay the cost of copying even if the row is immediately filtered out.

Selective scans are a case where this patch doesn't take a significant performance hit, but the alternative approach would be significantly slower.

It would be really nice to switch this to cross-compilation since it's so complex. Left a comment.



-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 Nov 2017 21:41:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8172/8/be/src/exec/hdfs-sequence-scanner.cc
File be/src/exec/hdfs-sequence-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8172/8/be/src/exec/hdfs-sequence-scanner.cc@329
PS8, Line 329:       add_row = WriteCompleteTuple(row_batch->tuple_data_pool(), field_locations_.data(),
I missed updating this additional code path, we also need to copy out strings here.



-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 31 Oct 2017 21:40:26 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8172

to look at the new patch set (#5).

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................

IMPALA-5307: Part 4: copy out uncompressed text and seq

This is the final patch for IMPALA-5307.

Text and Seq scanners are converted to use the same approach
as Avro.

contains_tuple_data is now false so a bunch of dead code in
ScannerContext can be removed. We also no longer attach I/O
buffers to row batches so that logic can be removed.

Testing:
Ran core ASAN tests.

Perf:
I reran the same benchmarks as in Part 2. There was no measurable
difference before and after - for both text and seq processing time
is dominated by text parsing.

Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
22 files changed, 119 insertions(+), 202 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/8172/5
-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8172

to look at the new patch set (#7).

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................

IMPALA-5307: Part 4: copy out uncompressed text and seq

This is the final patch for IMPALA-5307.

Text and Seq scanners are converted to use the same approach
as Avro.

contains_tuple_data is now false so a bunch of dead code in
ScannerContext can be removed. We also no longer attach I/O
buffers to row batches so that logic can be removed.

Testing:
Ran core ASAN tests.

Perf:
I reran the same benchmarks as in Part 2. There was no measurable
difference before and after - for both text and seq processing time
is dominated by text parsing.

Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
22 files changed, 121 insertions(+), 204 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/8172/7
-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 4:

(3 comments)

Nice cleanup! I don't have much to say - patch makes a lot of sense to me.

http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/hdfs-scanner.h@276
PS4, Line 276:   std::vector<SlotOffsets> string_slot_offsets_;
What's the reason for keeping a separate vector? We also have these in the scan node's tuple descriptor.


http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/hdfs-scanner.h@341
PS4, Line 341:   /// - 'copy_out_strings': if true, strings in returned tuples that pass conjuncts are
copy_strings?


http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/scanner-context.cc
File be/src/exec/scanner-context.cc:

http://gerrit.cloudera.org:8080/#/c/8172/4/be/src/exec/scanner-context.cc@110
PS4, Line 110:     parent_->scan_node_->num_owned_io_buffers_.Add(-1);
Not sure this counter is still useful. Do you think we can get rid of it?



-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Oct 2017 05:37:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Alex Behm, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8172

to look at the new patch set (#14).

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................

IMPALA-5307: Part 4: copy out uncompressed text and seq

This is the final patch for IMPALA-5307.

Text and Seq scanners are converted to use the same approach
as Avro.

contains_tuple_data is now false so a bunch of dead code in
ScannerContext can be removed. We also no longer attach I/O
buffers to row batches so that logic can be removed.

Testing:
Ran core ASAN tests.

Perf:
I reran the same benchmarks as in Part 2. There was no measurable
difference before and after - for both text and seq processing time
is dominated by text parsing.

Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/exec/text-converter.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
23 files changed, 137 insertions(+), 212 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/8172/14
-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 14
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Alex Behm, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8172

to look at the new patch set (#9).

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................

IMPALA-5307: Part 4: copy out uncompressed text and seq

This is the final patch for IMPALA-5307.

Text and Seq scanners are converted to use the same approach
as Avro.

contains_tuple_data is now false so a bunch of dead code in
ScannerContext can be removed. We also no longer attach I/O
buffers to row batches so that logic can be removed.

Testing:
Ran core ASAN tests.

Perf:
I reran the same benchmarks as in Part 2. There was no measurable
difference before and after - for both text and seq processing time
is dominated by text parsing.

Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
22 files changed, 131 insertions(+), 209 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/8172/9
-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 9
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Alex Behm, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8172

to look at the new patch set (#8).

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................

IMPALA-5307: Part 4: copy out uncompressed text and seq

This is the final patch for IMPALA-5307.

Text and Seq scanners are converted to use the same approach
as Avro.

contains_tuple_data is now false so a bunch of dead code in
ScannerContext can be removed. We also no longer attach I/O
buffers to row batches so that logic can be removed.

Testing:
Ran core ASAN tests.

Perf:
I reran the same benchmarks as in Part 2. There was no measurable
difference before and after - for both text and seq processing time
is dominated by text parsing.

Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
22 files changed, 121 insertions(+), 204 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/8172/8
-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 7: Code-Review+1

rebase


-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 30 Oct 2017 17:10:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 15: Code-Review+2

carry


-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 15
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2017 01:06:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Michael Ho, Alex Behm, Dan Hecht, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8172

to look at the new patch set (#16).

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................

IMPALA-5307: Part 4: copy out uncompressed text and seq

This is the final patch for IMPALA-5307.

Text and Seq scanners are converted to use the same approach
as Avro.

contains_tuple_data is now false so a bunch of dead code in
ScannerContext can be removed. We also no longer attach I/O
buffers to row batches so that logic can be removed.

Testing:
Ran core ASAN tests.

Perf:
I reran the same benchmarks as in Part 2. There was no measurable
difference before and after - for both text and seq processing time
is dominated by text parsing.

Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-scanner-ir.cc
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/exec/text-converter.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
23 files changed, 144 insertions(+), 212 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/8172/16
-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 16
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 16:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1447/


-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 16
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2017 06:45:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 16: Code-Review+2

Forgot that this needed a coordinated change with the Impala-lzo plugin. That has one callsite:

 context_->ReleaseCompletedResources(nullptr, false);

So I just added a wrapper function with the old API that we can remove once it's not needed.


-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 16
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2017 06:45:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 11:

Taking a look now.


-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 Nov 2017 19:24:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................

IMPALA-5307: Part 4: copy out uncompressed text and seq

This is the final patch for IMPALA-5307.

Text and Seq scanners are converted to use the same approach
as Avro.

contains_tuple_data is now false so a bunch of dead code in
ScannerContext can be removed. We also no longer attach I/O
buffers to row batches so that logic can be removed.

Testing:
Ran core ASAN tests.

Perf:
I reran the same benchmarks as in Part 2. There was no measurable
difference before and after - for both text and seq processing time
is dominated by text parsing.

Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-avro-scanner.cc
M be/src/exec/hdfs-avro-scanner.h
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-scan-node-mt.cc
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/scanner-context.cc
M be/src/exec/scanner-context.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
20 files changed, 118 insertions(+), 192 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/8172/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 15: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/1445/


-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 15
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Nov 2017 02:12:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 11:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner-ir.cc
File be/src/exec/hdfs-scanner-ir.cc:

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner-ir.cc@61
PS11, Line 61: return 0;
> Is there a reason why we don't treat this as parse error and returns -1 ?
No that's a good point. I changed it.


http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner.h@347
PS11, Line 347:   /// Returns -1 if parsing should be aborted due to parse errors.
> May help to also add "Returns 0 if copying strings into 'pool' failed'.
Clarified that it could be a memory allocation error. Per the other comment, it probably makes most sense to just return -1.


http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-sequence-scanner.cc
File be/src/exec/hdfs-sequence-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-sequence-scanner.cc@a282
PS11, Line 282: 
> It may be trivial but does it make sense to add DCHECK(row_batch->num_tuple
The assumption of 1 tuple per row for scans is pretty deeply baked into the scanners and planner, so it seemed silly to have this speculative generality here. I added a DCHECK to HdfsScanner to document that the assumption is valid for all HDFS scanners.


http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-text-scanner.cc
File be/src/exec/hdfs-text-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-text-scanner.cc@849
PS11, Line 849: tuples_returned
> There is a subtle behavior here if Tuple::CopyStrings() failed in WriteAlig
Yeah I think I just made a mistake returning 0. Returning -1 makes more sense. I manually tested with this query:
    [localhost:21000] > set num_nodes=1; set disable_codegen=1; set mem_limit=5m; select * from widetable_1000_cols;
    NUM_NODES set to 1
    DISABLE_CODEGEN set to 1
    MEM_LIMIT set to 5m
    Query: select * from widetable_1000_cols
    Query submitted at: 2017-11-07 11:52:01 (Coordinator: http://tarmstrong-box:25000)
    Query progress can be monitored at: http://tarmstrong-box:25000/query_plan?query_id=9948285e1aa8e172:f73fe97100000000
    ERROR: Memory limit exceeded: HdfsScanner::WriteAlignedTuples() failed to allocate 125 bytes for strings.
    HDFS_SCAN_NODE (id=0) could not allocate 125.00 B without exceeding limit.
    Error occurred on backend tarmstrong-box:22000 by fragment 9948285e1aa8e172:f73fe97100000000
    Memory left in process limit: 8.01 GB
    Memory left in query limit: 947.59 KB
    Query(9948285e1aa8e172:f73fe97100000000): Limit=5.00 MB Reservation=0 ReservationLimit=0 OtherMemory=4.07 MB Total=4.07 MB Peak=4.07 MB
      Fragment 9948285e1aa8e172:f73fe97100000000: Reservation=0 OtherMemory=4.07 MB Total=4.07 MB Peak=4.07 MB
        HDFS_SCAN_NODE (id=0): Total=4.07 MB Peak=4.07 MB
        PLAN_ROOT_SINK: Total=0 Peak=0

It would be nice to turn it into an end-to-end test but I cant see how to make it robustly fail in that place.


http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-text-scanner.cc@857
PS11, Line 857: num_tuples
> Why is this not tuples_returned here ? Is there any guarantee max_added_tup
I think num_fields is tracking the number of fields consumed from the input.



-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Nov 2017 19:54:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 10: Code-Review+1

Carry Alex's +1


-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 10
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 03 Nov 2017 18:59:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner.cc@234
PS11, Line 234: false
Should the callers pass an argument 'copy_strings' ? On the other hand, this may diverge from the codegen'd path. May be a TODO for now ?


http://gerrit.cloudera.org:8080/#/c/8172/11/be/src/exec/hdfs-scanner.cc@313
PS11, Line 313: TextConverter::CodegenWriteSlot
One thing I noticed is that the TextConverter::WriteSlot() already has logic to copy the char/varchar/string to the argument 'pool' although the caller below at line 233 always passes 'copy_string' as false. Not sure if you considered refactoring it or it's not worth it ?

Ideally, if we can propagate replace 'slot_desc' with a constant argument, we may be able to re-use the logic in TextConverter::WriteSlot() and avoids the need for hand-crafted IR in TextConverter::CodegenWriteSlot(). This doesn't have to be part of this change as this patch is more about memory transfer.

May be a TODO ?



-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 Nov 2017 21:29:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] PREVIEW: IMPALA-5307: Part 4: copy out uncompressed text and seq

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8172 )

Change subject: PREVIEW: IMPALA-5307: Part 4: copy out uncompressed text and seq
......................................................................


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/exec/exec-node.h
File be/src/exec/exec-node.h:

http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/exec/exec-node.h@109
PS1, Line 109:   /// Caller must not be holding any io buffers. This will cause deadlock.
Remove - not relevant even before this patch


http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/exec/hdfs-scanner.h
File be/src/exec/hdfs-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/exec/hdfs-scanner.h@99
PS1, Line 99: /// resources (IO buffers and mem pools) to the current row batch, and passing row batches
Needs update


http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/runtime/row-batch.h@51
PS1, Line 51: ///   3. Auxiliary tuple memory (e.g. string data) - this can either be stored externally
Needs update


http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/runtime/row-batch.h@65
PS1, Line 65: /// is IoBuffers which are only attached to one row batch. Only when the row batch reaches
Needs update


http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/runtime/row-batch.h@218
PS1, Line 218:   /// make it consistent between buffers and I/O buffers.
Needs update


http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/runtime/row-batch.h@254
PS1, Line 254:   /// pool and io buffers.
needs update


http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/runtime/row-batch.h@275
PS1, Line 275:   /// Acquires state from the 'src' row batch into this row batch. This includes all IO
Needs update


http://gerrit.cloudera.org:8080/#/c/8172/1/be/src/runtime/row-batch.h@397
PS1, Line 397:   /// Sum of all auxiliary bytes. This includes IoBuffers and memory from
Old reference to IoBuffers. Can probably rename to total_buffer_bytes_ or similar.



-- 
To view, visit http://gerrit.cloudera.org:8080/8172
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I304fd002b61bfedf41c8b1405cd7eb7b492bb941
Gerrit-Change-Number: 8172
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Fri, 29 Sep 2017 16:48:26 +0000
Gerrit-HasComments: Yes