You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2016/06/21 00:36:13 UTC

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Tim Armstrong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3413

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................

IMPALA-3729: batch_size=1 coverage for avro scanner

Also fix a stale comment in the avro scanner header.

The main work here is to fix the handling of empty result sets in the
test result verify. This because a problem because we wanted to verify
that the results in the test file were a superset of the rows
returned, and this was thrown of by superflous '' rows in the expected
and actual result sets.

The basic problem is that the way test file sections
was parsed conflated an empty result section with non-empty result
section that had a single empty string. I.e.:

---- RESULTS
====

vs
---- RESULTS

====

both got resolved to [''].

Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
---
M be/src/exec/hdfs-avro-scanner.h
M bin/load-data.py
M testdata/bin/generate-schema-statements.py
M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
M testdata/workloads/functional-query/queries/QueryTest/load.test
M testdata/workloads/functional-query/queries/QueryTest/test-unmatched-schema.test
M tests/beeswax/impala_beeswax.py
M tests/common/test_result_verifier.py
M tests/data_errors/test_data_errors.py
M tests/unittests/test_file_parser.py
M tests/util/test_file_parser.py
11 files changed, 74 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3413/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 5:

Ping

-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Taras Bobrovytsky, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3413

to look at the new patch set (#7).

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................

IMPALA-3729: batch_size=1 coverage for avro scanner

Also fix a stale comment in the avro scanner header.

The main work here is to fix the handling of empty result sets in the
test result verifier. This is a problem because we wanted to verify
that the results in the test file were a superset of the rows
returned, and this was thrown off by superflous '' rows in the expected
and actual result sets.

The basic problem is that the way test file sections
was parsed conflated an empty result section with non-empty result
section that had a single empty string. I.e.:

---- RESULTS
====

vs
---- RESULTS

====

both got resolved to [''].

Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
---
M be/src/exec/hdfs-avro-scanner.h
M bin/load-data.py
M testdata/bin/generate-schema-statements.py
M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
M testdata/workloads/functional-query/queries/QueryTest/load.test
M testdata/workloads/functional-query/queries/QueryTest/test-unmatched-schema.test
M tests/beeswax/impala_beeswax.py
M tests/common/test_result_verifier.py
M tests/data_errors/test_data_errors.py
M tests/unittests/test_file_parser.py
M tests/util/test_file_parser.py
11 files changed, 86 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3413/7
-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/3413/4//COMMIT_MSG
Commit Message:

PS4, Line 14: of
> nit: off
Done


http://gerrit.cloudera.org:8080/#/c/3413/4/tests/common/test_result_verifier.py
File tests/common/test_result_verifier.py:

PS4, Line 12: remove_comments, split_section_lines,
            :     join_section_lines
> Can you put each of these on a separate line including remove comments. Als
Done


http://gerrit.cloudera.org:8080/#/c/3413/4/tests/data_errors/test_data_errors.py
File tests/data_errors/test_data_errors.py:

PS4, Line 18: from tests.common.test_dimensions import create_exec_option_dimension
            : from tests.common.impala_test_suite import ImpalaTestSuite
            : from tests.common.skip import SkipIfS3, SkipIfLocal
> These should be in alphabetical order. Also the "import x" should be before
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has uploaded a new patch set (#4).

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................

IMPALA-3729: batch_size=1 coverage for avro scanner

Also fix a stale comment in the avro scanner header.

The main work here is to fix the handling of empty result sets in the
test result verifier. This is a problem because we wanted to verify
that the results in the test file were a superset of the rows
returned, and this was thrown of by superflous '' rows in the expected
and actual result sets.

The basic problem is that the way test file sections
was parsed conflated an empty result section with non-empty result
section that had a single empty string. I.e.:

---- RESULTS
====

vs
---- RESULTS

====

both got resolved to [''].

Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
---
M be/src/exec/hdfs-avro-scanner.h
M bin/load-data.py
M testdata/bin/generate-schema-statements.py
M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
M testdata/workloads/functional-query/queries/QueryTest/load.test
M testdata/workloads/functional-query/queries/QueryTest/test-unmatched-schema.test
M tests/beeswax/impala_beeswax.py
M tests/common/test_result_verifier.py
M tests/data_errors/test_data_errors.py
M tests/unittests/test_file_parser.py
M tests/util/test_file_parser.py
11 files changed, 83 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3413/4
-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 8: Code-Review+2

Hit some flaky infra issues and a deleted import during rebase.

-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 7: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 5: Code-Review+1

+2 backend change. Please get the review for test/scripts from Taras.

-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3413/5/testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
File testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test:

Line 5: ---- RESULTS: VERIFY_IS_SUPERSET
We are not differentiating between the result when batchsize=0 and batchsize=1. So if a non empty result set gets returned when batchsize=0 the test will pass. Is this ok?


-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 5: Code-Review+1

(1 comment)

The Python changes look good to me.

http://gerrit.cloudera.org:8080/#/c/3413/5/bin/load-data.py
File bin/load-data.py:

Line 14: import traceback
Alphabetical order.


-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3413/2/tests/common/test_result_verifier.py
File tests/common/test_result_verifier.py:

Line 202:     raise Exception("Duplicate rows in expected results: %s" % str(expected_results.rows))
> My thought was to prevent mistakes in writing test files. E.g. if someone w
I agree that it will help with catching mistakes, let's keep it.


-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has uploaded a new patch set (#5).

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................

IMPALA-3729: batch_size=1 coverage for avro scanner

Also fix a stale comment in the avro scanner header.

The main work here is to fix the handling of empty result sets in the
test result verifier. This is a problem because we wanted to verify
that the results in the test file were a superset of the rows
returned, and this was thrown off by superflous '' rows in the expected
and actual result sets.

The basic problem is that the way test file sections
was parsed conflated an empty result section with non-empty result
section that had a single empty string. I.e.:

---- RESULTS
====

vs
---- RESULTS

====

both got resolved to [''].

Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
---
M be/src/exec/hdfs-avro-scanner.h
M bin/load-data.py
M testdata/bin/generate-schema-statements.py
M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
M testdata/workloads/functional-query/queries/QueryTest/load.test
M testdata/workloads/functional-query/queries/QueryTest/test-unmatched-schema.test
M tests/beeswax/impala_beeswax.py
M tests/common/test_result_verifier.py
M tests/data_errors/test_data_errors.py
M tests/unittests/test_file_parser.py
M tests/util/test_file_parser.py
11 files changed, 85 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3413/5
-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 9: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Taras Bobrovytsky, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3413

to look at the new patch set (#6).

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................

IMPALA-3729: batch_size=1 coverage for avro scanner

Also fix a stale comment in the avro scanner header.

The main work here is to fix the handling of empty result sets in the
test result verifier. This is a problem because we wanted to verify
that the results in the test file were a superset of the rows
returned, and this was thrown off by superflous '' rows in the expected
and actual result sets.

The basic problem is that the way test file sections
was parsed conflated an empty result section with non-empty result
section that had a single empty string. I.e.:

---- RESULTS
====

vs
---- RESULTS

====

both got resolved to [''].

Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
---
M be/src/exec/hdfs-avro-scanner.h
M bin/load-data.py
M testdata/bin/generate-schema-statements.py
M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
M testdata/workloads/functional-query/queries/QueryTest/load.test
M testdata/workloads/functional-query/queries/QueryTest/test-unmatched-schema.test
M tests/beeswax/impala_beeswax.py
M tests/common/test_result_verifier.py
M tests/data_errors/test_data_errors.py
M tests/unittests/test_file_parser.py
M tests/util/test_file_parser.py
11 files changed, 86 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3413/6
-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 6
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 7:

Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge/2625/

-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/3413/1//COMMIT_MSG
Commit Message:

PS1, Line 12:  
nit: is


http://gerrit.cloudera.org:8080/#/c/3413/1/bin/load-data.py
File bin/load-data.py:

PS1, Line 142:     
inconsistent indentation


http://gerrit.cloudera.org:8080/#/c/3413/1/tests/common/test_result_verifier.py
File tests/common/test_result_verifier.py:

Line 205:   actual_set = set(map(str, actual_results.rows))
Is this the right way to verify if it's superset? What if actual = [1, 1, 1] and expected = [1]. Then can we say that expected is a superset?


http://gerrit.cloudera.org:8080/#/c/3413/1/tests/util/test_file_parser.py
File tests/util/test_file_parser.py:

Line 26:     return query_section_text.rstrip("\n").rstrip(';')
Would it be better to rstrip("\n;")?


-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/3413/2/tests/common/test_result_verifier.py
File tests/common/test_result_verifier.py:

PS2, Line 12: \
> I don't think this is needed, put these in round brackets. Similar to: http
I didn't know you could do that. I guess it makes sense that it's a tuple.


Line 202:     raise Exception("Duplicate rows in expected results: %s" % str(expected_results.rows))
> If you added the comment saying that duplicates are ignored, does it still 
My thought was to prevent mistakes in writing test files. E.g. if someone writes a RESULTS: VERIFY_IS_SUBSET section with duplicate rows, it's almost certainly a mistake, since the duplicate rows will be ignored.

I'm ok with removing if you think it's unnecessary.


Line 213:   actual_set = set(map(str, actual_results.rows))
> This code looks very similar to verify_query_result_is_subset, only the ass
Good point - done.


-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 5:

Did you have any more comments Taras?

-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/3413/1//COMMIT_MSG
Commit Message:

PS1, Line 12:  
> nit: is
Done


http://gerrit.cloudera.org:8080/#/c/3413/1/bin/load-data.py
File bin/load-data.py:

PS1, Line 142:     
> inconsistent indentation
agh, yeah, I need to fix my vim setup for python


http://gerrit.cloudera.org:8080/#/c/3413/1/tests/common/test_result_verifier.py
File tests/common/test_result_verifier.py:

Line 205:   actual_set = set(map(str, actual_results.rows))
> Is this the right way to verify if it's superset? What if actual = [1, 1, 1
Good point, the intention isn't documented. I think using set semantics (number of duplicates is ignored) makes sense here, but I'll add a comment and a sanity check since it's not at all obvious what is intended. There may be cases where it makes sense to factor in duplicates (bag semantics).


http://gerrit.cloudera.org:8080/#/c/3413/1/tests/util/test_file_parser.py
File tests/util/test_file_parser.py:

Line 26:     return query_section_text.rstrip("\n").rstrip(';')
> Would it be better to rstrip("\n;")?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/3413/2/tests/common/test_result_verifier.py
File tests/common/test_result_verifier.py:

PS2, Line 12: \
I don't think this is needed, put these in round brackets. Similar to: http://github.mtv.cloudera.com/CDH/Impala/blob/a8f3c56fc6fff973046869d49429f5a6fe76796c/tests/comparison/data_generator.py#L37


Line 202:     raise Exception("Duplicate rows in expected results: %s" % str(expected_results.rows))
If you added the comment saying that duplicates are ignored, does it still make sense to throw this exception?
Another alternative is to modify the comment above and say that any duplicates in the actual results are ignored. Would that still be considered set semantics?


Line 213:   actual_set = set(map(str, actual_results.rows))
This code looks very similar to verify_query_result_is_subset, only the assert at end is different. Do you think it would make sense to combine it somehow?


-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 9: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3413/8/tests/common/test_result_verifier.py
File tests/common/test_result_verifier.py:

Line 198:     raise Exception("Duplicate rows in expected results: %s" % str(expected_results.rows))
It turns out that a DDL test actually hit this. I'm not sure how I didn't catch this on an earlier private run, but I'm just going to remove this check.


-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 5:

Ah sorry, missed that previous comment. My feeling is that, given data corruption, we don't have an exact expected output, just a set of "sane" outputs. The pass criteria should be that we return an error for each corrupt file, plus maybe some valid rows.

batch_size is an implementation detail that helps us exercise some additional code paths but shouldn't affect the pass criteria.

-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 8: Verified-1

Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge/2628/

-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 7: Verified-1

Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge/2624/

-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 7:

Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge/2627/

-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has submitted this change and it was merged.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


IMPALA-3729: batch_size=1 coverage for avro scanner

Also fix a stale comment in the avro scanner header.

The main work here is to fix the handling of empty result sets in the
test result verifier. This is a problem because we wanted to verify
that the results in the test file were a superset of the rows
returned, and this was thrown off by superflous '' rows in the expected
and actual result sets.

The basic problem is that the way test file sections
was parsed conflated an empty result section with non-empty result
section that had a single empty string. I.e.:

---- RESULTS
====

vs
---- RESULTS

====

both got resolved to [''].

Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Reviewed-on: http://gerrit.cloudera.org:8080/3413
Tested-by: Internal Jenkins
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
---
M be/src/exec/hdfs-avro-scanner.h
M bin/load-data.py
M testdata/bin/generate-schema-statements.py
M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
M testdata/workloads/functional-query/queries/QueryTest/load.test
M testdata/workloads/functional-query/queries/QueryTest/test-unmatched-schema.test
M tests/beeswax/impala_beeswax.py
M tests/common/test_result_verifier.py
M tests/data_errors/test_data_errors.py
M tests/unittests/test_file_parser.py
M tests/util/test_file_parser.py
11 files changed, 84 insertions(+), 41 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Taras Bobrovytsky, Internal Jenkins, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3413

to look at the new patch set (#8).

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................

IMPALA-3729: batch_size=1 coverage for avro scanner

Also fix a stale comment in the avro scanner header.

The main work here is to fix the handling of empty result sets in the
test result verifier. This is a problem because we wanted to verify
that the results in the test file were a superset of the rows
returned, and this was thrown off by superflous '' rows in the expected
and actual result sets.

The basic problem is that the way test file sections
was parsed conflated an empty result section with non-empty result
section that had a single empty string. I.e.:

---- RESULTS
====

vs
---- RESULTS

====

both got resolved to [''].

Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
---
M be/src/exec/hdfs-avro-scanner.h
M bin/load-data.py
M testdata/bin/generate-schema-statements.py
M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
M testdata/workloads/functional-query/queries/QueryTest/load.test
M testdata/workloads/functional-query/queries/QueryTest/test-unmatched-schema.test
M tests/beeswax/impala_beeswax.py
M tests/common/test_result_verifier.py
M tests/data_errors/test_data_errors.py
M tests/unittests/test_file_parser.py
M tests/util/test_file_parser.py
11 files changed, 86 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3413/8
-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 4:

(3 comments)

In this patch, we are not differentiating between the result when batchsize=0 and batchsize=1. So if a non empty result set gets returned when batchsize=0 the test will pass. Is this ok?

http://gerrit.cloudera.org:8080/#/c/3413/4//COMMIT_MSG
Commit Message:

PS4, Line 14: of
nit: off


http://gerrit.cloudera.org:8080/#/c/3413/4/tests/common/test_result_verifier.py
File tests/common/test_result_verifier.py:

PS4, Line 12: remove_comments, split_section_lines,
            :     join_section_lines
Can you put each of these on a separate line including remove comments. Also they should be in alphabetical orders.


http://gerrit.cloudera.org:8080/#/c/3413/4/tests/data_errors/test_data_errors.py
File tests/data_errors/test_data_errors.py:

PS4, Line 18: from tests.common.test_dimensions import create_exec_option_dimension
            : from tests.common.impala_test_suite import ImpalaTestSuite
            : from tests.common.skip import SkipIfS3, SkipIfLocal
These should be in alphabetical order. Also the "import x" should be before "from y import z" statements.
btw, vim has a nice :sort function that sorts lines alphabetically.


-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 4
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 7:

Addressed comment, rebased. Carry +2 for be and test changes

-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 7
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3413/5/bin/load-data.py
File bin/load-data.py:

Line 14: import traceback
> Alphabetical order.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Taras Bobrovytsky, Internal Jenkins, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3413

to look at the new patch set (#9).

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................

IMPALA-3729: batch_size=1 coverage for avro scanner

Also fix a stale comment in the avro scanner header.

The main work here is to fix the handling of empty result sets in the
test result verifier. This is a problem because we wanted to verify
that the results in the test file were a superset of the rows
returned, and this was thrown off by superflous '' rows in the expected
and actual result sets.

The basic problem is that the way test file sections
was parsed conflated an empty result section with non-empty result
section that had a single empty string. I.e.:

---- RESULTS
====

vs
---- RESULTS

====

both got resolved to [''].

Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
---
M be/src/exec/hdfs-avro-scanner.h
M bin/load-data.py
M testdata/bin/generate-schema-statements.py
M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
M testdata/workloads/functional-query/queries/QueryTest/load.test
M testdata/workloads/functional-query/queries/QueryTest/test-unmatched-schema.test
M tests/beeswax/impala_beeswax.py
M tests/common/test_result_verifier.py
M tests/data_errors/test_data_errors.py
M tests/unittests/test_file_parser.py
M tests/util/test_file_parser.py
11 files changed, 84 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3413/9
-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 9
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-CR](cdh5-trunk) IMPALA-3729: batch size=1 coverage for avro scanner

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has uploaded a new patch set (#2).

Change subject: IMPALA-3729: batch_size=1 coverage for avro scanner
......................................................................

IMPALA-3729: batch_size=1 coverage for avro scanner

Also fix a stale comment in the avro scanner header.

The main work here is to fix the handling of empty result sets in the
test result verifier. This is a problem because we wanted to verify
that the results in the test file were a superset of the rows
returned, and this was thrown of by superflous '' rows in the expected
and actual result sets.

The basic problem is that the way test file sections
was parsed conflated an empty result section with non-empty result
section that had a single empty string. I.e.:

---- RESULTS
====

vs
---- RESULTS

====

both got resolved to [''].

Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
---
M be/src/exec/hdfs-avro-scanner.h
M bin/load-data.py
M testdata/bin/generate-schema-statements.py
M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test
M testdata/workloads/functional-query/queries/QueryTest/load.test
M testdata/workloads/functional-query/queries/QueryTest/test-unmatched-schema.test
M tests/beeswax/impala_beeswax.py
M tests/common/test_result_verifier.py
M tests/data_errors/test_data_errors.py
M tests/unittests/test_file_parser.py
M tests/util/test_file_parser.py
11 files changed, 82 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/13/3413/2
-- 
To view, visit http://gerrit.cloudera.org:8080/3413
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia007e558d92c7e4ce30be90446fdbb1f50a0ebc4
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>