You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org> on 2017/08/16 19:16:52 UTC

[Impala-ASF-CR] IMPALA-5799: Kudu DML can crash if schema has changed

Thomas Tauber-Marshall has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7688

Change subject: IMPALA-5799: Kudu DML can crash if schema has changed
......................................................................

IMPALA-5799: Kudu DML can crash if schema has changed

We check that the number/types of columns in a Kudu DML match the
underlying table during analysis. However, its possible that the
schema may be modified between analysis and execution, and if its
modified in incompatible ways it can cause Impala to crash.

Once the KuduTable object has been opened by the KuduTableSink, its
schema will remain the same, so we can check in Open() that the schema
is what we're expecting.

If the schema changes between Open() and when the DML is sent to Kudu,
Kudu will send back an error and we already handle this gracefully.

Testing:
- Added an e2e test that concurrently inserts into a Kudu table while
  dropping and then adding a column. It relies on timing, but running
  in a loop locally it caused Impala to crash every time without this
  change.

Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
---
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M tests/query_test/test_kudu.py
4 files changed, 77 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/7688/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-5799: Kudu DML can crash if schema has changed

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5799: Kudu DML can crash if schema has changed
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7688/3/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

PS3, Line 147: <=
> '<' ?
Suppose you have a table with '3' columns. The highest col_idx that would be valid would be '2' (0-indexed), so a col_idx of '3' (which is = to the number of columns) would be invalid.

Its equivalent to:
if (!(col_idx < table_->schema().num_columns())) {...}


-- 
To view, visit http://gerrit.cloudera.org:8080/7688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5799: Kudu DML can crash if schema has changed

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5799: Kudu DML can crash if schema has changed
......................................................................


IMPALA-5799: Kudu DML can crash if schema has changed

We check that the number/types of columns in a Kudu DML match the
underlying table during analysis. However, it's possible that the
schema may be modified between analysis and execution, and if it's
modified in incompatible ways it can cause Impala to crash.

Once the KuduTable object has been opened by the KuduTableSink, its
schema will remain the same, so we can check in Open() that the schema
is what we're expecting.

If the schema changes between Open() and when the WriteOp is sent to Kudu,
Kudu will send back an error and we already handle this gracefully.

Testing:
- Added an e2e test that concurrently inserts into a Kudu table while
  dropping and then adding a column. It relies on timing, but running
  in a loop locally it caused Impala to crash every time without this
  change.

Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
Reviewed-on: http://gerrit.cloudera.org:8080/7688
Reviewed-by: Matthew Jacobs <mj...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M tests/query_test/test_kudu.py
4 files changed, 85 insertions(+), 0 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-5799: Kudu DML can crash if schema has changed

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5799: Kudu DML can crash if schema has changed
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7688/3/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

PS3, Line 147: <=
> Suppose you have a table with '3' columns. The highest col_idx that would b
Yes you're right, I misread it. I think it'll be more obvious if you flip the inequality.


-- 
To view, visit http://gerrit.cloudera.org:8080/7688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5799: Kudu DML can crash if schema has changed

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5799: Kudu DML can crash if schema has changed
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1148/

-- 
To view, visit http://gerrit.cloudera.org:8080/7688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5799: Kudu DML can crash if schema has changed

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5799: Kudu DML can crash if schema has changed
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7688/3/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

PS3, Line 147: _c
> Yes you're right, I misread it. I think it'll be more obvious if you flip t
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5799: Kudu DML can crash if schema has changed

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5799: Kudu DML can crash if schema has changed
......................................................................


Patch Set 4: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5799: Kudu DML can crash if schema has changed

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5799: Kudu DML can crash if schema has changed
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/7688/2//COMMIT_MSG
Commit Message:

PS2, Line 11: '
> this one should be removed
No, this is right now:
'if it is modified'


http://gerrit.cloudera.org:8080/#/c/7688/2/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

PS2, Line 146: olumns[i];
> wouldn't this not work when the insert specifies all columns, i.e. referenc
Done


PS2, Line 146:         i : kudu_table_sink_.referenced_columns[i];
             :     if (table_->schema().num_columns() <= col_idx) {
             :       return Status(strings::Substitute(
             :    
> this makes me a bit nervous because it seems like expecting the list is in 
Done


PS2, Line 151: mnT
> nit: col_idx maybe
Done


http://gerrit.cloudera.org:8080/#/c/7688/2/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 412:       for i in range(0, iters):
> comment that this sleeps for up to a second
Done


PS2, Line 414:         try:
> nit: move this outside of the loop
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5799: Kudu DML can crash if schema has changed

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5799: Kudu DML can crash if schema has changed
......................................................................


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5799: Kudu DML can crash if schema has changed

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-5799: Kudu DML can crash if schema has changed
......................................................................


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/7688/1//COMMIT_MSG
Commit Message:

PS1, Line 14: its
> it's
Done


PS1, Line 18: Wri
> DML is really a SQL concept, technically this should be 'WriteOp'
Done


http://gerrit.cloudera.org:8080/#/c/7688/1/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

PS1, Line 144: g order of column positio
> We can't assume this is true. E.g. see the comment in Send().  We actually 
Done


http://gerrit.cloudera.org:8080/#/c/7688/1/be/src/exec/kudu-util.cc
File be/src/exec/kudu-util.cc:

PS1, Line 37: DataType = 
> why can't you just have
Weird quirk of C++. Doing that gives the compiler message:
'error: ‘kudu::client::KuduColumnSchema’ is not a namespace'


http://gerrit.cloudera.org:8080/#/c/7688/1/be/src/exec/kudu-util.h
File be/src/exec/kudu-util.h:

Line 81: /// Takes a Kudu client DataType and returns the corresponding Impala ColumnType.
> comment
Done


http://gerrit.cloudera.org:8080/#/c/7688/1/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 399:     print cursor.fetchall() == [(i, )]
> is this test reproducing a crash before your change?
Yes, I ran it 20+ times and it repro-ed the crash every time. I has also originally thought that it would need a loop, but then decided not to add one based on those results.

That may just be particular to my machine, and the timing may not work out quite so nicely in other environments, in which case a loop would be good to be safe.

Adding a loop though increases the number of legitimate ways the insert may fail, eg. we may get analysis errors.


Line 410:       threading.current_thread().errors = []
> I was thinking we could do these operations in a loop to get more opportuni
Done


PS1, Line 416:         except Exception as e:
             :           threading.current_thread().errors.append(e)
             : 
             :     insert_thread = threading.Thread(target=insert_values)
             :     insert_thread.start()
             : 
> this you can do on the main thread while the insert thread is inserting
Done


PS1, Line 432: rror in insert_thread.er
> this is the error in the case we sent a write op to Kudu that was w/ the ol
Yes. I was trying to avoid depending too much on the specific text of the error, esp. in this case where we don't control it, but its probably better to look for something more specific.


-- 
To view, visit http://gerrit.cloudera.org:8080/7688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5799: Kudu DML can crash if schema has changed

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has uploaded a new patch set (#3).

Change subject: IMPALA-5799: Kudu DML can crash if schema has changed
......................................................................

IMPALA-5799: Kudu DML can crash if schema has changed

We check that the number/types of columns in a Kudu DML match the
underlying table during analysis. However, it's possible that the
schema may be modified between analysis and execution, and if it's
modified in incompatible ways it can cause Impala to crash.

Once the KuduTable object has been opened by the KuduTableSink, its
schema will remain the same, so we can check in Open() that the schema
is what we're expecting.

If the schema changes between Open() and when the WriteOp is sent to Kudu,
Kudu will send back an error and we already handle this gracefully.

Testing:
- Added an e2e test that concurrently inserts into a Kudu table while
  dropping and then adding a column. It relies on timing, but running
  in a loop locally it caused Impala to crash every time without this
  change.

Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
---
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M tests/query_test/test_kudu.py
4 files changed, 85 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/7688/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-5799: Kudu DML can crash if schema has changed

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5799: Kudu DML can crash if schema has changed
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7688/1//COMMIT_MSG
Commit Message:

PS1, Line 14: its
> Done
my bad, I meant for this comment to be on l10 which you changed


http://gerrit.cloudera.org:8080/#/c/7688/2//COMMIT_MSG
Commit Message:

PS2, Line 11: '
this one should be removed


http://gerrit.cloudera.org:8080/#/c/7688/2/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

PS2, Line 146: udu_table_sink_.referenced_columns.back()
wouldn't this not work when the insert specifies all columns, i.e. referenced_cols is empty() ?


PS2, Line 146:   if (table_->schema().num_columns() <= kudu_table_sink_.referenced_columns.back()) {
             :     return Status(strings::Substitute(
             :         "Table $0 has fewer columns than expected.", table_desc_->name()));
             :   }
this makes me a bit nervous because it seems like expecting the list is in ascending order of col position an implementation detail that could change/break, and I don't think we really rely on it elsewhere or test it explicitly. I think you can just skip this check here, and instead just check right before l153 that table_->schema().num_columns() > col. We don't have to worry about doing a bit more work since this only happens on Open()


PS2, Line 151: col
nit: col_idx maybe


http://gerrit.cloudera.org:8080/#/c/7688/2/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 412:         time.sleep(random.random())
comment that this sleeps for up to a second


PS2, Line 414:           client = self.create_impala_client()
nit: move this outside of the loop


-- 
To view, visit http://gerrit.cloudera.org:8080/7688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5799: Kudu DML can crash if schema has changed

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has uploaded a new patch set (#2).

Change subject: IMPALA-5799: Kudu DML can crash if schema has changed
......................................................................

IMPALA-5799: Kudu DML can crash if schema has changed

We check that the number/types of columns in a Kudu DML match the
underlying table during analysis. However, it's possible that the
schema may be modified between analysis and execution, and if it's
modified in incompatible ways it can cause Impala to crash.

Once the KuduTable object has been opened by the KuduTableSink, its
schema will remain the same, so we can check in Open() that the schema
is what we're expecting.

If the schema changes between Open() and when the WriteOp is sent to Kudu,
Kudu will send back an error and we already handle this gracefully.

Testing:
- Added an e2e test that concurrently inserts into a Kudu table while
  dropping and then adding a column. It relies on timing, but running
  in a loop locally it caused Impala to crash every time without this
  change.

Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
---
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M tests/query_test/test_kudu.py
4 files changed, 87 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/7688/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>

[Impala-ASF-CR] IMPALA-5799: Kudu DML can crash if schema has changed

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5799: Kudu DML can crash if schema has changed
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7688/3/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

PS3, Line 147: <=
'<' ?


PS3, Line 147:     if (table_->schema().num_columns() <= col_idx) {
             :       return Status(strings::Substitute(
             :           "Table $0 has fewer columns than expected.", table_desc_->name()));
             :     }
while the new test is really great new coverage, I think there are some functional test cases we aren't covering with that test. e.g. I don't think we could have caught the off-by-one error in the comment on l147. Obviously getting good coverage of these cases is kinda tricky. I'm not sure yet if there's some reasonable way to do it. E.g. maybe we can inject a delay between planning and the BE exec by making it queue in admission control.


-- 
To view, visit http://gerrit.cloudera.org:8080/7688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5799: Kudu DML can crash if schema has changed

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5799: Kudu DML can crash if schema has changed
......................................................................


Patch Set 1:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/7688/1//COMMIT_MSG
Commit Message:

PS1, Line 14: its
it's


PS1, Line 18: DML
DML is really a SQL concept, technically this should be 'WriteOp'


http://gerrit.cloudera.org:8080/#/c/7688/1/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

PS1, Line 144: output_expr_evals_.size()
We can't assume this is true. E.g. see the comment in Send().  We actually don't necessarily have the full Table descriptor in the BE (sorry I didn't think of that when we first started talking about this JIRA), so we could either (a) ship that from the FE to the BE or (b) just check the cols that are referenced. I think the latter makes sense, and that you can do something like this:

      // output_expr_evals_ only contains the columns that the op
      // applies to, i.e. columns explicitly mentioned in the query, and
      // referenced_columns is then used to map to actual column positions.
      int col = kudu_table_sink_.referenced_columns.empty() ?
          j : kudu_table_sink_.referenced_columns[j];


http://gerrit.cloudera.org:8080/#/c/7688/1/be/src/exec/kudu-util.cc
File be/src/exec/kudu-util.cc:

PS1, Line 37: DataType = 
why can't you just have
using kudu::client::...::DataType; as we do above?


http://gerrit.cloudera.org:8080/#/c/7688/1/be/src/exec/kudu-util.h
File be/src/exec/kudu-util.h:

Line 81: ColumnType KuduDataTypeToColumnType(kudu::client::KuduColumnSchema::DataType type);
comment


http://gerrit.cloudera.org:8080/#/c/7688/1/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 399:   def test_concurrent_schema_change(self, cursor, unique_database):
is this test reproducing a crash before your change?


Line 410:         client.execute("insert into %s values (0, 0), (1, 1)" % table_name)
I was thinking we could do these operations in a loop to get more opportunities to hit issues.


PS1, Line 416:       try:
             :         client = self.create_impala_client()
             :         client.execute("alter table %s drop column col1" % table_name)
             :         client.execute("alter table %s add columns (col1 string)" % table_name)
             :       except Exception as e:
             :         threading.current_thread().error = e
this you can do on the main thread while the insert thread is inserting


PS1, Line 432: 'Kudu error(s) reported'
this is the error in the case we sent a write op to Kudu that was w/ the old schema? Is there any more of the error message we can match? This looks very generic.


-- 
To view, visit http://gerrit.cloudera.org:8080/7688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5799: Kudu DML can crash if schema has changed

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has uploaded a new patch set (#4).

Change subject: IMPALA-5799: Kudu DML can crash if schema has changed
......................................................................

IMPALA-5799: Kudu DML can crash if schema has changed

We check that the number/types of columns in a Kudu DML match the
underlying table during analysis. However, it's possible that the
schema may be modified between analysis and execution, and if it's
modified in incompatible ways it can cause Impala to crash.

Once the KuduTable object has been opened by the KuduTableSink, its
schema will remain the same, so we can check in Open() that the schema
is what we're expecting.

If the schema changes between Open() and when the WriteOp is sent to Kudu,
Kudu will send back an error and we already handle this gracefully.

Testing:
- Added an e2e test that concurrently inserts into a Kudu table while
  dropping and then adding a column. It relies on timing, but running
  in a loop locally it caused Impala to crash every time without this
  change.

Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
---
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M tests/query_test/test_kudu.py
4 files changed, 85 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/7688/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>