You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Qifan Chen (Code Review)" <ge...@cloudera.org> on 2021/09/27 20:34:29 UTC

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever.

Qifan Chen has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17872


Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever.
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever.

This patch addresses Impala client hang due to AWS network load balancer
timeout. Tthe scope of the fix is applicable to the following Impala
clients.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

These clients issue thrift RPC ExecuteStatement() followed by repeated
call to GetOperationStatus() (HS2, Impyla and HUE) or a variant of it
(Beeswax) to Impala backend.

In the fix, the backend operation for ExecuteStatement() runs
asynchronously and its completion status is checked repeatedly
via the equivalent of the GetOperationStatus() from the client.
A new execution state CATALOG_OP_RUNNING is added to represent
the new execution state.

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/exec/catalog-op-executor.cc
M be/src/exec/catalog-op-executor.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
6 files changed, 157 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 14:

(4 comments)

SessionState is not one to one with a SQL statement execution. A SessionState corresponds to multiple SQL statements, both sequentially and simultaneously. I think we're a level too high. We're a level above ClientRequestState, which is one-to-one with SQL statement execution. The lifecycle of the SessionState object is not a good fit for per-statement things.

I think the approach that ClientRequestState::ExecAsyncQueryOrDmlRequest() takes is the right level. It keeps the thread on the ClientRequestState, which makes queries independent from each other.

http://gerrit.cloudera.org:8080/#/c/17872/14/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/17872/14/be/src/service/impala-hs2-server.cc@600
PS14, Line 600:   {
              :      unique_lock<mutex> unique_lock(session->lock);
              :      session->execute_statement_info.return_val = return_val;
              :      session->execute_statement_info.state =
              :          SessionState::ExecuteStatementInfo::ExecutionState::DONE;
              :   }
              :   session->execute_statement_info.cv.NotifyAll();
ExecuteStatementCommonInternal() is being used from the synchronous and the asynchronous case. One issue is that the synchronous case is not using the ExecuteStatementInfo, but it still does these statements to manipulate it. I think child queries could cause problems for their parent queries.

The HS2_NOTIFY_AND_RETURN_IF_ERROR macro also manipulate the ExecuteStatementInfo.


http://gerrit.cloudera.org:8080/#/c/17872/14/be/src/service/impala-hs2-server.cc@669
PS14, Line 669:   if (session->execute_statement_info.state < target_state) {
              :     session->execute_statement_info.cv.Wait(unique_lock);
              :   }
When waiting for a particular state, you want to double-check that you actually ended up in that state.

If you know you can't be woken up without going to the right state, then assert it.
wait()
assert(state == desired_state)

If you think you could be woken up without a state transition, do a while loop:
while (state != desired_state) {
   wait()
}


http://gerrit.cloudera.org:8080/#/c/17872/14/be/src/service/impala-hs2-server.cc@682
PS14, Line 682: session->execute_statement_info.thread.release();
unique_ptr::release() won't delete the object that the unique_ptr is currently pointing to. The caller is responsible for freeing the object returned by release().

https://en.cppreference.com/w/cpp/memory/unique_ptr/release

What you want is unique_ptr::reset().


http://gerrit.cloudera.org:8080/#/c/17872/14/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/17872/14/be/src/service/impala-server.h@663
PS14, Line 663:     ExecuteStatementInfo execute_statement_info;
I think we have a cardinality problem here. A SessionState is not a one-to-one relationship to a SQL statement execution. A session can have multiple SQL statements in flight simultaneously. That's already a problem with child queries, but this is also true in general apart from child queries.

ClientRequestState is one-to-one with a SQL statement execution.



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 14
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Thu, 07 Oct 2021 05:03:16 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9570/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 14
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Oct 2021 20:13:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 22:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9597/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 22
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Oct 2021 21:01:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 33:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9635/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 33
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Oct 2021 16:04:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 37:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9644/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 37
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Oct 2021 17:42:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 31:

(3 comments)

Ok, this is getting close. I have a couple small code comments.

I wrote some additional tests here: http://gerrit.cloudera.org:8080/17959
You are welcome to incorporate them into your change, or I can merge them separately.

http://gerrit.cloudera.org:8080/#/c/17872/31/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/31/be/src/service/client-request-state.cc@695
PS31, Line 695: Async
Nit: What do you think about dropping Async from this name?


http://gerrit.cloudera.org:8080/#/c/17872/31/be/src/service/client-request-state.cc@702
PS31, Line 702:   // Indirectly check if running in thread async_exec_thread_.
              :   if (!exec_dml_async) {
When I was reading this code, I got confused by the variable name. The code inside this if is for when we are doing async DML. Should this variable be called "exec_dml_sync"?


http://gerrit.cloudera.org:8080/#/c/17872/31/be/src/service/client-request-state.cc@706
PS31, Line 706:     DebugActionNoFail(
              :         exec_request_->query_options, "CRS_DELAY_BEFORE_CATALOG_OP_EXEC");
When adding a delay, it usually lines up with some piece of code that can take a long time. This pause is standing on its own a bit away from the actual catalog op that could take a long time, but Impala won't actually pause in this location. Can we put it right next to the catalog op?



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 31
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Oct 2021 02:47:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 30:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9628/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 30
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 19 Oct 2021 21:12:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 33:

(1 comment)

Fix a formatting error.

http://gerrit.cloudera.org:8080/#/c/17872/32/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/17872/32/tests/metadata/test_ddl.py@997
PS32, Line 997: 
> flake8: E302 expected 2 blank lines, found 1
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 33
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Oct 2021 15:43:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 26:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17872/26/tests/metadata/test_metadata_query_statements.py
File tests/metadata/test_metadata_query_statements.py:

http://gerrit.cloudera.org:8080/#/c/17872/26/tests/metadata/test_metadata_query_statements.py@154
PS26, Line 154: d
flake8: E303 too many blank lines (2)


http://gerrit.cloudera.org:8080/#/c/17872/26/tests/metadata/test_metadata_query_statements.py@156
PS26, Line 156: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/17872/26/tests/metadata/test_metadata_query_statements.py@161
PS26, Line 161: \
flake8: E502 the backslash is redundant between brackets



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 26
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 18 Oct 2021 16:33:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 19:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@720
PS15, Line 720: 
              :   // Set the results to be reported to the client.
              :   SetResultSet(catalog_op_executor_->ddl_exec_response());
              :   return Status::OK();
              : }
              : 
              : Status ClientRequestState::ExecAsyncDdlRequest() {
              :   string op_type = catalog_op_type() == TCatalogOpType::DDL ?
              :    
> Decided to keep this particular case (CREATE_TABLE_AS_SELECT) to run (the e
I hacked a bit on this, and the modifications are not that big. This is what that would look like:

http://gerrit.cloudera.org:8080/17916


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@643
PS17, Line 643: SyncDdl
Nit: We have an unrelated feature called "sync_ddl". If we can avoid similar names, we should.

One naming scheme would be:
ExecDdlRequest() - entrypoint that picks between sync/async
ShouldRunDdlAsync() - decision to go sync/async
ExecDdlRequestInt() - synchronous version
ExecDdlRequestIntAsync() - asynchronous version


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@688
PS17, Line 688:   Status status = catalog_op_executor_->Exec(exec_request_->catalog_op_request);
              :   {
Same as other DebugAction location. Same fixes.


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@728
PS17, Line 728:       
For this async code, I think we are better off with a void return value, because nothing is actually consuming the return value. Status would be conveyed through UpdateQueryStatus().


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@735
PS17, Line 735:   Status status = catalog_op_executor_->Exec(exec_request_->catalog_op_request);
              :   {
For here, I think you can use DebugActionNoFail(), since we are using this to add sleeps rather than actually failing.

Separately, a small nit: let's make the location strings similar to our other location strings in this file. Something like "CRS_BEFORE_CATALOG_OP_EXEC"


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@744
PS17, Line 744:       exec_request_->query_options.sync_ddl));
              : 
              :   // Set the results to be reported to the cli
An important thing here is that nothing is actually using the return value of this function. So, I think we need to change this to use UpdateQueryStatus()

Status catalog_update_status = parent_server_->ProcessCatalogUpdateResult(...)
{
  lock_guard<mutex> l(lock_);
  RETURN_IF_ERROR(UpdateQueryStatus(catalog_update_status));
}


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@778
PS17, Line 778:     }
              :     UpdateNonErrorExecState(ExecState::RUNNING);
Please go look at the comment that I made on the previous upload about ABORT_IF_ERROR.

Separately, we don't need to be holding the lock when executing this.


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@781
PS17, Line 781:   }
Do this before spawning the thread (similar to how ExecAsyncQueryOrDmlRequest() does).


http://gerrit.cloudera.org:8080/#/c/17872/19/testdata/workloads/functional-query/queries/QueryTest/alter-table-recover.test
File testdata/workloads/functional-query/queries/QueryTest/alter-table-recover.test:

http://gerrit.cloudera.org:8080/#/c/17872/19/testdata/workloads/functional-query/queries/QueryTest/alter-table-recover.test@7
PS19, Line 7: drop table if exists alltypes_clone;
            : create external table alltypes_clone like functional_parquet.alltypes
            : location '$FILESYSTEM_PREFIX/test-warehouse/alltypes_parquet';
            : set debug_action="TIMED_WAIT_BEFORE_CATALOG_OP_EXEC:SLEEP@15000";
            : alter table alltypes_clone recover partitions;
Ideally, a test would fail if you ran it without the corresponding code changes. Timing changes are harder to test, because we aren't really changing the behavior of a SQL statement.

If you were doing the individual exec / getoperationstatus calls, you could verify that the exec is fast even with TIMED_WAIT_BEFORE_CATALOG_OP_EXEC:SLEEP@15000.

I think the HS2 tests can test this. This test seems to have useful pieces:
https://github.com/apache/impala/blob/master/tests/hs2/test_hs2.py#L282


http://gerrit.cloudera.org:8080/#/c/17872/19/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/17872/19/tests/metadata/test_ddl.py@913
PS19, Line 913:     cls.ImpalaTestMatrix.add_dimension(create_client_protocol_dimension())
              :     cls.ImpalaTestMatrix.add_constraint(lambda v:
              :         v.get_value('protocol') == 'hs2')
Test dimensions run tests multiple times with different configurations. If you remove the add_constraint(), it would run this test with each of hs2, beeswax, and hs2-http. That seems reasonable, and then you don't need TestAlterTableRecoverWithBeeswax.


http://gerrit.cloudera.org:8080/#/c/17872/19/tests/metadata/test_ddl.py@921
PS19, Line 921:  multiple_impalad=self._use_multiple_impalad(vector)
You can omit multiple_impalad.

When sync_ddl=0, _use_multiple_impalad is false, and that is the default value.



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Oct 2021 02:23:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc@713
PS21, Line 713:   // Transition the exec state to RUNNING for any non-CTAS DDLs. For the later, the
              :   // state is set from PENDING to RUNNING inside ExecQueryOrDmlRequest().
              :   if (catalog_op_type() != TCatalogOpType::DDL
              :       || ddl_type() != TDdlType::CREATE_TABLE_AS_SELECT) {
              :     UpdateNonErrorExecState(ExecState::RUNNING);
              :   }
> I do not think it will work, since Exec() sets the state to RUNNING (at the
Let me expand this code example:

# This is pseudocode
Status ExecDdlRequest() {
  etc etc shared code between async/sync etc etc
  if (async) {
     if (ctas) {
       set the state to ExecState::PENDING
     } else {
       set the state to ExecState::RUNNING
     }
     spawn async thread to do the rest
     return Status::OK();
  } else {
     /// this case is not impacted and doesn't set state
     run synchronously, etc.
  }
}

It is better for it to be in ExecDdlRequest() prior to spawning the async thread, because that guarantees that the state is set prior to returning from Exec().

Having the ctas/non-ctas states set in adjacent code makes it very clear that they are behaving differently. That's one reason I didn't like having one thing go to PENDING in ExecDdlRequest() and then other things go to RUNNING here.



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Oct 2021 23:49:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 24:

(13 comments)

Many thanks to Joe and Amogh for the review comments!

http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc@652
PS21, Line 652: 
              :   if (catalog_op_type() != TCatalogOpType::DDL &&
              :       catalog_op_type() != TCatalogOpType::RESET_METADA
> Nit: This is identical for sync/async. To avoid duplication, let's put this
Done


http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc@694
PS21, Line 694: 
              : void ClientRequestState::ExecDdlRequestImplAsync() {
              :   // Transition the exec state to RUNNING for any non-CTAS DDLs. For the later, the
              :   // state is set to RUNNING inside FinishExecQueryOrDmlRequest().
              :   if (catalog_op_type() != TCatalogOpType::DDL
              :       || ddl_type() != TDdlType::CREATE_TABLE_AS_SELECT) {
              :     UpdateNonErrorExecState(ExecState::RUNNING);
              :   }
              : 
              :   catalog_op_executor_.reset(
              :       new CatalogOpExecutor(ExecEnv::GetInstance(), frontend_, server_profile_));
              :   DebugActionNoFail(
              :       exec_request_->query_options, "CRS_DELAY_BEFORE_CATALOG_OP_EXEC");
              :   Status status = catalog_op_executor_->Exec(exec_request_->catalog_op_request);
              :   {
              :     lock_guard<mutex> 
> Nit: This code should be unreachable, so replace this with a DCHECK(false);
Done


http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc@713
PS21, Line 713:   // If this is a CTAS request, there will usually be more work to do
              :   // after executing the CREATE TABLE statement (the INSERT portion of the operation).
              :   // The exception is if the user specified IF NOT EXISTS and the table already
              :   // existed, in which case we do not execute the INSERT.
              :   if (catalog_op_type() == TCatalogOpType::DDL &&
              :    
> Nit: I would rather go directly to RUNNING without passing through PENDING 
Yes, the code to enter into the PENDING state is removed.

This block of code is kept since otherwise no one is set the STATE to RUNNING for non CTAS cases and we will hit illegal transition INITIALIZED -> FINISHED


http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc@719
PS21, Line 719:       !catalog_op_executor_->ddl_exec_response()->new_table_created) {
              :     DCHECK(exec_request_->catalog_op_request.
              :         ddl_params.create_table_params.if_not_exists);
> Nit: See other comment about this code.
Done


http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py
File tests/hs2/test_hs2.py:

http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py@356
PS22, Line 356:   def test_get_operation_status_for_async_ddl(self, 
> Use unique_database, as that will avoid needing to think about cleanup of t
Done


http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py@368
PS22, Line 368:     #  "create table alltypes_clone as select * from functiona
> What I would do is:
Did some debugging today and found that the idea of counting # of times in each state is okay. In particular, the # of times in the initialized state is a function of the length of the delay. 

Before the change, we will not be in the initialized state at all in the GetOperationStatus() loop.


http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py@370
PS22, Line 370: y
> flake8: E703 statement ends with a semicolon
Done


http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py@371
PS22, Line 371: t
> flake8: E703 statement ends with a semicolon
Done


http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py@379
PS22, Line 379: )
> flake8: E502 the backslash is redundant between brackets
Done


http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py@383
PS22, Line 383: 
> flake8: E502 the backslash is redundant between brackets
Done


http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py@384
PS22, Line 384: 
> flake8: E129 visually indented line with same indent as next logical line
Done


http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py@387
PS22, Line 387: 
> flake8: E502 the backslash is redundant between brackets
Done


http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py@388
PS22, Line 388: 
> flake8: E129 visually indented line with same indent as next logical line
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 24
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Oct 2021 21:11:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 34:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7553/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 34
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Oct 2021 18:56:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 35:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9640/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 35
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Oct 2021 02:08:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#37). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are
executing and the timeout happens, AWS silently drops the connection and
the Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecDdlRequestImpl()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run very low risks of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously.

External behavior change:
  1. A new field with name "DDL execution mode:" is added to the
     summary section in the runtime profile, next to "DDL Type". This
     field takes either 'asynchronous' or 'synchronous' as value.
  2. A new query option 'enable_async_ddl_execution', default to true,
     is added. It can be set to false to turn off the patch.

Limitations:
  This patch does not handle potential AWS NLB-type time out for LOAD
  DATA (IMPALA-10967).

Testing:
  1. Added new async. DDL unit tests with HS2, HS2-HTTP, Beeswax and
     JDBC clients.
  2. Ran core tests successfully.

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test
M tests/common/impala_test_suite.py
M tests/metadata/test_ddl.py
9 files changed, 386 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/37
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 37
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 22:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py
File tests/hs2/test_hs2.py:

http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py@370
PS22, Line 370: ;
flake8: E703 statement ends with a semicolon


http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py@371
PS22, Line 371: ;
flake8: E703 statement ends with a semicolon


http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py@379
PS22, Line 379: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py@383
PS22, Line 383: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py@384
PS22, Line 384: T
flake8: E129 visually indented line with same indent as next logical line


http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py@387
PS22, Line 387: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py@388
PS22, Line 388: T
flake8: E129 visually indented line with same indent as next logical line



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 22
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Oct 2021 20:40:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are going
on and the timeout happens, AWS silently drops the connection and the
Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecAsyncDdlRequest()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run no risk of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously;
  3. CREATE TABLE AS SELECT as the SELECT part already runs
     asynchronously;

Testing:
  1. Unit tests with HS2 and Beeswax client
  2. Core tests

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A testdata/workloads/functional-query/queries/QueryTest/alter-table-recover.test
M tests/metadata/test_ddl.py
4 files changed, 129 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/18
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 25:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9605/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 25
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Oct 2021 21:49:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 5:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9541/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 5
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Oct 2021 14:45:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

[WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout. The scope of the fix applies to the following Impala clients.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

These clients issue thrift RPC ExecuteStatement() followed by repeated
call to GetOperationStatus() (HS2, Impyla and HUE) or a variant of it
(Beeswax) to Impala backend.

In the fix, the backend operation for ExecuteStatement() runs
asynchronously in a new thread and its completion status is checked
periodically via the equivalent of the GetOperationStatus() from the
client. Specifically, the method ImpalaServer::ExecuteStatement()
starts a new thread for ImpalaServer::ExecuteStatementCommon().

The TCLIService protocol between the client and Impala server
is unchanged.

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
5 files changed, 188 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 6
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are going
on and the timeout happens, AWS silently drops the connection and the
Impala client enters the hang state.

The fix maintains the TCLIService protocol between the client and Impala
server and applies to the following Impala clients which issue thrift
RPC ExecuteStatement() followed by repeated call to GetOperationStatus()
(HS2, Impyla and HUE) or a variant of it (Beeswax) to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ImpalaServer::ExecuteStatementCommon()
starts a new thread for ImpalaServer::ExecuteStatementCommonInternal()
which can reach two states: COMPILED and DONE. The COMPILED is when the
front end has successfully compiles the query and the DONE is for the
execution of the query plan to reach the end successfully or to
encounter any errors. The main thread, which start the new thread,
waits for the COMPILED state before advancing to another short wait
period for the DONE state. If the DONE state is not reached, the
control is returned back to the client and the client will issue
GetOperationStatus() repeatedly to check if the execution has reached
the DONE state. When Impala server detects the FINISHED execution state
or there is error in servicing GetOperationStatus(), the new thread is
joined and released. Thus for a long DDL query, its execution part is
done in the new thread and the Impala client keeps checking its status
via GetOperationStatus() without waiting more than 350s.

In addition, a cild query, which is submitted from the Impala server
as an Impala client for compute stats stmt, runs synchronously in
the same child query thread.

The communication area between the new thread and the host thread
is per session.

Testing: TBD

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/child-query.cc
M be/src/service/client-request-state.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/thread.h
7 files changed, 316 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/12
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 12
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#33). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are
executing and the timeout happens, AWS silently drops the connection and
the Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecDdlRequestImpl()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run very low risks of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously.

External behavior changes:
  1. A new field with name "DDL execution mode:" is added to the
     summary section in the runtime profile, next to "DDL Type". This
     field takes either 'asynchronous' or 'synchronous' as value.
  2. A new query option 'enable_async_ddl_execution', default to true,
     is added. It can be set to false to turn off the patch.

Limitations:
  This patch does not handle potential AWS NLB-type time out for LOAD
  DATA (IMPALA-10967).

Testing:
  1. Added new async. DDL unit tests with HS2, HS2-HTTP, Beeswax and
     JDBC clients.
  2. Ran core tests successfully.

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test
M tests/common/impala_test_suite.py
M tests/metadata/test_ddl.py
9 files changed, 387 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/33
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 33
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 27:

Fix Python format error.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 27
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 18 Oct 2021 16:41:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 29:

(1 comment)

Small changes to make the state transition for async. exec of DDL more specific.

http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc@713
PS21, Line 713: 
              :   // If this is a CTAS request, there will usually be more work to do
              :   // after executing the CREATE TABLE statement (the INSERT portion of the operation).
              :   // The exception is if the user specified IF NOT EXISTS and the table already
              :   // existed, in which case we do not execute the INSERT.
              :   i
> When a statement is in the INITIALIZED state, its runtime profile is unavai
I see. Thanks a lot for the description. 

Sounds like transition away from INITIALIZED state is good idea in general to help make the runtime profile available sooner than later.

For the racing condition, my impression is that if the exec state is set in the worker thread, it truly reflects the state of execution. Since a lock is used, there is no ambiguity. It probably is better model as we do not make a lie in the code :-).



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 29
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 18 Oct 2021 21:13:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 30:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17872/29/tests/hs2/test_hs2.py
File tests/hs2/test_hs2.py:

http://gerrit.cloudera.org:8080/#/c/17872/29/tests/hs2/test_hs2.py@368
PS29, Line 368:     # and measure the time spent. Should be less than 3s.
              :     start = time.time()
              :     execute_statement_resp = self.execute_statement(
> Take a timestamp before this statement and after, then take the diff and ma
Done


http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_ddl.py@903
PS29, Line 903: # IMPALA-10811: RPC to submit query getting stuck for AWS NLB forever
              : # Test HS2, Beeswax and HS2-HTTP three clients.
              : class TestAsyncDDL(TestDdlBase):
              :   @classmethod
              :   def get_workload(self):
              :     return 'functional-query'
              : 
              :   @classmethod
              :   def add_test_dimensions(cls):
              :     super(TestAsyncDDL, cls).add_test_dimensions()
              :     cls.ImpalaTestMatrix.add_dimension(create_client_protocol_dimension())
              :     cls.ImpalaTestMatrix.add_dimension(create_exec_option_dimension(
              :         sync_ddl=[0]))
              : 
              :   def test_async_ddl(self, vector, unique_database):
              :     self.run_test_case('QueryTest/async_ddl', vector, use_db=unique_database)
> Create an test in tests/hs2/test_hs2.py that does the equivalent of test_ge
The purpose of this test is to check out 3 different clients.

Tests in test_hs2.py checks out HS2 only. 

class TestHS2(HS2TestSuite):  

110                                                                                            
111 class HS2TestSuite(ImpalaTestSuite):                                                       
112   HS2_V6_COLUMN_TYPES = ['boolVal', 'stringVal', 'byteVal', 'i16Val', 'i32Val', 'i64Val',  
113                          'doubleVal', 'binaryVal']                                         
114                                                                                            
115   def setup(self):                                                                         
116     self.socket, self.hs2_client = self._open_hs2_connection()                             
117                                                                                            
118   def teardown(self):                                                                      
119     if self.socket:                                                                        
120       self.socket.close()                                                                  
121


http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_metadata_query_statements.py
File tests/metadata/test_metadata_query_statements.py:

http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_metadata_query_statements.py@153
PS29, Line 153:   def test_async_ddl_with_JDBC(self, vector, unique_database):
              :     self.exec_with_jdbc("drop table if exists {0}.test_table".format(unique_database))
              :     self.exec_with_jdbc_and_compare_result(
              :         "create table {0}.test_table(a int)".format(unique_database),
              :         "'Table has been created.'")
              : 
              :     self.exec_with_jdbc("drop table if exists {0}.alltypes_clone".format(unique_database))
              :     self.exec_with_jdbc_and_compare_result(
              :         "create table {0}.alltypes_clone as select * from\
              :         functional_parquet.alltypes".format(unique_database),
              :         "'Inserted 7300 row(s)'")
> Metadata query statements are things like "describe {table_name}" and "desc
It looks like the compiled java code for JDBC is based on the standard JDBC implementation: 

https://github.com/apache/impala/blob/master/fe/src/test/java/org/apache/impala/testutil/ImpalaJdbcClient.java#L26

The core command used under the table is

('cmd=', '/home/qchen/Impala.07202021/bin/run-jdbc-client.sh -i "localhost:21050" -t NOSASL -q "create table test_async_ddl_with_JDBC_bc95b

run-jdbc-client.sh is the following.

20 . ${IMPALA_HOME}/bin/set-classpath.sh test                                                 
 21 CLASSPATH=${IMPALA_JDBC_DRIVER_CLASSPATH}:${CLASSPATH}                                     
 22 "$JAVA" -cp $CLASSPATH org.apache.impala.testutil.ImpalaJdbcClient "$@"



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 30
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 19 Oct 2021 20:50:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 29:

(5 comments)

I want this code review to converge and get merged. What I'm looking for is for the next upload to be a "clean" upload. Address every code review comment completely. No additional changes.

http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc@713
PS21, Line 713: 
              :   // If this is a CTAS request, there will usually be more work to do
              :   // after executing the CREATE TABLE statement (the INSERT portion of the operation).
              :   // The exception is if the user specified IF NOT EXISTS and the table already
              :   // existed, in which case we do not execute the INSERT.
              :   i
> I see. Thanks a lot for the description. 
I'm glad that you changed out of the INITIALIZED state before the end of Exec(). Now, the only thing remaining is to solve the race condition. :-)

I understand your argument completely, and I still disagree with you. My code review comments stand. I realize that this is not the answer that you want. Nonetheless, this is a code review, and I need you to respect my decision.


http://gerrit.cloudera.org:8080/#/c/17872/29/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/29/be/src/service/client-request-state.cc@700
PS29, Line 700:   // For any non-CTAS DDLs, transition to RUNNING. For CTAS DDLs, transition
              :   // to RUNNING during FinishExecQueryOrDmlRequest() called by ExecQueryOrDmlRequest().
              :   if (!is_CTAS) UpdateNonErrorExecState(ExecState::RUNNING);
I am going to write this comment once more:

The state transition for non-CTAS should go directly from INITIALIZED to RUNNING with no time spent in PENDING. The transition should occur on the main thread prior to spawning the async thread. See my previous comment for the if/then/else structure that I'm looking for.


http://gerrit.cloudera.org:8080/#/c/17872/29/tests/hs2/test_hs2.py
File tests/hs2/test_hs2.py:

http://gerrit.cloudera.org:8080/#/c/17872/29/tests/hs2/test_hs2.py@368
PS29, Line 368:     execute_statement_resp = self.execute_statement(
              :         "create table {0}.alltypes_clone as select * from \
              :         functional_parquet.alltypes".format(unique_database))
Take a timestamp before this statement and after, then take the diff and make sure it is less than 5 seconds (or 3 seconds or something).


http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_ddl.py@903
PS29, Line 903: # IMPALA-10811: RPC to submit query getting stuck for AWS NLB forever
              : # Test HS2, Beeswax and HS2-HTTP three clients.
              : class TestAsyncDDL(TestDdlBase):
              :   @classmethod
              :   def get_workload(self):
              :     return 'functional-query'
              : 
              :   @classmethod
              :   def add_test_dimensions(cls):
              :     super(TestAsyncDDL, cls).add_test_dimensions()
              :     cls.ImpalaTestMatrix.add_dimension(create_client_protocol_dimension())
              :     cls.ImpalaTestMatrix.add_dimension(create_exec_option_dimension(
              :         sync_ddl=[0]))
              : 
              :   def test_async_ddl(self, vector, unique_database):
              :     self.run_test_case('QueryTest/async_ddl', vector, use_db=unique_database)
Create an test in tests/hs2/test_hs2.py that does the equivalent of test_get_operation_status_for_async_ddl() for the alter table case we are doing here.

The test_hs2.py tests are more powerful, because they can test the state transitions directly (and the time it takes to do Exec()). If we test CTAS and non-CTAS in that way, then this test is no longer necessary. Remove this test.


http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_metadata_query_statements.py
File tests/metadata/test_metadata_query_statements.py:

http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_metadata_query_statements.py@153
PS29, Line 153:   def test_async_ddl_with_JDBC(self, vector, unique_database):
              :     self.exec_with_jdbc("drop table if exists {0}.test_table".format(unique_database))
              :     self.exec_with_jdbc_and_compare_result(
              :         "create table {0}.test_table(a int)".format(unique_database),
              :         "'Table has been created.'")
              : 
              :     self.exec_with_jdbc("drop table if exists {0}.alltypes_clone".format(unique_database))
              :     self.exec_with_jdbc_and_compare_result(
              :         "create table {0}.alltypes_clone as select * from\
              :         functional_parquet.alltypes".format(unique_database),
              :         "'Inserted 7300 row(s)'")
Metadata query statements are things like "describe {table_name}" and "describe database {database_name}". This is not doing metadata query statements, so this is not the place for this test.

I don't think this provides additional value beyond the other tests that we have. JDBC is using HS2 under the covers just like Impyla. Remove this test and the corresponding changes in tests/common/impala_test_suite.py.



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 29
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 19 Oct 2021 18:59:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9567/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 11
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Oct 2021 17:27:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 10:

fix several bugs in the 1st version that bring the info exchange between the new thread and the parent thread into SessionState.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 10
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Oct 2021 17:05:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9554/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 8
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 04 Oct 2021 17:09:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

[WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout. The scope of the fix applies to the following Impala clients.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

These clients issue thrift RPC ExecuteStatement() followed by repeated
call to GetOperationStatus() (HS2, Impyla and HUE) or a variant of it
(Beeswax) to Impala backend.

In the fix, the backend operation for ExecuteStatement() runs
asynchronously in a new thread and its completion status is checked
periodically via the equivalent of the GetOperationStatus() from the
client. Specifically, the method ImpalaServer::ExecuteStatement()
starts a new thread for ImpalaServer::ExecuteStatementCommon().

The TCLIService protocol between the client and Impala server
is unchanged.

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
5 files changed, 178 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 5
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 19:

(2 comments)

Fix style flakiness in test_ddl.py.

http://gerrit.cloudera.org:8080/#/c/17872/18/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/17872/18/tests/metadata/test_ddl.py@904
PS18, Line 904: # Test HS2
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/17872/18/tests/metadata/test_ddl.py@924
PS18, Line 924: # IMPALA-10811: RPC to submit query getting stuck for AWS NLB forever
> flake8: E302 expected 2 blank lines, found 1
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Oct 2021 20:38:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#25). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are going
on and the timeout happens, AWS silently drops the connection and the
Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecAsyncDdlRequest()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

Externally, a new field with name "DDL execution mode:" has been added
to the summary section in the runtime profile, next to "DDL Type". This
field takes either asynchronous or synchronous.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run very low risks of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously.

Limitations:
  This patch does not handle potential AWS NLB-type time out for LOAD
  DATA (IMPALA-10967).

Testing:
  1. New async. DDL unit tests with HS2, HS2-HTTP and Beeswax clients
  2. Core tests

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test
M tests/hs2/test_hs2.py
M tests/metadata/test_ddl.py
5 files changed, 208 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/25
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 25
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17872/5/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/17872/5/be/src/service/impala-hs2-server.cc@497
PS5, Line 497:   HS2_NOTIFY_AND_RETURN_IF_ERROR(return_val, CheckNotShuttingDown(), SQLSTATE_GENERAL_ERROR);
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17872/5/be/src/service/impala-hs2-server.cc@517
PS5, Line 517:     HS2_NOTIFY_AND_RETURN_IF_ERROR(return_val, Status::Expected(err_msg), SQLSTATE_GENERAL_ERROR);
> line too long (98 > 90)
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 5
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Oct 2021 14:26:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout. The scope of the fix applies to the following Impala clients.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

These clients issue thrift RPC ExecuteStatement() followed by repeated
call to GetOperationStatus() (HS2, Impyla and HUE) or a variant of it
(Beeswax) to Impala backend.

In the fix, the backend method ImpalaServer::ExecuteStatement()
starts a new thread for ImpalaServer::ExecuteStatementCommon() which
can reach two stages: COMPILED and DONE. The COMPILED is when the
query has been successfully compiled and the DONE is for the execution
to reach the end successfully or to encounter any errors. The main
thread, which start the new thread, waits for the COMPILED state
before advancing to another short wait period for the DONE state. If
the DONE state is not reached, the control is returned back to the
client and the client will issue GetOperationStatus() repeatedly to
check if the execution has reached the DONE state.

The TCLIService protocol between the client and Impala server
is unchanged.

Testing: TBD

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
5 files changed, 176 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/7
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 7
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 17:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9585/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Oct 2021 16:37:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 15:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@256
PS15, Line 256:       // Specifically run most DDLs asynchronously in async_exec_thread_ except
              :       // for CREATE_TABLE_AS_SELECT which runs the execution step asynchronously.
              :       if (catalog_op_type() == TCatalogOpType::DDL &&
              :           ddl_type() == TDdlType::CREATE_TABLE_AS_SELECT) {
              :         LOG_AND_RETURN_IF_ERROR(ExecDdlRequest());
              :       } else {
              :         RunExecDdlRequestAsynchronously();
              :       }
Style point:
Elsewhere I'm going to be making the decision to be async more complicated, so I'm thinking we can push the decision to go async into ExecDdlRequest() and the call would be like it was. Inside ExecDdlRequest() we'll decide whether to go async.


http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@366
PS15, Line 366:     case TCatalogOpType::USE: {
              :       lock_guard<mutex> l(session_->lock);
              :       session_->database = exec_request_->catalog_op_request.use_db_params.db;
              :       return Status::OK();
              :     }
Most of the actions in this function could require metadata operations, but this one is purely a session state update. It would be nice not to use the async logic for this.


http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@655
PS15, Line 655:   if (catalog_op_type() != TCatalogOpType::DDL &&
              :       catalog_op_type() != TCatalogOpType::RESET_METADATA) {
              :     Status status = ExecLocalCatalogOp(exec_request_->catalog_op_request);
              :     lock_guard<mutex> l(lock_);
              :     return UpdateQueryStatus(status);
              :   }
See my comment in ExecLocalCatalogOp() about USE. It would be nice to avoid async for that case.


http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@662
PS15, Line 662:   if (ddl_type() == TDdlType::COMPUTE_STATS) {
              :     TComputeStatsParams& compute_stats_params =
              :         exec_request_->catalog_op_request.ddl_params.compute_stats_params;
              :     RuntimeProfile* child_profile =
              :         RuntimeProfile::Create(&profile_pool_, "Child Queries");
              :     profile_->AddChild(child_profile);
              :     // Add child queries for computing table and column stats.
              :     vector<ChildQuery> child_queries;
              :     if (compute_stats_params.__isset.tbl_stats_query) {
              :       RuntimeProfile* profile =
              :           RuntimeProfile::Create(&profile_pool_, "Table Stats Query");
              :       child_profile->AddChild(profile);
              :       child_queries.emplace_back(compute_stats_params.tbl_stats_query, this,
              :           parent_server_, profile, &profile_pool_);
              :     }
              :     if (compute_stats_params.__isset.col_stats_query) {
              :       RuntimeProfile* profile =
              :           RuntimeProfile::Create(&profile_pool_, "Column Stats Query");
              :       child_profile->AddChild(profile);
              :       child_queries.emplace_back(compute_stats_params.col_stats_query, this,
              :           parent_server_, profile, &profile_pool_);
              :     }
              : 
              :     if (child_queries.size() > 0) {
              :       RETURN_IF_ERROR(child_query_executor_->ExecAsync(move(child_queries)));
              :     } else {
              :       SetResultSet({"No partitions selected for incremental stats update."});
              :     }
              :     return Status::OK();
              :   }
I think we can skip going async for the compute stats case. It is starting a couple child queries and returning quickly.


http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@720
PS15, Line 720:   if (catalog_op_type() == TCatalogOpType::DDL &&
              :       ddl_type() == TDdlType::CREATE_TABLE_AS_SELECT) {
              :     // At this point, the remainder of the CTAS request executes
              :     // like a normal DML request. As with other DML requests, it will
              :     // wait for another catalog update if any partitions were altered as a result
              :     // of the operation.
              :     DCHECK(exec_request_->__isset.query_exec_request);
              :     RETURN_IF_ERROR(ExecAsyncQueryOrDmlRequest(exec_request_->query_exec_request));
              :   }
To handle CTAS, one option is to have ExecAsyncQueryOrDmlRequest() have a mode that stays executing in the same thread. In other words, ExecDdlRequest() spawns a thread for the part of this function that does the catalog op + everything after, and that thread ends up calling ExecAsyncQueryOrDmlRequest() with a parameter telling it to stay in this thread. (Naming of these functions would start to be an issue.)

I think that would be doable but we need to think through the state transitions. ExecAsyncQueryOrDmlRequest() currently starts with ExecState::INITIALIZED and transitions to ExecState::PENDING. We'd have to decide how to handle that. Either ExecDdlRequest() would return with the status still at INITIALIZED and then ExecAsyncQueryOrDmlRequest() would behave like normal or ExecDddlRequest() would return with status PENDING and ExecAsyncQueryOrDmlRequest() would need to be able to handle that case.


http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@738
PS15, Line 738:     ABORT_IF_ERROR(
ABORT_IF_ERROR will crash Impala if the Status is not OK. not-OK status for queries should be handled and returned to the client. This should be RETURN_IF_ERROR and the function around it should have a return type of Status.


http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@743
PS15, Line 743:   UpdateNonErrorExecState(ExecState::PENDING);
For DDLs, today we skip the PENDING state, so I think it makes sense to go directly to RUNNING. I don't think clients make a major distinction. We use PENDING in the query case to correspond to waiting for admission control, which doesn't apply here.



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Oct 2021 19:44:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9576/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Oct 2021 16:16:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are going
on and the timeout happens, AWS silently drops the connection and the
Impala client enters the hang state.

The fix maintains the TCLIService protocol between the client and Impala
server and applies to the following Impala clients which issue thrift
RPC ExecuteStatement() followed by repeated call to GetOperationStatus()
(HS2, Impyla and HUE) or a variant of it (Beeswax) to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::Exec()
starts a new thread for ClientRequestState::ExecDdlRequest() which
executes most of the DDLs asynchronously. This thread is waited for
in the wait thread. Since the wait thread also runs asynchronously,
the execution of the DDLs will not cause a wait on the Impala client.
Thus the Impala client can keep checking its execution status via
GetOperationStatus() without long waiting, say more than 350s.

This new way of execution DDLs asynchronously is not done for the
special CREATE TABLE AS SELECT DDL since it is already executed
asynchronously.

Testing: TBD

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
2 files changed, 25 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/15
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 21:

Classify and handle CREATE TABLE AS SELECT as async DDL.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Oct 2021 18:02:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 30:

After thinking about it, I agree with you that we may want a query option to control this behavior. That will make it easier to ship this without worrying about unknown client behavior. If it would help, I can give a quick idea of how I would implement it.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 30
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Oct 2021 16:52:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 32:

(3 comments)

Thanks a lot Joe!

Your timing tests are great and included.

http://gerrit.cloudera.org:8080/#/c/17872/31/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/31/be/src/service/client-request-state.cc@695
PS31, Line 695: (bool
> Nit: What do you think about dropping Async from this name?
Done


http://gerrit.cloudera.org:8080/#/c/17872/31/be/src/service/client-request-state.cc@702
PS31, Line 702:   // Indirectly check if running in thread async_exec_thread_.
              :   if (exec_dml_sync) {
> When I was reading this code, I got confused by the variable name. The code
Done


http://gerrit.cloudera.org:8080/#/c/17872/31/be/src/service/client-request-state.cc@706
PS31, Line 706:     // 1. For any non-CTAS DDLs, transition to RUNNING
              :     // 2. For CTAS DDLs, transition to RUNNING during FinishExecQueryOrDml
> When adding a delay, it usually lines up with some piece of code that can t
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 32
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Oct 2021 15:40:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#31). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are
executing and the timeout happens, AWS silently drops the connection and
the Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecAsyncDdlRequest()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run very low risks of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously.

External behavior changes:
  1. A new field with name "DDL execution mode:" is added to the
     summary section in the runtime profile, next to "DDL Type". This
     field takes either 'asynchronous' or 'synchronous' as value.
  2. A new query option 'enable_async_ddl_execution', default to true,
     is added. It can be set to false to turn off the patch.

Limitations:
  This patch does not handle potential AWS NLB-type time out for LOAD
  DATA (IMPALA-10967).

Testing:
  1. Added new async. DDL unit tests with HS2, HS2-HTTP, Beeswax and
     JDBC clients
  2. Ran core tests successfully

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test
M tests/common/impala_test_suite.py
M tests/metadata/test_ddl.py
9 files changed, 255 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/31
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 31
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 31:

(2 comments)

Address all review comments and add the query option enable_async_ddl_execution.

http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_ddl.py@903
PS29, Line 903: 
              : # IMPALA-10811: RPC to submit query getting stuck for AWS NLB forever
              : # Test HS2, Beeswax and HS2-HTTP three clients.
              : class TestAsyncDDL(TestDdlBase):
              :   @classmethod
              :   def get_workload(self):
              :     return 'functional-query'
              : 
              :   @classmethod
              :   def add_test_dimensions(cls):
              :     super(TestAsyncDDL, cls).add_test_dimensions()
              :     cls.ImpalaTestMatrix.add_dimension(create_client_protocol_dimension())
              :     cls.ImpalaTestMatrix.add_dimension(create_exec_option_dimension(
              :         sync_ddl=[0], disable_codegen_options=[False]))
              : 
              :   def test_async_ddl(self, vector, unique_database):
> As with the JDBC test, the way I'm thinking about this test is whether this
Removed CTAS test from async_ddl and replaced it with the state transition test for three clients here. 

Retain only alter table recover partition test in async_ddl.

Done.


http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_metadata_query_statements.py
File tests/metadata/test_metadata_query_statements.py:

http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_metadata_query_statements.py@153
PS29, Line 153:   @pytest.mark.execute_serially # due to data src setup/teardown
              :   @SkipIfCatalogV2.data_sources_unsupported()
              :   def test_show_data_sources(self, vector):
              :     try:
              :       self.__create_data_sources()
              :       self.run_test_case('QueryTest/show-data-sources', vector)
              :     finally:
              :       self.__drop_data_sources()
              : 
              :   def __drop_data_sources(self):
              :     for name in self.TEST_DATA_SR
> The 21050 port is the HS2 port.
Since we claim that the protocol does not change with the patch, this is a functional test for JDBC client. 

We have similar tests for HS2, BW and HS2-http.

By looking at JDBC driver setup for other vendors, it seems one can set the port in the URL connection string. See one example here https://www.baeldung.com/java-jdbc-url-format. 

In the invocation of Impala JDBC client, the port number is set to 21050. So it is very likely (without real code to look at) that the Impala JDBC client is capable of talking to impala hs2 server. 

('cmd=', '/home/qchen/Impala.07202021/bin/run-jdbc-client.sh -i "localhost:21050" -t NOSASL -q "drop table if exists test_async_ddl_with_JD
BC_bc95ba92.foo;"')



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 31
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Oct 2021 21:44:26 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 26:

Address review comment and add JDBC tests for async. DDLs.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 26
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 18 Oct 2021 16:33:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 37:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7557/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 37
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Oct 2021 17:21:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9508/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Sep 2021 20:55:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 10:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/17872/10/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/17872/10/be/src/service/impala-server.h@647
PS10, Line 647:     
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17872/10/be/src/service/impala-server.h@654
PS10, Line 654:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17872/10/be/src/service/impala-server.h@658
PS10, Line 658:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17872/10/be/src/service/impala-server.h@661
PS10, Line 661:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17872/10/be/src/service/impala-server.h@664
PS10, Line 664:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17872/10/be/src/service/impala-server.h@667
PS10, Line 667:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17872/10/be/src/service/impala-server.h@751
PS10, Line 751:       const TExecRequest* external_exec_request = nullptr, 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17872/10/be/src/service/impala-server.h@1253
PS10, Line 1253:     apache::hive::service::cli::thrift::TExecuteStatementResp& return_val, 
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 10
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 Oct 2021 19:51:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 25:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17872/25/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/25/be/src/service/client-request-state.cc@773
PS25, Line 773:       unique_lock<mutex> unique_lock(lock_);
You don't need this lock, so let's remove it.



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 25
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Oct 2021 23:58:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 25:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc@713
PS21, Line 713:   // If this is a CTAS request, there will usually be more work to do
              :   // after executing the CREATE TABLE statement (the INSERT portion of the operation).
              :   // The exception is if the user specified IF NOT EXISTS and the table already
              :   // existed, in which case we do not execute the INSERT.
              :   if (catalog_op_type() == TCatalogOpType::DDL &&
              :    
> Let me expand this code example:
The difference with the state in main thread for CTAS before overwritten by the child thread is as follows.

1. Previous code: INITIALIZED 
2. Your proposal: PENDING

In either version, the state is set prior to returning from Exec(). 

For non-CTAS DDLs, the state in main thread before overwritten by the child thread is as follows.

1. Previous code: INITIALIZED
2. Your proposal: RUNNING

It seems we disagree on when to set the state and by which thread. My thinking is that it should be the responsibility of the thread who does the state change (i.e., ExecDdlRequestImplAsync() when running in async_exec_thread_). In the current code, the transition is set at the beginning and a better place is after the delay via CRS_DELAY_BEFORE_CATALOG_OP_EXEC.

Please let me know. I probably will not be able to work on it until next week.


http://gerrit.cloudera.org:8080/#/c/17872/25/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/25/be/src/service/client-request-state.cc@773
PS25, Line 773:       unique_lock<mutex> unique_lock(lock_);
> You don't need this lock, so let's remove it.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 25
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 Oct 2021 02:00:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 22:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py
File tests/hs2/test_hs2.py:

http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py@356
PS22, Line 356:   def test_get_operation_status_for_async_ddl(self):
Use unique_database, as that will avoid needing to think about cleanup of the table.


http://gerrit.cloudera.org:8080/#/c/17872/22/tests/hs2/test_hs2.py@368
PS22, Line 368:     execute_statement_resp = self.execute_statement(statement)
What I would do is:
select count(*) from functional_parquet.alltypes;
(which means that the metadata is already loaded and a subsequent exec referencing that table is fast)
then
set debug_action to delay at the right place for 5+ seconds
start_time = time.time()
create table alltypes_clone as select * from functional_parquet.alltypes;
end_time = time.time()
# Should take less than one second
assert end_time - start_time < 1.0

I think this is a pretty clear test for whether this is async.



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 22
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Oct 2021 21:08:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 4:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9540/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 4
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Oct 2021 14:45:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

[WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout. The scope of the fix applies to the following Impala clients.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

These clients issue thrift RPC ExecuteStatement() followed by repeated
call to GetOperationStatus() (HS2, Impyla and HUE) or a variant of it
(Beeswax) to Impala backend.

In the fix, the backend operation for ExecuteStatement() runs
asynchronously in a new thread and its completion status is checked
periodically via the equivalent of the GetOperationStatus() from the
client. A new execution state CATALOG_OP_RUNNING is added to represent
the new execution state.

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/exec/catalog-op-executor.cc
M be/src/exec/catalog-op-executor.h
A be/src/service/catalog-op-util.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
9 files changed, 366 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 4
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 31:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9630/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 31
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Oct 2021 22:05:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are
executing and the timeout happens, AWS silently drops the connection and
the Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecDdlRequestImpl()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run very low risks of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously.

External behavior change:
  1. A new field with name "DDL execution mode:" is added to the
     summary section in the runtime profile, next to "DDL Type". This
     field takes either 'asynchronous' or 'synchronous' as value.
  2. A new query option 'enable_async_ddl_execution', default to true,
     is added. It can be set to false to turn off the patch.

Limitations:
  This patch does not handle potential AWS NLB-type time out for LOAD
  DATA (IMPALA-10967).

Testing:
  1. Added new async. DDL unit tests with HS2, HS2-HTTP, Beeswax and
     JDBC clients.
  2. Ran core tests successfully.

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Reviewed-on: http://gerrit.cloudera.org:8080/17872
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test
M tests/common/impala_test_suite.py
M tests/metadata/test_ddl.py
9 files changed, 386 insertions(+), 26 deletions(-)

Approvals:
  Joe McDonnell: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 38
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 22:

Add a new unit test to measure # of calls to get_operation_status() in test_hs2.py. 

Also rename test_alter_cover to test_aync_ddl.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 22
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Oct 2021 20:40:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 19:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9588/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Oct 2021 20:24:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9509/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Sep 2021 21:02:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 13:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9569/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 13
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Oct 2021 17:42:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 11:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17872/11/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/17872/11/be/src/service/impala-server.h@645
PS11, Line 645:     
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17872/11/be/src/service/impala-server.h@655
PS11, Line 655:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17872/11/be/src/service/impala-server.h@658
PS11, Line 658:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17872/11/be/src/service/impala-server.h@661
PS11, Line 661:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17872/11/be/src/service/impala-server.h@664
PS11, Line 664:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17872/11/be/src/service/impala-server.h@748
PS11, Line 748:       const TExecRequest* external_exec_request = nullptr, 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/17872/11/be/src/service/impala-server.h@1238
PS11, Line 1238:     apache::hive::service::cli::thrift::TExecuteStatementResp& return_val, 
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 11
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Oct 2021 17:06:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9545/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 7
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Oct 2021 16:44:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

[WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout. The scope of the fix applies to the following Impala clients.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

These clients issue thrift RPC ExecuteStatement() followed by repeated
call to GetOperationStatus() (HS2, Impyla and HUE) or a variant of it
(Beeswax) to Impala backend.

In the fix, the backend operation for ExecuteStatement() runs
asynchronously in a new thread and its completion status is checked
periodically via the equivalent of the GetOperationStatus() from the
client. A new execution state CATALOG_OP_RUNNING is added to represent
the new execution state.

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/exec/catalog-op-executor.cc
M be/src/exec/catalog-op-executor.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
6 files changed, 194 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 3
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#23). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are going
on and the timeout happens, AWS silently drops the connection and the
Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecAsyncDdlRequest()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run very low risks of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously.

Limitations:
  This patch does not handle potential AWS NLB-type time out for LOAD
  DATA (IMPALA-10967).

Testing:
  1. New async. DDL unit tests with HS2, HS2-HTTP and Beeswax clients
  2. Core tests

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test
M tests/hs2/test_hs2.py
M tests/metadata/test_ddl.py
5 files changed, 211 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/23
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 23
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 25:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc@713
PS21, Line 713:   // If this is a CTAS request, there will usually be more work to do
              :   // after executing the CREATE TABLE statement (the INSERT portion of the operation).
              :   // The exception is if the user specified IF NOT EXISTS and the table already
              :   // existed, in which case we do not execute the INSERT.
              :   if (catalog_op_type() == TCatalogOpType::DDL &&
              :    
> I should have been more explicit. I want CTAS to go to PENDING and non-CTAS
I do not think it will work, since Exec() sets the state to RUNNING (at the end of the method) for sync code path, and RUNNING->RUNNING is not allowed. 

So what we should do is to set the state for async only, which is exactly done here, to avoid another illegal transition INITIALIZED->FINISHED 

I can add a comment in ExecDdlRequest() to explain this.



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 25
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Oct 2021 23:26:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#29). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are
executing and the timeout happens, AWS silently drops the connection and
the Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecAsyncDdlRequest()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

Externally, a new field with name "DDL execution mode:" has been added
to the summary section in the runtime profile, next to "DDL Type". This
field takes either 'asynchronous' or 'synchronous' as value.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run very low risks of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously.

Limitations:
  This patch does not handle potential AWS NLB-type time out for LOAD
  DATA (IMPALA-10967).

Testing:
  1. Added new async. DDL unit tests with HS2, HS2-HTTP, Beeswax and
     JDBC clients
  2. Ran core tests successfully

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test
M tests/common/impala_test_suite.py
M tests/hs2/test_hs2.py
M tests/metadata/test_ddl.py
M tests/metadata/test_metadata_query_statements.py
7 files changed, 241 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/29
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 29
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 33:

(2 comments)

Thanks for incorporating the test, just some last notes

http://gerrit.cloudera.org:8080/#/c/17872/33/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/33/be/src/service/client-request-state.cc@703
PS33, Line 703:   if (exec_dml_sync) {
What I meant to say is that "exec_dml_sync" is a name that is the opposite of its boolean value. "exec_dml_sync" is currently true when we are running asynchronously in the worker thread and false when we are running synchronously in the main thread. I just want the name to align with the condition. i.e. either "if (!exec_ddl_sync)" or "if (exec_ddl_async)".

I have a bit of brain fuzz, and I also missed that this should be "ddl" rather than "dml" (here and elsewhere).


http://gerrit.cloudera.org:8080/#/c/17872/33/be/src/service/client-request-state.cc@710
PS33, Line 710: 
              :     // Optionally wait with a debug action before Exec() below.
              :     DebugActionNoFail(
              :         exec_request_->query_options, "CRS_DELAY_BEFORE_CATALOG_OP_EXEC");
I'm sorry, I should have mentioned that my extra test will need this to be outside the if statement, because it is testing delays for both the sync and async cases.



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 33
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Oct 2021 17:06:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 34: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7553/


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 34
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Oct 2021 01:08:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. After the timeout due to some long
query compilation for example, the connection is silently dropped and
the Impala client enters the hang state.

The scope of the fix applies to the following Impala clients which
issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it
(Beeswax) to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

The TCLIService protocol between the client and Impala server remains
unchanged.

In the fix, the backend method ImpalaServer::ExecuteStatement()
starts a new thread for ImpalaServer::ExecuteStatementCommon() which
can reach two states: COMPILED and DONE. The COMPILED is when the
query has been successfully compiled and the DONE is for the execution
to reach the end successfully or to encounter any errors. The main
thread, which start the new thread, waits for the COMPILED state
before advancing to another short wait period for the DONE state. If
the DONE state is not reached, the control is returned back to the
client and the client will issue GetOperationStatus() repeatedly to
check if the execution has reached the DONE state. When Impala server
detects the FINISHED execution state or there is error in servicing
GetOperationStatus(), the new thread is joined and released.

In addition, a cild query, which is submitted from the Impala server
as an Impala client for compute stats stmt, runs synchronously in
the same child query thread.

Testing: TBD

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/child-query.cc
M be/src/service/client-request-state.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
6 files changed, 251 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/9
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 9
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17872/5/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/17872/5/be/src/service/impala-hs2-server.cc@497
PS5, Line 497:   HS2_NOTIFY_AND_RETURN_IF_ERROR(return_val, CheckNotShuttingDown(), SQLSTATE_GENERAL_ERROR);
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17872/5/be/src/service/impala-hs2-server.cc@517
PS5, Line 517:     HS2_NOTIFY_AND_RETURN_IF_ERROR(return_val, Status::Expected(err_msg), SQLSTATE_GENERAL_ERROR);
line too long (98 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 5
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Oct 2021 14:23:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 18:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17872/18/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/17872/18/tests/metadata/test_ddl.py@904
PS18, Line 904: class TestAlterTableRecover(TestDdlBase):
flake8: E302 expected 2 blank lines, found 1


http://gerrit.cloudera.org:8080/#/c/17872/18/tests/metadata/test_ddl.py@924
PS18, Line 924: class TestAlterTableRecoverWithBeeswax(TestDdlBase):
flake8: E302 expected 2 blank lines, found 1



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Oct 2021 19:23:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are going
on and the timeout happens, AWS silently drops the connection and the
Impala client enters the hang state.

The fix maintains the TCLIService protocol between the client and Impala
server and applies to the following Impala clients which issue thrift
RPC ExecuteStatement() followed by repeated call to GetOperationStatus()
(HS2, Impyla and HUE) or a variant of it (Beeswax) to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ImpalaServer::ExecuteStatementCommon()
starts a new thread for ImpalaServer::ExecuteStatementCommonInternal()
which can reach two states: COMPILED and DONE. The COMPILED is when the
front end has successfully compiles the query and the DONE is for the
execution of the query plan to reach the end successfully or to
encounter any errors. The main thread, which start the new thread,
waits for the COMPILED state before advancing to another short wait
period for the DONE state. If the DONE state is not reached, the
control is returned back to the client and the client will issue
GetOperationStatus() repeatedly to check if the execution has reached
the DONE state. When Impala server detects the FINISHED execution state
or there is error in servicing GetOperationStatus(), the new thread is
joined and released. Thus for a long DDL query, its execution part is
done in the new thread and the Impala client keeps checking its status
via GetOperationStatus() without waiting more than 350s.

In addition, a cild query, which is submitted from the Impala server
as an Impala client for compute stats stmt, runs synchronously in
the same child query thread.

The communication area between the new thread and the host thread
is per session.

Testing: TBD

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/child-query.cc
M be/src/service/client-request-state.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/thread.h
7 files changed, 322 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/11
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 11
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 18:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9587/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Oct 2021 19:44:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#21). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are going
on and the timeout happens, AWS silently drops the connection and the
Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecAsyncDdlRequest()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run very low risks of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously.

Limitations:
  This patch does not handle potential AWS NLB-type time out for LOAD
  DATA (IMPALA-10967).

Testing:
  1. Unit tests with HS2, HS2-HTTP and Beeswax clients
  2. Core tests

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A testdata/workloads/functional-query/queries/QueryTest/alter-table-recover.test
M tests/metadata/test_ddl.py
4 files changed, 142 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/21
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 23:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9602/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 23
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Oct 2021 21:32:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 34:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17872/33/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/33/be/src/service/client-request-state.cc@703
PS33, Line 703:   if (exec_in_worker_t
> What I meant to say is that "exec_dml_sync" is a name that is the opposite 
Renamed to exec_in_worker_thread. I was having hard time interpreting the previously named argument too. This name should help a lot. 

Done


http://gerrit.cloudera.org:8080/#/c/17872/33/be/src/service/client-request-state.cc@710
PS33, Line 710:   }
              : 
              :   // Optionally wait with a debug action before Exec() below.
              :   DebugActionNoFail(exec_request_->query_options, "CRS_DELAY_BEFORE_CATALO
> I'm sorry, I should have mentioned that my extra test will need this to be 
The change will make the execution time (exec_time) longer than 10s, for the sync mode in general. Modified your test cases accordingly.  

Done.



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 34
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Oct 2021 17:52:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 34:

(2 comments)

Ok, this is looking good to me. I'm ready to approve this. I had only one nit.

http://gerrit.cloudera.org:8080/#/c/17872/33/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/33/be/src/service/client-request-state.cc@703
PS33, Line 703:   if (exec_in_worker_t
> Renamed to exec_in_worker_thread. I was having hard time interpreting the p
Thanks, this makes sense.


http://gerrit.cloudera.org:8080/#/c/17872/34/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/17872/34/tests/metadata/test_ddl.py@1066
PS34, Line 1066:         assert(exec_time >= 4)
Nit: This can be "assert(exec_time >= 10)"



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 34
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Oct 2021 18:10:24 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#35). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are
executing and the timeout happens, AWS silently drops the connection and
the Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecDdlRequestImpl()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run very low risks of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously.

External behavior change:
  1. A new field with name "DDL execution mode:" is added to the
     summary section in the runtime profile, next to "DDL Type". This
     field takes either 'asynchronous' or 'synchronous' as value.
  2. A new query option 'enable_async_ddl_execution', default to true,
     is added. It can be set to false to turn off the patch.

Limitations:
  This patch does not handle potential AWS NLB-type time out for LOAD
  DATA (IMPALA-10967).

Testing:
  1. Added new async. DDL unit tests with HS2, HS2-HTTP, Beeswax and
     JDBC clients.
  2. Ran core tests successfully.

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test
M tests/common/impala_test_suite.py
M tests/metadata/test_ddl.py
9 files changed, 386 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/35
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 35
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 26:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9618/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 26
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 18 Oct 2021 16:57:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#30). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are
executing and the timeout happens, AWS silently drops the connection and
the Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecAsyncDdlRequest()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

Externally, a new field with name "DDL execution mode:" has been added
to the summary section in the runtime profile, next to "DDL Type". This
field takes either 'asynchronous' or 'synchronous' as value.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run very low risks of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously.

Limitations:
  This patch does not handle potential AWS NLB-type time out for LOAD
  DATA (IMPALA-10967).

Testing:
  1. Added new async. DDL unit tests with HS2, HS2-HTTP, Beeswax and
     JDBC clients
  2. Ran core tests successfully

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test
M tests/common/impala_test_suite.py
M tests/hs2/test_hs2.py
M tests/metadata/test_ddl.py
M tests/metadata/test_metadata_query_statements.py
7 files changed, 245 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/30
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 30
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 32:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17872/32/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/17872/32/tests/metadata/test_ddl.py@997
PS32, Line 997: class TestAsyncDDLTiming(TestDdlBase):
flake8: E302 expected 2 blank lines, found 1



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 32
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Oct 2021 15:40:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 34:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9637/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 34
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Oct 2021 18:14:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#34). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are
executing and the timeout happens, AWS silently drops the connection and
the Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecDdlRequestImpl()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run very low risks of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously.

External behavior changes:
  1. A new field with name "DDL execution mode:" is added to the
     summary section in the runtime profile, next to "DDL Type". This
     field takes either 'asynchronous' or 'synchronous' as value.
  2. A new query option 'enable_async_ddl_execution', default to true,
     is added. It can be set to false to turn off the patch.

Limitations:
  This patch does not handle potential AWS NLB-type time out for LOAD
  DATA (IMPALA-10967).

Testing:
  1. Added new async. DDL unit tests with HS2, HS2-HTTP, Beeswax and
     JDBC clients.
  2. Ran core tests successfully.

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test
M tests/common/impala_test_suite.py
M tests/metadata/test_ddl.py
9 files changed, 386 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/34
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 34
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 35:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17872/34/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/17872/34/tests/metadata/test_ddl.py@1066
PS34, Line 1066:         assert(exec_time >= 10
> Nit: This can be "assert(exec_time >= 10)"
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 35
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Oct 2021 01:45:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 29:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_ddl.py@903
PS29, Line 903: # IMPALA-10811: RPC to submit query getting stuck for AWS NLB forever
              : # Test HS2, Beeswax and HS2-HTTP three clients.
              : class TestAsyncDDL(TestDdlBase):
              :   @classmethod
              :   def get_workload(self):
              :     return 'functional-query'
              : 
              :   @classmethod
              :   def add_test_dimensions(cls):
              :     super(TestAsyncDDL, cls).add_test_dimensions()
              :     cls.ImpalaTestMatrix.add_dimension(create_client_protocol_dimension())
              :     cls.ImpalaTestMatrix.add_dimension(create_exec_option_dimension(
              :         sync_ddl=[0]))
              : 
              :   def test_async_ddl(self, vector, unique_database):
              :     self.run_test_case('QueryTest/async_ddl', vector, use_db=unique_database)
> The purpose of this test is to check out 3 different clients.
As with the JDBC test, the way I'm thinking about this test is whether this detects any failure cases. If we ran this test without this patch, it would succeed.

What defect does this test catch? What would cause this test to fail?

I consider the current test_hs2.py test better than this test, because it can definitely detect defects and would fail if this patch were not applied. I think the way forward is for that type of test to apply to all the cases that we care about. That may mean moving that test out of test_hs2.py to a location that runs for all clients. For example, that could move here and use ImpalaConnection's execute_async and get_state. This is implemented for beeswax and hs2.

https://github.com/apache/impala/blob/master/tests/common/impala_connection.py#L214
https://github.com/apache/impala/blob/master/tests/common/impala_connection.py#L224


http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_metadata_query_statements.py
File tests/metadata/test_metadata_query_statements.py:

http://gerrit.cloudera.org:8080/#/c/17872/29/tests/metadata/test_metadata_query_statements.py@153
PS29, Line 153:   def test_async_ddl_with_JDBC(self, vector, unique_database):
              :     self.exec_with_jdbc("drop table if exists {0}.test_table".format(unique_database))
              :     self.exec_with_jdbc_and_compare_result(
              :         "create table {0}.test_table(a int)".format(unique_database),
              :         "'Table has been created.'")
              : 
              :     self.exec_with_jdbc("drop table if exists {0}.alltypes_clone".format(unique_database))
              :     self.exec_with_jdbc_and_compare_result(
              :         "create table {0}.alltypes_clone as select * from\
              :         functional_parquet.alltypes".format(unique_database),
              :         "'Inserted 7300 row(s)'")
> It looks like the compiled java code for JDBC is based on the standard JDBC
The 21050 port is the HS2 port.
https://github.com/apache/impala/blob/master/be/src/service/impala-server.cc#L153-L154

My motivation here is that I don't want to duplicate existing tests if we can avoid it. From what I'm seeing, this is using HS2, and it is running a simple create table and a CTAS.

What type of defect can this test catch? Walk me through a time when this test can fail and why this test is unique.



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 29
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 19 Oct 2021 23:14:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 37: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 37
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Oct 2021 23:42:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 25:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17872/24/tests/hs2/test_hs2.py
File tests/hs2/test_hs2.py:

http://gerrit.cloudera.org:8080/#/c/17872/24/tests/hs2/test_hs2.py@361
PS24, Line 361: p
> flake8: E126 continuation line over-indented for hanging indent
Done


http://gerrit.cloudera.org:8080/#/c/17872/24/tests/hs2/test_hs2.py@367
PS24, Line 367: e
> flake8: E265 block comment should start with '# '
Done


http://gerrit.cloudera.org:8080/#/c/17872/24/tests/hs2/test_hs2.py@369
PS24, Line 369: t
> flake8: E502 the backslash is redundant between brackets
Done


http://gerrit.cloudera.org:8080/#/c/17872/24/tests/hs2/test_hs2.py@370
PS24, Line 370: 
> flake8: E126 continuation line over-indented for hanging indent
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 25
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Oct 2021 21:30:00 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 14:

I'm starting to look at this. I thought we were going to have ClientRequestState::ExecDdlRequest() become async similar to ClientRequestState::ExecAsyncQueryOrDmlRequest(). Was there some problem with that approach?


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 14
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Oct 2021 20:31:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 21:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc@652
PS21, Line 652:   string op_type = catalog_op_type() == TCatalogOpType::DDL ?
              :       PrintThriftEnum(ddl_type()) : PrintThriftEnum(catalog_op_type());
              :   summary_profile_->AddInfoString("DDL Type", op_type);
Nit: This is identical for sync/async. To avoid duplication, let's put this in ExecDdlRequest() prior to deciding whether to do async/sync.


http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc@694
PS21, Line 694:   catalog_op_executor_.reset(
              :       new CatalogOpExecutor(ExecEnv::GetInstance(), frontend_, server_profile_));
              :   Status status = catalog_op_executor_->Exec(exec_request_->catalog_op_request);
              :   {
              :     lock_guard<mutex> l(lock_);
              :     RETURN_IF_ERROR(UpdateQueryStatus(status));
              :   }
              : 
              :   // Add newly created table to catalog cache.
              :   RETURN_IF_ERROR(parent_server_->ProcessCatalogUpdateResult(
              :       *catalog_op_executor_->update_catalog_result(),
              :       exec_request_->query_options.sync_ddl));
              : 
              :   // Set the results to be reported to the client.
              :   SetResultSet(catalog_op_executor_->ddl_exec_response());
              :   return Status::OK();
Nit: This code should be unreachable, so replace this with a DCHECK(false);


http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc@713
PS21, Line 713:   // Transition the exec state to RUNNING for any non-CTAS DDLs. For the later, the
              :   // state is set from PENDING to RUNNING inside ExecQueryOrDmlRequest().
              :   if (catalog_op_type() != TCatalogOpType::DDL
              :       || ddl_type() != TDdlType::CREATE_TABLE_AS_SELECT) {
              :     UpdateNonErrorExecState(ExecState::RUNNING);
              :   }
Nit: I would rather go directly to RUNNING without passing through PENDING for non-CTAS. Having the state transitions for both in the same place makes it easy to understand that there is a distinction.


http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc@719
PS21, Line 719:   string op_type = catalog_op_type() == TCatalogOpType::DDL ?
              :       PrintThriftEnum(ddl_type()) : PrintThriftEnum(catalog_op_type());
              :   summary_profile_->AddInfoString("DDL Type", op_type);
Nit: See other comment about this code.



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Oct 2021 20:49:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 20:

(13 comments)

Address all comments except for the one that runs CREATE TABLE AS SELECT in async.

http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@720
PS15, Line 720: 
              :   // Set the results to be reported to the client.
              :   SetResultSet(catalog_op_executor_->ddl_exec_response());
              :   return Status::OK();
              : }
              : 
              : void ClientRequestState::ExecDdlRequestImplAsync() {
              :   string op_type = catalog_op_type() == TCatalogOpType::DDL ?
              :    
> I hacked a bit on this, and the modifications are not that big. This is wha
Thanks a lot Joe for this. 

I'll integrate your change in a separate patch set for this JIRA.


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@643
PS17, Line 643: DdlRequ
> Nit: We have an unrelated feature called "sync_ddl". If we can avoid simila
Done. Change the naming slightly to make it clearer. 

Impl for implementation

ExecDdlRequestImplAsync()
ExecDdlRequestImplSync()
ShouldRunExecDdlAsync()


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@688
PS17, Line 688:   Status status = catalog_op_executor_->Exec(exec_request_->catalog_op_request);
              :   {
> Same as other DebugAction location. Same fixes.
This debug action was removed since it is in the sync path.


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@728
PS17, Line 728:       
> For this async code, I think we are better off with a void return value, be
Done


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@735
PS17, Line 735:   Status status = catalog_op_executor_->Exec(exec_request_->catalog_op_request);
              :   {
> For here, I think you can use DebugActionNoFail(), since we are using this 
Done


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@744
PS17, Line 744:       exec_request_->query_options.sync_ddl);
              :   {
              :     lock_guard<mutex> l(lock_);
> An important thing here is that nothing is actually using the return value 
Good point. Done


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@778
PS17, Line 778:       RETURN_IF_ERROR(Thread::Create("impala-server", "async_exec_thread_",
              :           &ClientRequestState::ExecDdlRequestImplAsync, this, &async_exec_thread
> Please go look at the comment that I made on the previous upload about ABOR
Done


http://gerrit.cloudera.org:8080/#/c/17872/17/be/src/service/client-request-state.cc@781
PS17, Line 781:     return Status::OK();
> Do this before spawning the thread (similar to how ExecAsyncQueryOrDmlReque
Okay. Changed the order and set the state to PENDING to account for the possibility of failure to fork the thread.


http://gerrit.cloudera.org:8080/#/c/17872/19/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/19/be/src/service/client-request-state.cc@643
PS19, Line 643: Status ClientRequestState::ExecDdlRequestImplSync() {
> LOAD DATA might also reset metadata through CatalogOpExecutor::Exec which m
Interesting. 

By looking at the code, it seems the real load data is done in java (https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/Frontend.java#L748). 

After that, the meta-data refresh is done with Catalog Op Exec call (https://github.com/apache/impala/blob/master/be/src/service/client-request-state.cc#L259). 

Looks like some more research is needed to correctly handle AWS NLB-type timeout for data loading in general. My guess is that frontend_->LoadData() probably should be running in a thread too. 

For this JIRA, I will document the limitation in the commit message and file a new JIRA.


http://gerrit.cloudera.org:8080/#/c/17872/19/be/src/service/client-request-state.cc@776
PS19, Line 776:     {
> I think earlier this was not being counted for 'EXEC_TIME_LIMIT_S', but aft
Did a test with CREAT TABLE with execution time limit and found the limit is tracked by admission control regardless of how the ddl is executed. See https://github.com/apache/impala/blob/master/be/src/service/impala-server.cc#L1336, where queries_by_timestamp_ is updated with the DDL query.

Thus I think we are OK here.


http://gerrit.cloudera.org:8080/#/c/17872/19/be/src/service/client-request-state.cc@779
PS19, Line 779:           &ClientRequestState::ExecDdlRequestImplAsync, this, &async_exec_thread_));
> Thread spawned just above (Line 776) might finish off before the execution 
Done. See my reply to Joe's comment in this area.


http://gerrit.cloudera.org:8080/#/c/17872/19/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/17872/19/tests/metadata/test_ddl.py@913
PS19, Line 913:     cls.ImpalaTestMatrix.add_dimension(create_client_protocol_dimension())
              :     cls.ImpalaTestMatrix.add_dimension(create_exec_option_dimension(
              :         sync_ddl=[0]))
> Test dimensions run tests multiple times with different configurations. If 
Good to know! 

Done.


http://gerrit.cloudera.org:8080/#/c/17872/19/tests/metadata/test_ddl.py@921
PS19, Line 921: dding/dropping of .jar and .so in the lib cache.
> You can omit multiple_impalad.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Oct 2021 15:49:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 20:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9594/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Oct 2021 16:10:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#22). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are going
on and the timeout happens, AWS silently drops the connection and the
Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecAsyncDdlRequest()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run very low risks of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously.

Limitations:
  This patch does not handle potential AWS NLB-type time out for LOAD
  DATA (IMPALA-10967).

Testing:
  1. New async. DDL unit tests with HS2, HS2-HTTP and Beeswax clients
  2. Core tests

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test
M tests/hs2/test_hs2.py
M tests/metadata/test_ddl.py
5 files changed, 208 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/22
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 22
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9555/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 9
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 04 Oct 2021 17:47:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 6:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9542/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 6
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Oct 2021 16:22:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#27). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are going
on and the timeout happens, AWS silently drops the connection and the
Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecAsyncDdlRequest()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

Externally, a new field with name "DDL execution mode:" has been added
to the summary section in the runtime profile, next to "DDL Type". This
field takes either asynchronous or synchronous.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run very low risks of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously.

Limitations:
  This patch does not handle potential AWS NLB-type time out for LOAD
  DATA (IMPALA-10967).

Testing:
  1. Added new async. DDL unit tests with HS2, HS2-HTTP, Beeswax and
     JDBC clients
  2. Ran core tests

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test
M tests/common/impala_test_suite.py
M tests/hs2/test_hs2.py
M tests/metadata/test_ddl.py
M tests/metadata/test_metadata_query_statements.py
7 files changed, 233 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/27
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 27
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#26). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are going
on and the timeout happens, AWS silently drops the connection and the
Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecAsyncDdlRequest()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

Externally, a new field with name "DDL execution mode:" has been added
to the summary section in the runtime profile, next to "DDL Type". This
field takes either asynchronous or synchronous.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run very low risks of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously.

Limitations:
  This patch does not handle potential AWS NLB-type time out for LOAD
  DATA (IMPALA-10967).

Testing:
  1. Added new async. DDL unit tests with HS2, HS2-HTTP, Beeswax and
     JDBC clients
  2. Ran core tests

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test
M tests/common/impala_test_suite.py
M tests/hs2/test_hs2.py
M tests/metadata/test_ddl.py
M tests/metadata/test_metadata_query_statements.py
7 files changed, 236 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/26
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 26
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 29:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9620/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 29
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 18 Oct 2021 21:33:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 21:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9596/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Oct 2021 18:24:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17872/2/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/2/be/src/service/client-request-state.cc@221
PS2, Line 221: std::string GetDebugString(const TExecRequest& exec_request, const TUniqueId& query_id = TUniqueId()) {
line too long (103 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Sep 2021 20:40:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 36:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7554/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 36
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Oct 2021 01:47:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#32). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are
executing and the timeout happens, AWS silently drops the connection and
the Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecDdlRequestImpl()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run very low risks of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously.

External behavior changes:
  1. A new field with name "DDL execution mode:" is added to the
     summary section in the runtime profile, next to "DDL Type". This
     field takes either 'asynchronous' or 'synchronous' as value.
  2. A new query option 'enable_async_ddl_execution', default to true,
     is added. It can be set to false to turn off the patch.

Limitations:
  This patch does not handle potential AWS NLB-type time out for LOAD
  DATA (IMPALA-10967).

Testing:
  1. Added new async. DDL unit tests with HS2, HS2-HTTP, Beeswax and
     JDBC clients.
  2. Ran core tests successfully.

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test
M tests/common/impala_test_suite.py
M tests/metadata/test_ddl.py
9 files changed, 386 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/32
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 32
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are going
on and the timeout happens, AWS silently drops the connection and the
Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecAsyncDdlRequest()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run very low risks of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously;
  3. CREATE TABLE AS SELECT as the SELECT part already runs
     asynchronously;

Limitations:
  This patch does not handle potential AWS NLB-type time out for LOAD
  DATA (IMPALA-10967).

Testing:
  1. Unit tests with HS2 and Beeswax client
  2. Core tests

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A testdata/workloads/functional-query/queries/QueryTest/alter-table-recover.test
M tests/metadata/test_ddl.py
4 files changed, 114 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/20
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 18:

Add unit test for HS2 and Beeswax client to demonstrate that Alter Table Recover works with a simulated delay of 15s.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Oct 2021 19:23:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. After the timeout due to some long
query compilation for example, the connection is silently dropped and
the impala client enters the hang state.

The scope of the fix applies to the following Impala clients which
issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it
(Beeswax) to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

The TCLIService protocol between the client and Impala server remains
unchanged.

In the fix, the backend method ImpalaServer::ExecuteStatement()
starts a new thread for ImpalaServer::ExecuteStatementCommon() which
can reach two stages: COMPILED and DONE. The COMPILED is when the
query has been successfully compiled and the DONE is for the execution
to reach the end successfully or to encounter any errors. The main
thread, which start the new thread, waits for the COMPILED state
before advancing to another short wait period for the DONE state. If
the DONE state is not reached, the control is returned back to the
client and the client will issue GetOperationStatus() repeatedly to
check if the execution has reached the DONE state. When the client
detects the FINISHED state via GetOperationStatus(), the new thread
is joined and released.

In addition, a cild query, which is submitted from the Impala server
as an Impala client for compute stats stmt, runs synchronously in
the same child query thread.

Testing: TBD

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/child-query.cc
M be/src/service/client-request-state.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
6 files changed, 226 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/8
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 8
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are going
on and the timeout happens, AWS silently drops the connection and the
Impala client enters the hang state.

The fix maintains the TCLIService protocol between the client and Impala
server and applies to the following Impala clients which issue thrift
RPC ExecuteStatement() followed by repeated call to GetOperationStatus()
(HS2, Impyla and HUE) or a variant of it (Beeswax) to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ImpalaServer::ExecuteStatementCommon()
starts a new thread for ImpalaServer::ExecuteStatementCommonInternal()
which can reach two states: COMPILED and DONE. The COMPILED is when the
front end has successfully compiles the query and the DONE is for the
execution of the query plan to reach the end successfully or to
encounter any errors. The main thread, which start the new thread,
waits for the COMPILED state before advancing to another short wait
period for the DONE state. If the DONE state is not reached, the
control is returned back to the client and the client will issue
GetOperationStatus() repeatedly to check if the execution has reached
the DONE state. When Impala server detects the FINISHED execution state
or there is error in servicing GetOperationStatus(), the new thread is
joined and released. Thus for a long DDL query, its execution part is
done in the new thread and the Impala client keeps checking its status
via GetOperationStatus() without waiting more than 350s.

In addition, a cild query, which is submitted from the Impala server
as an Impala client for compute stats stmt, runs synchronously in
the same child query thread.

The communication area between the new thread and the host thread
is per session.

Testing: TBD

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/child-query.cc
M be/src/service/client-request-state.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/thread.h
7 files changed, 316 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/13
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 13
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 12:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9568/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 12
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Oct 2021 17:33:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are going
on and the timeout happens, AWS silently drops the connection and the
Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecAsyncDdlRequest()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run no risk of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously;
  3. CREATE TABLE AS SELECT as the SELECT part already runs
     asynchronously;

Testing:
  1. Unit tests (TBD)
  2. Core tests

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
2 files changed, 73 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/17
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#24). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are going
on and the timeout happens, AWS silently drops the connection and the
Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecAsyncDdlRequest()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run very low risks of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously.

Limitations:
  This patch does not handle potential AWS NLB-type time out for LOAD
  DATA (IMPALA-10967).

Testing:
  1. New async. DDL unit tests with HS2, HS2-HTTP and Beeswax clients
  2. Core tests

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A testdata/workloads/functional-query/queries/QueryTest/async_ddl.test
M tests/hs2/test_hs2.py
M tests/metadata/test_ddl.py
5 files changed, 210 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/24
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 24
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 24:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17872/24/tests/hs2/test_hs2.py
File tests/hs2/test_hs2.py:

http://gerrit.cloudera.org:8080/#/c/17872/24/tests/hs2/test_hs2.py@361
PS24, Line 361: "
flake8: E126 continuation line over-indented for hanging indent


http://gerrit.cloudera.org:8080/#/c/17872/24/tests/hs2/test_hs2.py@367
PS24, Line 367: #
flake8: E265 block comment should start with '# '


http://gerrit.cloudera.org:8080/#/c/17872/24/tests/hs2/test_hs2.py@369
PS24, Line 369: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/17872/24/tests/hs2/test_hs2.py@370
PS24, Line 370: "
flake8: E126 continuation line over-indented for hanging indent



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 24
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Oct 2021 21:12:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 24:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9603/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 24
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Oct 2021 21:31:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 23:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/17872/23/tests/hs2/test_hs2.py
File tests/hs2/test_hs2.py:

http://gerrit.cloudera.org:8080/#/c/17872/23/tests/hs2/test_hs2.py@361
PS23, Line 361: "
flake8: E126 continuation line over-indented for hanging indent


http://gerrit.cloudera.org:8080/#/c/17872/23/tests/hs2/test_hs2.py@367
PS23, Line 367: #
flake8: E265 block comment should start with '# '


http://gerrit.cloudera.org:8080/#/c/17872/23/tests/hs2/test_hs2.py@369
PS23, Line 369: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/17872/23/tests/hs2/test_hs2.py@370
PS23, Line 370: "
flake8: E126 continuation line over-indented for hanging indent


http://gerrit.cloudera.org:8080/#/c/17872/23/tests/hs2/test_hs2.py@381
PS23, Line 381: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/17872/23/tests/hs2/test_hs2.py@385
PS23, Line 385: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/17872/23/tests/hs2/test_hs2.py@386
PS23, Line 386: T
flake8: E129 visually indented line with same indent as next logical line


http://gerrit.cloudera.org:8080/#/c/17872/23/tests/hs2/test_hs2.py@389
PS23, Line 389: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/17872/23/tests/hs2/test_hs2.py@390
PS23, Line 390: T
flake8: E129 visually indented line with same indent as next logical line


http://gerrit.cloudera.org:8080/#/c/17872/23/tests/hs2/test_hs2.py@393
PS23, Line 393: \
flake8: E502 the backslash is redundant between brackets


http://gerrit.cloudera.org:8080/#/c/17872/23/tests/hs2/test_hs2.py@394
PS23, Line 394: T
flake8: E129 visually indented line with same indent as next logical line



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 23
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Oct 2021 21:11:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 25:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc@713
PS21, Line 713:   // If this is a CTAS request, there will usually be more work to do
              :   // after executing the CREATE TABLE statement (the INSERT portion of the operation).
              :   // The exception is if the user specified IF NOT EXISTS and the table already
              :   // existed, in which case we do not execute the INSERT.
              :   if (catalog_op_type() == TCatalogOpType::DDL &&
              :    
> Yes, the code to enter into the PENDING state is removed.
I should have been more explicit. I want CTAS to go to PENDING and non-CTAS to go to RUNNING. I want the state transition to happen in ExecDdlRequest() in the ShouldRunExecDdlAsync()==true case prior to spawning the async thread. I want them to be right next to each other so that the distinction is very clear:

// Comment about why CTAS goes to PENDING rather than running.
if (ctas)
  UpdateNonErrorExecState(ExecState::PENDING);
else
  UpdateNonErrorExecState(ExecState::RUNNING);
RETURN_IF_ERROR(Thread::Create(...))



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 25
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Oct 2021 22:05:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 17:

(7 comments)

Address review comments.

http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@256
PS15, Line 256:       LOG_AND_RETURN_IF_ERROR(ExecDdlRequest());
              :       break;
              :     }
              :     case TStmtType::LOAD: {
              :       DCHECK(exec_request_->__isset.load_data_request);
              :       TLoadDataResp response;
              :       RETURN_IF_ERROR(
              :        
> Style point:
Done. Moved the decision to run the logic synchronously or asynchronously into the method ExecDdlRequest().


http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@366
PS15, Line 366:       // A NULL pattern means match all tables. However, Thrift string types can't
              :       // be NULL in C++, so we have to test if it's set rather than just blindly
              :       // using the value.
              :       const string* table_name =
              :      
> Most of the actions in this function could require metadata operations, but
Done


http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@655
PS15, Line 655:   if (ddl_type() == TDdlType::COMPUTE_STATS) {
              :     TComputeStatsParams& compute_stats_params =
              :         exec_request_->catalog_op_request.ddl_params.compute_stats_params;
              :     RuntimeProfile* child_profile =
              :         RuntimeProfile::Create(&profile_pool_, "Child Queries");
              :    
> See my comment in ExecLocalCatalogOp() about USE. It would be nice to avoid
Done.


http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@662
PS15, Line 662:     vector<ChildQuery> child_queries;
              :     if (compute_stats_params.__isset.tbl_stats_query) {
              :       RuntimeProfile* profile =
              :           RuntimeProfile::Create(&profile_pool_, "Table Stats Query");
              :       child_profile->AddChild(profile);
              :       child_queries.emplace_back(compute_stats_params.tbl_stats_query, this,
              :           parent_server_, profile, &profile_pool_);
              :     }
              :     if (compute_stats_params.__isset.col_stats_query) {
              :       RuntimeProfile* profile =
              :           RuntimeProfile::Create(&profile_pool_, "Column Stats Query");
              :       child_profile->AddChild(profile);
              :       child_queries.emplace_back(compute_stats_params.col_stats_query, this,
              :           parent_server_, profile, &profile_pool_);
              :     }
              : 
              :     if (child_queries.size() > 0) {
              :       RETURN_IF_ERROR(child_query_executor_->ExecAsync(move(child_queries)));
              :     } else {
              :       SetResultSet({"No partitions selected for incremental stats update."});
              :     }
              :     return Status::OK();
              :   }
              : 
              :   catalog_op_executor_.reset(
              :       new CatalogOpExecutor(ExecEnv::GetInstance(), frontend_, server_profile_));
              :   RETURN_IF_ERROR(
              :       DebugAction(exec_request_->query_options, "TIMED_WAIT_BEFORE_CATALOG_OP_EXEC"));
              :   Status status = catalog_op_executor_->Exec(exec_request_->catalog_op_request);
              :   {
> I think we can skip going async for the compute stats case. It is starting 
Done


http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@720
PS15, Line 720:     RETURN_IF_ERROR(ExecAsyncQueryOrDmlRequest(exec_request_->query_exec_request));
              :   }
              : 
              :   // Set the results to be reported to the client.
              :   SetResultSet(catalog_op_executor_->ddl_exec_response());
              :   return Status::OK();
              : }
              : 
              : Sta
> To handle CTAS, one option is to have ExecAsyncQueryOrDmlRequest() have a m
Decided to keep this particular case (CREATE_TABLE_AS_SELECT) to run (the exec DDL part) in the same thread as the caller, as the modification to ExecAsyncQueryOrDmlRequest() can be complex, and  that in IMPALA-10811, the slow DDLs are those like Rename, Alter Table Recover partition  that can take time on existing tables with many partitions. 

We can file a new JIRA if there is a need for it.


http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@738
PS15, Line 738:   {
> ABORT_IF_ERROR will crash Impala if the Status is not OK. not-OK status for
Done


http://gerrit.cloudera.org:8080/#/c/17872/15/be/src/service/client-request-state.cc@743
PS15, Line 743:   // Add newly created table to catalog cache.
> For DDLs, today we skip the PENDING state, so I think it makes sense to go 
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Oct 2021 16:16:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17872/1/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/1/be/src/service/client-request-state.cc@221
PS1, Line 221: std::string GetDebugString(const TExecRequest& exec_request, const TUniqueId& query_id = TUniqueId()) {
line too long (103 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Sep 2021 20:35:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9521/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 3
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Sep 2021 20:09:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 10:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9561/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 10
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 Oct 2021 20:00:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. After the timeout due to some long
query compilation for example, the connection is silently dropped and
the Impala client enters the hang state.

The scope of the fix applies to the following Impala clients which
issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it
(Beeswax) to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

The TCLIService protocol between the client and Impala server remains
unchanged.

In the fix, the backend method ImpalaServer::ExecuteStatement()
starts a new thread for ImpalaServer::ExecuteStatementCommon() which
can reach two states: COMPILED and DONE. The COMPILED is when the
query has been successfully compiled and the DONE is for the execution
to reach the end successfully or to encounter any errors. The main
thread, which start the new thread, waits for the COMPILED state
before advancing to another short wait period for the DONE state. If
the DONE state is not reached, the control is returned back to the
client and the client will issue GetOperationStatus() repeatedly to
check if the execution has reached the DONE state. When Impala server
detects the FINISHED execution state or there is error in servicing
GetOperationStatus(), the new thread is joined and released.

In addition, a cild query, which is submitted from the Impala server
as an Impala client for compute stats stmt, runs synchronously in
the same child query thread.

The communication area between the new thread and the host thread
is per session.

Testing: TBD

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/child-query.cc
M be/src/service/client-request-state.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/thread.h
7 files changed, 365 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/10
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 10
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are going
on and the timeout happens, AWS silently drops the connection and the
Impala client enters the hang state.

The fix maintains the TCLIService protocol between the client and Impala
server and applies to the following Impala clients which issue thrift
RPC ExecuteStatement() followed by repeated call to GetOperationStatus()
(HS2, Impyla and HUE) or a variant of it (Beeswax) to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ImpalaServer::ExecuteStatementCommon()
starts a new thread for ImpalaServer::ExecuteStatementCommonInternal()
which can reach two states: COMPILED and DONE. The COMPILED is when the
front end has successfully compiled the query and the DONE is for the
execution of the query plan to reach the end successfully or to
encounter any errors. The main thread, which start the new thread,
waits for the COMPILED state before advancing to another short wait
period for the DONE state. If the DONE state is not reached, the
control is returned back to the client and the client will issue
GetOperationStatus() repeatedly to check if the execution has reached
the DONE state. When Impala server detects the FINISHED execution state
or there is error in servicing GetOperationStatus(), the new thread is
joined and released.  The communications between the new and the host
thread are per session.

Thus for a long DDL query, its execution part is done in the new thread
and the Impala client keeps checking its status via GetOperationStatus()
without waiting more than 350s.

In addition, a cild query, which is submitted from the Impala server
as an Impala client for compute stats stmt, runs synchronously in
the same child query thread.

Testing: TBD

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/child-query.cc
M be/src/service/client-request-state.cc
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
M be/src/service/impala-server.cc
M be/src/service/impala-server.h
M be/src/util/thread.h
7 files changed, 318 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/14
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 14
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 15:

Make ClientRequestState::ExecDdlRequest() async utilizing the existing wait and async_exec thread. The patch prevents long timeout seen with DDL execution.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Oct 2021 16:09:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: [WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

[WIP] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout. The scope of the fix applies to the following Impala clients.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

These clients issue thrift RPC ExecuteStatement() followed by repeated
call to GetOperationStatus() (HS2, Impyla and HUE) or a variant of it
(Beeswax) to Impala backend.

In the fix, the backend operation for ExecuteStatement() runs
asynchronously in a new thread and its completion status is checked
periodically via the equivalent of the GetOperationStatus() from the
client. A new execution state CATALOG_OP_RUNNING is added to represent
the new execution state.

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/exec/catalog-op-executor.cc
M be/src/exec/catalog-op-executor.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-beeswax-server.cc
M be/src/service/impala-hs2-server.cc
6 files changed, 157 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 27:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9619/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 27
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 18 Oct 2021 17:05:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/21/be/src/service/client-request-state.cc@713
PS21, Line 713:   // Transition the exec state to RUNNING for any non-CTAS DDLs. For the later, the
              :   // state is set from PENDING to RUNNING inside ExecQueryOrDmlRequest().
              :   if (catalog_op_type() != TCatalogOpType::DDL
              :       || ddl_type() != TDdlType::CREATE_TABLE_AS_SELECT) {
              :     UpdateNonErrorExecState(ExecState::RUNNING);
              :   }
> The difference with the state in main thread for CTAS before overwritten by
When a statement is in the INITIALIZED state, its runtime profile is unavailable. This limitation is because we currently don't handle generating a runtime profile when the query is in planning, but this restriction doesn't apply after the planning is over. This is why it is useful to transition out of the INITIALIZED state as soon as planning is over. You can search the code for ExecState::INITIALIZED to see these locations. So, my reason for wanting to get out of INITIALIZED state is so that these DDLs have runtime profiles while they are doing the catalog ops.

PENDING corresponds to admission control, and the profile contains extra information while in this state (e.g. the queued reason and other state). CTAS could spend time in admission control after the create table completes, so we want that admission control information to be accessible. Choice 1 for CTAS is to stay in INITIALIZED during the create table, then transition to PENDING when in admission control. Choice 2 for CTAS is to go directly to PENDING before the create table, then stay in PENDING when in admission control. I prefer Choice 2, because it means the profile is available during create table.

Other DDLs don't go through admission control, so PENDING doesn't provide anything extra. RUNNING fits for those statements.

As far as which thread should set it, we should avoid race conditions when we can. If we set the state in the async thread, then Exec() could return INITIALIZED or RUNNING. If we set it prior to spawning the async thread, then there is no race condition. This is what ExecAsyncQueryOrDmlRequest() does today.

If there was a meaningful difference in the time when the state would transition, then we would put it directly in that location, but for the transition to RUNNING here, we were doing it immediately after spawn, so there is no reason not to do it in the main thread.



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 18 Oct 2021 17:37:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................

IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

This patch addresses Impala client hang due to AWS network load balancer
timeout which is fixed at 350s. When some long DDL operations are going
on and the timeout happens, AWS silently drops the connection and the
Impala client enters the hang state.

The fix maintains the current TCLIService protocol between the client
and Impala server and is applicable to the following Impala clients
which issue thrift RPC ExecuteStatement() followed by repeated call to
GetOperationStatus() (HS2, Impyla and HUE) or a variant of it (Beeswax)
to Impala backend.

  1. HS2
  2. Beeswax
  3. Impyla
  4. HUE

In the fix, the backend method ClientRequestState::ExecDdlRequest()
can start a new thread in 'async_exec_thread_' for ExecAsyncDdlRequest()
which executes most of the DDLs asynchronously. This thread is waited
for in the wait thread 'wait_thread_'. Since the wait thread also runs
asynchronously, the execution of the DDLs will not cause a wait on the
Impala client. Thus the Impala client can keep checking its execution
status via GetOperationStatus() without long waiting, say more than
350s.

As an optimization, the above asynchronous mode is not applied to the
execution of certain DDLs that run no risk of long execution.

  1. Operations that do not access catalog service;
  2. COMPUTE STATS as the stats computation queries already run
     asynchronously;
  3. CREATE TABLE AS SELECT as the SELECT part already runs
     asynchronously;

Testing:
  1. Unit tests with HS2 and Beeswax client
  2. Core tests

Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
---
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
A testdata/workloads/functional-query/queries/QueryTest/alter-table-recover.test
M tests/metadata/test_ddl.py
4 files changed, 132 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/72/17872/19
-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Amogh Margoor (Code Review)" <ge...@cloudera.org>.
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 19:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17872/19/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17872/19/be/src/service/client-request-state.cc@643
PS19, Line 643: Status ClientRequestState::ExecSyncDdlRequest() {
LOAD DATA might also reset metadata through CatalogOpExecutor::Exec which may not be going through ExecDdlRequest. Probably even that needs to be handled.


http://gerrit.cloudera.org:8080/#/c/17872/19/be/src/service/client-request-state.cc@776
PS19, Line 776:       ABORT_IF_ERROR(Thread::Create("impala-server", "async_exec_thread_",
I think earlier this was not being counted for 'EXEC_TIME_LIMIT_S', but after making it async it would be counted and can breach time limit if set low. It should be added to release notes to avoid surprise.


http://gerrit.cloudera.org:8080/#/c/17872/19/be/src/service/client-request-state.cc@779
PS19, Line 779:     UpdateNonErrorExecState(ExecState::RUNNING);
Thread spawned just above (Line 776) might finish off before the execution reaches here, in which case we might end up updating wrong state.



-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Oct 2021 10:57:22 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 32:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9634/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 32
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Oct 2021 16:00:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 36: Code-Review+2

Looks good! Thanks!


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 36
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Oct 2021 01:56:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17872 )

Change subject: IMPALA-10811 RPC to submit query getting stuck for AWS NLB forever
......................................................................


Patch Set 36: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7554/


-- 
To view, visit http://gerrit.cloudera.org:8080/17872
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib57e86926a233ef13d27a9ec8d9c36d33a88a44e
Gerrit-Change-Number: 17872
Gerrit-PatchSet: 36
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 22 Oct 2021 08:06:28 +0000
Gerrit-HasComments: No