You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Fang-Yu Rao (Jira)" <ji...@apache.org> on 2022/11/01 19:01:00 UTC

[jira] [Commented] (IMPALA-11653) Identify and time out connections that are not from a supported Impala client more eagerly

    [ https://issues.apache.org/jira/browse/IMPALA-11653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627298#comment-17627298 ] 

Fang-Yu Rao commented on IMPALA-11653:
--------------------------------------

Thanks [~wzhou]!

Wenzhe and I also had a brief discussion about the alternative approach of sending heartbeats. We could possibly reduce the value of time out after the changes to Impala shell and the Impala server are done.

However, one concern we have is that for a client (e.g., JDBC or ODBC drivers) that does not support sending heartbeats before the SASL initial message is ready, the connection to Impala server could be prematurely closed by the Impala server.

Maybe [~thundergun] and [~kdeschle] could provide some feedback on this.

On my side, I can start investigating how to revise [handleSaslStartMessage()|https://github.com/apache/impala/blob/17ec3a85c7e3733dacb08a9fcca83fff5ec75102/be/src/transport/TSaslServerTransport.cpp#L98-L106] on the server side so that the server will be aware of the new type of message (heartbeat). As the first step, we could keep the value of timeout the same after the changes to the server side is complete so that the clients that do not send heartbeats will not be affected.

> Identify and time out connections that are not from a supported Impala client more eagerly
> ------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-11653
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11653
>             Project: IMPALA
>          Issue Type: Improvement
>    Affects Versions: Impala 4.1.0
>            Reporter: Vincent Tran
>            Assignee: Fang-Yu Rao
>            Priority: Major
>         Attachments: simple_tcp_client.py
>
>
> When a tcp client opens a connection to an Impala client interface (hs2 or beeswax), the connection is accepted immediately after the 3-way handshake (SYN, SYN-ACK, ACK) and is queued forĀ *TAcceptQueueServer::SetupConnection()*.  However, if the client sends nothing else, the ImpalaServer will block in *apache::thrift::transport::TSocket::read()* until the client sends a RST/FIN or until *sasl_connect_tcp_timeout_ms* elapses (which is by default, 5 minutes).
> The connection setup thread stack trace can be observed below during this period.
> {noformat}
> (gdb) bt
> #0  0x00007f3b972ee20d in poll () from ./lib64/libc.so.6
> #1  0x0000000002dcd5bc in apache::thrift::transport::TSocket::read(unsigned char*, unsigned int) ()
> #2  0x0000000002dd1803 in unsigned int apache::thrift::transport::readAll<apache::thrift::transport::TSocket>(apache::thrift::transport::TSocket&, unsigned char*, unsigned int) ()
> #3  0x0000000001330cc9 in readAll (len=5, buf=0x7f3277ea4f8b "", this=<optimized out>) at ../../../toolchain/toolchain-packages-gcc7.5.0/thrift-0.9.3-p8/include/thrift/transport/TTransport.h:121
> #4  apache::thrift::transport::TSaslTransport::receiveSaslMessage (this=this@entry=0x278a96b0, status=status@entry=0x7f3277ea500c, length=length@entry=0x7f3277ea5008) at TSaslTransport.cpp:259
> #5  0x000000000132db14 in apache::thrift::transport::TSaslServerTransport::handleSaslStartMessage (this=0x278a96b0) at TSaslServerTransport.cpp:95
> #6  0x0000000001330e33 in apache::thrift::transport::TSaslTransport::doSaslNegotiation (this=0x278a96b0) at TSaslTransport.cpp:81
> #7  0x000000000132e723 in open (this=0x12e29750) at ../../../toolchain/toolchain-packages-gcc7.5.0/thrift-0.9.3-p8/include/thrift/transport/TBufferTransports.h:218
> #8  apache::thrift::transport::TSaslServerTransport::Factory::getTransport (this=0xf825a70, trans=...) at TSaslServerTransport.cpp:173
> #9  0x00000000010cd49d in apache::thrift::server::TAcceptQueueServer::SetupConnection (this=0x174270c0, entry=...) at TAcceptQueueServer.cpp:233
> #10 0x00000000010cef4d in operator() (tid=<optimized out>, item=..., __closure=<optimized out>) at TAcceptQueueServer.cpp:323
> #11 boost::detail::function::void_function_obj_invoker2<apache::thrift::server::TAcceptQueueServer::serve()::<lambda(int, const boost::shared_ptr<apache::thrift::server::TAcceptQueueEntry>&)>, void, int, const boost::shared_ptr<apache::thrift::server::TAcceptQueueEntry>&>::invoke(boost::detail::function::function_buffer &, int, const boost::shared_ptr<apache::thrift::server::TAcceptQueueEntry> &) (function_obj_ptr=..., a0=<optimized out>, a1=...)
>     at ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:159
> #12 0x00000000010d3e59 in operator() (a1=..., a0=1, this=0x7f3279ea9510) at ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
> #13 impala::ThreadPool<boost::shared_ptr<apache::thrift::server::TAcceptQueueEntry> >::WorkerThread (this=0x7f3279ea94c0, thread_id=1) at ../util/thread-pool.h:166
> #14 0x000000000144f8f2 in operator() (this=0x7f3277ea5b40) at ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/function/function_template.hpp:770
> #15 impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*) (name=..., category=..., functor=..., parent_thread_info=<optimized out>, thread_started=0x7f3279ea9110) at thread.cc:360
> #16 0x0000000001450d6b in operator()<void (*)(const std::__cxx11::basic_string<char>&, const std::__cxx11::basic_string<char>&, boost::function<void()>, const impala::ThreadDebugInfo*, impala::Promise<long int>*), boost::_bi::list0> (a=<synthetic pointer>,
>     f=@0x1417ccf8: 0x144f5f0 <impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*)>, this=0x1417cd00) at ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:531
> #17 operator() (this=0x1417ccf8) at ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/bind/bind.hpp:1222
> #18 boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long, (impala::PromiseMode)0>*), boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::ThreadDebugInfo*>, boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > >::run() (this=0x1417cb40)
>     at ../../../toolchain/toolchain-packages-gcc7.5.0/boost-1.61.0-p2/include/boost/thread/detail/thread.hpp:116
> #19 0x0000000001ca17c2 in thread_proxy ()
> #20 0x00007f3b9a7f2dd5 in start_thread () from ./lib64/libpthread.so.0
> #21 0x00007f3b972f8ead in clone () from ./lib64/libc.so.6
> {noformat}
> As discussed in IMPALA-7638, we need to distinguish between a client that is slow to finish sasl negotiation (due to slow Kerberos negotiation) from a client that is never going to do anything beyond the tcp handshake and time these out much sooner than the *sasl_connect_tcp_timeout_ms* duration.
> The logging pattern below captures some instances of these abnormal connections
> {noformat}
> I1007 12:16:07.636166 185038 TAcceptQueueServer.cpp:227] TAcceptQueueServer: hiveserver2-frontend started connection setup for client <Host: x.x.x.x Port: 32878>
> I1007 12:21:06.634896 185038 thrift-util.cc:96] TAcceptQueueServer: hiveserver2-frontend connection setup failed for client <Host: x.x.x.x Port: 32878>. Caught TException: No more data to read.
> ====
> I1007 12:16:23.488011 185039 TAcceptQueueServer.cpp:227] TAcceptQueueServer: hiveserver2-frontend started connection setup for client <Host: x.x.x.x Port: 33934>
> I1007 12:21:22.488610 185039 thrift-util.cc:96] TAcceptQueueServer: hiveserver2-frontend connection setup failed for client <Host: x.x.x.x Port: 33934>. Caught TException: No more data to read.
> {noformat}
> This instance of ImpalaServer was running with *accepted_cnxn_setup_thread_pool_size*=2. That means that both of these threads (185038 &185039) were tied up during this 5-minute window. Subsequent incoming client connections to the HS2 interface will be in the accept queue until one of these threads frees up.  If more of those accepted connections do not start sasl negotiation, the problem will snowball.
> Attached a simple tcp client  [^simple_tcp_client.py] that when called more than once in quick succession, will block port 21050 and cause otherwise legitimate connections from Impala-supported clients to wait in the "accept" queue for at least 5 minutes.
> {noformat}
> # python simple_tcp_client.py &
> [1] 19986
> 2022-10-12 11:28:16 INFO     Created a tcp client
> 2022-10-12 11:28:16 INFO     Connecting to: c908086-2.vpc.cloudera.com:21050
> 2022-10-12 11:28:16 INFO     Client1 connected
> 2022-10-12 11:28:16 INFO     Sleeping for 5 minutes
> # python simple_tcp_client.py &
> [2] 19989
> 2022-10-12 11:28:20 INFO     Created a tcp client
> 2022-10-12 11:28:20 INFO     Connecting to: c908086-2.vpc.cloudera.com:21050
> 2022-10-12 11:28:20 INFO     Client1 connected
> 2022-10-12 11:28:20 INFO     Sleeping for 5 minutes
> {noformat}
> {noformat}
> # impala-shell -i c908086-2.vpc.cloudera.com -d default -k --protocol=hs2
> Starting Impala Shell using Kerberos authentication
> Using service name 'impala'
> Socket error None: timed out
> ***********************************************************************************
> Welcome to the Impala shell.
> (Impala Shell v3.4.0-SNAPSHOT (a1dfdfd) built on Sun Aug 21 10:10:08 UTC 2022)
> When pretty-printing is disabled, you can use the '--output_delimiter' flag to set
> the delimiter for fields in the same row. The default is '\t'.
> ***********************************************************************************
> [Not connected] >
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org