You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "John Sherman (Code Review)" <ge...@cloudera.org> on 2017/06/02 18:03:41 UTC

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

John Sherman has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7061

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................

IMPALA-5394: Set socket timeouts while opening TSaslTransport

- TThreadPoolServer calls getTransport() on a client from the Server
  thread (the thread that does the accepts).
  - TSaslServerTransport->getTransport() calls TSaslTransport->open()
  - TSaslServerTransport->open() tries to negotiate SASL which calls
    read/write
    - If read/write blocks, the TThreadPoolServer cannot accept
      connections
- This patch set the underlying TSocket's recvTimeout and sendTimeout
  before the TSaslServerTransport->open() and reset them to 0 after
  open() completes.
- Added sasl_connect_tcp_timeout_seconds flag that defaults to 5

Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
---
M be/src/transport/TSaslServerTransport.cpp
1 file changed, 12 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/7061/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 8:

No problem, I just got back from a month long wedding/honeymoon myself.

I'll try to address your comments in the next few days.

 > (3 comments)
 > 
 > Sorry for the slow response and thanks for your patience, John.
 > Henry is away on holiday, but let's try to get this in in the
 > meantime. The patch looks good to me. I just have a couple small
 > comments. And rebasing the patch before posting the next patch set
 > would be a good idea.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 8
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Fri, 29 Sep 2017 23:48:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 2:

What are the next steps here?

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "Matthew Jacobs (Code Review)" <ge...@cloudera.org>.
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7061/2/be/src/transport/TSaslServerTransport.cpp
File be/src/transport/TSaslServerTransport.cpp:

PS2, Line 158: socket->setRecvTimeout(0);
             :     socket->setSendTimeout(0);
> Do the timeouts in client-cache.cc lines 120-121 not get applied to TSSLSoc
perhaps a safer fix is to get the values before overwriting them on l150, then set them back rather than setting to 0


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 9:

I've done local testing which caught a regression on my patch due to a change in how thread pools are initialized now. Should I run this through the new gerrit-verify-dryrun-external Jenkins job?


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 9
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Oct 2017 23:23:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has uploaded a new patch set (#7).

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................

IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

- Previously TThreadPoolServer called getTransport() on a client from
  the Server thread (the thread that did the accepts).
  - TSaslServerTransport->getTransport() called TSaslTransport->open()
  - TSaslServerTransport->open() tried to negotiate SASL which calls
    read/write
    - If read/write blocks indefinitely, the TThreadPoolServer could
      not accept connections until tcp_keepalive kicked in.
- Set the underlying TSocket's recvTimeout and sendTimeout before the
  TSaslServerTransport->open() and reset them to 0 after open()
  completes.
- Added sasl_connect_tcp_timeout_ms flag that defaults to 300000
  milliseconds (5 minutes)
- Add the ability for the TAcceptQueueServer to limit the maximum
  number of tasks at a time
- Changed the remaining Thrift servers to use TAcceptQueueServer.
  (hs2/beeswax/network-perf-benchmark)
  - The timeout is still needed in TAcceptQueueServer since
    SetupConnection follows a similar pattern that TThreadPoolServer
    does.
- Removed support for TThreadPool from ThriftServer() since it is
  no longer used anywhere. ThriftServer() now always uses
  TAcceptQueueServer.
- Deprecated enable_accept_queue_server flag and removed supporting
  code.

Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
---
M be/src/benchmarks/network-perf-benchmark.cc
M be/src/common/global-flags.cc
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/service/impala-server.cc
M be/src/transport/TSaslServerTransport.cpp
M common/thrift/metrics.json
9 files changed, 108 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/7061/7
-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 5:

Currently running my merged changes through the run-all test suite. Hit some probably unrelated test failures, going to rerun the run-all and hopefully get a clean run. Will post updated patch after.

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 1:

> Thanks for submitting this John. Was there any testing you did to
 > validate this patch? Also, how did you decide on the 5 second
 > value?

I did manual testing on my dev machine. I'm in the process of getting my specific build on a cluster with kerberos enabled to be able to test the specific timeout case. The patch was originally verified against an older version of the code base in a kerberized environment. I'm also attempting to get the impala regression testdata loaded onto my test machine and having very little luck, so I can run the full impala regression suite. Is there a way for me to remotely run this against the impala test suite?

As for the value of 5, I honestly just picked a number that seemed safe. I think perhaps it is a bit high. I think the timeout for internal connections is 3 seconds.

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 11:

Thanks, I added my comments to IMPALA-6009. I don't think the fix for IMPALA-6009 is correct or I'm missing something.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 11
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Oct 2017 23:26:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7061/4/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS4, Line 1963: exec_env->metrics(), FLAGS_fe_service_threads, ThriftServer::ThreadPool);
> Running this change through run-all-tests and changing the comment at 1954 
Yes, let's remove it for network-perf-benchmark.cc.


PS4, Line 1985: Threaded
> Does it matter that FLAG_enable_accept_queue_server can be turned to false 
I think the accept queue is stable enough at this point - it's been running in production for us for quite a long time. We can probably deprecate the flag (and hide it using DEFINE_bool_hidden).


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 13:

Thanks! Also thanks to you and Henry for taking the time to review. (technically my 2nd patch, I removed an invalid DCHECK a few months ago, but it is my first patch of substance)
I'll probably start working in my spare time on IMPALA-5393 for my next contribution. 

> > It looks like it passed gerri-verify-external - https://jenkins.impala.io/job/gerrit-verify-dryrun-external/13/
 > 
 > Great! I'll rebase and merge this.
 > Congratulations on getting your first patch in!


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 13
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Oct 2017 23:26:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 2: Code-Review+1

Thanks John! This LGTM. I would still like someone else to have a quick look before they sign off.

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 10:

Thanks for the analysis, I'll fix up the issues (and start running clang tidy for my patches).


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 10
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Oct 2017 17:16:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7061/2/be/src/transport/TSaslServerTransport.cpp
File be/src/transport/TSaslServerTransport.cpp:

PS2, Line 158: socket->setRecvTimeout(0);
             :     socket->setSendTimeout(0);
> Ah, that's what I was missing, thanks John. Still trying to make sure I und
Sorry, reposting this comment so it's not all on one line: If I'm reading the thrift code correctly, fixing the lock doesn't really help the specific issue I've seen. The C++ thrift server implementation ends up calling getTransport on the same thread that does the accept of the connection. So if the getTransport implementation does reads/writes that ends up blocking, it prevents new connections from being accepted until the read/write completes or times out. We ran into this issue when a client connecting to the hs2_port started the SASL handshake then for some reason quits communicating with the server (the packets from the client to impalad started to not make it to impalad even the RST packet that the client sends after retrying). In this case all other connection requests to the hs2_port end up being queued until I think the tcp keepalive kicks in (I think the default time is 2 hours) or impalad is restarted. (The Java implementation of TThreadPoolServer puts the accept!
 ed connection on a different thread before calling getTransport https://github.com/apache/thrift/blob/master/lib/java/src/org/apache/thrift/server/TThreadPoolServer.java) If the C++ thrift implementation put the connection on a different thread before calling getTransport, fixing the locking would be helpful.

C++ implementation: https://github.com/apache/thrift/blob/0.10.0/lib/cpp/src/thrift/server/TServerFramework.cpp lines 147-157
 
Backtrace ends up lookings something like this (from IMPALA-5268):
#6  0x0000000001b394d2 in apache::thrift::transport::TSSLSocket::read(unsigned char*, unsigned int) ()
No symbol table info available.
#7  0x0000000001b36003 in unsigned int apache::thrift::transport::readAll<apache::thrift::transport::TSocket>(apache::thrift::transport::TSocket&, unsigned char*, unsigned int) ()
No symbol table info available.
#8  0x0000000000b52e15 in apache::thrift::transport::TSaslTransport::receiveSaslMessage(apache::thrift::transport::NegotiationStatus*, unsigned int*) ()
No symbol table info available.
#9  0x0000000000b50733 in apache::thrift::transport::TSaslServerTransport::handleSaslStartMessage() ()
No symbol table info available.
#10 0x0000000000b52f9e in apache::thrift::transport::TSaslTransport::open() ()
No symbol table info available.
#11 0x0000000000b51431 in apache::thrift::transport::TSaslServerTransport::Factory::getTransport(boost::shared_ptr<apache::thrift::transport::TTransport>) ()
No symbol table info available.
#12 0x0000000001b4066d in apache::thrift::server::TThreadPoolServer::serve() ()
No symbol table info available.
#13 0x00000000009f2e60 in impala::ThriftServer::ThriftServerEventProcessor::Supervise() ()
No symbol table info available.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7061/4/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS4, Line 1963: exec_env->metrics(), FLAGS_fe_service_threads, ThriftServer::ThreadPool);
> Let's make this change for beeswax as well.
Running this change through run-all-tests and changing the comment at 1954 to be:
"ODBC and Hue drivers do not support non-blocking servers." similar to the comment at line 1976. I'm guessing at the intent of the old comment.

After this change the only thing using ThreadPool server implementation is network-perf-benchmark.cc

Not sure what the network-perf-benchmark.cc is used for, but if nothing else is using ThreadedPool implementation should I just remove support for it from ThriftServer() to prevent bit rot in the unused code paths? Or just leave it be for future use (or just for the network-perf-benchmark to use?)


PS4, Line 1985: Threaded
> Maybe we can just remove Threaded and ThreadPool and just pass in the numbe
Does it matter that FLAG_enable_accept_queue_server can be turned to false and that these thread limits will not be enforced? Or at this point is there enough confidence that the accept_queue_server is good/stable enough and won't be disabled? If that is the case, should the flag be deprecated and the special case code be removed for disabling it?


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 12:

My local build issue related to IMPALA-6009, seemed to go away. So I'll go ahead and kick off the Jenkins job again.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 12
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Oct 2017 02:40:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
......................................................................


Patch Set 3:

Ran the latest patch through be/fe/e2e testing.
be/fe ran clean.
end to end mostly ran clean. I had one failure I couldn't explain (could be related to me using Java 8/Ubuntu 16 or the limited amount of memory in my development environment):
custom_cluster/test_hdfs_fd_caching.py::TestHdfsFdCaching::test_caching_enabled
custom_cluster/test_hdfs_fd_caching.py:125: in test_caching_enabled
    self.run_fd_caching_test(vector, caching_expected, cache_capacity)
custom_cluster/test_hdfs_fd_caching.py:85: in run_fd_caching_test
    assert num_handles_after == (num_handles_start + 1)
E   assert 0 == (0 + 1)

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 8:

(3 comments)

Sorry for the slow response and thanks for your patience, John. Henry is away on holiday, but let's try to get this in in the meantime. The patch looks good to me. I just have a couple small comments. And rebasing the patch before posting the next patch set would be a good idea.

http://gerrit.cloudera.org:8080/#/c/7061/8/be/src/rpc/thrift-server-test.cc
File be/src/rpc/thrift-server-test.cc:

http://gerrit.cloudera.org:8080/#/c/7061/8/be/src/rpc/thrift-server-test.cc@403
PS8, Line 403: TEST(ConcurrencyTest, MaxConcurrentConnections) {
Could you write a brief comment about what this test is trying to achieve?


http://gerrit.cloudera.org:8080/#/c/7061/8/be/src/rpc/thrift-server-test.cc@439
PS8, Line 439:   EXPECT_TRUE(did_reach_max);
> Not sure if this will be effective. Could end up being racy? If so probably
It looks like it could be racy, but very rarely. We have no control over how the OS would schedule these threads. We can revisit this test later if it shows up to be a problem, but for now, I'm okay with it.


http://gerrit.cloudera.org:8080/#/c/7061/3/be/src/transport/TSaslServerTransport.cpp
File be/src/transport/TSaslServerTransport.cpp:

http://gerrit.cloudera.org:8080/#/c/7061/3/be/src/transport/TSaslServerTransport.cpp@164
PS3, Line 164:       new TSaslServerTransport(serverDefinitionMap_, trans));
It would be good to add the comment here acknowledging the "logical" race, i.e. we know a logical race exists but it won't ever happen because getTransport() won't be called concurrently from different threads with the same 'trans' object.



-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 8
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Fri, 29 Sep 2017 23:45:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7061/6/be/src/rpc/thrift-server.h
File be/src/rpc/thrift-server.h:

Line 190:   /// Maximum number of concurrent connections (connections will
Will fix not using full line length limit.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has uploaded a new patch set (#6).

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................

IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

- Previously TThreadPoolServer called getTransport() on a client from
  the Server thread (the thread that did the accepts).
  - TSaslServerTransport->getTransport() called TSaslTransport->open()
  - TSaslServerTransport->open() tried to negotiate SASL which calls
    read/write
    - If read/write blocks indefinitely, the TThreadPoolServer could
      not accept connections until tcp_keepalive kicked in.
- Set the underlying TSocket's recvTimeout and sendTimeout before the
  TSaslServerTransport->open() and reset them to 0 after open()
  completes.
- Added sasl_connect_tcp_timeout_ms flag that defaults to 300000
  milliseconds (5 minutes)
- Add the ability for the TAcceptQueueServer to limit the maximum
  number of tasks at a time
- Changed the remaining Thrift servers to use TAcceptQueueServer.
  (hs2/beeswax/network-perf-benchmark)
  - The timeout is still needed in TAcceptQueueServer since
    SetupConnection follows a similar pattern that TThreadPoolServer
    does.
- Removed support for TThreadPool from ThriftServer() since it is
  no longer used anywhere. ThriftServer() now always uses
  TAcceptQueueServer.
- Deprecated enable_accept_queue_server flag and removed supporting
  code.

Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
---
M be/src/benchmarks/network-perf-benchmark.cc
M be/src/common/global-flags.cc
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/service/impala-server.cc
M be/src/transport/TSaslServerTransport.cpp
M common/thrift/metrics.json
9 files changed, 108 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/7061/6
-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 7:

I wonder if you could write a test in thrift-server-test, that did something like the following:

1. Create a server with max N concurrent cnxns (for a low N - 1 or 2)
2. Start M >> N threads, each of which creates a connection, increments a shared counter, sleeps for a few ms, decrements the shared counter, and then closes the connection.
3. Have the test assert if the shared counter ever goes above N. 

It's not quite perfect (because you can't guarantee that the thread scheduler isn't serializing the connection requests, not the thrift server), but it would give some confidence. You could also record the max concurrent connections in the thrift server and check that after the test has finished.

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 5:

I don't quite know the correct etiquette, should I fix the merge conflict with recent changes or wait for review comments on my current change set?

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has uploaded a new patch set (#3).

Change subject: IMPALA-5394: Handle blocked HS2 connections
......................................................................

IMPALA-5394: Handle blocked HS2 connections

- TThreadPoolServer calls getTransport() on a client from the Server
  thread (the thread that does the accepts).
  - TSaslServerTransport->getTransport() calls TSaslTransport->open()
  - TSaslServerTransport->open() tries to negotiate SASL which calls
    read/write
    - If read/write blocks, the TThreadPoolServer cannot accept
      connections
- Set the underlying TSocket's recvTimeout and sendTimeout before the
  TSaslServerTransport->open() and reset them to 0 after open()
  completes.
- Added sasl_connect_tcp_timeout_ms flag that defaults to 300000
  milliseconds (5 minutes)
- Changed the Thrift server type for hs2 connections from ThreadPool
  to Threaded to take advantage of the AcceptQueueServer
  implementation.
- Increased the AcceptQueueServer CONNECTION_SETUP_POOL_SIZE from
  1 to 2, so new connections can continue to be accepted if for
  some reason a connection needs to use the timeout period.

Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
---
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/service/impala-server.cc
M be/src/transport/TSaslServerTransport.cpp
M common/thrift/metrics.json
4 files changed, 49 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/7061/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 2:

Cool, thanks.

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7061/4/be/src/catalog/catalogd-main.cc
File be/src/catalog/catalogd-main.cc:

Line 92:       FLAGS_catalog_service_port, NULL, metrics.get());
I removed the num_worker_threads passed in for a few services, because it was never enforced before. I did not want to suddenly start enforcing and break things that were previously working.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 2:

I'm going to test the requested changes on a local kerberos cluster tomorrow and either upload a new patch or justify why the proposed alternative approach doesn't work. Sorry for the delay, I had some unrelated work pop up.

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 5:

Sorry for the delay, John! I'll try to have comments later today. There's no particular etiquette - you're free to post a new patch if the reviewers (i.e. me!) are taking their time over comments...

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has uploaded a new patch set (#8).

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................

IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

- Previously TThreadPoolServer called getTransport() on a client from
  the Server thread (the thread that did the accepts).
  - TSaslServerTransport->getTransport() called TSaslTransport->open()
  - TSaslServerTransport->open() tried to negotiate SASL which calls
    read/write
    - If read/write blocks indefinitely, the TThreadPoolServer could
      not accept connections until tcp_keepalive kicked in.
- Set the underlying TSocket's recvTimeout and sendTimeout before the
  TSaslServerTransport->open() and reset them to 0 after open()
  completes.
- Added sasl_connect_tcp_timeout_ms flag that defaults to 300000
  milliseconds (5 minutes)
- Add the ability for TAcceptQueueServer to limit the maximum
  number of concurrent tasks
- Added a test case to thrift-server-test to test
  max_concurrent_connections enforcement
- Changed the remaining Thrift servers to use TAcceptQueueServer.
  (hs2/beeswax/network-perf-benchmark)
  - The timeout is still needed in TAcceptQueueServer since
    SetupConnection follows a similar pattern that TThreadPoolServer
    does.
- Removed support for TThreadPool from ThriftServer() since it is
  no longer used anywhere. ThriftServer() now always uses
  TAcceptQueueServer.
- Deprecated enable_accept_queue_server flag and removed supporting
  code.

Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
---
M be/src/benchmarks/network-perf-benchmark.cc
M be/src/common/global-flags.cc
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/service/impala-server.cc
M be/src/transport/TSaslServerTransport.cpp
M common/thrift/metrics.json
10 files changed, 148 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/7061/8
-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7061/4/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS4, Line 1963: exec_env->metrics(), FLAGS_fe_service_threads, ThriftServer::ThreadPool);
> Sorry, should that be "let's remove it from network-perf-benchmark"
I mean let's move network-perf-benchmark to a threaded server as well, with 0 as the number of max tasks. Yes, let's remove the server type from ThriftServer.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7061/3/be/src/rpc/TAcceptQueueServer.cpp
File be/src/rpc/TAcceptQueueServer.cpp:

Line 210
The prior comment mentioned potential thread safety issues, but I didn't see any thread safety issues in the current usage. If someone has experience in this area, it's probably worth it to make sure I didn't miss something.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 1:

Thanks for submitting this John. Was there any testing you did to validate this patch? Also, how did you decide on the 5 second value?

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7061/5//COMMIT_MSG
Commit Message:

Line 7: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
Not sure what a good commit title would be, this patch has evolved quite a bit.


http://gerrit.cloudera.org:8080/#/c/7061/5/be/src/benchmarks/network-perf-benchmark.cc
File be/src/benchmarks/network-perf-benchmark.cc:

Line 206:   impala::InitCommonRuntime(argc, argv, false, impala::TestInfo::BE_TEST);
I don't think network-perf-benchmark gets used much because it had bit rotted to the point of not working. This change seems to fix it by initializing various things (like InternalAuthProvider and the thread manager). I kept the ParseCommandLineFlags above so the if (argc != -1) code below properly worked. I only did basic local node testing and tested changing the default port.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
......................................................................


Patch Set 3:

I'd like some advice/feedback on the approach for enforcing fe_service_threads. In my local diff, I've added a maxTasks, numTasks and a taskLimitMonitor to the TAcceptQueueServer class.
From there it seems like I have two choices:
1)
a) inside the Task constructor, synchronize with the taskLimitMonitor, check the passed in server's numTasks against maxTasks and wait if we've reached the limit or increment the server's numTasks.
b) inside the Task destructor synchronize with the taskLimitMonitor, decrement the server's numTasks and notifyAll if I'm transistioning from maxTasks to < maxTasks.
-- I like this because the code is somewhat symmetrical, but I do not know if it is a good idea to do these blocking operations inside a constructor/destructor.

2)
a) inside the SetupConnection function before I create a new task, synchronize with the tasklimitMonitor and check/block/increment numTasks.
b) inside the Tasks run method, decrement/notify where the Task removes itself from the TAcceptQueueServer's tasks set.

My local diff uses approach 1 currently but I'm a bit unsure about the best practices around that approach.

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7061/2/be/src/transport/TSaslServerTransport.cpp
File be/src/transport/TSaslServerTransport.cpp:

PS2, Line 158: socket->setRecvTimeout(0);
             :     socket->setSendTimeout(0);
> I initially planned to do that, but TSocket doesn't provide a way to get th
Ah, that's what I was missing, thanks John. Still trying to make sure I understand the issue - maybe it would better to give up transportMap_mutex_ if the if(..) branch is taken? So you'd have something like:
  
  {
    lock_guard<mutex> l(transportMap_mutex_);
    // ....
    if (trans_map != transportMap_.end()) {
      // return transport and erase entry in map
    }
  }

  boost::shared_ptr<TTransport> wrapped(
        new TSaslServerTransport(serverDefinitionMap_, trans));
    ret_transport.reset(new TBufferedTransport(wrapped, impala::ThriftServer::BufferedTransportFactory::DEFAULT_BUFFER_SIZE_BYTES));
  ret_transport.get()->open();
    
  {
    lock_guard<mutex> l(transportMap_mutex_);
    transportMap_[trans] = ret_transport;   
  }

The most obvious problem with that is that concurrent calls to getTransport() with the same 'trans' object would race. There are ways to address that but it looks like it's assumed the callers only use 'trans' consecutively, not concurrently.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 2:

Thanks for the write-up, very helpful!

For backend services we have TAcceptQueueServer which moves accept()'ed sockets onto a thread pool for calling open(). See IMPALA-4135, and https://github.com/apache/incubator-impala/commit/a9c40595549bfb74d2b9796a0a7098361793492e. It looks like from your stack trace 

You might look into that as a drop-in replacement (I think you need to change the HS2 server to Threaded, from ThreadPool) to see if that shows any improvement. You might need to increase the size of the thread pool in TAcceptQueueServer::serve to >1, and implement the locking fix, to see an improvement.

If that doesn't seem to work, then setting the timeouts here is ok - just want to make sure we've considered everything first.

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
......................................................................


Patch Set 3:

It looks like it'll be next week (or this weekend) before I'm able to make some time to write/test the code to honor the fe_service_threads flag.

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
......................................................................


Patch Set 3:

Giving this a bump, since my updated patch might have slipped under the radar during the holiday.

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "Michael Brown (Code Review)" <ge...@cloudera.org>.
Michael Brown has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 10:

(1 comment)

Thanks for giving gerrit-verify-dryrun-external a try! It caught something (see inline comment).

Looks like this patch now can't merge due to conflicts in impala-server.cc, likely due to the very recent https://gerrit.cloudera.org/#/c/8076/.

http://gerrit.cloudera.org:8080/#/c/7061/10/be/src/rpc/TAcceptQueueServer.h
File be/src/rpc/TAcceptQueueServer.h:

http://gerrit.cloudera.org:8080/#/c/7061/10/be/src/rpc/TAcceptQueueServer.h@135
PS10, Line 135: int32_t maxTasks = 0
Here and elsewhere, clang-tidy and clang compilations are unhappy with the following:

  be/src/rpc/TAcceptQueueServer.h:135:13: error: default arguments cannot be added to a function template that has already been declared
      int32_t maxTasks = 0,
              ^          ~
  be/src/rpc/TAcceptQueueServer.h:58:3: note: previous template declaration is here
    TAcceptQueueServer(const boost::shared_ptr<ProcessorFactory>& processorFactory,

https://jenkins.impala.io/job/clang-tidy/1601/
https://jenkins.impala.io/job/all-build-options/1600/



-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 10
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Oct 2017 14:39:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 8:

(1 comment)

Thanks for the feedback Henry. I've attempted to create an effective test.

http://gerrit.cloudera.org:8080/#/c/7061/8/be/src/rpc/thrift-server-test.cc
File be/src/rpc/thrift-server-test.cc:

Line 439:   EXPECT_TRUE(did_reach_max);
Not sure if this will be effective. Could end up being racy? If so probably extremely rare?


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 14: Verified+1

> Thanks! Also thanks to you and Henry for taking the time to review.
 > (technically my 2nd patch, I removed an invalid DCHECK a few months
 > ago, but it is my first patch of substance)
 > I'll probably start working in my spare time on IMPALA-5393 for my
 > next contribution.
 > 
 > > > It looks like it passed gerri-verify-external -
 > https://jenkins.impala.io/job/gerrit-verify-dryrun-external/13/
 > >
 > > Great! I'll rebase and merge this.
 > > Congratulations on getting your first patch in!

Sure thing! Please feel free to reach out at any time. Your commit can be viewed here:
https://github.com/apache/incubator-impala/commit/4dd0f1b3d84f67eb40bf671160b057be9bbdb921


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 14
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2017 05:41:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
Hello Henry Robinson, Matthew Jacobs, Sailesh Mukil, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/7061

to look at the new patch set (#9).

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................

IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

- Previously TThreadPoolServer called getTransport() on a client from
  the Server thread (the thread that did the accepts).
  - TSaslServerTransport->getTransport() called TSaslTransport->open()
  - TSaslServerTransport->open() tried to negotiate SASL which calls
    read/write
    - If read/write blocks indefinitely, the TThreadPoolServer could
      not accept connections until tcp_keepalive kicked in.
- Set the underlying TSocket's recvTimeout and sendTimeout before the
  TSaslServerTransport->open() and reset them to 0 after open()
  completes.
- Added sasl_connect_tcp_timeout_ms flag that defaults to 300000
  milliseconds (5 minutes)
- Add the ability for TAcceptQueueServer to limit the maximum
  number of concurrent tasks
- Added a test case to thrift-server-test to test
  max_concurrent_connections enforcement
- Changed the remaining Thrift servers to use TAcceptQueueServer.
  (hs2/beeswax/network-perf-benchmark)
  - The timeout is still needed in TAcceptQueueServer since
    SetupConnection follows a similar pattern that TThreadPoolServer
    does.
- Removed support for TThreadPool from ThriftServer() since it is
  no longer used anywhere. ThriftServer() now always uses
  TAcceptQueueServer.
- Deprecated enable_accept_queue_server flag and removed supporting
  code.

Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
---
M be/src/benchmarks/network-perf-benchmark.cc
M be/src/common/global-flags.cc
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/service/impala-server.cc
M be/src/transport/TSaslServerTransport.cpp
M common/thrift/metrics.json
10 files changed, 159 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/7061/9
-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 9
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 10:

(1 comment)

Thanks Mike. @John: I think a rebase and a fix of the default args should suffice before kicking off another run.

http://gerrit.cloudera.org:8080/#/c/7061/10/be/src/rpc/TAcceptQueueServer.h
File be/src/rpc/TAcceptQueueServer.h:

http://gerrit.cloudera.org:8080/#/c/7061/10/be/src/rpc/TAcceptQueueServer.h@135
PS10, Line 135: int32_t maxTasks = 0
> Here and elsewhere, clang-tidy and clang compilations are unhappy with the 
Yes, you would need to move the default arguments to the declarations instead of having them in the definitions here.



-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 10
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Oct 2017 17:11:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has uploaded a new patch set (#2).

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................

IMPALA-5394: Set socket timeouts while opening TSaslTransport

- TThreadPoolServer calls getTransport() on a client from the Server
  thread (the thread that does the accepts).
  - TSaslServerTransport->getTransport() calls TSaslTransport->open()
  - TSaslServerTransport->open() tries to negotiate SASL which calls
    read/write
    - If read/write blocks, the TThreadPoolServer cannot accept
      connections
- This patch set the underlying TSocket's recvTimeout and sendTimeout
  before the TSaslServerTransport->open() and reset them to 0 after
  open() completes.
- Added sasl_connect_tcp_timeout_ms flag that defaults to 300000
  milliseconds (5 minutes)

Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
---
M be/src/transport/TSaslServerTransport.cpp
1 file changed, 12 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/7061/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7061/2/be/src/transport/TSaslServerTransport.cpp
File be/src/transport/TSaslServerTransport.cpp:

PS2, Line 158: socket->setRecvTimeout(0);
             :     socket->setSendTimeout(0);
> perhaps a safer fix is to get the values before overwriting them on l150, t
Right, but there's no easy way to do that here since TSocket doesn't have a getter for recvTimeout_ as far as I can tell.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 1:

(1 comment)

Thanks John. The internal connection timeout is actually 5 minutes:
https://github.com/apache/incubator-impala/blob/master/be/src/runtime/exec-env.cc#L118-L123

We would also want the client connection timeout to default to a pretty high number since on large clusters, we've seen Kerberos negotiations take up to a few minutes.
I would prefer keeping the timeout to 5 minutes. It's not ideal, however, we would rather not see queries fail because of timed out negotiations vs. optimize for an even worse case of clients hung for 5 minutes (which is configurable by a flag if the user chooses to do so). This is the same reason we keep the internal connection timeout so high, since we'd rather see progress than a failed query due to one timed out connection.

http://gerrit.cloudera.org:8080/#/c/7061/1/be/src/transport/TSaslServerTransport.cpp
File be/src/transport/TSaslServerTransport.cpp:

PS1, Line 40: sasl_connect_tcp_timeout_seconds
We usually keep our timeouts in milliseconds. it would be good to follow this example:
https://github.com/apache/incubator-impala/blob/master/be/src/runtime/exec-env.cc#L118-L123


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 6:

(1 comment)

I haven't created the automated test yet, I will try to create one in the next few days. I did manually test the behavior, but it's always good to have regression tests.

My run-all-tests.sh mostly passed, I did have issues with kudu. I think those errors are probably related to the lack of memory on my test machine.

http://gerrit.cloudera.org:8080/#/c/7061/6/be/src/benchmarks/network-perf-benchmark.cc
File be/src/benchmarks/network-perf-benchmark.cc:

Line 227:                      .max_concurrent_connections(100)
I went with max_concurrent_connections vs max_concurrent_cnxns, but my choice might be too verbose. I can change it to the suggested max_concurrent_cnxns if you think that is preferable.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
......................................................................


Patch Set 3:

(1 comment)

Patch looks pretty good. It would be great to still respect fe_service_threads, if possible, as I know of some users who set that flag to control the concurrency on a particular Impala instance. It shouldn't be too hard to have TAcceptQueueServer::Task limit itself to have only fe_service_threads active at one time with a condition variable and a shared atomic integer. Do you think that's something you could add to this patch?

http://gerrit.cloudera.org:8080/#/c/7061/3/be/src/rpc/TAcceptQueueServer.cpp
File be/src/rpc/TAcceptQueueServer.cpp:

Line 210
> The prior comment mentioned potential thread safety issues, but I didn't se
IIRC, the issue was that we didn't know for sure whether the various transport and protocol factories (particularly with SSL enabled) were thread-safe. Might be worth keeping this at 1 until we know for sure one way or the other.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 10:

Ok kicked off the job and used your comment suggestion.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 10
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Oct 2017 00:20:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 12:

It looks like it passed gerri-verify-external - https://jenkins.impala.io/job/gerrit-verify-dryrun-external/13/


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 12
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Oct 2017 18:13:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 7:

A couple of test questions:
Would the custom cluster test framework be the appropriate framework to use for creating a test for fe_service_threads enforcement? Should I create a new test or is there an appropriate existing test that I could add a new case to (I didn't see an obvious candidate looking through the current custom cluster tests)?

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7061/4/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS4, Line 1963: exec_env->metrics(), FLAGS_fe_service_threads, ThriftServer::ThreadPool);
> Yes, let's remove it for network-perf-benchmark.cc.
Sorry, should that be "let's remove it from network-perf-benchmark"
or should it be "let's leave it for network-perf-benchmark"?

If I remove it from network-perf-benchmark, should I remove support of the ThreadPool type from ThriftServer.cc?


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 5:

(9 comments)

This looks good! Just needs to merge with the recent ThriftServer changes, and consider adding a test where you set the number of concurrent clients and confirm that only that many can connect at once.

http://gerrit.cloudera.org:8080/#/c/7061/5/be/src/benchmarks/network-perf-benchmark.cc
File be/src/benchmarks/network-perf-benchmark.cc:

Line 206:   impala::InitCommonRuntime(argc, argv, false, impala::TestInfo::BE_TEST);
> I don't think network-perf-benchmark gets used much because it had bit rott
This is fine - it's sufficient to keep it compiling for now.


http://gerrit.cloudera.org:8080/#/c/7061/5/be/src/catalog/catalogd-main.cc
File be/src/catalog/catalogd-main.cc:

Line 91:   ThriftServer* server = new ThriftServer("CatalogService", processor,
As you'll have seen, IMPALA-5696 switched over to a new ThriftServerBuilder idiom for constructing ThriftServer objects. I think what you'll need to do is remove the threaded() and thread_pool() methods, and just add a max_concurrent_cnxns(int) method.


http://gerrit.cloudera.org:8080/#/c/7061/5/be/src/rpc/TAcceptQueueServer.h
File be/src/rpc/TAcceptQueueServer.h:

Line 61:       int32_t maxTasks,
could you add a TODO to this class to suggest we determine which of these c'tors are actually needed?


PS5, Line 114: notifies
notified?


Line 117:   // The maximum number of running tasks allowed at a time.
nit: put a new line before this one.


http://gerrit.cloudera.org:8080/#/c/7061/5/be/src/rpc/thrift-server.cc
File be/src/rpc/thrift-server.cc:

PS5, Line 409: stringstream key_prefix_ss;
             :     key_prefix_ss << "impala.thrift-server." << name_;
while you're here.... use Substitute("impala.thrift-server.$0", name_) ?


http://gerrit.cloudera.org:8080/#/c/7061/5/be/src/rpc/thrift-server.h
File be/src/rpc/thrift-server.h:

PS5, Line 176: conncurrent
typo


PS5, Line 177: until a thread becomes available
nit: this makes it sound like the threads are pooled. Suggest 'connections will block until fewer than num_worker_threads_ threads are concurrently active' or something.


http://gerrit.cloudera.org:8080/#/c/7061/5/be/src/transport/TSaslServerTransport.cpp
File be/src/transport/TSaslServerTransport.cpp:

PS5, Line 144: TThreadedServer and
             :   // TThreadPoolServer
remove these two?


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 12: Code-Review+2

> It looks like it passed gerri-verify-external - https://jenkins.impala.io/job/gerrit-verify-dryrun-external/13/

Great! I'll rebase and merge this.
Congratulations on getting your first patch in!


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 12
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Oct 2017 22:23:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 13: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 13
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Oct 2017 02:26:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 2: -Code-Review

> Is the the -kerb option deprecated in the buildall.sh script? If
 > not is there a guide on how to use it? I played around with it
 > trying to get it to work since it would save me a bit of time in
 > testing (being able to test the kerberos stuff in my local dev
 > environment vs deploying to a full kerberos enabled cluster), I ran
 > into a variety of issues that lead me to think that it isn't
 > actively maintained/used.

Unfortunately, we don't yet support kerberos for our minicluster. That script is deprecated. It will be coming up at some point in the near future, but for now, if you want to test with kerberos, the easiest way would be to deploy to a cluster configured with kerberos.

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 13: Code-Review+2

Rebase, Carry +2.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 13
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Oct 2017 22:23:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 9:

(1 comment)

Yes, a run of the gerrit-verify-dryrun-external is a good idea. If that passes, we can +2 this patch and merge it.

I just have one nit review about a code comment otherwise.

http://gerrit.cloudera.org:8080/#/c/7061/9/be/src/transport/TSaslServerTransport.cpp
File be/src/transport/TSaslServerTransport.cpp:

http://gerrit.cloudera.org:8080/#/c/7061/9/be/src/transport/TSaslServerTransport.cpp@162
PS9, Line 162:     // It would be possible for the transport to be created multiple times if this method
             :     // was called concurrently on the same 'trans' object. The current usage of this
             :     // method is that it is always called serially.
Instead of saying it's always called serially, I think this comment is clearer:

"This method should never be called concurrently with the same 'trans' object. Therefore, it is safe to drop the transportMap_mutex_ here."



-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 9
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Oct 2017 00:09:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
......................................................................


Patch Set 3:

Hi John - Thanks for taking the time to make this change! I'll probably do the first review round here, but just got back from vacation, so bear with me while I catch up!

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 2:

I was able to get my test environment properly setup for running the run-all.sh tests over the weekend. I originally ran against patch set 1.
The be tests all passed.
Since I am running Ubuntu 16 with Java 8, I had to cherry-pick the Java 8 FE patch set that is up for review for the fe tests to pass.
I had 2 ee test failures:
query_test/test_hdfs_caching.py::TestHdfsCaching::test_table_is_cached which seems to be documented at IMPALA-5368
query_test/test_insert.py::TestInsertQueries::test_insert_large_string which failed due to a java heap out of memory error, which seems to be related to the amount of memory I have available on my test machine.

 I've only ran fe and be against patch set 2, which passed.

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
......................................................................


Patch Set 3:

My instinct would be just to use the size of tasks_ to control the coordination.

In SetupConnection(), line 168 or so:

  { 
    Synchronized s(tasksMonitor_);
    if (tasks_.size() > max) tasksMonitor_.wait();
    tasks_.insert(task);
  }

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

Re: [Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by Alexander Behm <al...@cloudera.com>.
Great work, John! Thanks for your continued effort to get this important
change merged. Much appreciated.

On Wed, Oct 4, 2017 at 7:26 PM, Impala Public Jenkins (Code Review) <
gerrit@cloudera.org> wrote:

> Impala Public Jenkins *merged* this change.
>
> View Change <http://gerrit.cloudera.org:8080/7061>
> Approvals: Sailesh Mukil: Looks good to me, approved Impala Public
> Jenkins: Verified
>
> IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
>
> - Previously TThreadPoolServer called getTransport() on a client from
>   the Server thread (the thread that did the accepts).
>   - TSaslServerTransport->getTransport() called TSaslTransport->open()
>   - TSaslServerTransport->open() tried to negotiate SASL which calls
>     read/write
>     - If read/write blocks indefinitely, the TThreadPoolServer could
>       not accept connections until tcp_keepalive kicked in.
> - Set the underlying TSocket's recvTimeout and sendTimeout before the
>   TSaslServerTransport->open() and reset them to 0 after open()
>   completes.
> - Added sasl_connect_tcp_timeout_ms flag that defaults to 300000
>   milliseconds (5 minutes)
> - Add the ability for TAcceptQueueServer to limit the maximum
>   number of concurrent tasks
> - Added a test case to thrift-server-test to test
>   max_concurrent_connections enforcement
> - Changed the remaining Thrift servers to use TAcceptQueueServer.
>   (hs2/beeswax/network-perf-benchmark)
>   - The timeout is still needed in TAcceptQueueServer since
>     SetupConnection follows a similar pattern that TThreadPoolServer
>     does.
> - Removed support for TThreadPool from ThriftServer() since it is
>   no longer used anywhere. ThriftServer() now always uses
>   TAcceptQueueServer.
> - Deprecated enable_accept_queue_server flag and removed supporting
>   code.
>
> Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
> Reviewed-on: http://gerrit.cloudera.org:8080/7061
> Reviewed-by: Sailesh Mukil <sa...@cloudera.com>
> Tested-by: Impala Public Jenkins
> ---
> M be/src/benchmarks/network-perf-benchmark.cc
> M be/src/common/global-flags.cc
> M be/src/rpc/TAcceptQueueServer.cpp
> M be/src/rpc/TAcceptQueueServer.h
> M be/src/rpc/thrift-server-test.cc
> M be/src/rpc/thrift-server.cc
> M be/src/rpc/thrift-server.h
> M be/src/service/impala-server.cc
> M be/src/transport/TSaslServerTransport.cpp
> M common/thrift/metrics.json
> 10 files changed, 157 insertions(+), 110 deletions(-)
>
> To view, visit change 7061 <http://gerrit.cloudera.org:8080/7061>. To
> unsubscribe, visit settings <http://gerrit.cloudera.org:8080/settings>.
> Gerrit-Project: Impala-ASF
> Gerrit-Branch: master
> Gerrit-MessageType: merged
> Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
> Gerrit-Change-Number: 7061
> Gerrit-PatchSet: 14
> Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
> Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
> Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
> Gerrit-Reviewer: Impala Public Jenkins
> Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
> Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
> Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
> Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
> Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
>
> --
> You received this message because you are subscribed to the Google Groups
> "impala-cr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to impala-cr+unsubscribe@cloudera.com.
> For more options, visit https://groups.google.com/a/cloudera.com/d/optout.
>

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................

IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

- Previously TThreadPoolServer called getTransport() on a client from
  the Server thread (the thread that did the accepts).
  - TSaslServerTransport->getTransport() called TSaslTransport->open()
  - TSaslServerTransport->open() tried to negotiate SASL which calls
    read/write
    - If read/write blocks indefinitely, the TThreadPoolServer could
      not accept connections until tcp_keepalive kicked in.
- Set the underlying TSocket's recvTimeout and sendTimeout before the
  TSaslServerTransport->open() and reset them to 0 after open()
  completes.
- Added sasl_connect_tcp_timeout_ms flag that defaults to 300000
  milliseconds (5 minutes)
- Add the ability for TAcceptQueueServer to limit the maximum
  number of concurrent tasks
- Added a test case to thrift-server-test to test
  max_concurrent_connections enforcement
- Changed the remaining Thrift servers to use TAcceptQueueServer.
  (hs2/beeswax/network-perf-benchmark)
  - The timeout is still needed in TAcceptQueueServer since
    SetupConnection follows a similar pattern that TThreadPoolServer
    does.
- Removed support for TThreadPool from ThriftServer() since it is
  no longer used anywhere. ThriftServer() now always uses
  TAcceptQueueServer.
- Deprecated enable_accept_queue_server flag and removed supporting
  code.

Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Reviewed-on: http://gerrit.cloudera.org:8080/7061
Reviewed-by: Sailesh Mukil <sa...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/benchmarks/network-perf-benchmark.cc
M be/src/common/global-flags.cc
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/service/impala-server.cc
M be/src/transport/TSaslServerTransport.cpp
M common/thrift/metrics.json
10 files changed, 157 insertions(+), 110 deletions(-)

Approvals:
  Sailesh Mukil: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 14
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>

[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
......................................................................


Patch Set 3:

Yeah, I'll give it a go.

> Do you think that's something you could add to this patch?

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
......................................................................


Patch Set 3:

Thanks, yeah I overthought it. I'll run this through run-all-tests and post the patch.

 > My instinct would be just to use the size of tasks_ to control the
 > coordination.
 > 
 > In SetupConnection(), line 168 or so:
 > 
 > {
 > Synchronized s(tasksMonitor_);
 > if (tasks_.size() > max) tasksMonitor_.wait();
 > tasks_.insert(task);
 > }

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 2:

Is the the -kerb option deprecated in the buildall.sh script? If not is there a guide on how to use it? I played around with it trying to get it to work since it would save me a bit of time in testing (being able to test the kerberos stuff in my local dev environment vs deploying to a full kerberos enabled cluster), I ran into a variety of issues that lead me to think that it isn't actively maintained/used.

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 5:

> Hi John - anything I can do to help with this one? Let me know if
 > you need a hand with the rebase.

Sorry for the delay. I'm juggling a couple of projects. Currently in the process of rebasing. Thus far,  It seems relatively straightforward.

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 2:

> (1 comment)

https://github.com/apache/thrift/blob/0.10.0/lib/cpp/src/thrift/server/TServerFramework.cpp lines 147-157
 If I'm reading the thrift code correctly, fixing the lock doesn't really help the specific issue I've seen. The C++ thrift server implementation ends up calling getTransport on the same thread that does the accept of the connection. So if the getTransport implementation does reads/writes that ends up blocking, it prevents new connections from being accepted until the read/write completes or times out. We ran into this issue when a client connecting to the hs2_port started the SASL handshake then for some reason quits communicating with the server (the packets from the client to impalad started to not make it to impalad even the RST packet that the client sends after retrying). In this case all other connection requests to the hs2_port end up being queued until I think the tcp keepalive kicks in (I think the default time is 2 hours) or impalad is restarted. (The Java implementation of TThreadPoolServer puts the accepted connection on a different thread before calling getTran!
 sport https://github.com/apache/thrift/blob/master/lib/java/src/org/apache/thrift/server/TThreadPoolServer.java) If the C++ thrift implementation put the connection on a different thread before calling getTransport, fixing the locking would be helpful.
 
Backtrace ends up lookings something like this (from IMPALA-5268):
#6  0x0000000001b394d2 in apache::thrift::transport::TSSLSocket::read(unsigned char*, unsigned int) ()
No symbol table info available.
#7  0x0000000001b36003 in unsigned int apache::thrift::transport::readAll<apache::thrift::transport::TSocket>(apache::thrift::transport::TSocket&, unsigned char*, unsigned int) ()
No symbol table info available.
#8  0x0000000000b52e15 in apache::thrift::transport::TSaslTransport::receiveSaslMessage(apache::thrift::transport::NegotiationStatus*, unsigned int*) ()
No symbol table info available.
#9  0x0000000000b50733 in apache::thrift::transport::TSaslServerTransport::handleSaslStartMessage() ()
No symbol table info available.
#10 0x0000000000b52f9e in apache::thrift::transport::TSaslTransport::open() ()
No symbol table info available.
#11 0x0000000000b51431 in apache::thrift::transport::TSaslServerTransport::Factory::getTransport(boost::shared_ptr<apache::thrift::transport::TTransport>) ()
No symbol table info available.
#12 0x0000000001b4066d in apache::thrift::server::TThreadPoolServer::serve() ()
No symbol table info available.
#13 0x00000000009f2e60 in impala::ThriftServer::ThriftServerEventProcessor::Supervise() ()
No symbol table info available.

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 8:

> No problem, I just got back from a month long wedding/honeymoon
 > myself.
 > 
 > I'll try to address your comments in the next few days.
 > 
 > > (3 comments)
 > >
 > > Sorry for the slow response and thanks for your patience, John.
 > > Henry is away on holiday, but let's try to get this in in the
 > > meantime. The patch looks good to me. I just have a couple small
 > > comments. And rebasing the patch before posting the next patch
 > set
 > > would be a good idea.

Sounds good. And congratulations!


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 8
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Fri, 29 Sep 2017 23:52:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5394: Handle blocked HS2 connections
......................................................................


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/7061/4/be/src/rpc/TAcceptQueueServer.cpp
File be/src/rpc/TAcceptQueueServer.cpp:

PS4, Line 167: maxTasks_
nit: be explicit with maxTasks_ > 0


PS4, Line 211: // Use multiple threads to ensure that other connections can proceed if SetupConnection
             :   // blocks for a prior connection
comment should probably be reverted as well


http://gerrit.cloudera.org:8080/#/c/7061/4/be/src/rpc/thrift-server.h
File be/src/rpc/thrift-server.h:

PS4, Line 104: num_worker_threads: the number of worker threads to use in any thread pool
update comment to say what 0 means


http://gerrit.cloudera.org:8080/#/c/7061/4/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

PS4, Line 1963: exec_env->metrics(), FLAGS_fe_service_threads, ThriftServer::ThreadPool);
Let's make this change for beeswax as well.


PS4, Line 1985: Threaded
Maybe we can just remove Threaded and ThreadPool and just pass in the number of worker threads?


http://gerrit.cloudera.org:8080/#/c/7061/4/be/src/transport/TSaslServerTransport.cpp
File be/src/transport/TSaslServerTransport.cpp:

PS4, Line 158: VLOG_EVERY_N(2, 100) << "getTransport(): transportMap_ size is: "
             :                          << transportMap_.size();
seems chatty, any reason to keep it?


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1301/


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 13
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Oct 2017 22:23:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
Hello Henry Robinson, Matthew Jacobs, Sailesh Mukil, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/7061

to look at the new patch set (#10).

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................

IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

- Previously TThreadPoolServer called getTransport() on a client from
  the Server thread (the thread that did the accepts).
  - TSaslServerTransport->getTransport() called TSaslTransport->open()
  - TSaslServerTransport->open() tried to negotiate SASL which calls
    read/write
    - If read/write blocks indefinitely, the TThreadPoolServer could
      not accept connections until tcp_keepalive kicked in.
- Set the underlying TSocket's recvTimeout and sendTimeout before the
  TSaslServerTransport->open() and reset them to 0 after open()
  completes.
- Added sasl_connect_tcp_timeout_ms flag that defaults to 300000
  milliseconds (5 minutes)
- Add the ability for TAcceptQueueServer to limit the maximum
  number of concurrent tasks
- Added a test case to thrift-server-test to test
  max_concurrent_connections enforcement
- Changed the remaining Thrift servers to use TAcceptQueueServer.
  (hs2/beeswax/network-perf-benchmark)
  - The timeout is still needed in TAcceptQueueServer since
    SetupConnection follows a similar pattern that TThreadPoolServer
    does.
- Removed support for TThreadPool from ThriftServer() since it is
  no longer used anywhere. ThriftServer() now always uses
  TAcceptQueueServer.
- Deprecated enable_accept_queue_server flag and removed supporting
  code.

Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
---
M be/src/benchmarks/network-perf-benchmark.cc
M be/src/common/global-flags.cc
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/service/impala-server.cc
M be/src/transport/TSaslServerTransport.cpp
M common/thrift/metrics.json
10 files changed, 158 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/7061/10
-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 10
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 11:

Ok, kicked off another run. Hopefully it has a bit more success.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 11
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Oct 2017 19:46:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "Taras Bobrovytsky (Code Review)" <ge...@cloudera.org>.
Taras Bobrovytsky has posted comments on this change. ( http://gerrit.cloudera.org:8080/7061 )

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 11:

The run failed due to IMPALA-6009. Rebase and try again.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 11
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tb...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Oct 2017 22:36:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7061/2/be/src/transport/TSaslServerTransport.cpp
File be/src/transport/TSaslServerTransport.cpp:

PS2, Line 158: socket->setRecvTimeout(0);
             :     socket->setSendTimeout(0);
> perhaps a safer fix is to get the values before overwriting them on l150, t
I initially planned to do that, but TSocket doesn't provide a way to get the previous timeout values.


PS2, Line 158: socket->setRecvTimeout(0);
             :     socket->setSendTimeout(0);
> Do the timeouts in client-cache.cc lines 120-121 not get applied to TSSLSoc
Since this is the TSaslServerTransport I do not think they get applied via the client-cache code. I think the client-cache code would set the timeout for the TSaslClientTransport. I did the initial investigation awhile ago on an older build of impala, but will re-investigate to make sure my assumption is still correct.


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has uploaded a new patch set (#5).

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................

IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

- Previously TThreadPoolServer called getTransport() on a client from
  the Server thread (the thread that did the accepts).
  - TSaslServerTransport->getTransport() called TSaslTransport->open()
  - TSaslServerTransport->open() tried to negotiate SASL which calls
    read/write
    - If read/write blocks indefinitely, the TThreadPoolServer could
      not accept connections until tcp_keepalive kicked in.
- Set the underlying TSocket's recvTimeout and sendTimeout before the
  TSaslServerTransport->open() and reset them to 0 after open()
  completes.
- Added sasl_connect_tcp_timeout_ms flag that defaults to 300000
  milliseconds (5 minutes)
- Add the ability for the TAcceptQueueServer to limit the maximum
  number of tasks at a time
- Removed the previously unenforced thread limit requests from
  StatestoreService, StatestoreSubscriber, CatalogService.
- Changed the remaining Thrift servers to use TAcceptQueueServer.
  (hs2/beeswax/network-perf-benchmark)
  - The timeout is still needed in TAcceptQueueServer since
    SetupConnection follows a similar pattern that TThreadPoolServer
    does.
- Removed support for TThreadPool from ThriftServer() since it is
  no longer used anywhere. ThriftServer() now always uses
  TAcceptQueueServer.
- Deprecated enable_accept_queue_server flag and removed supporting
  code.

Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
---
M be/src/benchmarks/network-perf-benchmark.cc
M be/src/catalog/catalogd-main.cc
M be/src/common/global-flags.cc
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/service/impala-server.cc
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestored-main.cc
M be/src/transport/TSaslServerTransport.cpp
M common/thrift/metrics.json
12 files changed, 103 insertions(+), 101 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/7061/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7061/2/be/src/transport/TSaslServerTransport.cpp
File be/src/transport/TSaslServerTransport.cpp:

PS2, Line 158: socket->setRecvTimeout(0);
             :     socket->setSendTimeout(0);
Do the timeouts in client-cache.cc lines 120-121 not get applied to TSSLSockets? If they do, how do we know that 0 is the right value to reset these timeouts to?


-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5394: Handle blocked HS2 connections

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has uploaded a new patch set (#4).

Change subject: IMPALA-5394: Handle blocked HS2 connections
......................................................................

IMPALA-5394: Handle blocked HS2 connections

- TThreadPoolServer calls getTransport() on a client from the Server
  thread (the thread that does the accepts).
  - TSaslServerTransport->getTransport() calls TSaslTransport->open()
  - TSaslServerTransport->open() tries to negotiate SASL which calls
    read/write
    - If read/write blocks, the TThreadPoolServer cannot accept
      connections
- Set the underlying TSocket's recvTimeout and sendTimeout before the
  TSaslServerTransport->open() and reset them to 0 after open()
  completes.
- Added sasl_connect_tcp_timeout_ms flag that defaults to 300000
  milliseconds (5 minutes)
- Changed the Thrift server type for hs2 connections from ThreadPool
  to Threaded to take advantage of the AcceptQueueServer
  implementation.
- Add the ability for the TAcceptQueueServer to limit the maximum
  number of tasks at a time
- Removed the previously unenforced thread limit requests from
  StatestoreService, StatestoreSubscriber, CatalogService.

Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
---
M be/src/catalog/catalogd-main.cc
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/service/impala-server.cc
M be/src/statestore/statestore-subscriber.cc
M be/src/statestore/statestored-main.cc
M be/src/transport/TSaslServerTransport.cpp
M common/thrift/metrics.json
10 files changed, 77 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/7061/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
Hello Henry Robinson, Michael Brown, Matthew Jacobs, Sailesh Mukil, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/7061

to look at the new patch set (#11).

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................

IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

- Previously TThreadPoolServer called getTransport() on a client from
  the Server thread (the thread that did the accepts).
  - TSaslServerTransport->getTransport() called TSaslTransport->open()
  - TSaslServerTransport->open() tried to negotiate SASL which calls
    read/write
    - If read/write blocks indefinitely, the TThreadPoolServer could
      not accept connections until tcp_keepalive kicked in.
- Set the underlying TSocket's recvTimeout and sendTimeout before the
  TSaslServerTransport->open() and reset them to 0 after open()
  completes.
- Added sasl_connect_tcp_timeout_ms flag that defaults to 300000
  milliseconds (5 minutes)
- Add the ability for TAcceptQueueServer to limit the maximum
  number of concurrent tasks
- Added a test case to thrift-server-test to test
  max_concurrent_connections enforcement
- Changed the remaining Thrift servers to use TAcceptQueueServer.
  (hs2/beeswax/network-perf-benchmark)
  - The timeout is still needed in TAcceptQueueServer since
    SetupConnection follows a similar pattern that TThreadPoolServer
    does.
- Removed support for TThreadPool from ThriftServer() since it is
  no longer used anywhere. ThriftServer() now always uses
  TAcceptQueueServer.
- Deprecated enable_accept_queue_server flag and removed supporting
  code.

Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
---
M be/src/benchmarks/network-perf-benchmark.cc
M be/src/common/global-flags.cc
M be/src/rpc/TAcceptQueueServer.cpp
M be/src/rpc/TAcceptQueueServer.h
M be/src/rpc/thrift-server-test.cc
M be/src/rpc/thrift-server.cc
M be/src/rpc/thrift-server.h
M be/src/service/impala-server.cc
M be/src/transport/TSaslServerTransport.cpp
M common/thrift/metrics.json
10 files changed, 157 insertions(+), 110 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/61/7061/11
-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-Change-Number: 7061
Gerrit-PatchSet: 11
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@apache.org>
Gerrit-Reviewer: Michael Brown <mi...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 2:

Ok, my manual testing went well. Took a bit of effort to get it so I could get my bits installed in an environment with kerberos enabled. (I setup a centos dev machine and used the Cloudera Sandbox VM with kerberos enabled). The approach I tested was what Henry suggested (which was to use the AcceptQueueServer implementation, increase the number of  connection setup threads, and fix the locking), I also opted to keep the timeouts. I'm now loading the testdata and running the regression tests locally, once they (hopefully) pass I'll post the updated review (likely tomorrow).

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 2:

> Thanks for the write-up, very helpful!

I'll give that a try. I think the hard part would be in selecting the thread pool size in TAcceptQueue::serve. Because lets say I set it to 5 and the flaky client tries to connect 5 times within the tcp keep alive period, then impalad is back in the situation of not being able to accept connections until the sockets start timing out. I'm tempted to keep the timeouts combined with switching to the TAcceptQueue implementation. I'll read the code a bit more to get a better understanding.

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5394: Change ThriftServer() to always use TAcceptQueueServer
......................................................................


Patch Set 5:

Hi John - anything I can do to help with this one? Let me know if you need a hand with the rebase.

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5394: Set socket timeouts while opening TSaslTransport

Posted by "John Sherman (Code Review)" <ge...@cloudera.org>.
John Sherman has posted comments on this change.

Change subject: IMPALA-5394: Set socket timeouts while opening TSaslTransport
......................................................................


Patch Set 1:

> (1 comment)
 > 
 > Thanks John. The internal connection timeout is actually 5 minutes:
 > https://github.com/apache/incubator-impala/blob/master/be/src/runtime/exec-env.cc#L118-L123
 > 
 > We would also want the client connection timeout to default to a
 > pretty high number since on large clusters, we've seen Kerberos
 > negotiations take up to a few minutes.
 > I would prefer keeping the timeout to 5 minutes. It's not ideal,
 > however, we would rather not see queries fail because of timed out
 > negotiations vs. optimize for an even worse case of clients hung
 > for 5 minutes (which is configurable by a flag if the user chooses
 > to do so). This is the same reason we keep the internal connection
 > timeout so high, since we'd rather see progress than a failed query
 > due to one timed out connection.

I will make those changes. I was basing my (incorrect) comment off of the statestore_heartbeat_tcp_timeout_seconds flag (which I modeled the flag off of). I also see that statestore_update_tcp_timeout_seconds is 300 seconds. Thanks for pointing out backend_client_rpc_timeout_ms.

 > (1 comment)
 > 
 > Thanks John. The internal connection timeout is actually 5 minutes:
 > https://github.com/apache/incubator-impala/blob/master/be/src/runtime/exec-env.cc#L118-L123
 > 
 > We would also want the client connection timeout to default to a
 > pretty high number since on large clusters, we've seen Kerberos
 > negotiations take up to a few minutes.
 > I would prefer keeping the timeout to 5 minutes. It's not ideal,
 > however, we would rather not see queries fail because of timed out
 > negotiations vs. optimize for an even worse case of clients hung
 > for 5 minutes (which is configurable by a flag if the user chooses
 > to do so). This is the same reason we keep the internal connection
 > timeout so high, since we'd rather see progress than a failed query
 > due to one timed out connection.

-- 
To view, visit http://gerrit.cloudera.org:8080/7061
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I56a5f3d9cf931cff14eae7f236fea018236a6255
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: John Sherman <jf...@arcadiadata.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No