You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Sailesh Mukil (Code Review)" <ge...@cloudera.org> on 2017/06/08 17:23:09 UTC

[Impala-ASF-CR] IMPALA-5221: Fix TSaslTransport negotiation order

Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7116

Change subject: IMPALA-5221: Fix TSaslTransport negotiation order
......................................................................

IMPALA-5221: Fix TSaslTransport negotiation order

The TSaslTransport is written as a thrift extension that is a wrapper
around the Cyrus-SASL APIs. This transport is then used by Impala's
RPC layer.

On RHEL7 systems that use newer versions of the Cyrus-SASL library,
we noticed that we sometimes crash inside the Cyrus-SASL thirdparty
while trying to lock an internal mutex. During my investigation, I
found that we needed to fix the order of negotiation that happens in
an edge case.

The steps to use the Cyrus-SASL APIs for SASL negitiation are the
following (Replace '_client_' with '_server_' for server calls):
sasl_client_new()
sasl_client_start()
sasl_client_step()
sasl_client_dispose()   < --- When we're done with the connection.

sasl_client_new() was being called in the constructor TSaslClient()
which is invoked from SaslAuthProvider::WrapClientTransport().

sasl_client_start() and sasl_client_step() were being called under
TSaslTransport::open(). If for some reason we hit an error during
SASL negotiation, the TSaslTransport::open() call would fail. When
we fail, the ThriftClientImpl::OpenWithRetry() does a retry, which
directly retries the negotiation from sasl_client_start(). This
caused the use of already freed resources from the first negotiation
failure, hence causing the crash.

To fix this, we make sure that on a negotiation failure, we dispose
of all the resources properly by calling sasl_dispose() and retry
the negotiation from the start by calling sasl_client_new() first, and
then the remaining steps. This is done by moving the sasl_client_new()
and sasl_server_new() calls out of the TSaslClient/TSaslServer
constructors and into a new call called TSasl*::setupNegotiation(),
which is called under TSaslTransport::open().

The patch is fairly large for the above mentioned change, however,
most of it is just plumbing.

Testing: Tested on systems with older SASL versions to make sure
we don't regress. Also tested on systems with newer SASL versions
where we previously saw the crash and verified that we don't see
them anymore.

Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
---
M be/src/transport/TSasl.cpp
M be/src/transport/TSasl.h
M be/src/transport/TSaslClientTransport.cpp
M be/src/transport/TSaslClientTransport.h
M be/src/transport/TSaslServerTransport.cpp
M be/src/transport/TSaslServerTransport.h
M be/src/transport/TSaslTransport.cpp
M be/src/transport/TSaslTransport.h
8 files changed, 222 insertions(+), 78 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/7116/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-5221: Avoid re-use of stale SASL contexts.

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5221: Avoid re-use of stale SASL contexts.
......................................................................


Patch Set 4: Code-Review+2

(3 comments)

Thanks for the review!

Carry +2. I'll run it through one last round of testing and merge it after that.

http://gerrit.cloudera.org:8080/#/c/7116/3/be/src/transport/TSasl.cpp
File be/src/transport/TSasl.cpp:

Line 115:       mechList(mechanisms) {
> Is it right to initialise chosenMech to a list of mechanisms? Should it sta
Good point. It shouldn't make a difference, but I've removed it anyway. I'll just run it though one last round of testing to make sure that this doesn't cause a regression.


http://gerrit.cloudera.org:8080/#/c/7116/3/be/src/transport/TSasl.h
File be/src/transport/TSasl.h:

PS3, Line 125: *
> nit: line break
Done


http://gerrit.cloudera.org:8080/#/c/7116/3/be/src/transport/TSaslTransport.h
File be/src/transport/TSaslTransport.h:

PS3, Line 177: /**
> nit: three / (or better to conform to other function comment style in this 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5221: Avoid re-use of stale SASL contexts.

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5221: Avoid re-use of stale SASL contexts.
......................................................................


Patch Set 2:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/7116/1//COMMIT_MSG
Commit Message:

PS1, Line 7: Avoid re-use of stale SASL contexts.
> specifically: don't re-use old sasl contexts, right?
Yup, done.


http://gerrit.cloudera.org:8080/#/c/7116/1/be/src/transport/TSasl.cpp
File be/src/transport/TSasl.cpp:

PS1, Line 119: }
             :   /*
             :   if (!authenticationId.empty()) {
             :     // TODO: setup security property
             :     sasl_security_properties_t secprops;
             :     // populate  secprop
> move these initializations to the member initializer list (e.g. service(ser
Whoops, thanks. Done.


PS1, Line 128: 
             : 
> int result = ...
Done


Line 129: 
> IIUC, we should DCHECK(conn == nullptr) here to confirm we're not hitting t
Done


PS1, Line 208:   const string& userRealm, unsigned flags, sasl_callback_t* callbacks)
             :     : TSasl(service, serverFQDN, callbacks),
             :       userRealm(userRealm),
             :       flags(flags),
             :       serverStarted(false) { }
             : 
             : void TSaslServer::setupS
> move to member initializer list etc.
Done


PS1, Line 218:     NULL, NULL, callbacks, flags, &conn);
             :   if (re
> int result = ...
Done


http://gerrit.cloudera.org:8080/#/c/7116/1/be/src/transport/TSasl.h
File be/src/transport/TSasl.h:

PS1, Line 61: setupSaslContext
> I think setupSaslContext is a better name.
Done


PS1, Line 66: dispose
> can't this lead to calling sasl_dispose(&nullptr) if there's a negotiation 
Yes, I completely missed that. I've added a null check in dispose().

It looks like the code would return a bad status if the pointer it receives is invalid and we ignore it anyway because it's the destructor. That probably explains why we've never seen it crash here with my patch.


PS1, Line 69:  * Call
> why is this one virtual? and how about disposeSaslContext()?
Done


Line 71:    * with servers or done with clients.  Internally the library maintains a which
> doesn't this need to be wrapped in "if (conn != nullptr)"? i.e. can't we ge
Yes, I added the check here, it could end up here with conn==nullptr with the case Henry mentioned.

It looks like the code would return a bad status if the pointer it receives is invalid and we ignore it anyway because it's the destructor. That probably explains why we've never seen it crash here with my patch.


PS1, Line 69:  * Called once per application to free resources.`
            :    * Note that there is no distinction in the sasl library between being done
            :    * with servers or done with clients.  Internally the library maintains a which
            :    
> can this be protected?
Done


Line 131:    sasl_conn_t* conn;
> who owns it? and any way to be more descriptive than "registered callbacks"
Done


http://gerrit.cloudera.org:8080/#/c/7116/1/be/src/transport/TSaslClientTransport.h
File be/src/transport/TSaslClientTransport.h:

PS1, Line 51: Set u
> Set up
Done


http://gerrit.cloudera.org:8080/#/c/7116/1/be/src/transport/TSaslServerTransport.h
File be/src/transport/TSaslServerTransport.h:

PS1, Line 44: /* Set up the Sasl server state for a connection.
> If this is a no-op, the comment shold be here.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5221: Avoid re-use of stale SASL contexts.

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5221: Avoid re-use of stale SASL contexts.
......................................................................


Patch Set 2:

> (11 comments)
 > 
 > Looks pretty good. Have you confirmed this works with LDAP as well
 > as GSSAPI?

Forgot to address the LDAP comment. I just got my hands on a cluster with LDAP enabled, I'll test the patch on it and post an update tomorrow.

-- 
To view, visit http://gerrit.cloudera.org:8080/7116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5221: Avoid re-use of stale SASL contexts.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5221: Avoid re-use of stale SASL contexts.
......................................................................


Patch Set 4:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/714/

-- 
To view, visit http://gerrit.cloudera.org:8080/7116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5221: Fix TSaslTransport negotiation order

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5221: Fix TSaslTransport negotiation order
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7116/1/be/src/transport/TSasl.h
File be/src/transport/TSasl.h:

PS1, Line 69: virtual
why is this one virtual? and how about disposeSaslContext()?


Line 71:     conn = NULL;
doesn't this need to be wrapped in "if (conn != nullptr)"? i.e. can't we get here with conn already nullptr?


Line 131:    /* List of registered callbacks */
who owns it? and any way to be more descriptive than "registered callbacks"? Maybe at least "Callbacks registered with the Sasl connection"?


-- 
To view, visit http://gerrit.cloudera.org:8080/7116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5221: Avoid re-use of stale SASL contexts.

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has uploaded a new patch set (#2).

Change subject: IMPALA-5221: Avoid re-use of stale SASL contexts.
......................................................................

IMPALA-5221: Avoid re-use of stale SASL contexts.

The TSaslTransport is written as a thrift extension that is a wrapper
around the Cyrus-SASL APIs. This transport is then used by Impala's
RPC layer.

On RHEL7 systems that use newer versions of the Cyrus-SASL library,
we noticed that we sometimes crash inside the Cyrus-SASL thirdparty
while trying to lock an internal mutex. During my investigation, I
found that we needed to fix the order of negotiation that happens in
an edge case.

The steps to use the Cyrus-SASL APIs for SASL negitiation are the
following (Replace '_client_' with '_server_' for server calls):
sasl_client_new()
sasl_client_start()
sasl_client_step()
sasl_client_dispose()   < --- When we're done with the connection.

sasl_client_new() was being called in the constructor TSaslClient()
which is invoked from SaslAuthProvider::WrapClientTransport().

sasl_client_start() and sasl_client_step() were being called under
TSaslTransport::open(). If for some reason we hit an error during
SASL negotiation, the TSaslTransport::open() call would fail. When
we fail, the ThriftClientImpl::OpenWithRetry() does a retry, which
directly retries the negotiation from sasl_client_start(). This
caused the use of already freed resources from the first negotiation
failure, hence causing the crash.

To fix this, we make sure that on a negotiation failure, we dispose
of all the resources properly by calling sasl_dispose() and retry
the negotiation from the start by calling sasl_client_new() first, and
then the remaining steps. This is done by moving the sasl_client_new()
and sasl_server_new() calls out of the TSaslClient/TSaslServer
constructors and into a new call called TSasl*::setupNegotiation(),
which is called under TSaslTransport::open().

The patch is fairly large for the above mentioned change, however,
most of it is just plumbing.

Testing: Tested on systems with older SASL versions to make sure
we don't regress. Also tested on systems with newer SASL versions
where we previously saw the crash and verified that we don't see
them anymore.

Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
---
M be/src/transport/TSasl.cpp
M be/src/transport/TSasl.h
M be/src/transport/TSaslClientTransport.cpp
M be/src/transport/TSaslClientTransport.h
M be/src/transport/TSaslServerTransport.cpp
M be/src/transport/TSaslServerTransport.h
M be/src/transport/TSaslTransport.cpp
M be/src/transport/TSaslTransport.h
8 files changed, 232 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/7116/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-5221: Fix TSaslTransport negotiation order

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5221: Fix TSaslTransport negotiation order
......................................................................


Patch Set 1:

(11 comments)

Looks pretty good. Have you confirmed this works with LDAP as well as GSSAPI?

http://gerrit.cloudera.org:8080/#/c/7116/1//COMMIT_MSG
Commit Message:

PS1, Line 7: Fix TSaslTransport negotiation order
specifically: don't re-use old sasl contexts, right?


http://gerrit.cloudera.org:8080/#/c/7116/1/be/src/transport/TSasl.cpp
File be/src/transport/TSasl.cpp:

PS1, Line 119: this->service = service;
             :   this->serverFQDN = serverFQDN;
             :   this->callbacks = callbacks;
             :   chosenMech = mechList = mechanisms;
             :   authComplete = false;
             :   clientStarted = false;
move these initializations to the member initializer list (e.g. service(service), serverFQDN(serverFQDN) ...). The constant ones can be initialized in their declarations.


Line 129:   result = sasl_client_new(service.c_str(), serverFQDN.c_str(), NULL, NULL, callbacks,
IIUC, we should DCHECK(conn == nullptr) here to confirm we're not hitting the bug.


PS1, Line 128: int result;
             :   result =
int result = ...


PS1, Line 208: this->service = service;
             :   this->serverFQDN = serverFQDN;
             :   this->callbacks = callbacks;
             :   this->flags = flags;
             :   this->userRealm = userRealm;
             :   authComplete = false;
             :   serverStarted = false;
move to member initializer list etc.


PS1, Line 218: int result;
             :   result
int result = ...


http://gerrit.cloudera.org:8080/#/c/7116/1/be/src/transport/TSasl.h
File be/src/transport/TSasl.h:

PS1, Line 61: setupNegotiation
I think setupSaslContext is a better name.


PS1, Line 66: dispose
can't this lead to calling sasl_dispose(&nullptr) if there's a negotiation error? (i.e. dispose() is called from resetNegotiation() and then from the d'tor). Or if the TSasl is destroyed without ever actually setting up the negotiation.


PS1, Line 69: virtual void dispose() {
            :     sasl_dispose(&conn);
            :     conn = NULL;
            :   }
can this be protected?


http://gerrit.cloudera.org:8080/#/c/7116/1/be/src/transport/TSaslClientTransport.h
File be/src/transport/TSaslClientTransport.h:

PS1, Line 51: Setup
Set up


http://gerrit.cloudera.org:8080/#/c/7116/1/be/src/transport/TSaslServerTransport.h
File be/src/transport/TSaslServerTransport.h:

PS1, Line 44: /* Setup the Sasl server state for a connection. */
If this is a no-op, the comment shold be here.


-- 
To view, visit http://gerrit.cloudera.org:8080/7116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5221: Avoid re-use of stale SASL contexts.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5221: Avoid re-use of stale SASL contexts.
......................................................................


Patch Set 4: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5221: Avoid re-use of stale SASL contexts.

Posted by "Henry Robinson (Code Review)" <ge...@cloudera.org>.
Henry Robinson has posted comments on this change.

Change subject: IMPALA-5221: Avoid re-use of stale SASL contexts.
......................................................................


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7116/3/be/src/transport/TSasl.cpp
File be/src/transport/TSasl.cpp:

Line 115:       chosenMech(mechanisms),
Is it right to initialise chosenMech to a list of mechanisms? Should it stay blank until evaluateChallengeOrResponse()?


-- 
To view, visit http://gerrit.cloudera.org:8080/7116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5221: Avoid re-use of stale SASL contexts.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-5221: Avoid re-use of stale SASL contexts.
......................................................................


IMPALA-5221: Avoid re-use of stale SASL contexts.

The TSaslTransport is written as a thrift extension that is a wrapper
around the Cyrus-SASL APIs. This transport is then used by Impala's
RPC layer.

On RHEL7 systems that use newer versions of the Cyrus-SASL library,
we noticed that we sometimes crash inside the Cyrus-SASL thirdparty
while trying to lock an internal mutex. During my investigation, I
found that we needed to fix the order of negotiation that happens in
an edge case.

The steps to use the Cyrus-SASL APIs for SASL negitiation are the
following (Replace '_client_' with '_server_' for server calls):
sasl_client_new()
sasl_client_start()
sasl_client_step()
sasl_dispose()   < --- When we're done with the connection.

sasl_client_new() was being called in the constructor TSaslClient()
which is invoked from SaslAuthProvider::WrapClientTransport().

sasl_client_start() and sasl_client_step() were being called under
TSaslTransport::open(). If for some reason we hit an error during
SASL negotiation, the TSaslTransport::open() call would fail. When
we fail, the ThriftClientImpl::OpenWithRetry() does a retry, which
directly retries the negotiation from sasl_client_start(). This
caused the use of already freed resources from the first negotiation
failure, hence causing the crash.

To fix this, we make sure that on a negotiation failure, we dispose
of all the resources properly by calling sasl_dispose() and retry
the negotiation from the start by calling sasl_client_new() first, and
then the remaining steps. This is done by moving the sasl_client_new()
and sasl_server_new() calls out of the TSaslClient/TSaslServer
constructors and into a new call called TSasl*::setupSaslContext(),
which is called under TSaslTransport::open().

The patch is fairly large for the above mentioned change, however,
most of it is just plumbing.

Testing: Tested on systems with older SASL versions to make sure
we don't regress. Also tested on systems with newer SASL versions
where we previously saw the crash and verified that we don't see
them anymore.
Also, tested with GSSAPI and LDAP mechanisms.

Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
Reviewed-on: http://gerrit.cloudera.org:8080/7116
Reviewed-by: Sailesh Mukil <sa...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/transport/TSasl.cpp
M be/src/transport/TSasl.h
M be/src/transport/TSaslClientTransport.cpp
M be/src/transport/TSaslClientTransport.h
M be/src/transport/TSaslServerTransport.cpp
M be/src/transport/TSaslServerTransport.h
M be/src/transport/TSaslTransport.cpp
M be/src/transport/TSaslTransport.h
8 files changed, 234 insertions(+), 82 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Sailesh Mukil: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/7116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-5221: Avoid re-use of stale SASL contexts.

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change.

Change subject: IMPALA-5221: Avoid re-use of stale SASL contexts.
......................................................................


Patch Set 3: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7116/3/be/src/transport/TSasl.h
File be/src/transport/TSasl.h:

PS3, Line 125: * C
nit: line break


http://gerrit.cloudera.org:8080/#/c/7116/3/be/src/transport/TSaslTransport.h
File be/src/transport/TSaslTransport.h:

PS3, Line 177: // 
nit: three / (or better to conform to other function comment style in this file).


-- 
To view, visit http://gerrit.cloudera.org:8080/7116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5221: Fix TSaslTransport negotiation order

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5221: Fix TSaslTransport negotiation order
......................................................................


Patch Set 1:

There's a lot more cleanup that can be done around the TSasl* layer, however, I think that can be done as a later patch so that it will be much easier for review at this point.

-- 
To view, visit http://gerrit.cloudera.org:8080/7116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5221: Avoid re-use of stale SASL contexts.

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5221: Avoid re-use of stale SASL contexts.
......................................................................


Patch Set 2:

> > (11 comments)
 > >
 > > Looks pretty good. Have you confirmed this works with LDAP as
 > well
 > > as GSSAPI?
 > 
 > Forgot to address the LDAP comment. I just got my hands on a
 > cluster with LDAP enabled, I'll test the patch on it and post an
 > update tomorrow.

Verified that there are no regressions on a cluster with LDAP enabled too.
- Deployed my patch onto an LDAP enabled cluster.
- I was able to connect with a client using LDAP credentials.
- Verified that I saw the following in the logs:

I0609 09:31:49.210407 11809 authentication.cc:249] Trying simple LDAP bind for: uid=admin,ou=users,dc=cloudera,dc=com
I0609 09:31:49.565165 11809 authentication.cc:261] LDAP bind successful
I0609 09:31:49.565322 11809 authentication.cc:459] Successfully authenticated client user "admin"

-- 
To view, visit http://gerrit.cloudera.org:8080/7116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5221: Avoid re-use of stale SASL contexts.

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Hello Henry Robinson, Dan Hecht,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7116

to look at the new patch set (#4).

Change subject: IMPALA-5221: Avoid re-use of stale SASL contexts.
......................................................................

IMPALA-5221: Avoid re-use of stale SASL contexts.

The TSaslTransport is written as a thrift extension that is a wrapper
around the Cyrus-SASL APIs. This transport is then used by Impala's
RPC layer.

On RHEL7 systems that use newer versions of the Cyrus-SASL library,
we noticed that we sometimes crash inside the Cyrus-SASL thirdparty
while trying to lock an internal mutex. During my investigation, I
found that we needed to fix the order of negotiation that happens in
an edge case.

The steps to use the Cyrus-SASL APIs for SASL negitiation are the
following (Replace '_client_' with '_server_' for server calls):
sasl_client_new()
sasl_client_start()
sasl_client_step()
sasl_dispose()   < --- When we're done with the connection.

sasl_client_new() was being called in the constructor TSaslClient()
which is invoked from SaslAuthProvider::WrapClientTransport().

sasl_client_start() and sasl_client_step() were being called under
TSaslTransport::open(). If for some reason we hit an error during
SASL negotiation, the TSaslTransport::open() call would fail. When
we fail, the ThriftClientImpl::OpenWithRetry() does a retry, which
directly retries the negotiation from sasl_client_start(). This
caused the use of already freed resources from the first negotiation
failure, hence causing the crash.

To fix this, we make sure that on a negotiation failure, we dispose
of all the resources properly by calling sasl_dispose() and retry
the negotiation from the start by calling sasl_client_new() first, and
then the remaining steps. This is done by moving the sasl_client_new()
and sasl_server_new() calls out of the TSaslClient/TSaslServer
constructors and into a new call called TSasl*::setupSaslContext(),
which is called under TSaslTransport::open().

The patch is fairly large for the above mentioned change, however,
most of it is just plumbing.

Testing: Tested on systems with older SASL versions to make sure
we don't regress. Also tested on systems with newer SASL versions
where we previously saw the crash and verified that we don't see
them anymore.
Also, tested with GSSAPI and LDAP mechanisms.

Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
---
M be/src/transport/TSasl.cpp
M be/src/transport/TSasl.h
M be/src/transport/TSaslClientTransport.cpp
M be/src/transport/TSaslClientTransport.h
M be/src/transport/TSaslServerTransport.cpp
M be/src/transport/TSaslServerTransport.h
M be/src/transport/TSaslTransport.cpp
M be/src/transport/TSaslTransport.h
8 files changed, 234 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/7116/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-5221: Avoid re-use of stale SASL contexts.

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has uploaded a new patch set (#3).

Change subject: IMPALA-5221: Avoid re-use of stale SASL contexts.
......................................................................

IMPALA-5221: Avoid re-use of stale SASL contexts.

The TSaslTransport is written as a thrift extension that is a wrapper
around the Cyrus-SASL APIs. This transport is then used by Impala's
RPC layer.

On RHEL7 systems that use newer versions of the Cyrus-SASL library,
we noticed that we sometimes crash inside the Cyrus-SASL thirdparty
while trying to lock an internal mutex. During my investigation, I
found that we needed to fix the order of negotiation that happens in
an edge case.

The steps to use the Cyrus-SASL APIs for SASL negitiation are the
following (Replace '_client_' with '_server_' for server calls):
sasl_client_new()
sasl_client_start()
sasl_client_step()
sasl_dispose()   < --- When we're done with the connection.

sasl_client_new() was being called in the constructor TSaslClient()
which is invoked from SaslAuthProvider::WrapClientTransport().

sasl_client_start() and sasl_client_step() were being called under
TSaslTransport::open(). If for some reason we hit an error during
SASL negotiation, the TSaslTransport::open() call would fail. When
we fail, the ThriftClientImpl::OpenWithRetry() does a retry, which
directly retries the negotiation from sasl_client_start(). This
caused the use of already freed resources from the first negotiation
failure, hence causing the crash.

To fix this, we make sure that on a negotiation failure, we dispose
of all the resources properly by calling sasl_dispose() and retry
the negotiation from the start by calling sasl_client_new() first, and
then the remaining steps. This is done by moving the sasl_client_new()
and sasl_server_new() calls out of the TSaslClient/TSaslServer
constructors and into a new call called TSasl*::setupSaslContext(),
which is called under TSaslTransport::open().

The patch is fairly large for the above mentioned change, however,
most of it is just plumbing.

Testing: Tested on systems with older SASL versions to make sure
we don't regress. Also tested on systems with newer SASL versions
where we previously saw the crash and verified that we don't see
them anymore.
Also, tested with GSSAPI and LDAP mechanisms.

Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
---
M be/src/transport/TSasl.cpp
M be/src/transport/TSasl.h
M be/src/transport/TSaslClientTransport.cpp
M be/src/transport/TSaslClientTransport.h
M be/src/transport/TSaslServerTransport.cpp
M be/src/transport/TSaslServerTransport.h
M be/src/transport/TSaslTransport.cpp
M be/src/transport/TSaslTransport.h
8 files changed, 232 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/7116/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7116
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7d509e14dfe46fa28d0626cf84daf6de82955d80
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>