You are viewing a plain text version of this content. The canonical link for it is here.

Posted to rampart-dev@ws.apache.org by Glen Daniels <gl...@thoughtcraft.com> on 2009/10/12 16:08:10 UTC

[axis2] Status on Axis2 1.5.1 and Rampart 1.5

Hi folks!

OK, so here are the results of my weekend investigations.  The lockup when
running the Rampart 1.5 tests with Axis2 1.5.1 was due to http connection
starvation.  I've fixed two issues and everything works now, but I'd like to
respin both Axis2 1.5.1 and Rampart 1.5 as a result.  Details below.

First, a quick summary of a major change in Axis2 1.5.1 : we were formerly
creating new MultithreadedHTTPConenctionManagers all the time in the HTTP
sender code.  In typical usage you'd never see connection pool starvation
(since each new MHCM had a new pool), but two major problems occurred.  1)
Connection reuse wasn't really possible, and 2) we would eventually (in
high-volume situations) run into the OS limits for open sockets.  So I fixed
this so that 1.5.1 now re-uses a single MHCM for each ConfigurationContext,
which allows for sharing connections across ServiceClient instances.

The bigger problem *behind* the problem above is that users of the commons
HTTPClient library (like Axis2) need to call releaseConnection() on each and
every HTTPMethod after they are finished.  The
ServiceClient.cleanupTransport() call does this, but since we never told
people to call that explicitly, no one was in the habit of doing it.  A
number of bugs about connection starvation came up, and we put in the
Options.setCallTransportCleanup() option, which automatically calls
cleanupTransport() after each call, but at a cost - since we're releasing
connection resources you need to make sure you've read everything, which
means building the whole Axiom tree.  Bye-bye, streaming.  So I also added a
different connection cleanup option which automatically cleans up the *last*
operation as you're setting up the next one.

So, to make the Rampart story very short, the problem was this: a new
ServiceClient gets created to deal with SecureConversation interactions (see
STSClient.getServiceClient()).  This SC shares the same ConfigurationContext
with the outer (i.e. user) SC, so it shares a MHCM and a connection pool.
The problem is since the STS operations happen inside a user-level operation,
the record of the "last operation" gets overwritten, and as a result my
automatic cleanup mechanism can't catch both!  So we lose one connection each
time we go through the STS process, and that causes a hard lock.

SOLUTION
--------

I did two things to fix this, both of which I think should be reflected in
the released code.  First, in Rampart, I added a call to
setCallTransportCleanup(true) in STSClient - this means that the STS
operations will be forced to build the complete Axiom tree (see above), but
solves the connection starvation issue.  Second, in Axis2, I added a default
30-second timeout while waiting for new connections - this doesn't change the
functionality at all, but it does mean that we can no longer get into
situations where the system just locks up forever.  With that change, we'll
now at least get an Exception if there's a starvation issue, which can then
be debugged.

Nandana/all, can you check what I did in Rampart and let me know if you
foresee any problems with it?  I'm going to respin Axis2 1.5.1 with this and
one other fix, and we should respin Rampart 1.5 as well.

Thoughts/comments?

Thanks,
--Glen

Re: [axis2] Status on Axis2 1.5.1 and Rampart 1.5

Posted by Ruwan Linton <ru...@gmail.com>.

+1 for re-spining the votes.

Thanks,
Ruwan

On Tue, Oct 13, 2009 at 5:24 AM, Nandana Mihindukulasooriya <
nandana.cse@gmail.com> wrote:

> Hi Glen
>
> Nandana/all, can you check what I did in Rampart and let me know if you
>> foresee any problems with it?  I'm going to respin Axis2 1.5.1 with this
>> and
>> one other fix, and we should respin Rampart 1.5 as well.
>>
>
> +1 for re-spining both Axis2 1.5.1 and Rampart 1.5. I checked the change in
> STSClient and it does't seem to have side effects on Rampart functionality.
>
> thanks,
> Nandana
>



-- 
Ruwan Linton
Technical Lead & Product Manager; WSO2 ESB; http://wso2.org/esb
WSO2 Inc.; http://wso2.org
email: ruwan@wso2.com; cell: +94 77 341 3097
blog: http://ruwansblog.blogspot.com

Re: [axis2] Status on Axis2 1.5.1 and Rampart 1.5

Posted by Nandana Mihindukulasooriya <na...@gmail.com>.

Hi Glen

Nandana/all, can you check what I did in Rampart and let me know if you
> foresee any problems with it?  I'm going to respin Axis2 1.5.1 with this
> and
> one other fix, and we should respin Rampart 1.5 as well.
>

+1 for re-spining both Axis2 1.5.1 and Rampart 1.5. I checked the change in
STSClient and it does't seem to have side effects on Rampart functionality.

thanks,
Nandana

Re: [axis2] Status on Axis2 1.5.1 and Rampart 1.5

Posted by Andreas Veithen <an...@gmail.com>.

On Mon, Oct 12, 2009 at 16:08, Glen Daniels <gl...@thoughtcraft.com> wrote:
> Hi folks!
>
> OK, so here are the results of my weekend investigations.  The lockup when
> running the Rampart 1.5 tests with Axis2 1.5.1 was due to http connection
> starvation.  I've fixed two issues and everything works now, but I'd like to
> respin both Axis2 1.5.1 and Rampart 1.5 as a result.  Details below.
>
> First, a quick summary of a major change in Axis2 1.5.1 : we were formerly
> creating new MultithreadedHTTPConenctionManagers all the time in the HTTP
> sender code.  In typical usage you'd never see connection pool starvation
> (since each new MHCM had a new pool), but two major problems occurred.  1)
> Connection reuse wasn't really possible, and 2) we would eventually (in
> high-volume situations) run into the OS limits for open sockets.  So I fixed
> this so that 1.5.1 now re-uses a single MHCM for each ConfigurationContext,
> which allows for sharing connections across ServiceClient instances.
>
> The bigger problem *behind* the problem above is that users of the commons
> HTTPClient library (like Axis2) need to call releaseConnection() on each and
> every HTTPMethod after they are finished.  The
> ServiceClient.cleanupTransport() call does this, but since we never told
> people to call that explicitly,

Well, I did :-) See [1] and [2].

Andreas

[1] http://markmail.org/message/c7wqfwzl23qrheic
[2] http://svn.apache.org/viewvc?view=rev&revision=748730

> no one was in the habit of doing it.  A
> number of bugs about connection starvation came up, and we put in the
> Options.setCallTransportCleanup() option, which automatically calls
> cleanupTransport() after each call, but at a cost - since we're releasing
> connection resources you need to make sure you've read everything, which
> means building the whole Axiom tree.  Bye-bye, streaming.  So I also added a
> different connection cleanup option which automatically cleans up the *last*
> operation as you're setting up the next one.
>
> So, to make the Rampart story very short, the problem was this: a new
> ServiceClient gets created to deal with SecureConversation interactions (see
> STSClient.getServiceClient()).  This SC shares the same ConfigurationContext
> with the outer (i.e. user) SC, so it shares a MHCM and a connection pool.
> The problem is since the STS operations happen inside a user-level operation,
> the record of the "last operation" gets overwritten, and as a result my
> automatic cleanup mechanism can't catch both!  So we lose one connection each
> time we go through the STS process, and that causes a hard lock.
>
> SOLUTION
> --------
>
> I did two things to fix this, both of which I think should be reflected in
> the released code.  First, in Rampart, I added a call to
> setCallTransportCleanup(true) in STSClient - this means that the STS
> operations will be forced to build the complete Axiom tree (see above), but
> solves the connection starvation issue.  Second, in Axis2, I added a default
> 30-second timeout while waiting for new connections - this doesn't change the
> functionality at all, but it does mean that we can no longer get into
> situations where the system just locks up forever.  With that change, we'll
> now at least get an Exception if there's a starvation issue, which can then
> be debugged.
>
> Nandana/all, can you check what I did in Rampart and let me know if you
> foresee any problems with it?  I'm going to respin Axis2 1.5.1 with this and
> one other fix, and we should respin Rampart 1.5 as well.
>
> Thoughts/comments?
>
> Thanks,
> --Glen
>

Re: [axis2] Status on Axis2 1.5.1 and Rampart 1.5

Posted by Nandana Mihindukulasooriya <na...@gmail.com>.

Hi Glen

Nandana/all, can you check what I did in Rampart and let me know if you
> foresee any problems with it?  I'm going to respin Axis2 1.5.1 with this
> and
> one other fix, and we should respin Rampart 1.5 as well.
>

+1 for re-spining both Axis2 1.5.1 and Rampart 1.5. I checked the change in
STSClient and it does't seem to have side effects on Rampart functionality.

thanks,
Nandana