You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Shinta Tjio <st...@broadjump.com> on 2001/03/08 02:16:02 UTC

RE: Design Review for ajp13's changes: WAS problem w/ ajp13 - if Tomc at is shutdown

Attached are the unified diffs for the proposed changes.
They are diffs against the 3.2.1 release code. I hope this
is sufficient. I haven't got to use Solaris patch tool yet. 
These are tested on Solaris 2.8. Changes #1 is the one 
that's less platform specific, since I don't call any 
socket APIs.

I will test these on Windows 2000 tomorrow. As of other
UNIXes, we don't have those in house. So if someone
can volunteer testing it on other UNIX flavors, that
will be great!

Unified diffs for the proposed changes #1:
  jk_ajp13_worker.c.1.diff
  mod_jk.c.1.diff

Unified diffs for the proposed changes #2:
  jk_ajp13_worker.c.2.diff
  jk_connect.c.2.diff

thanks so much!
shinta

> -----Original Message-----
> From: cmanolache@yahoo.com [mailto:cmanolache@yahoo.com]
> Sent: Wednesday, March 07, 2001 6:57 PM
> To: 'tomcat-dev@jakarta.apache.org '
> Subject: Re: Design Review for ajp13's changes: WAS problem w/ ajp13 -
> if Tomc at is shutdown
> 
> 
> Hi Shinta,
> 
> It's sounds like a solution to a real problem, please send a patch,
> I'm sure someone will read it. Dan and Henri are the best 
> people to ask
> about this, I can also help a bit ( I've been using RPMs 
> lately, it's too
> easy to get them and not worry about compile :-)
> 
> My only sugestion/concern is that the code should work on 
> both Windows and
> unix - or at least compile :-) 
> 
> Costin
> 
> 
> > I would like to propose some changes to eliminate the
> > requirement to restart Apache, when you restart Tomcat.
> > I'm willing to give the code to anyone who needs it, 
> > when I'm done testing.
> > 
> > But I need some help/suggestions so that I can put in 
> > the right code. If any of the proposed changes below
> > should not exists ever, I'm open to other suggestions.
> > This is my first time looking at mod_jk's ajp13 code.
> > So any clue to make these better would be appreciated.
> > 
> > Right now, if you use ajp13 and you restart Tomcat, you
> > have to also restart Apache. See details in previous
> > postings. For us, having to restart Apache is not a
> > feasible solution in our customers' environment.
> > 
> > After looking at the code, I have two possible solutions:
> > 
> > 1. From mod_jk, I can detect that the socket has been
> >    closed by Tomcat. This is normally indicated by the
> >    recv() returning ECONNRESET. The recv() is called
> >    after the request has been sent to the socket. The
> >    send() unfortunately, doesn't give you an error.
> > 
> >    The proposed fixed is to check for errno ECONNRESET, 
> >    then set the is_recoverable_error flag to TRUE, in
> >    the service() function in jk_ajp13_worker.c. I also 
> >    add a code in mod_jk.c, to check for this flag, and
> >    call the service() method again if the flag is set 
> >    TRUE. The 2nd time the service() method is called, 
> >    it will reconnect to Tomcat like normal.
> > 
> > 2. Another solution would be to put in a select() on the
> >    socket prior to send(), looking for the socket being
> >    read ready. Under normal condition, this select() 
> >    should return nothing. But if Tomcat shuts down 
> >    the socket, this select() should return the socket
> >    being read ready. When this happen, I issue a read()
> >    of 1 bytes. If the read() comes back with return code
> >    0, this should be an indication that the socket was
> >    closed on the remote end. Then I will proceed to close 
> >    the socket. The remaining logic already handles the 
> >    reconnect, etc.
> > 
> > I have both of these solution prototyped and minimally 
> > tested. They both Anyone care to comment which solution fits better 
> > with the overall code? Anyone voluteer to review the code?
> > 
> > thanks,
> > shinta
> > 
> > > -----Original Message----- 
> > > From: Shinta Tjio 
> > > To: tomcat-dev@jakarta.apache.org 
> > > Cc: 'Dan Milstein' 
> > > Sent: 3/6/01 7:01 PM 
> > > Subject: RE: FW: problem w/ ajp13 - if Tomcat is shutdown 
> > > 
> > > I am using Tomcat 3.2.1, Apache 1.3.14, running on 
> > > Solaris 2.8, Sun machines. 
> > > 
> > > After various attempts of debugging this, I have 
> > > more information. 
> > > 
> > > 1. Even though I'm setting the worker's property 
> > > cache_size to default (1), I'm finding there 
> > > are up to 6 connections opened from Apache to 
> > > Tomcat. I deduce this by looking at the mod_jk.conf 
> > > and by doing netstat. 
> > > 
> > > I finally find out, this is so because my Apache 
> > > is set to spawn minimum of 6 children and each 
> > > of those children are making separate connections 
> > > to Tomcat. 
> > > 
> > > This is very bad because, I ended up having to 
> > > reload 6 times before Tomcat starts serving me 
> > > the page again. Each time it uses a different 
> > > Apache children that has defunct socket. So the 
> > > more Apache children I have, the longer it takes 
> > > me to recover from this problem. 
> > > 
> > > 2. It seems when Tomcat dies & restarts, the send() 
> > > called by ajp13's jk_tcp_socket_sendfull() does not 
> > > get an error. But the recv() does get an error, with 
> > > errno ECONNRESET. After which, the socket is properly 
> > > closed. 
> > > 
> > > 3. When I shutdown Tomcat, those sockets that were 
> > > opened between Apache/Tomcat showed up in state 
> > > CLOSE_WAIT, and FIN_WAIT2. I think this is normally 
> > > solved by calling the shutdown() API after closing 
> > > the socket. However, this would have to be done from 
> > > the Tomcat side in Ajp13ConnectionHandler.java. 
> > > I can't find the corresponding method of Socket 
> > > in Java. 
> > > 
> > > So.. based on all of these, the only fix I can think 
> > > of putting is to make mod_jk retry the send() if 
> > > recv() comes back with an error ECONNRESET. The retry 
> > > should happen after the old socket is properly closed. 
> > > 
> > > Anyone wants to comment? 
> > > 
> > > shinta 
> > >   
> > > 
> > > > -----Original Message----- 
> > > > From: Dan Milstein [ mailto:danmil@shore.net 
> > > <ma...@shore.net> 
> > > ] 
> > > > Sent: Tuesday, March 06, 2001 12:00 PM 
> > > > To: tomcat-dev@jakarta.apache.org 
> > > > Subject: Re: FW: problem w/ ajp13 - if Tomcat is shutdown 
> > > > 
> > > > 
> > > > What version of TC are you using?  What version of Apache? 
> > > > 
> > > > I would look into the mod_jk docs -- I think this is the 
> > > > spec'd behavior 
> > > > (which, admittedly, is not great, but that makes it more of 
> > > a feature 
> > > > request than a bug ;-).  With ajp13, Apache opens up a 
> > > > persistent TCP/IP 
> > > > connection TC -- if TC restarts, I think that connection may 
> > > > just hang up 
> > > > and then timeout (since Apache doesn't know that TC has 
> restarted). 
> > > > 
> > > > If anyone wants to work on this, you would have the undying 
> > > > thanks of the 
> > > > rest of the TC community -- having to restart Apache all the 
> > > > time bugs a 
> > > > *lot* of people. 
> > > > 
> > > > -Dan 
> > > > 
> > > > > Shinta Tjio wrote: 
> > > > > 
> > > > > I'm having problem with mod_jk if ajp13 is used. 
> > > > > 
> > > > > The problem is often reproduced when Tomcat is shut 
> > > > > down without Apache being shut down. When a request 
> > > > > is fired through Apache as soon as Tomcat starts, 
> > > > > I often get Internal Server Error. The mod_jk.log 
> > > > > will have the following: 
> > > > > 
> > > > > > [jk_uri_worker_map.c (344)]: Into 
> > > > jk_uri_worker_map_t::map_uri_to_worker 
> > > > > 
> > > > > > [jk_uri_worker_map.c (406)]: 
> > > > jk_uri_worker_map_t::map_uri_to_worker, 
> > > > > >   Found a match ajp13 
> > > > > > [jk_worker.c (123)]: Into wc_get_worker_for_name ajp13 
> > > > > > [jk_worker.c (127)]: wc_get_worker_for_name, done  
> > > found a worker 
> > > > > > [jk_ajp13_worker.c (651)]: Into jk_worker_t::get_endpoint 
> > > > > > [jk_ajp13_worker.c (536)]: Into jk_endpoint_t::service 
> > > > > > [jk_ajp13.c (346)]: Into ajp13_marshal_into_msgb 
> > > > > > [jk_ajp13.c (480)]: ajp13_marshal_into_msgb - Done 
> > > > > > [jk_ajp13_worker.c (203)]: 
> connection_tcp_get_message: Error - 
> > > > > >    jk_tcp_socket_recvfull failed 
> > > > > > [jk_ajp13_worker.c (619)]: Error reading request 
> > > > > > [jk_ajp13_worker.c (489)]: Into jk_endpoint_t::done 
> > > > > 
> > > > > If I hit reload multiple times, eventually Tomcat will 
> > > > > serve the servlet fine. 
> > > > > 
> > > > > Did anyone see this problem before? Is there anyway 
> > > > > around this? 
> > > > > 
> > > > > shinta 
> > > > 
> > > > -- 
> > > > 
> > > > Dan Milstein // danmil@shore.net 
> > > > 
> > > 
> >   
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, email: tomcat-dev-help@jakarta.apache.org
> 


Re: Design Review for ajp13's changes: WAS problem w/ ajp13 - if Tomc at is shutdown

Posted by Dan Milstein <da...@shore.net>.
First off, it's GREAT that you are working on this -- it's a very heavily
requested improvement.

I haven't had time to do a full review yet, but I've taken a quick look and
I have a few questions/suggestions:

 1) This work will end up being committed in the 3.3 branch, rather than the
3.2 branch.  3.2 is only bug fixes now, not new features.  The mod_jk C code
is very similar in 3.3 and 3.2, but not identical.  I can adapt your
patches, but if you wanted to work against 3.3, that would make things
easier.

 2) You seem to have managed to puzzle out some of the workings of mod_jk
already, but, FYI, I just committed some internal documentation a few days
ago.  It's in the 3.3 branch, in src/native/mod_jk/common/jk_service.h --
may be worth taking a look at.

 3) For option (1), I have a few questions

  - Is there a way in which data could be lost?  Specifically, as you state,
the send() will return without error, and then it will only get the error on
the following read().  Is all the data always preserved so that simply
retrying will work correctly?  I think most of that state is in the
jk_ws_service_t object -- is it possible a read pointer will be advanced and
data will be lost?  This may be acceptable, but I'd like to understand it...

 - You only retry once.  If there are a number of connections open (from a
single Apache process), isn't it possible that Tomcat has come back up, and
that the next connection obtained (from the endpoint cache), will also be
stale?  Would it make sense in this case to trigger a shutdown of all the
connections currently in the cache (and then retry once)?  That would make
sense if there were no other ways to get a ECONNRESET error.  

 - Or, more generally, just so I (and everyone) can understand, how does
this new code deal with the following stages:

  1) TC and Apache both up and running

  2) TC is shutdown
If mod_jk is in the middle of handling a request, what happens?  There was
an infinite loop in the 3.2.1 code, but that's been fixed in 3.2.2 and 3.3.

  3) TC is shutdown, Apache is still up.  While TC is down, requests come
in.  How are they handled?  Are there any loops Apache gets stuck in?

  4) TC starts back up.  Now requests get handled smoothly again? 


 4) For option (2):

 - If the user has Win32, you're just punting, correct?  Why is that?  I
know nothing about Win32 socket programming, but I'm curious...  You say
you're testing on Win2k -- does Win2k support select(), but win32 doesn't? 
Does anyone know about how widely select() is supported?

I'll try to take a more thorough look (and do some testing). 

Thanks again,
-Dan 

> Shinta Tjio wrote:
> 
> Attached are the unified diffs for the proposed changes.
> They are diffs against the 3.2.1 release code. I hope this
> is sufficient. I haven't got to use Solaris patch tool yet.
> These are tested on Solaris 2.8. Changes #1 is the one
> that's less platform specific, since I don't call any
> socket APIs.
> 
> I will test these on Windows 2000 tomorrow. As of other
> UNIXes, we don't have those in house. So if someone
> can volunteer testing it on other UNIX flavors, that
> will be great!
> 
> Unified diffs for the proposed changes #1:
>   jk_ajp13_worker.c.1.diff
>   mod_jk.c.1.diff
> 
> Unified diffs for the proposed changes #2:
>   jk_ajp13_worker.c.2.diff
>   jk_connect.c.2.diff
> 
> thanks so much!
> shinta
> 
> > -----Original Message-----
> > From: cmanolache@yahoo.com [mailto:cmanolache@yahoo.com]
> > Sent: Wednesday, March 07, 2001 6:57 PM
> > To: 'tomcat-dev@jakarta.apache.org '
> > Subject: Re: Design Review for ajp13's changes: WAS problem w/ ajp13 -
> > if Tomc at is shutdown
> >
> >
> > Hi Shinta,
> >
> > It's sounds like a solution to a real problem, please send a patch,
> > I'm sure someone will read it. Dan and Henri are the best
> > people to ask
> > about this, I can also help a bit ( I've been using RPMs
> > lately, it's too
> > easy to get them and not worry about compile :-)
> >
> > My only sugestion/concern is that the code should work on
> > both Windows and
> > unix - or at least compile :-)
> >
> > Costin
> >
> >
> > > I would like to propose some changes to eliminate the
> > > requirement to restart Apache, when you restart Tomcat.
> > > I'm willing to give the code to anyone who needs it,
> > > when I'm done testing.
> > >
> > > But I need some help/suggestions so that I can put in
> > > the right code. If any of the proposed changes below
> > > should not exists ever, I'm open to other suggestions.
> > > This is my first time looking at mod_jk's ajp13 code.
> > > So any clue to make these better would be appreciated.
> > >
> > > Right now, if you use ajp13 and you restart Tomcat, you
> > > have to also restart Apache. See details in previous
> > > postings. For us, having to restart Apache is not a
> > > feasible solution in our customers' environment.
> > >
> > > After looking at the code, I have two possible solutions:
> > >
> > > 1. From mod_jk, I can detect that the socket has been
> > >    closed by Tomcat. This is normally indicated by the
> > >    recv() returning ECONNRESET. The recv() is called
> > >    after the request has been sent to the socket. The
> > >    send() unfortunately, doesn't give you an error.
> > >
> > >    The proposed fixed is to check for errno ECONNRESET,
> > >    then set the is_recoverable_error flag to TRUE, in
> > >    the service() function in jk_ajp13_worker.c. I also
> > >    add a code in mod_jk.c, to check for this flag, and
> > >    call the service() method again if the flag is set
> > >    TRUE. The 2nd time the service() method is called,
> > >    it will reconnect to Tomcat like normal.
> > >
> > > 2. Another solution would be to put in a select() on the
> > >    socket prior to send(), looking for the socket being
> > >    read ready. Under normal condition, this select()
> > >    should return nothing. But if Tomcat shuts down
> > >    the socket, this select() should return the socket
> > >    being read ready. When this happen, I issue a read()
> > >    of 1 bytes. If the read() comes back with return code
> > >    0, this should be an indication that the socket was
> > >    closed on the remote end. Then I will proceed to close
> > >    the socket. The remaining logic already handles the
> > >    reconnect, etc.
> > >
> > > I have both of these solution prototyped and minimally
> > > tested. They both Anyone care to comment which solution fits better
> > > with the overall code? Anyone voluteer to review the code?
> > >
> > > thanks,
> > > shinta
> > >
> > > > -----Original Message-----
> > > > From: Shinta Tjio
> > > > To: tomcat-dev@jakarta.apache.org
> > > > Cc: 'Dan Milstein'
> > > > Sent: 3/6/01 7:01 PM
> > > > Subject: RE: FW: problem w/ ajp13 - if Tomcat is shutdown
> > > >
> > > > I am using Tomcat 3.2.1, Apache 1.3.14, running on
> > > > Solaris 2.8, Sun machines.
> > > >
> > > > After various attempts of debugging this, I have
> > > > more information.
> > > >
> > > > 1. Even though I'm setting the worker's property
> > > > cache_size to default (1), I'm finding there
> > > > are up to 6 connections opened from Apache to
> > > > Tomcat. I deduce this by looking at the mod_jk.conf
> > > > and by doing netstat.
> > > >
> > > > I finally find out, this is so because my Apache
> > > > is set to spawn minimum of 6 children and each
> > > > of those children are making separate connections
> > > > to Tomcat.
> > > >
> > > > This is very bad because, I ended up having to
> > > > reload 6 times before Tomcat starts serving me
> > > > the page again. Each time it uses a different
> > > > Apache children that has defunct socket. So the
> > > > more Apache children I have, the longer it takes
> > > > me to recover from this problem.
> > > >
> > > > 2. It seems when Tomcat dies & restarts, the send()
> > > > called by ajp13's jk_tcp_socket_sendfull() does not
> > > > get an error. But the recv() does get an error, with
> > > > errno ECONNRESET. After which, the socket is properly
> > > > closed.
> > > >
> > > > 3. When I shutdown Tomcat, those sockets that were
> > > > opened between Apache/Tomcat showed up in state
> > > > CLOSE_WAIT, and FIN_WAIT2. I think this is normally
> > > > solved by calling the shutdown() API after closing
> > > > the socket. However, this would have to be done from
> > > > the Tomcat side in Ajp13ConnectionHandler.java.
> > > > I can't find the corresponding method of Socket
> > > > in Java.
> > > >
> > > > So.. based on all of these, the only fix I can think
> > > > of putting is to make mod_jk retry the send() if
> > > > recv() comes back with an error ECONNRESET. The retry
> > > > should happen after the old socket is properly closed.
> > > >
> > > > Anyone wants to comment?
> > > >
> > > > shinta
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Dan Milstein [ mailto:danmil@shore.net
> > > > <ma...@shore.net>
> > > > ]
> > > > > Sent: Tuesday, March 06, 2001 12:00 PM
> > > > > To: tomcat-dev@jakarta.apache.org
> > > > > Subject: Re: FW: problem w/ ajp13 - if Tomcat is shutdown
> > > > >
> > > > >
> > > > > What version of TC are you using?  What version of Apache?
> > > > >
> > > > > I would look into the mod_jk docs -- I think this is the
> > > > > spec'd behavior
> > > > > (which, admittedly, is not great, but that makes it more of
> > > > a feature
> > > > > request than a bug ;-).  With ajp13, Apache opens up a
> > > > > persistent TCP/IP
> > > > > connection TC -- if TC restarts, I think that connection may
> > > > > just hang up
> > > > > and then timeout (since Apache doesn't know that TC has
> > restarted).
> > > > >
> > > > > If anyone wants to work on this, you would have the undying
> > > > > thanks of the
> > > > > rest of the TC community -- having to restart Apache all the
> > > > > time bugs a
> > > > > *lot* of people.
> > > > >
> > > > > -Dan
> > > > >
> > > > > > Shinta Tjio wrote:
> > > > > >
> > > > > > I'm having problem with mod_jk if ajp13 is used.
> > > > > >
> > > > > > The problem is often reproduced when Tomcat is shut
> > > > > > down without Apache being shut down. When a request
> > > > > > is fired through Apache as soon as Tomcat starts,
> > > > > > I often get Internal Server Error. The mod_jk.log
> > > > > > will have the following:
> > > > > >
> > > > > > > [jk_uri_worker_map.c (344)]: Into
> > > > > jk_uri_worker_map_t::map_uri_to_worker
> > > > > >
> > > > > > > [jk_uri_worker_map.c (406)]:
> > > > > jk_uri_worker_map_t::map_uri_to_worker,
> > > > > > >   Found a match ajp13
> > > > > > > [jk_worker.c (123)]: Into wc_get_worker_for_name ajp13
> > > > > > > [jk_worker.c (127)]: wc_get_worker_for_name, done
> > > > found a worker
> > > > > > > [jk_ajp13_worker.c (651)]: Into jk_worker_t::get_endpoint
> > > > > > > [jk_ajp13_worker.c (536)]: Into jk_endpoint_t::service
> > > > > > > [jk_ajp13.c (346)]: Into ajp13_marshal_into_msgb
> > > > > > > [jk_ajp13.c (480)]: ajp13_marshal_into_msgb - Done
> > > > > > > [jk_ajp13_worker.c (203)]:
> > connection_tcp_get_message: Error -
> > > > > > >    jk_tcp_socket_recvfull failed
> > > > > > > [jk_ajp13_worker.c (619)]: Error reading request
> > > > > > > [jk_ajp13_worker.c (489)]: Into jk_endpoint_t::done
> > > > > >
> > > > > > If I hit reload multiple times, eventually Tomcat will
> > > > > > serve the servlet fine.
> > > > > >
> > > > > > Did anyone see this problem before? Is there anyway
> > > > > > around this?
> > > > > >
> > > > > > shinta
> > > > >
> > > > > --
> > > > >
> > > > > Dan Milstein // danmil@shore.net
> > > > >
> > > >
> > >
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> > For additional commands, email: tomcat-dev-help@jakarta.apache.org
> >
> 
> 
> 
>                                Name: jk_ajp13_worker.c.1.diff
>    jk_ajp13_worker.c.1.diff    Type: unspecified type
>                                      (application/octet-stream)
>                            Encoding: quoted-printable
> 
>                       Name: mod_jk.c.1.diff
>    mod_jk.c.1.diff    Type: unspecified type (application/octet-stream)
>                   Encoding: quoted-printable
> 
>                                Name: jk_ajp13_worker.c.2.diff
>    jk_ajp13_worker.c.2.diff    Type: unspecified type
>                                      (application/octet-stream)
>                            Encoding: quoted-printable
> 
>                           Name: jk_connect.c.2.diff
>    jk_connect.c.2.diff    Type: unspecified type
>                                 (application/octet-stream)
>                       Encoding: quoted-printable
> 
>     -------------------------------------------------------------------
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
> For additional commands, email: tomcat-dev-help@jakarta.apache.org

-- 

Dan Milstein // danmil@shore.net

---------------------------------------------------------------------
To unsubscribe, e-mail: tomcat-dev-unsubscribe@jakarta.apache.org
For additional commands, email: tomcat-dev-help@jakarta.apache.org