You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Christopher Schultz <ch...@christopherschultz.net> on 2020/12/02 16:21:13 UTC

Re: Weirdest Tomcat Behavior Ever?

Mark,

On 11/26/20 05:14, Mark Thomas wrote:
> On 26/11/2020 04:57, Christopher Schultz wrote:
> 
> <snip/>
> 
>>> After a normal clean-up the parent then calls close on the two file
>>> descriptors associated with the pipe for a second time."
>>
>> So the child cleans them up AND the parent cleans them up? Or the parent
>> cleans when up twice? The child should be able to call close() as many
>> times as it wants and only poison itself. Does the child process ever
>> exit()?
> 
> With the caveat that some of the below is educated guess work because
> the strace was configured to look at the events we were interested in so
> I am having to fill in some of the gaps.
> 
> The parent "process" is a Java thread currently in native code in a 3rd
> party library.
> 
> The parent creates a pipe which comes with two file descriptors. One for
> the read end, one for the write end.
> 
> The parent process then forks. The child process now has copies of the
> two file descriptors. (see man fork for more details).
> 
> The parent closes its fd for the write end of the pipe. The child closes
> its fd for the read end of the pipe.
> 
> The child writes to the pipe and the parent reads from it.
> 
> The child exits and closes its fd for the write end of the pipe.
> 
> The parent closes its fd for the read end of the pipe.
> 
> At this point all is good. All the closes completely cleanly. Everything
> has been shutdown properly.

+1

> The two fds allocated to the parent are back in the pool any may be
> reused by other threads in the JVM.
> 
> The parent then attempts to close the fds associated with the pipe
> again. For each fd, if it has not been reallocated an EBADF error
> occurs. If it has been reallocated, it is closed thereby breaking
> whatever was using it legitimately.

Thanks for clarifying this. I was confused and thinking you were saying 
that the child process was the one breaking things, but it's the parent 
process. Since the parent is the JVM (the long-running process), all 
hell breaks loose.

>> The parent process must be the JVM process, right? And the parent
>> process (this native library, running within the JVM process)
>> double-closes file descriptors, with some measurable delay?
> 
> Correct. In the instance where I did most of the log analysis the delay
> was about 0.1 seconds. In other logs I did observe longer delays with
> what looked like a very different failure mode.
> 
>> That's the
>> only way this could make sense. And of course it mess mess everything up
>> in really *really* unpredictable ways.
> 
> Yep.

Fascinating.

Thanks for the wild ride, Eric and Mark :)

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Weirdest Tomcat Behavior Ever?

Posted by Mark Thomas <ma...@apache.org>.
On 04/12/2020 20:36, Eric Robinson wrote:
<snip/>

> We'll begin the packet captures and straces again on Monday. Mark, besides network and fd tracing, is there anything else you want from strace to make the analysis more certain?

I think the following should be sufficient:

- Wireshark trace for the Tomcat instance where the issue was observed
- debug logs for the same Tomcat instance
- fd trace for the same Tomcat instance
- a thread dump of the Java process as soon as practical after the issue

Because you are now using longer lasting connections we may need to look
at a longer period of data to figure out exactly what is happening. To
reduce the amount of data we pass around, if I need more data I'll ask
for it. The important thing is that you keep hold of a copy.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


RE: Weirdest Tomcat Behavior Ever?

Posted by Eric Robinson <er...@psmnv.com>.
> -----Original Message-----
> From: Christopher Schultz <ch...@christopherschultz.net>
> Sent: Wednesday, December 2, 2020 10:21 AM
> To: users@tomcat.apache.org
> Subject: Re: Weirdest Tomcat Behavior Ever?
>
> Mark,
>
> On 11/26/20 05:14, Mark Thomas wrote:
> > On 26/11/2020 04:57, Christopher Schultz wrote:
> >
> > <snip/>
> >
> >>> After a normal clean-up the parent then calls close on the two file
> >>> descriptors associated with the pipe for a second time."
> >>
> >> So the child cleans them up AND the parent cleans them up? Or the
> >> parent cleans when up twice? The child should be able to call close()
> >> as many times as it wants and only poison itself. Does the child
> >> process ever exit()?
> >
> > With the caveat that some of the below is educated guess work because
> > the strace was configured to look at the events we were interested in
> > so I am having to fill in some of the gaps.
> >
> > The parent "process" is a Java thread currently in native code in a
> > 3rd party library.
> >
> > The parent creates a pipe which comes with two file descriptors. One
> > for the read end, one for the write end.
> >
> > The parent process then forks. The child process now has copies of the
> > two file descriptors. (see man fork for more details).
> >
> > The parent closes its fd for the write end of the pipe. The child
> > closes its fd for the read end of the pipe.
> >
> > The child writes to the pipe and the parent reads from it.
> >
> > The child exits and closes its fd for the write end of the pipe.
> >
> > The parent closes its fd for the read end of the pipe.
> >
> > At this point all is good. All the closes completely cleanly.
> > Everything has been shutdown properly.
>
> +1
>
> > The two fds allocated to the parent are back in the pool any may be
> > reused by other threads in the JVM.
> >
> > The parent then attempts to close the fds associated with the pipe
> > again. For each fd, if it has not been reallocated an EBADF error
> > occurs. If it has been reallocated, it is closed thereby breaking
> > whatever was using it legitimately.
>
> Thanks for clarifying this. I was confused and thinking you were saying that
> the child process was the one breaking things, but it's the parent process.
> Since the parent is the JVM (the long-running process), all hell breaks loose.
>
> >> The parent process must be the JVM process, right? And the parent
> >> process (this native library, running within the JVM process)
> >> double-closes file descriptors, with some measurable delay?
> >
> > Correct. In the instance where I did most of the log analysis the
> > delay was about 0.1 seconds. In other logs I did observe longer delays
> > with what looked like a very different failure mode.
> >
> >> That's the
> >> only way this could make sense. And of course it mess mess everything
> >> up in really *really* unpredictable ways.
> >
> > Yep.
>
> Fascinating.
>
> Thanks for the wild ride, Eric and Mark :)
>
> -chris
>

I case anyone thought I had forgotten about all this... sorry, no such luck. You’re stuck with me!

Things have been quieter for the past several days because, at Mark's suggestion, we changed the nginx proxy and tomcat configurations to make them more tolerant of the underlying condition. Specifically, we configured nginx to use HTTP/1.1 instead of 1.0, enabled keepalives, and set maxKeepAliveRequests="-1"  in server.xml. This reduced the frequency of the issue.

The vendor was unable to dispute the quality of the analysis, so they accepted that the third-party component (which I can now name: PDFTron) could be the root cause. They disabled use of the component, and that seems to have quieting things down a bit more. We are still seeing occasional session disconnects, so it is possible that the component is leveraged for more than one function in their software and it was only disabled for a subset of them. The big difference now is that, instead of seeing a GET request from the proxy followed by a FIN from the upstream, now it’s a GET followed by an RST.

We'll begin the packet captures and straces again on Monday. Mark, besides network and fd tracing, is there anything else you want from strace to make the analysis more certain?

-Eric

Disclaimer : This email and any files transmitted with it are confidential and intended solely for intended recipients. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physician Select Management. Warning: Although Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org