You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@tomcat.apache.org by Andrew Seales <an...@ed.ac.uk> on 2015/03/26 10:54:34 UTC

Tomcat 8 on Solaris 10/11

Hi,

We are having a problem on our production servers where downloads of 
certain files are getting randomly truncated. This includes static 
Javascript files, file downloads via servlets, etc, where the file is 
more than about 100K. Most of the time the file downloads successfully, 
but some randomly get truncated. The truncation doesn't happen in 
exactly the same place every time.

I've been able to recreate the issue on our development servers using 
Tomcat 8.0.20 with Java 1.8.20 and 1.8.40. I've tried Solaris 10, 11, 
SPARC and x64 CPUs and the same issue occurs. I've tested on a fresh 
install of Tomcat 8 and dropped in one of our larger Javascript files 
into the webapps/ROOT directory and made no other changes. I'm using a 
Perl script to continuously download the file and test an md5 hash 
against a known good value to test if the download breaks. It also seems 
to only occur when the network speed isn't very good. I use the 
following command to limit the speed of my network interface:

sudo tc qdisc add dev eth0 root handle 1:0 netem rate 128kbit

I've also tested the same Tomcat on a Redhat 6 server but that appears 
to work fine.

If I revert to Tomcat 7.0.59, then Solaris works fine. The problem 
appears to only occur with Tomcat 8 on Solaris. I've tried v8.0.14 and 
8.0.20 and they both have the problem.

The Perl script is available from 
http://dlib-bauer.ucs.ed.ac.uk/testdata.pl
The Javascript file is available from 
http://dlib-bauer.ucs.ed.ac.uk/ext-datadownload-20150323_1157.js

Is anyone else running Tomcat 8 on Solaris 10 or 11 with Java 8, or know 
of any problems on the platform?

Regards,

-- 
Andrew Seales

EDINA                   tel: +44 (0) 131 650 3022
Edinburgh University    fax: +44 (0) 131 650 3308
Causewayside House      url: http://edina.ac.uk
160 Causewayside        email: andrew.seales@ed.ac.uk
Edinburgh EH9 1PR


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 8 on Solaris 10/11

Posted by Andrew Seales <an...@ed.ac.uk>.
On 26/03/15 21:10, Aurélien Terrestris wrote:
> As suggested by Rainer, I would try with the blocking connector and compare.
>
> Otherwise, it could be that your file is using very long lines (only 5
> lines for more than 800k of data). Maybe a tomcat-dev could have a
> look on this.
>
> $ wc ext-datadownload-20150323_1157.js
>       5   7634 838044 ext-datadownload-20150323_1157.js
>
I can try the non-minified version to see if it makes a difference, but 
we're getting the same problem with binary files too.

-- 
Andrew Seales

EDINA                   tel: +44 (0) 131 650 3022
Edinburgh University    fax: +44 (0) 131 650 3308
Causewayside House      url: http://edina.ac.uk
160 Causewayside        email: andrew.seales@ed.ac.uk
Edinburgh EH9 1PR


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 8 on Solaris 10/11

Posted by Aurélien Terrestris <at...@gmail.com>.
As suggested by Rainer, I would try with the blocking connector and compare.

Otherwise, it could be that your file is using very long lines (only 5
lines for more than 800k of data). Maybe a tomcat-dev could have a
look on this.

$ wc ext-datadownload-20150323_1157.js
     5   7634 838044 ext-datadownload-20150323_1157.js



2015-03-26 12:12 GMT+01:00 Rainer Jung <ra...@kippdata.de>:
> Am 26.03.2015 um 10:54 schrieb Andrew Seales:
>
>> Hi,
>>
>> We are having a problem on our production servers where downloads of
>> certain files are getting randomly truncated. This includes static
>> Javascript files, file downloads via servlets, etc, where the file is
>> more than about 100K. Most of the time the file downloads successfully,
>> but some randomly get truncated. The truncation doesn't happen in
>> exactly the same place every time.
>>
>> I've been able to recreate the issue on our development servers using
>> Tomcat 8.0.20 with Java 1.8.20 and 1.8.40. I've tried Solaris 10, 11,
>> SPARC and x64 CPUs and the same issue occurs. I've tested on a fresh
>> install of Tomcat 8 and dropped in one of our larger Javascript files
>> into the webapps/ROOT directory and made no other changes. I'm using a
>> Perl script to continuously download the file and test an md5 hash
>> against a known good value to test if the download breaks. It also seems
>> to only occur when the network speed isn't very good. I use the
>> following command to limit the speed of my network interface:
>>
>> sudo tc qdisc add dev eth0 root handle 1:0 netem rate 128kbit
>>
>> I've also tested the same Tomcat on a Redhat 6 server but that appears
>> to work fine.
>>
>> If I revert to Tomcat 7.0.59, then Solaris works fine. The problem
>> appears to only occur with Tomcat 8 on Solaris. I've tried v8.0.14 and
>> 8.0.20 and they both have the problem.
>>
>> The Perl script is available from
>> http://dlib-bauer.ucs.ed.ac.uk/testdata.pl
>> The Javascript file is available from
>> http://dlib-bauer.ucs.ed.ac.uk/ext-datadownload-20150323_1157.js
>>
>> Is anyone else running Tomcat 8 on Solaris 10 or 11 with Java 8, or know
>> of any problems on the platform?
>
>
> Yes, we do, on Solaris 10. I don't know of any such problems, but I can't
> introduce the slow network condition here to test.
>
> Is the file really truncated, i.e. too short, or is it corrupt? Can the
> truncation also be seen in the Tomcat access log? If so, could you replace
> the curl/md5sum based test with another HTTP client like LWP::Simple in perl
> or "ab" coming with Apache httpd. Just to rule out the client side of the
> picture.
>
> Is truncation always happening at the same byte? Any pattern?
>
> Which connector are you using? NIO? APR?
>
> I personally would try the following to provide additional analysis data:
> Find a setup, where you can log the client port. Use this setup and snoop
> network traffic during the test on the client and server side. Once the
> problem happens, use the local port number and timestamp to extract the
> communication pattern on the server and client side. That way you can see,
> which side closed/aborted the connection - or whether it is something in
> between client and server.
>
> Unfortunately logging the client port often is not trivial to achieve. On
> the Tomcat side (access log), currently there is only the server port
> available, not the remote port, although this would be very simple to add.
> On the short hand it would maybe work to switch to perl plus LWP and try to
> get the local port from LWP.
>
> Regards,
>
> Rainer
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 8 on Solaris 10/11

Posted by Rainer Jung <ra...@kippdata.de>.
Am 27.03.2015 um 16:15 schrieb Rainer Jung:
> I'm thinking about adding client port as a loggable item to
> the Tomcat AccessLog but that won't help you right now.

Done and will be available starting with TC 8.0.22 and 7.0.62. Log 
pattern format is %{remote}p like for Apache httpd.

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 8 on Solaris 10/11

Posted by Andrew Seales <an...@ed.ac.uk>.
> Not necessarily a problem from the TCP layer point f view. But once 
> you can find a request that's truncated in the TCP log, you can look out
>
> - whether it was a normal connection shutdown or a reset
> - whether there were unusual pauses between packets triggering timeouts
>
> etc. That's why you would benefit from your test client being able to 
> log the client port when a failure arises so you can filter easily in 
> Wireshark. I'm thinking about adding client port as a loggable item to 
> the Tomcat AccessLog but that won't help you right now.
>
> I vaguely remember problems with TCP checksum offloading, but they 
> should be fixed long ago. See e.g.
>
> http://compgroups.net/comp.unix.solaris/disable-e1000-tcp-checksum-offloading-t5220/472801 
>

I can't see any RST packets so I can only conclude its a normal(ish) 
shutdown.

I have found that if I run my test client on a Linux server that's close 
to the Solaris servers, I don't have an issue with truncation. Perhaps 
there's a switch issue between our Solaris servers at the outside world.

>
> I also once at a customer saw a problem not of truncation, but the 
> last packet of a response as delayed quite noticable. That was fixed 
> by applying the current Solaris patch cluster at that time.

I'm not sure there's a delay but I'm sure the patch level is way out of 
date.

Thanks for your help though, my workaround at the moment is to just run 
Tomcat 7. We're planning to migrate of Solaris onto Linux anyway, we'll 
just have to wait until then before upgrading to Tomcat 8.

-- 
Andrew Seales

EDINA                   tel: +44 (0) 131 650 3022
Edinburgh University    fax: +44 (0) 131 650 3308
Causewayside House      url: http://edina.ac.uk
160 Causewayside        email: andrew.seales@ed.ac.uk
Edinburgh EH9 1PR



The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 8 on Solaris 10/11

Posted by Rainer Jung <ra...@kippdata.de>.
Am 27.03.2015 um 15:48 schrieb Andrew Seales:
>
>> Yes, we do, on Solaris 10. I don't know of any such problems, but I
>> can't introduce the slow network condition here to test.
> Good to know. In case it wasn't clear, the network limiting is done on
> the client side, not on the server.
>>
>> Is the file really truncated, i.e. too short, or is it corrupt? Can
>> the truncation also be seen in the Tomcat access log? If so, could you
>> replace the curl/md5sum based test with another HTTP client like
>> LWP::Simple in perl or "ab" coming with Apache httpd. Just to rule out
>> the client side of the picture.
> Yes the file is definitely truncated rather than corrupted. Users of our
> services with normal browsers are noticing the problem, it's what
> prompted me to use the Perl+Curl test script.
>>
>> Is truncation always happening at the same byte? Any pattern?
>>
>> Which connector are you using? NIO? APR?
> I've tried both AJP13 and the standard HTTP1/1 connectors, both have the
> same problem. When using AJP the Apache log shows the file size as being
> truncated, I'll check the Tomcat log when using HTTP1/1 to see if it
> agrees.
>>
>> I personally would try the following to provide additional analysis
>> data: Find a setup, where you can log the client port. Use this setup
>> and snoop network traffic during the test on the client and server
>> side. Once the problem happens, use the local port number and
>> timestamp to extract the communication pattern on the server and
>> client side. That way you can see, which side closed/aborted the
>> connection - or whether it is something in between client and server.
> Thanks, I'll give Wireshark or something like that a go to see if I can
> see any TCP problems.

Not necessarily a problem from the TCP layer point f view. But once you 
can find a request that's truncated in the TCP log, you can look out

- whether it was a normal connection shutdown or a reset
- whether there were unusual pauses between packets triggering timeouts

etc. That's why you would benefit from your test client being able to 
log the client port when a failure arises so you can filter easily in 
Wireshark. I'm thinking about adding client port as a loggable item to 
the Tomcat AccessLog but that won't help you right now.

I vaguely remember problems with TCP checksum offloading, but they 
should be fixed long ago. See e.g.

http://compgroups.net/comp.unix.solaris/disable-e1000-tcp-checksum-offloading-t5220/472801

I also once at a customer saw a problem not of truncation, but the last 
packet of a response as delayed quite noticable. That was fixed by 
applying the current Solaris patch cluster at that time.

Regards,

Rainer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 8 on Solaris 10/11

Posted by Andrew Seales <an...@ed.ac.uk>.
> Yes, we do, on Solaris 10. I don't know of any such problems, but I 
> can't introduce the slow network condition here to test.
Good to know. In case it wasn't clear, the network limiting is done on 
the client side, not on the server.
>
> Is the file really truncated, i.e. too short, or is it corrupt? Can 
> the truncation also be seen in the Tomcat access log? If so, could you 
> replace the curl/md5sum based test with another HTTP client like 
> LWP::Simple in perl or "ab" coming with Apache httpd. Just to rule out 
> the client side of the picture.
Yes the file is definitely truncated rather than corrupted. Users of our 
services with normal browsers are noticing the problem, it's what 
prompted me to use the Perl+Curl test script.
>
> Is truncation always happening at the same byte? Any pattern?
>
> Which connector are you using? NIO? APR?
I've tried both AJP13 and the standard HTTP1/1 connectors, both have the 
same problem. When using AJP the Apache log shows the file size as being 
truncated, I'll check the Tomcat log when using HTTP1/1 to see if it agrees.
>
> I personally would try the following to provide additional analysis 
> data: Find a setup, where you can log the client port. Use this setup 
> and snoop network traffic during the test on the client and server 
> side. Once the problem happens, use the local port number and 
> timestamp to extract the communication pattern on the server and 
> client side. That way you can see, which side closed/aborted the 
> connection - or whether it is something in between client and server.
Thanks, I'll give Wireshark or something like that a go to see if I can 
see any TCP problems.

-- 
Andrew Seales

EDINA                   tel: +44 (0) 131 650 3022
Edinburgh University    fax: +44 (0) 131 650 3308
Causewayside House      url: http://edina.ac.uk
160 Causewayside        email: andrew.seales@ed.ac.uk
Edinburgh EH9 1PR


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Re: Tomcat 8 on Solaris 10/11

Posted by Rainer Jung <ra...@kippdata.de>.
Am 26.03.2015 um 10:54 schrieb Andrew Seales:
> Hi,
>
> We are having a problem on our production servers where downloads of
> certain files are getting randomly truncated. This includes static
> Javascript files, file downloads via servlets, etc, where the file is
> more than about 100K. Most of the time the file downloads successfully,
> but some randomly get truncated. The truncation doesn't happen in
> exactly the same place every time.
>
> I've been able to recreate the issue on our development servers using
> Tomcat 8.0.20 with Java 1.8.20 and 1.8.40. I've tried Solaris 10, 11,
> SPARC and x64 CPUs and the same issue occurs. I've tested on a fresh
> install of Tomcat 8 and dropped in one of our larger Javascript files
> into the webapps/ROOT directory and made no other changes. I'm using a
> Perl script to continuously download the file and test an md5 hash
> against a known good value to test if the download breaks. It also seems
> to only occur when the network speed isn't very good. I use the
> following command to limit the speed of my network interface:
>
> sudo tc qdisc add dev eth0 root handle 1:0 netem rate 128kbit
>
> I've also tested the same Tomcat on a Redhat 6 server but that appears
> to work fine.
>
> If I revert to Tomcat 7.0.59, then Solaris works fine. The problem
> appears to only occur with Tomcat 8 on Solaris. I've tried v8.0.14 and
> 8.0.20 and they both have the problem.
>
> The Perl script is available from
> http://dlib-bauer.ucs.ed.ac.uk/testdata.pl
> The Javascript file is available from
> http://dlib-bauer.ucs.ed.ac.uk/ext-datadownload-20150323_1157.js
>
> Is anyone else running Tomcat 8 on Solaris 10 or 11 with Java 8, or know
> of any problems on the platform?

Yes, we do, on Solaris 10. I don't know of any such problems, but I 
can't introduce the slow network condition here to test.

Is the file really truncated, i.e. too short, or is it corrupt? Can the 
truncation also be seen in the Tomcat access log? If so, could you 
replace the curl/md5sum based test with another HTTP client like 
LWP::Simple in perl or "ab" coming with Apache httpd. Just to rule out 
the client side of the picture.

Is truncation always happening at the same byte? Any pattern?

Which connector are you using? NIO? APR?

I personally would try the following to provide additional analysis 
data: Find a setup, where you can log the client port. Use this setup 
and snoop network traffic during the test on the client and server side. 
Once the problem happens, use the local port number and timestamp to 
extract the communication pattern on the server and client side. That 
way you can see, which side closed/aborted the connection - or whether 
it is something in between client and server.

Unfortunately logging the client port often is not trivial to achieve. 
On the Tomcat side (access log), currently there is only the server port 
available, not the remote port, although this would be very simple to 
add. On the short hand it would maybe work to switch to perl plus LWP 
and try to get the local port from LWP.

Regards,

Rainer





---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org