You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@trafficserver.apache.org by Pranav Desai <pr...@gmail.com> on 2010/09/16 21:26:53 UTC

errors and shutdown message in 2.1.2 under load

Hi!

I am running a load test with some video files to see . I am using
curl-loader to generate the load. I have modified it to add a random
number to the URLs before sending so I can test with a single URL and
still stress the cache. The webserver is a lighttpd server with
rewrite rules to translate the random strings back to a common URL.
The URL is essentially a 15MB video file. I can provide more details
on the setup if needed.

Here are the errors I see in /var/log/messages
Sep 16 12:05:31 c1b14 traffic_server[30008]: {1095895360} NOTE:
OpenReadHead failed for cachekey F99D907A : vector inconsistency with
2408
Sep 16 12:05:33 c1b14 traffic_server[30008]: {1077999936} NOTE:
OpenReadHead failed for cachekey 8CEE75D9 : vector inconsistency with
2416
Sep 16 12:05:33 c1b14 traffic_server[30008]: {1079052608} NOTE:
OpenReadHead failed for cachekey 2DC0CAB : vector inconsistency with
2416
Sep 16 12:05:34 c1b14 traffic_server[30008]: {1083263296} NOTE:
OpenReadHead failed for cachekey 21712A98 : vector inconsistency with
2416
Sep 16 12:05:36 c1b14 traffic_server[30008]: {1105369408} NOTE:
OpenReadHead failed for cachekey FFD8902 : vector inconsistency with
2416
Sep 16 12:05:36 c1b14 traffic_server[30008]: {1074841920} NOTE:
OpenReadHead failed for cachekey C28AE2D3 : vector inconsistency with
2416
Sep 16 12:05:37 c1b14 traffic_server[30008]: {1083263296} NOTE:
OpenReadHead failed for cachekey 38E6F4CE : vector inconsistency with
2416
...
lots of them.

Sep 16 12:05:53 c1b14 traffic_server[30008]: {1102211392} NOTE:
OpenReadHead failed for cachekey 160CEE08 : vector inconsistency with
2416
Sep 16 12:05:55 c1b14 traffic_manager[29998]: {139744257029936} FATAL:
[LocalManager::pollMgmtProcessServer] Error in read (errno: 104)
Sep 16 12:05:55 c1b14 traffic_manager[29998]: {139744257029936} FATAL:
 (last system error 104: Connection reset by peer)
Sep 16 12:05:55 c1b14 traffic_manager[29998]: {139744257029936} NOTE:
[LocalManager::mgmtShutdown] Executing shutdown request.
Sep 16 12:05:55 c1b14 traffic_manager[29998]: {139744257029936} NOTE:
[LocalManager::processShutdown] Executing process shutdown request.
Sep 16 12:05:55 c1b14 traffic_manager[29998]: {139744257029936} ERROR:
[LocalManager::sendMgmtMsgToProcesses] Error writing message
Sep 16 12:05:55 c1b14 traffic_manager[29998]: {139744257029936} ERROR:
 (last system error 32: Broken pipe)
Sep 16 12:05:55 c1b14 traffic_cop[29996]: cop received child status
signal [29998 2816]
Sep 16 12:05:55 c1b14 traffic_cop[29996]: traffic_manager not running,
making sure traffic_server is dead
Sep 16 12:05:55 c1b14 traffic_cop[29996]: spawning traffic_manager
Sep 16 12:05:55 c1b14 traffic_manager[30116]: NOTE: --- Manager Starting ---

I dont know if its generating a core, atleast I couldnt find one, it
does have a bunch of these messages in traffic.out. I can send the
traffic.out file if it helps. I do have ulimit set correctly on the
shell that I run 'trafficserver start' from, but not sure if its
needed within the script as well ?

-------------- End header heap dump -----------
WARNING: Unmarshal failed due to unknow obj type 121 after 776
bytes---- Dumping header heap @ 0x2aaab7284808 - len 1209 ------
0x2aaab7284808: 0xdcbafeed 0xa1e63532 0xb7284b80 0x2aaa
0x2aaab7284818: 0xb7284890 0x2aaa 0x378 0x907cbe00
0x2aaab7284828: 0x0 0x0 0x0 0x1fab0054
0x2aaab7284838: 0x0 0x0 0x1c01430 0x0
0x2aaab7284848: 0xb7284b80 0x2aaa 0x141 0xd48914a0
0x2aaab7284858: 0x1b564038 0x816a7c80 0x0 0x0
0x2aaab7284868: 0x7e29048b 0x6fbc0a8e 0x5b1c996 0x4daab81d


I dont see this behavior with 2.0.1.

Let me know if you need more information.

Thanks
-- Pranav

Re: errors and shutdown message in 2.1.2 under load

Posted by Pranav Desai <pr...@gmail.com>.
On Thu, Sep 16, 2010 at 2:03 PM, Leif Hedstrom <zw...@apache.org> wrote:
> On 09/16/2010 02:45 PM, Pranav Desai wrote:
>
> On Thu, Sep 16, 2010 at 12:33 PM, Leif Hedstrom <zw...@apache.org> wrote:
>
>  On 09/16/2010 01:26 PM, Pranav Desai wrote:
>
> Hi!
>
> I am running a load test with some video files to see . I am using
> curl-loader to generate the load. I have modified it to add a random
> number to the URLs before sending so I can test with a single URL and
> still stress the cache. The webserver is a lighttpd server with
> rewrite rules to translate the random strings back to a common URL.
> The URL is essentially a 15MB video file. I can provide more details
> on the setup if needed.
>
> Ok, I've created https://issues.apache.org/jira/browse/TS-441   with this
> information. If you can find a core file (or, run traffic_server under gdb),
> and get a stack trace, that would be very helpful. Also, when it crashes,
> you might get a stack trace in /var/log/messages and/or one of the log files
> in the .../var/log/trafficserver  directory.
>
> Will do that.
>
> Is there an architecture diagram or doc some where that briefly
> describes how the system works especially in terms of the roles of the
> processes like traffic_manager and traffic_server and how they
> interact, communicate. How the threads get the request, who runs the
> event loop etc. Even a general block diagram would work for me. It
> will of great help in understanding the system.
>
>
>
>
> Well, the admin guide (http://trafficserver.apache.org/docs/v2/admin/)
> should give some ideas on how the system overall runs (processes etc.), and
> the developers guide ( http://trafficserver.apache.org/docs/v2/sdk/) will
> give some ideas how things works under the hood.
>
> That much said, I can tell you what the three processes does (big picture);
>
> traffic_cop - This process starts the traffic_manager process, and it's
> primary task is to verify that the traffic_server process is responding to a
> built-in health check. If it doesn't, traffic_cop will kill the
> traffic_server process.
> traffic_manager - This process is responsible for various "admin" tasks, and
> also implements the (defunct) WebUI. In addition, it binds the "listen"
> port(s) that the server will accept requests on, and start the
> traffic_server process. If this process dies (either crashes or killed by
> traffic_cop), traffic_manager still has the ports bound, and will
> immediately restart traffic_server.
> traffic_server - This is the primary proxy server process. It will run <n>
> number of worker threads (net-threads), where n is configurable but
> defaulted to something reasonable based on the number of CPUs. It also runs
> <m> I/O threads, by default 4 threads per disk spindle (but this is also
> configurable, and 4 is probably to small in many cases). In addition, there
> are a few "helper" threads for things like logging etc.
>

That helps, I am looking at strace so its giving me some more info
before I start looking into the code. I couldn't find the IO thread
option but I guess its proxy.config.cache.threads_per_disk

Thanks

-- Pranav


> Cheers,
>
> -- leif
>
>

Re: errors and shutdown message in 2.1.2 under load

Posted by Leif Hedstrom <zw...@apache.org>.
  On 09/16/2010 02:45 PM, Pranav Desai wrote:
> On Thu, Sep 16, 2010 at 12:33 PM, Leif Hedstrom<zw...@apache.org>  wrote:
>>   On 09/16/2010 01:26 PM, Pranav Desai wrote:
>>> Hi!
>>>
>>> I am running a load test with some video files to see . I am using
>>> curl-loader to generate the load. I have modified it to add a random
>>> number to the URLs before sending so I can test with a single URL and
>>> still stress the cache. The webserver is a lighttpd server with
>>> rewrite rules to translate the random strings back to a common URL.
>>> The URL is essentially a 15MB video file. I can provide more details
>>> on the setup if needed.
>> Ok, I've created https://issues.apache.org/jira/browse/TS-441   with this
>> information. If you can find a core file (or, run traffic_server under gdb),
>> and get a stack trace, that would be very helpful. Also, when it crashes,
>> you might get a stack trace in /var/log/messages and/or one of the log files
>> in the .../var/log/trafficserver  directory.
>>
> Will do that.
>
> Is there an architecture diagram or doc some where that briefly
> describes how the system works especially in terms of the roles of the
> processes like traffic_manager and traffic_server and how they
> interact, communicate. How the threads get the request, who runs the
> event loop etc. Even a general block diagram would work for me. It
> will of great help in understanding the system.
>
>


Well, the admin guide (http://trafficserver.apache.org/docs/v2/admin/) 
should give some ideas on how the system overall runs (processes etc.), 
and the developers guide ( http://trafficserver.apache.org/docs/v2/sdk/) 
will give some ideas how things works under the hood.

That much said, I can tell you what the three processes does (big picture);

   1. traffic_cop - This process starts the traffic_manager process, and
      it's primary task is to verify that the traffic_server process is
      responding to a built-in health check. If it doesn't, traffic_cop
      will kill the traffic_server process.
   2. traffic_manager - This process is responsible for various "admin"
      tasks, and also implements the (defunct) WebUI. In addition, it
      binds the "listen" port(s) that the server will accept requests
      on, and start the traffic_server process. If this process dies
      (either crashes or killed by traffic_cop), traffic_manager still
      has the ports bound, and will immediately restart traffic_server.
   3. traffic_server - This is the primary proxy server process. It will
      run <n> number of worker threads (net-threads), where n is
      configurable but defaulted to something reasonable based on the
      number of CPUs. It also runs <m> I/O threads, by default 4 threads
      per disk spindle (but this is also configurable, and 4 is probably
      to small in many cases). In addition, there are a few "helper"
      threads for things like logging etc.


Cheers,

-- leif


Re: errors and shutdown message in 2.1.2 under load

Posted by Pranav Desai <pr...@gmail.com>.
On Thu, Sep 16, 2010 at 12:33 PM, Leif Hedstrom <zw...@apache.org> wrote:
>  On 09/16/2010 01:26 PM, Pranav Desai wrote:
>>
>> Hi!
>>
>> I am running a load test with some video files to see . I am using
>> curl-loader to generate the load. I have modified it to add a random
>> number to the URLs before sending so I can test with a single URL and
>> still stress the cache. The webserver is a lighttpd server with
>> rewrite rules to translate the random strings back to a common URL.
>> The URL is essentially a 15MB video file. I can provide more details
>> on the setup if needed.
>
> Ok, I've created https://issues.apache.org/jira/browse/TS-441   with this
> information. If you can find a core file (or, run traffic_server under gdb),
> and get a stack trace, that would be very helpful. Also, when it crashes,
> you might get a stack trace in /var/log/messages and/or one of the log files
> in the .../var/log/trafficserver  directory.
>

Will do that.

Is there an architecture diagram or doc some where that briefly
describes how the system works especially in terms of the roles of the
processes like traffic_manager and traffic_server and how they
interact, communicate. How the threads get the request, who runs the
event loop etc. Even a general block diagram would work for me. It
will of great help in understanding the system.

Thanks
-- Pranav

> -- leif
>
>

Re: errors and shutdown message in 2.1.2 under load

Posted by Leif Hedstrom <zw...@apache.org>.
  On 09/16/2010 01:26 PM, Pranav Desai wrote:
> Hi!
>
> I am running a load test with some video files to see . I am using
> curl-loader to generate the load. I have modified it to add a random
> number to the URLs before sending so I can test with a single URL and
> still stress the cache. The webserver is a lighttpd server with
> rewrite rules to translate the random strings back to a common URL.
> The URL is essentially a 15MB video file. I can provide more details
> on the setup if needed.

Ok, I've created https://issues.apache.org/jira/browse/TS-441   with 
this information. If you can find a core file (or, run traffic_server 
under gdb), and get a stack trace, that would be very helpful. Also, 
when it crashes, you might get a stack trace in /var/log/messages and/or 
one of the log files in the .../var/log/trafficserver  directory.

-- leif