You are viewing a plain text version of this content. The canonical link for it is here.

Posted to olio-user@incubator.apache.org by Joshua Schnee <jp...@gmail.com> on 2010/02/01 17:32:47 UTC

OLIO-Java harness runs forever - round 2

OK, so I'm seeing the harness just run indefinitely again, and this time it
isn't related to the maximum open files.

Details and Background information:
Faban 1.0 build 111109
Olio Java 0.2
The workload is being driven from physical Windows client against 2 VMs on a
system that is doing many other tasks.

Usage Info:
Physical Client usage : Avg ~ 22.93%, Max ~ 42.15%
Total System under test Utilization ~95%
Web avg util = 5.3%, Max =40.83% (during manual reload)
DB   avg uili = 4.2%, Max = 55.6% (during auto reload)

Granted, the system under test is near saturation, the client isn't.  I'm
not sure why the harness is never exiting.  Even if the VMs or even the
system under test gets so saturated that they can't respond to requests,
shouldn't the test, which is running on the under utilized client finish
regardless, reporting whatever results it can?  Shanti, you
had previously asked me to file a JIRA, on this, of which I forgot.  I can
do so now, if you'd like.  Finally, glassfish appears to be stuck, it's
running, but not responding to requests, probably due to the SEVERE entry in
the server.log file (see below).

Faban\Logs:
  agent.log = empty
  cmdagent.log = No issues
  faban.log.xml = No issues

Master\Logs
  catalina.out = No issues
  localhost*log*txt = No issues

OlioDriver.3C\
  log.xml : Two types of issues, doTagSearch and the catastrophic "Forcefully
terminating benchmark run" : attached

GlassFish logs
   jvm.log : Numerous dependency_failed entries : attached
   server.log : Serveral SEVERE entires : Most notibly one where "a signal
was attempted before wait()" : Attached

Any help resolving this would me much appreciated,
-Josh

Re: OLIO-Java harness runs forever - round 2

Posted by Akara Sucharitakul <Ak...@Sun.COM>.

I think you have it all covered. Unfortunately, there is no single place 
  to download them all. The Faban source coming with the download is 
more for IDE use and not for building. I suggest you get the source from 
github directly. http://github.com/akara/faban/

There is a Rel1.0b branch that holds the exact Faban 1.0 image. However, 
I suggest you build from the master (which has advanced quite a bit from 
  1.0)

I suggest adding logs to the connect method to observe the read timeout 
setting, and adding some logs to the termination process when the 2 
minute time times out. So far I have observed them being invoked 
correctly. Please let me know what you find. Thanks.

-Akara

Joshua Schnee wrote:
> Akara,
> 
> These are (1.6.x versions of) Sun JDKs on both the client and server 
> end, so oddball JVMs are probably not the source of the Faban read 
> timeout failures.  In order to be able to better debug these kinds of 
> issues on the harness side, we are looking into setting up a build 
> environment for the harness. Toward that end, we have so far identified 
> Faban, Olio, and HttpClient source packages that we will need to 
> download and patch together.  Are there any other packages we'll have to 
> find, or is there a single point where we can download, say, a tarball 
> with *all* of the source code used to build the Faban harness?
> 
> Thanks,
> 
> Joshua
>

Re: OLIO-Java harness runs forever - round 2

Posted by Joshua Schnee <jp...@gmail.com>.

Akara,

These are (1.6.x versions of) Sun JDKs on both the client and server end, so
oddball JVMs are probably not the source of the Faban read timeout failures.
 In order to be able to better debug these kinds of issues on the harness
side, we are looking into setting up a build environment for the harness.
Toward that end, we have so far identified Faban, Olio, and HttpClient
source packages that we will need to download and patch together.  Are there
any other packages we'll have to find, or is there a single point where we
can download, say, a tarball with *all* of the source code used to build the
Faban harness?

Thanks,

Joshua

Re: OLIO-Java harness runs forever - round 2

Posted by Ak...@Sun.COM.

On 02/09/10 15:11, Joshua Schnee wrote:
> Akara,
> 
> Can you review his comments?
> 
> 
> Thanks,
> 
> Joshua
> 
> 
> His comments:
> 
>   > The code setting the timeout is in TimedSocket.connect().
> 
> But the point in the stack trace that at least appears to my eyes to be 
> the source of the read is the UIDriver class, and that class is using 
> URLConnections. So I don't see how a call to the methods in UIDriver has 
> any such timeout. You don't explicitly create sockets nor do you 
> explicitly call connect() with URLConnections.

What source are you looking at? Is this Olio 0.2? The UIDriver is 
several layers away from the socket.
In Olio 0.2, all calls from UIDriver go through ApacheHC3Transport, 
which then goes through the Apache HttpClient stack and finally arrives 
at the socket layer. The UIDriver methods do not set or read any timeouts.

> 
> Now it is possible that somewhere several layers back, outside the scope 
> of these stack traces, *something* is creating a such a connection with 
> a timeout and within that connection this URLConnection is created 
> without a timeout, so that effectively even if the blocking read on the 
> URLConnection will wait forever, the method that called it won't. But 
> these stack traces provide no hint of such a relationship.

Each simulated client is responsible for its own connection. It does not 
take a connection from somewhere else. I don't see how this could even 
happen. But to be really sure, we'll need to make all the connect calls 
expose state.

> 
>>  The only way a thread can block on the I/O forever is if
> 
>>  both of these conditions are true:
>>
>>  1. The socket does not obey setSoTimeout
>>  2. The socket I/O interrupt does not work. A thread.interrupt
> 
>>     does not cause the waiting read call to get interrupted.
> 
> or
> 
> 3. You are not using the this connect() method to make the connection.

If you check the Java socket library source, it will always use this 
connect call to make the connection, no matter which Socket constructor 
you use - of course, unless you have a strange JVM

> 
> Lastly, I noticed there is "logging" code in the faban harness 
> (logging.finer(), etc). I haven't found how you can turn it on, but you 
> might want to ask this guy, as logging output would be a lot more 
> helpful than these stack traces for backtracking to help figure out what 
> is going on.

On the system you want to make it log, go to 
$FABAN/config/logging.properties and change or add logging for the class 
you want to see more detailed log messages. Add a property

<name>.level=<desired level>
i.e.
com.sun.faban.driver.level=FINEST

Logs are helpful. But if you have something hanging, stack dumps are better.

-Akara



> 
> 
> On Tue, Feb 9, 2010 at 11:37 AM, Akara Sucharitakul 
> <Akara.Sucharitakul@sun.com <ma...@sun.com>> wrote:
> 
>     We explicitly do this on the sockets right after the connect:
> 
>            setSoTimeout(30000); // 30 second socket read timeout.
> 
>     Moreover, the master schedules a reaper that goes and calls
>     terminate on each agent 60 seconds after the end of ramp down. The
>     terminate will call interrupt on all driver threads that are still
>     alive.
> 
>     If you kill the run, the rampdown won't be observed. It will go and
>     interrupt the threads right away.
> 
>     The code setting the timeout is in TimedSocket.connect(). Here it is:
> 
>        public void connect(SocketAddress endpoint, int timeout) throws
>     IOException {
>            // Here we intercept the connect and capture the start time.
>            DriverContext ctx = DriverContext.getContext();
>            if (ctx != null)
>                ctx.recordStartTime();
>            if (timeout <= 0)
>                timeout = 30000; // 30 second connect timeout.
>            super.connect(endpoint, timeout);
>            setSoTimeout(30000); // 30 second socket read timeout.
>        }
> 
>     Note that the read timeout of 30 seconds should be set no matter
>     what connect timeout is set.
> 
>     As for the 2 minute limit at the end, search the MasterImpl for
>     "killAtEnd" and you'll see it calls the agent's terminate method at
>     that end time. If you trace it down, you'll see that all threads
>     waiting for socket read will get interrupted and terminated.
> 
>     The only way a thread can block on the I/O forever is if both of
>     these conditions are true:
> 
>     1. The socket does not obey setSoTimeout
>     2. The socket I/O interrupt does not work. A thread.interrupt does
>     not cause the waiting read call to get interrupted.
> 
> 
>     Thanks,
>     -Akara
> 
> 
> 
>     Joshua Schnee wrote:
> 
>         Akara,
> 
>         We are still experiencing this issue, across several different
>         physical clients (3 now).  I've asked one of our lead Java
>         developers to review the stack traces and although he agrees
>         with your assessment of where the harness, master and driver are
>         all hung waiting for the driver to terminate.  He said he was
>         somewhat confused by the claim that Faban will time out an
>         attempted read after 30 seconds.  He believes this to be a
>         blocking read call, one that will wait forever for data to be
>         returned unless it is interrupted somehow.
> 
>         He suggested putting in a 30-second read timeout on the threads
>         at the end of the rampdown to stop the harness from running
>         forever, but was also concerned about the run never reaching the
>         end of the rampdown period.  Does Faban still do the rampdown
>         phase if a run is being aborted?
> 
>         Also, could you possibly tell me what block of code in Faban
>         assures the timeout constraint?  He thought he might be able to
>         work backwards from there...
> 
>         Thanks,
>         Joshua
> 
>         On Tue, Feb 2, 2010 at 11:53 AM, Joshua Schnee
>         <jpschnee@gmail.com <ma...@gmail.com>
>         <mailto:jpschnee@gmail.com <ma...@gmail.com>>> wrote:
> 
>            Akara,
> 
>            We don't know of anything that is specific to our
>         configuration, but
>            one thought is that it has something to do with our running
>         it under
>            a Windows client.  We'll dig a bit deeper, but if you have any
>            ideas, they would also be appreciated.
> 
>            Thanks,
>            Joshua
> 
> 
>            On Tue, Feb 2, 2010 at 10:30 AM, Akara Sucharitakul
>            <Akara.Sucharitakul@sun.com
>         <ma...@sun.com>
>         <mailto:Akara.Sucharitakul@sun.com
>         <ma...@sun.com>>> wrote:
> 
>                Josh,
> 
>                I've checked the stacks and it turns out all driver
>         threads are
>                waiting on socket read (reading from the server). Faban has a
>                default read timeout of 30 seconds and interrupts the
>         thread and
>                therefore the I/O at 2 minutes after the rampdown has
>         ended. So
>                it should not run forever.
> 
>                Is there anything in your configuration that causes the
>         socket
>                not to timeout according to contract or be non-interruptible?
>                Otherwise I couldn't see how it could run forever.
> 
>                Here are some details:
> 
>                1. pid 608 seems to be an old stale instance of the
>         harness that
>                does not want to terminate. Please kill it off.
>                2. pid 2504 is a driver agent. All threads are waiting on
>         socket
>                read.
>                3. pid 4508 is the master. It is waiting for the driver
>         agent to
>                finish up the threads (in a join).
>                4. pid 4080 is the harness. It is waiting for the master to
>                terminate.
> 
> 
>                Thanks,
>                -Akara
> 
>                Joshua Schnee wrote:
> 
>                    Turns out I didn't get as much info as what is probably
>                    needed.  Here's an updated zip file with more dumps...
> 
>                    Thanks,
> 
> 
>                    On Mon, Feb 1, 2010 at 5:45 PM, Joshua Schnee
>                    <jpschnee@gmail.com <ma...@gmail.com>
>         <mailto:jpschnee@gmail.com <ma...@gmail.com>>
>                    <mailto:jpschnee@gmail.com
>         <ma...@gmail.com> <mailto:jpschnee@gmail.com
>         <ma...@gmail.com>>>> wrote:
> 
>                       Just realized the list didn't get copied.
> 
> 
>                       ---------- Forwarded message ----------
>                       From: *Joshua Schnee* <jpschnee@gmail.com
>         <ma...@gmail.com>
>                    <mailto:jpschnee@gmail.com
>         <ma...@gmail.com>> <mailto:jpschnee@gmail.com
>         <ma...@gmail.com>
>                    <mailto:jpschnee@gmail.com <ma...@gmail.com>>>>
>                       Date: Mon, Feb 1, 2010 at 1:46 PM
>                       Subject: Re: OLIO-Java harness runs forever - round 2
>                       To: Akara Sucharitakul <Akara.Sucharitakul@sun.com
>         <ma...@sun.com>
>                    <mailto:Akara.Sucharitakul@sun.com
>         <ma...@sun.com>>
>                       <mailto:Akara.Sucharitakul@sun.com
>         <ma...@sun.com>
>                    <mailto:Akara.Sucharitakul@sun.com
>         <ma...@sun.com>>>>
> 
> 
>                       Thanks,
> 
>                       Here's two PIDs I dumped, let me know if these
>         aren't the
>                    ones you
>                       want as there are several.
> 
>                       Thanks,
>                       Joshua
> 
> 
>                       On Mon, Feb 1, 2010 at 1:12 PM, Akara Sucharitakul
>                       <Akara.Sucharitakul@sun.com
>         <ma...@sun.com>
>                    <mailto:Akara.Sucharitakul@sun.com
>         <ma...@sun.com>>
>                    <mailto:Akara.Sucharitakul@sun.com
>         <ma...@sun.com>
>                    <mailto:Akara.Sucharitakul@sun.com
>         <ma...@sun.com>>>> wrote:
> 
>                           kill -QUIT <pid>
> 
>                           or
> 
>                           $JAVA_HOME/bin/jstack <pid>
> 
>                           -Akara
> 
>                           Joshua Schnee wrote:
> 
>                               Can you tell me how to do this?  I'm not
>         sure how
>                    do this
>                               when the harness doesn't hit an exception...
> 
>                               Thanks,
>                               Joshua
> 
>                               On Mon, Feb 1, 2010 at 12:17 PM, Akara
>         Sucharitakul
>                               <Akara.Sucharitakul@sun.com
>         <ma...@sun.com>
>                    <mailto:Akara.Sucharitakul@sun.com
>         <ma...@sun.com>>
>                               <mailto:Akara.Sucharitakul@sun.com
>         <ma...@sun.com>
>                    <mailto:Akara.Sucharitakul@sun.com
>         <ma...@sun.com>>>
>                               <mailto:Akara.Sucharitakul@sun.com
>         <ma...@sun.com>
>                    <mailto:Akara.Sucharitakul@sun.com
>         <ma...@sun.com>>
>                               <mailto:Akara.Sucharitakul@sun.com
>         <ma...@sun.com>
>                    <mailto:Akara.Sucharitakul@sun.com
>         <ma...@sun.com>>>>> wrote:
> 
>                                  Can you please obtain me a stack dump
>         of the Faban
>                               harness? If it
>                                  hangs somewhere, we'll be able to diagnose
>                    better. Thanks.
> 
>                                  -Akara
> 
> 
>                                  Joshua Schnee wrote:
> 
>                                      OK, so I'm seeing the harness just run
>                    indefinitely
>                               again, and
>                                      this time it isn't related to the
>         maximum
>                    open files.
> 
>                                      Details and Background information:
>                                      Faban 1.0 build 111109
>                                      Olio Java 0.2
>                                      The workload is being driven from
>         physical
>                    Windows client
>                                      against 2 VMs on a system that is doing
>                    many other tasks.
> 
>                                      Usage Info:
>                                      Physical Client usage : Avg ~
>         22.93%, Max
>                    ~ 42.15%
>                                      Total System under test Utilization
>         ~95%
>                                      Web avg util = 5.3%, Max =40.83%
>         (during
>                    manual reload)
>                                      DB   avg uili = 4.2%, Max = 55.6%
>         (during
>                    auto reload)
> 
>                                      Granted, the system under test is near
>                    saturation,
>                               the client
>                                      isn't.  I'm not sure why the
>         harness is never
>                               exiting.  Even if
>                                      the VMs or even the system under
>         test gets so
>                               saturated that
>                                      they can't respond to requests,
>         shouldn't
>                    the test,
>                               which is
>                                      running on the under utilized client
>                    finish regardless,
>                                      reporting whatever results it can?
>                     Shanti, you had
>                               previously
>                                      asked me to file a JIRA, on this,
>         of which
>                    I forgot.
>                                I can do
>                                      so now, if you'd like.  Finally,
>         glassfish
>                    appears to
>                               be stuck,
>                                      it's running, but not responding to
>         requests,
>                               probably due to
>                                      the SEVERE entry in the server.log file
>                    (see below).
> 
>                                      Faban\Logs:
>                                       agent.log = empty
>                                       cmdagent.log = No issues
>                                       faban.log.xml = No issues
> 
>                                      Master\Logs
>                                       catalina.out = No issues
>                                       localhost*log*txt = No issues
>                                       OlioDriver.3C\
>                                       log.xml : Two types of issues,
>                    doTagSearch and the
>                               catastrophic
>                                      "Forcefully terminating benchmark
>         run" :
>                    attached
>                                        GlassFish logs
>                                        jvm.log : Numerous dependency_failed
>                    entries :
>                               attached          server.log : Serveral SEVERE
>                    entires :
>                               Most notibly one where "a
>                                      signal was attempted before wait()"
>         : Attached
>                                      Any help resolving this would me much
>                    appreciated,
>                                      -Josh
> 
> 
> 
> 
> 
>                               --            -Josh
> 
> 
> 
> 
> 
>                       --    -Josh
> 
> 
> 
> 
>                       --    -Josh
> 
> 
> 
> 
>                    --            -Josh
> 
> 
> 
> 
> 
>            --    -Josh
> 
> 
> 
> 
>         -- 
>         -Josh
> 
> 
> 
> 
> 
> -- 
> -Josh
>

Re: OLIO-Java harness runs forever - round 2

Posted by Joshua Schnee <jp...@gmail.com>.

Akara,

Can you review his comments?


Thanks,

Joshua


His comments:

  > The code setting the timeout is in TimedSocket.connect().

But the point in the stack trace that at least appears to my eyes to be the
source of the read is the UIDriver class, and that class is using
URLConnections. So I don't see how a call to the methods in UIDriver has any
such timeout. You don't explicitly create sockets nor do you explicitly call
connect() with URLConnections.

Now it is possible that somewhere several layers back, outside the scope of
these stack traces, *something* is creating a such a connection with a
timeout and within that connection this URLConnection is created without a
timeout, so that effectively even if the blocking read on the URLConnection
will wait forever, the method that called it won't. But these stack traces
provide no hint of such a relationship.

> The only way a thread can block on the I/O forever is if

> both of these conditions are true:
>
> 1. The socket does not obey setSoTimeout
> 2. The socket I/O interrupt does not work. A thread.interrupt

>     does not cause the waiting read call to get interrupted.

or

3. You are not using the this connect() method to make the connection.

Lastly, I noticed there is "logging" code in the faban harness
(logging.finer(), etc). I haven't found how you can turn it on, but you
might want to ask this guy, as logging output would be a lot more helpful
than these stack traces for backtracking to help figure out what is going
on.

On Tue, Feb 9, 2010 at 11:37 AM, Akara Sucharitakul <
Akara.Sucharitakul@sun.com> wrote:

> We explicitly do this on the sockets right after the connect:
>
>        setSoTimeout(30000); // 30 second socket read timeout.
>
> Moreover, the master schedules a reaper that goes and calls terminate on
> each agent 60 seconds after the end of ramp down. The terminate will call
> interrupt on all driver threads that are still alive.
>
> If you kill the run, the rampdown won't be observed. It will go and
> interrupt the threads right away.
>
> The code setting the timeout is in TimedSocket.connect(). Here it is:
>
>    public void connect(SocketAddress endpoint, int timeout) throws
> IOException {
>        // Here we intercept the connect and capture the start time.
>        DriverContext ctx = DriverContext.getContext();
>        if (ctx != null)
>            ctx.recordStartTime();
>        if (timeout <= 0)
>            timeout = 30000; // 30 second connect timeout.
>        super.connect(endpoint, timeout);
>        setSoTimeout(30000); // 30 second socket read timeout.
>    }
>
> Note that the read timeout of 30 seconds should be set no matter what
> connect timeout is set.
>
> As for the 2 minute limit at the end, search the MasterImpl for "killAtEnd"
> and you'll see it calls the agent's terminate method at that end time. If
> you trace it down, you'll see that all threads waiting for socket read will
> get interrupted and terminated.
>
> The only way a thread can block on the I/O forever is if both of these
> conditions are true:
>
> 1. The socket does not obey setSoTimeout
> 2. The socket I/O interrupt does not work. A thread.interrupt does not
> cause the waiting read call to get interrupted.
>
>
> Thanks,
> -Akara
>
>
>
> Joshua Schnee wrote:
>
>> Akara,
>>
>> We are still experiencing this issue, across several different physical
>> clients (3 now).  I've asked one of our lead Java developers to review the
>> stack traces and although he agrees with your assessment of where the
>> harness, master and driver are all hung waiting for the driver to terminate.
>>  He said he was somewhat confused by the claim that Faban will time out an
>> attempted read after 30 seconds.  He believes this to be a blocking read
>> call, one that will wait forever for data to be returned unless it is
>> interrupted somehow.
>>
>> He suggested putting in a 30-second read timeout on the threads at the end
>> of the rampdown to stop the harness from running forever, but was also
>> concerned about the run never reaching the end of the rampdown period.  Does
>> Faban still do the rampdown phase if a run is being aborted?
>>
>> Also, could you possibly tell me what block of code in Faban assures the
>> timeout constraint?  He thought he might be able to work backwards from
>> there...
>>
>> Thanks,
>> Joshua
>>
>> On Tue, Feb 2, 2010 at 11:53 AM, Joshua Schnee <jpschnee@gmail.com<mailto:
>> jpschnee@gmail.com>> wrote:
>>
>>    Akara,
>>
>>    We don't know of anything that is specific to our configuration, but
>>    one thought is that it has something to do with our running it under
>>    a Windows client.  We'll dig a bit deeper, but if you have any
>>    ideas, they would also be appreciated.
>>
>>    Thanks,
>>    Joshua
>>
>>
>>    On Tue, Feb 2, 2010 at 10:30 AM, Akara Sucharitakul
>>    <Akara.Sucharitakul@sun.com <ma...@sun.com>>
>> wrote:
>>
>>        Josh,
>>
>>        I've checked the stacks and it turns out all driver threads are
>>        waiting on socket read (reading from the server). Faban has a
>>        default read timeout of 30 seconds and interrupts the thread and
>>        therefore the I/O at 2 minutes after the rampdown has ended. So
>>        it should not run forever.
>>
>>        Is there anything in your configuration that causes the socket
>>        not to timeout according to contract or be non-interruptible?
>>        Otherwise I couldn't see how it could run forever.
>>
>>        Here are some details:
>>
>>        1. pid 608 seems to be an old stale instance of the harness that
>>        does not want to terminate. Please kill it off.
>>        2. pid 2504 is a driver agent. All threads are waiting on socket
>>        read.
>>        3. pid 4508 is the master. It is waiting for the driver agent to
>>        finish up the threads (in a join).
>>        4. pid 4080 is the harness. It is waiting for the master to
>>        terminate.
>>
>>
>>        Thanks,
>>        -Akara
>>
>>        Joshua Schnee wrote:
>>
>>            Turns out I didn't get as much info as what is probably
>>            needed.  Here's an updated zip file with more dumps...
>>
>>            Thanks,
>>
>>
>>            On Mon, Feb 1, 2010 at 5:45 PM, Joshua Schnee
>>            <jpschnee@gmail.com <ma...@gmail.com>
>>            <mailto:jpschnee@gmail.com <ma...@gmail.com>>>
>> wrote:
>>
>>               Just realized the list didn't get copied.
>>
>>
>>               ---------- Forwarded message ----------
>>               From: *Joshua Schnee* <jpschnee@gmail.com
>>            <ma...@gmail.com> <mailto:jpschnee@gmail.com
>>            <ma...@gmail.com>>>
>>               Date: Mon, Feb 1, 2010 at 1:46 PM
>>               Subject: Re: OLIO-Java harness runs forever - round 2
>>               To: Akara Sucharitakul <Akara.Sucharitakul@sun.com
>>            <ma...@sun.com>
>>               <mailto:Akara.Sucharitakul@sun.com
>>            <ma...@sun.com>>>
>>
>>
>>               Thanks,
>>
>>               Here's two PIDs I dumped, let me know if these aren't the
>>            ones you
>>               want as there are several.
>>
>>               Thanks,
>>               Joshua
>>
>>
>>               On Mon, Feb 1, 2010 at 1:12 PM, Akara Sucharitakul
>>               <Akara.Sucharitakul@sun.com
>>            <ma...@sun.com>
>>            <mailto:Akara.Sucharitakul@sun.com
>>            <ma...@sun.com>>> wrote:
>>
>>                   kill -QUIT <pid>
>>
>>                   or
>>
>>                   $JAVA_HOME/bin/jstack <pid>
>>
>>                   -Akara
>>
>>                   Joshua Schnee wrote:
>>
>>                       Can you tell me how to do this?  I'm not sure how
>>            do this
>>                       when the harness doesn't hit an exception...
>>
>>                       Thanks,
>>                       Joshua
>>
>>                       On Mon, Feb 1, 2010 at 12:17 PM, Akara Sucharitakul
>>                       <Akara.Sucharitakul@sun.com
>>            <ma...@sun.com>
>>                       <mailto:Akara.Sucharitakul@sun.com
>>            <ma...@sun.com>>
>>                       <mailto:Akara.Sucharitakul@sun.com
>>            <ma...@sun.com>
>>                       <mailto:Akara.Sucharitakul@sun.com
>>            <ma...@sun.com>>>> wrote:
>>
>>                          Can you please obtain me a stack dump of the
>> Faban
>>                       harness? If it
>>                          hangs somewhere, we'll be able to diagnose
>>            better. Thanks.
>>
>>                          -Akara
>>
>>
>>                          Joshua Schnee wrote:
>>
>>                              OK, so I'm seeing the harness just run
>>            indefinitely
>>                       again, and
>>                              this time it isn't related to the maximum
>>            open files.
>>
>>                              Details and Background information:
>>                              Faban 1.0 build 111109
>>                              Olio Java 0.2
>>                              The workload is being driven from physical
>>            Windows client
>>                              against 2 VMs on a system that is doing
>>            many other tasks.
>>
>>                              Usage Info:
>>                              Physical Client usage : Avg ~ 22.93%, Max
>>            ~ 42.15%
>>                              Total System under test Utilization ~95%
>>                              Web avg util = 5.3%, Max =40.83% (during
>>            manual reload)
>>                              DB   avg uili = 4.2%, Max = 55.6% (during
>>            auto reload)
>>
>>                              Granted, the system under test is near
>>            saturation,
>>                       the client
>>                              isn't.  I'm not sure why the harness is never
>>                       exiting.  Even if
>>                              the VMs or even the system under test gets so
>>                       saturated that
>>                              they can't respond to requests, shouldn't
>>            the test,
>>                       which is
>>                              running on the under utilized client
>>            finish regardless,
>>                              reporting whatever results it can?
>>             Shanti, you had
>>                       previously
>>                              asked me to file a JIRA, on this, of which
>>            I forgot.
>>                        I can do
>>                              so now, if you'd like.  Finally, glassfish
>>            appears to
>>                       be stuck,
>>                              it's running, but not responding to requests,
>>                       probably due to
>>                              the SEVERE entry in the server.log file
>>            (see below).
>>
>>                              Faban\Logs:
>>                               agent.log = empty
>>                               cmdagent.log = No issues
>>                               faban.log.xml = No issues
>>
>>                              Master\Logs
>>                               catalina.out = No issues
>>                               localhost*log*txt = No issues
>>                               OlioDriver.3C\
>>                               log.xml : Two types of issues,
>>            doTagSearch and the
>>                       catastrophic
>>                              "Forcefully terminating benchmark run" :
>>            attached
>>                                GlassFish logs
>>                                jvm.log : Numerous dependency_failed
>>            entries :
>>                       attached          server.log : Serveral SEVERE
>>            entires :
>>                       Most notibly one where "a
>>                              signal was attempted before wait()" :
>> Attached
>>                              Any help resolving this would me much
>>            appreciated,
>>                              -Josh
>>
>>
>>
>>
>>
>>                       --            -Josh
>>
>>
>>
>>
>>
>>               --    -Josh
>>
>>
>>
>>
>>               --    -Josh
>>
>>
>>
>>
>>            --            -Josh
>>
>>
>>
>>
>>
>>    --    -Josh
>>
>>
>>
>>
>> --
>> -Josh
>>
>>
>


-- 
-Josh

Re: OLIO-Java harness runs forever - round 2

Posted by Akara Sucharitakul <Ak...@Sun.COM>.

We explicitly do this on the sockets right after the connect:

         setSoTimeout(30000); // 30 second socket read timeout.

Moreover, the master schedules a reaper that goes and calls terminate on 
each agent 60 seconds after the end of ramp down. The terminate will 
call interrupt on all driver threads that are still alive.

If you kill the run, the rampdown won't be observed. It will go and 
interrupt the threads right away.

The code setting the timeout is in TimedSocket.connect(). Here it is:

     public void connect(SocketAddress endpoint, int timeout) throws 
IOException {
         // Here we intercept the connect and capture the start time.
         DriverContext ctx = DriverContext.getContext();
         if (ctx != null)
             ctx.recordStartTime();
         if (timeout <= 0)
             timeout = 30000; // 30 second connect timeout.
         super.connect(endpoint, timeout);
         setSoTimeout(30000); // 30 second socket read timeout.
     }

Note that the read timeout of 30 seconds should be set no matter what 
connect timeout is set.

As for the 2 minute limit at the end, search the MasterImpl for 
"killAtEnd" and you'll see it calls the agent's terminate method at that 
end time. If you trace it down, you'll see that all threads waiting for 
socket read will get interrupted and terminated.

The only way a thread can block on the I/O forever is if both of these 
conditions are true:

1. The socket does not obey setSoTimeout
2. The socket I/O interrupt does not work. A thread.interrupt does not 
cause the waiting read call to get interrupted.

Thanks,
-Akara



Joshua Schnee wrote:
> Akara,
> 
> We are still experiencing this issue, across several different physical 
> clients (3 now).  I've asked one of our lead Java developers to review 
> the stack traces and although he agrees with your assessment of where 
> the harness, master and driver are all hung waiting for the driver to 
> terminate.  He said he was somewhat confused by the claim that Faban 
> will time out an attempted read after 30 seconds.  He believes this to 
> be a blocking read call, one that will wait forever for data to be 
> returned unless it is interrupted somehow.
> 
> He suggested putting in a 30-second read timeout on the threads at the 
> end of the rampdown to stop the harness from running forever, but was 
> also concerned about the run never reaching the end of the rampdown 
> period.  Does Faban still do the rampdown phase if a run is being aborted?
> 
> Also, could you possibly tell me what block of code in Faban assures the 
> timeout constraint?  He thought he might be able to work backwards from 
> there...
> 
> Thanks,
> Joshua
> 
> On Tue, Feb 2, 2010 at 11:53 AM, Joshua Schnee <jpschnee@gmail.com 
> <ma...@gmail.com>> wrote:
> 
>     Akara,
> 
>     We don't know of anything that is specific to our configuration, but
>     one thought is that it has something to do with our running it under
>     a Windows client.  We'll dig a bit deeper, but if you have any
>     ideas, they would also be appreciated.
> 
>     Thanks,
>     Joshua
> 
> 
>     On Tue, Feb 2, 2010 at 10:30 AM, Akara Sucharitakul
>     <Akara.Sucharitakul@sun.com <ma...@sun.com>> wrote:
> 
>         Josh,
> 
>         I've checked the stacks and it turns out all driver threads are
>         waiting on socket read (reading from the server). Faban has a
>         default read timeout of 30 seconds and interrupts the thread and
>         therefore the I/O at 2 minutes after the rampdown has ended. So
>         it should not run forever.
> 
>         Is there anything in your configuration that causes the socket
>         not to timeout according to contract or be non-interruptible?
>         Otherwise I couldn't see how it could run forever.
> 
>         Here are some details:
> 
>         1. pid 608 seems to be an old stale instance of the harness that
>         does not want to terminate. Please kill it off.
>         2. pid 2504 is a driver agent. All threads are waiting on socket
>         read.
>         3. pid 4508 is the master. It is waiting for the driver agent to
>         finish up the threads (in a join).
>         4. pid 4080 is the harness. It is waiting for the master to
>         terminate.
> 
> 
>         Thanks,
>         -Akara
> 
>         Joshua Schnee wrote:
> 
>             Turns out I didn't get as much info as what is probably
>             needed.  Here's an updated zip file with more dumps...
> 
>             Thanks,
> 
> 
>             On Mon, Feb 1, 2010 at 5:45 PM, Joshua Schnee
>             <jpschnee@gmail.com <ma...@gmail.com>
>             <mailto:jpschnee@gmail.com <ma...@gmail.com>>> wrote:
> 
>                Just realized the list didn't get copied.
> 
> 
>                ---------- Forwarded message ----------
>                From: *Joshua Schnee* <jpschnee@gmail.com
>             <ma...@gmail.com> <mailto:jpschnee@gmail.com
>             <ma...@gmail.com>>>
>                Date: Mon, Feb 1, 2010 at 1:46 PM
>                Subject: Re: OLIO-Java harness runs forever - round 2
>                To: Akara Sucharitakul <Akara.Sucharitakul@sun.com
>             <ma...@sun.com>
>                <mailto:Akara.Sucharitakul@sun.com
>             <ma...@sun.com>>>
> 
> 
>                Thanks,
> 
>                Here's two PIDs I dumped, let me know if these aren't the
>             ones you
>                want as there are several.
> 
>                Thanks,
>                Joshua
> 
> 
>                On Mon, Feb 1, 2010 at 1:12 PM, Akara Sucharitakul
>                <Akara.Sucharitakul@sun.com
>             <ma...@sun.com>
>             <mailto:Akara.Sucharitakul@sun.com
>             <ma...@sun.com>>> wrote:
> 
>                    kill -QUIT <pid>
> 
>                    or
> 
>                    $JAVA_HOME/bin/jstack <pid>
> 
>                    -Akara
> 
>                    Joshua Schnee wrote:
> 
>                        Can you tell me how to do this?  I'm not sure how
>             do this
>                        when the harness doesn't hit an exception...
> 
>                        Thanks,
>                        Joshua
> 
>                        On Mon, Feb 1, 2010 at 12:17 PM, Akara Sucharitakul
>                        <Akara.Sucharitakul@sun.com
>             <ma...@sun.com>
>                        <mailto:Akara.Sucharitakul@sun.com
>             <ma...@sun.com>>
>                        <mailto:Akara.Sucharitakul@sun.com
>             <ma...@sun.com>
>                        <mailto:Akara.Sucharitakul@sun.com
>             <ma...@sun.com>>>> wrote:
> 
>                           Can you please obtain me a stack dump of the Faban
>                        harness? If it
>                           hangs somewhere, we'll be able to diagnose
>             better. Thanks.
> 
>                           -Akara
> 
> 
>                           Joshua Schnee wrote:
> 
>                               OK, so I'm seeing the harness just run
>             indefinitely
>                        again, and
>                               this time it isn't related to the maximum
>             open files.
> 
>                               Details and Background information:
>                               Faban 1.0 build 111109
>                               Olio Java 0.2
>                               The workload is being driven from physical
>             Windows client
>                               against 2 VMs on a system that is doing
>             many other tasks.
> 
>                               Usage Info:
>                               Physical Client usage : Avg ~ 22.93%, Max
>             ~ 42.15%
>                               Total System under test Utilization ~95%
>                               Web avg util = 5.3%, Max =40.83% (during
>             manual reload)
>                               DB   avg uili = 4.2%, Max = 55.6% (during
>             auto reload)
> 
>                               Granted, the system under test is near
>             saturation,
>                        the client
>                               isn't.  I'm not sure why the harness is never
>                        exiting.  Even if
>                               the VMs or even the system under test gets so
>                        saturated that
>                               they can't respond to requests, shouldn't
>             the test,
>                        which is
>                               running on the under utilized client
>             finish regardless,
>                               reporting whatever results it can?
>              Shanti, you had
>                        previously
>                               asked me to file a JIRA, on this, of which
>             I forgot.
>                         I can do
>                               so now, if you'd like.  Finally, glassfish
>             appears to
>                        be stuck,
>                               it's running, but not responding to requests,
>                        probably due to
>                               the SEVERE entry in the server.log file
>             (see below).
> 
>                               Faban\Logs:
>                                agent.log = empty
>                                cmdagent.log = No issues
>                                faban.log.xml = No issues
> 
>                               Master\Logs
>                                catalina.out = No issues
>                                localhost*log*txt = No issues
>                                OlioDriver.3C\
>                                log.xml : Two types of issues,
>             doTagSearch and the
>                        catastrophic
>                               "Forcefully terminating benchmark run" :
>             attached
>                                 GlassFish logs
>                                 jvm.log : Numerous dependency_failed
>             entries :
>                        attached          server.log : Serveral SEVERE
>             entires :
>                        Most notibly one where "a
>                               signal was attempted before wait()" : Attached
>                               Any help resolving this would me much
>             appreciated,
>                               -Josh
> 
> 
> 
> 
> 
>                        --            -Josh
> 
> 
> 
> 
> 
>                --    -Josh
> 
> 
> 
> 
>                --    -Josh
> 
> 
> 
> 
>             -- 
>             -Josh
> 
> 
> 
> 
> 
>     -- 
>     -Josh
> 
> 
> 
> 
> -- 
> -Josh
>

Re: OLIO-Java harness runs forever - round 2

Posted by Joshua Schnee <jp...@gmail.com>.

Akara,

We are still experiencing this issue, across several different physical
clients (3 now).  I've asked one of our lead Java developers to review the
stack traces and although he agrees with your assessment of where the
harness, master and driver are all hung waiting for the driver to terminate.
 He said he was somewhat confused by the claim that Faban will time out an
attempted read after 30 seconds.  He believes this to be a blocking read
call, one that will wait forever for data to be returned unless it is
interrupted somehow.

He suggested putting in a 30-second read timeout on the threads at the end
of the rampdown to stop the harness from running forever, but was also
concerned about the run never reaching the end of the rampdown period.  Does
Faban still do the rampdown phase if a run is being aborted?

Also, could you possibly tell me what block of code in Faban assures the
timeout constraint?  He thought he might be able to work backwards from
there...

Thanks,
Joshua

On Tue, Feb 2, 2010 at 11:53 AM, Joshua Schnee <jp...@gmail.com> wrote:

> Akara,
>
> We don't know of anything that is specific to our configuration, but one
> thought is that it has something to do with our running it under a Windows
> client.  We'll dig a bit deeper, but if you have any ideas, they would also
> be appreciated.
>
> Thanks,
> Joshua
>
>
> On Tue, Feb 2, 2010 at 10:30 AM, Akara Sucharitakul <
> Akara.Sucharitakul@sun.com> wrote:
>
>> Josh,
>>
>> I've checked the stacks and it turns out all driver threads are waiting on
>> socket read (reading from the server). Faban has a default read timeout of
>> 30 seconds and interrupts the thread and therefore the I/O at 2 minutes
>> after the rampdown has ended. So it should not run forever.
>>
>> Is there anything in your configuration that causes the socket not to
>> timeout according to contract or be non-interruptible? Otherwise I couldn't
>> see how it could run forever.
>>
>> Here are some details:
>>
>> 1. pid 608 seems to be an old stale instance of the harness that does not
>> want to terminate. Please kill it off.
>> 2. pid 2504 is a driver agent. All threads are waiting on socket read.
>> 3. pid 4508 is the master. It is waiting for the driver agent to finish up
>> the threads (in a join).
>> 4. pid 4080 is the harness. It is waiting for the master to terminate.
>>
>>
>> Thanks,
>> -Akara
>>
>> Joshua Schnee wrote:
>>
>>> Turns out I didn't get as much info as what is probably needed.  Here's
>>> an updated zip file with more dumps...
>>>
>>> Thanks,
>>>
>>>
>>> On Mon, Feb 1, 2010 at 5:45 PM, Joshua Schnee <jpschnee@gmail.com<mailto:
>>> jpschnee@gmail.com>> wrote:
>>>
>>>    Just realized the list didn't get copied.
>>>
>>>
>>>    ---------- Forwarded message ----------
>>>    From: *Joshua Schnee* <jpschnee@gmail.com <mailto:jpschnee@gmail.com
>>> >>
>>>    Date: Mon, Feb 1, 2010 at 1:46 PM
>>>    Subject: Re: OLIO-Java harness runs forever - round 2
>>>    To: Akara Sucharitakul <Akara.Sucharitakul@sun.com
>>>    <ma...@sun.com>>
>>>
>>>
>>>    Thanks,
>>>
>>>    Here's two PIDs I dumped, let me know if these aren't the ones you
>>>    want as there are several.
>>>
>>>    Thanks,
>>>    Joshua
>>>
>>>
>>>    On Mon, Feb 1, 2010 at 1:12 PM, Akara Sucharitakul
>>>    <Akara.Sucharitakul@sun.com <ma...@sun.com>>
>>> wrote:
>>>
>>>        kill -QUIT <pid>
>>>
>>>        or
>>>
>>>        $JAVA_HOME/bin/jstack <pid>
>>>
>>>        -Akara
>>>
>>>        Joshua Schnee wrote:
>>>
>>>            Can you tell me how to do this?  I'm not sure how do this
>>>            when the harness doesn't hit an exception...
>>>
>>>            Thanks,
>>>            Joshua
>>>
>>>            On Mon, Feb 1, 2010 at 12:17 PM, Akara Sucharitakul
>>>            <Akara.Sucharitakul@sun.com
>>>            <ma...@sun.com>
>>>            <mailto:Akara.Sucharitakul@sun.com
>>>            <ma...@sun.com>>> wrote:
>>>
>>>               Can you please obtain me a stack dump of the Faban
>>>            harness? If it
>>>               hangs somewhere, we'll be able to diagnose better. Thanks.
>>>
>>>               -Akara
>>>
>>>
>>>               Joshua Schnee wrote:
>>>
>>>                   OK, so I'm seeing the harness just run indefinitely
>>>            again, and
>>>                   this time it isn't related to the maximum open files.
>>>
>>>                   Details and Background information:
>>>                   Faban 1.0 build 111109
>>>                   Olio Java 0.2
>>>                   The workload is being driven from physical Windows
>>> client
>>>                   against 2 VMs on a system that is doing many other
>>> tasks.
>>>
>>>                   Usage Info:
>>>                   Physical Client usage : Avg ~ 22.93%, Max ~ 42.15%
>>>                   Total System under test Utilization ~95%
>>>                   Web avg util = 5.3%, Max =40.83% (during manual reload)
>>>                   DB   avg uili = 4.2%, Max = 55.6% (during auto reload)
>>>
>>>                   Granted, the system under test is near saturation,
>>>            the client
>>>                   isn't.  I'm not sure why the harness is never
>>>            exiting.  Even if
>>>                   the VMs or even the system under test gets so
>>>            saturated that
>>>                   they can't respond to requests, shouldn't the test,
>>>            which is
>>>                   running on the under utilized client finish regardless,
>>>                   reporting whatever results it can?  Shanti, you had
>>>            previously
>>>                   asked me to file a JIRA, on this, of which I forgot.
>>>             I can do
>>>                   so now, if you'd like.  Finally, glassfish appears to
>>>            be stuck,
>>>                   it's running, but not responding to requests,
>>>            probably due to
>>>                   the SEVERE entry in the server.log file (see below).
>>>
>>>                   Faban\Logs:
>>>                    agent.log = empty
>>>                    cmdagent.log = No issues
>>>                    faban.log.xml = No issues
>>>
>>>                   Master\Logs
>>>                    catalina.out = No issues
>>>                    localhost*log*txt = No issues
>>>                    OlioDriver.3C\
>>>                    log.xml : Two types of issues, doTagSearch and the
>>>            catastrophic
>>>                   "Forcefully terminating benchmark run" : attached
>>>                     GlassFish logs
>>>                     jvm.log : Numerous dependency_failed entries :
>>>            attached          server.log : Serveral SEVERE entires :
>>>            Most notibly one where "a
>>>                   signal was attempted before wait()" : Attached
>>>                   Any help resolving this would me much appreciated,
>>>                   -Josh
>>>
>>>
>>>
>>>
>>>
>>>            --            -Josh
>>>
>>>
>>>
>>>
>>>
>>>    --    -Josh
>>>
>>>
>>>
>>>
>>>    --    -Josh
>>>
>>>
>>>
>>>
>>> --
>>> -Josh
>>>
>>>
>>
>
>
> --
> -Josh
>
>


-- 
-Josh

Re: OLIO-Java harness runs forever - round 2

Posted by Joshua Schnee <jp...@gmail.com>.

Akara,

We don't know of anything that is specific to our configuration, but one
thought is that it has something to do with our running it under a Windows
client.  We'll dig a bit deeper, but if you have any ideas, they would also
be appreciated.

Thanks,
Joshua

On Tue, Feb 2, 2010 at 10:30 AM, Akara Sucharitakul <
Akara.Sucharitakul@sun.com> wrote:

> Josh,
>
> I've checked the stacks and it turns out all driver threads are waiting on
> socket read (reading from the server). Faban has a default read timeout of
> 30 seconds and interrupts the thread and therefore the I/O at 2 minutes
> after the rampdown has ended. So it should not run forever.
>
> Is there anything in your configuration that causes the socket not to
> timeout according to contract or be non-interruptible? Otherwise I couldn't
> see how it could run forever.
>
> Here are some details:
>
> 1. pid 608 seems to be an old stale instance of the harness that does not
> want to terminate. Please kill it off.
> 2. pid 2504 is a driver agent. All threads are waiting on socket read.
> 3. pid 4508 is the master. It is waiting for the driver agent to finish up
> the threads (in a join).
> 4. pid 4080 is the harness. It is waiting for the master to terminate.
>
>
> Thanks,
> -Akara
>
> Joshua Schnee wrote:
>
>> Turns out I didn't get as much info as what is probably needed.  Here's an
>> updated zip file with more dumps...
>>
>> Thanks,
>>
>>
>> On Mon, Feb 1, 2010 at 5:45 PM, Joshua Schnee <jpschnee@gmail.com<mailto:
>> jpschnee@gmail.com>> wrote:
>>
>>    Just realized the list didn't get copied.
>>
>>
>>    ---------- Forwarded message ----------
>>    From: *Joshua Schnee* <jpschnee@gmail.com <ma...@gmail.com>>
>>    Date: Mon, Feb 1, 2010 at 1:46 PM
>>    Subject: Re: OLIO-Java harness runs forever - round 2
>>    To: Akara Sucharitakul <Akara.Sucharitakul@sun.com
>>    <ma...@sun.com>>
>>
>>
>>    Thanks,
>>
>>    Here's two PIDs I dumped, let me know if these aren't the ones you
>>    want as there are several.
>>
>>    Thanks,
>>    Joshua
>>
>>
>>    On Mon, Feb 1, 2010 at 1:12 PM, Akara Sucharitakul
>>    <Akara.Sucharitakul@sun.com <ma...@sun.com>>
>> wrote:
>>
>>        kill -QUIT <pid>
>>
>>        or
>>
>>        $JAVA_HOME/bin/jstack <pid>
>>
>>        -Akara
>>
>>        Joshua Schnee wrote:
>>
>>            Can you tell me how to do this?  I'm not sure how do this
>>            when the harness doesn't hit an exception...
>>
>>            Thanks,
>>            Joshua
>>
>>            On Mon, Feb 1, 2010 at 12:17 PM, Akara Sucharitakul
>>            <Akara.Sucharitakul@sun.com
>>            <ma...@sun.com>
>>            <mailto:Akara.Sucharitakul@sun.com
>>            <ma...@sun.com>>> wrote:
>>
>>               Can you please obtain me a stack dump of the Faban
>>            harness? If it
>>               hangs somewhere, we'll be able to diagnose better. Thanks.
>>
>>               -Akara
>>
>>
>>               Joshua Schnee wrote:
>>
>>                   OK, so I'm seeing the harness just run indefinitely
>>            again, and
>>                   this time it isn't related to the maximum open files.
>>
>>                   Details and Background information:
>>                   Faban 1.0 build 111109
>>                   Olio Java 0.2
>>                   The workload is being driven from physical Windows
>> client
>>                   against 2 VMs on a system that is doing many other
>> tasks.
>>
>>                   Usage Info:
>>                   Physical Client usage : Avg ~ 22.93%, Max ~ 42.15%
>>                   Total System under test Utilization ~95%
>>                   Web avg util = 5.3%, Max =40.83% (during manual reload)
>>                   DB   avg uili = 4.2%, Max = 55.6% (during auto reload)
>>
>>                   Granted, the system under test is near saturation,
>>            the client
>>                   isn't.  I'm not sure why the harness is never
>>            exiting.  Even if
>>                   the VMs or even the system under test gets so
>>            saturated that
>>                   they can't respond to requests, shouldn't the test,
>>            which is
>>                   running on the under utilized client finish regardless,
>>                   reporting whatever results it can?  Shanti, you had
>>            previously
>>                   asked me to file a JIRA, on this, of which I forgot.
>>             I can do
>>                   so now, if you'd like.  Finally, glassfish appears to
>>            be stuck,
>>                   it's running, but not responding to requests,
>>            probably due to
>>                   the SEVERE entry in the server.log file (see below).
>>
>>                   Faban\Logs:
>>                    agent.log = empty
>>                    cmdagent.log = No issues
>>                    faban.log.xml = No issues
>>
>>                   Master\Logs
>>                    catalina.out = No issues
>>                    localhost*log*txt = No issues
>>                    OlioDriver.3C\
>>                    log.xml : Two types of issues, doTagSearch and the
>>            catastrophic
>>                   "Forcefully terminating benchmark run" : attached
>>                     GlassFish logs
>>                     jvm.log : Numerous dependency_failed entries :
>>            attached          server.log : Serveral SEVERE entires :
>>            Most notibly one where "a
>>                   signal was attempted before wait()" : Attached
>>                   Any help resolving this would me much appreciated,
>>                   -Josh
>>
>>
>>
>>
>>
>>            --            -Josh
>>
>>
>>
>>
>>
>>    --    -Josh
>>
>>
>>
>>
>>    --    -Josh
>>
>>
>>
>>
>> --
>> -Josh
>>
>>
>


-- 
-Josh

Re: OLIO-Java harness runs forever - round 2

Posted by Akara Sucharitakul <Ak...@Sun.COM>.

Josh,

I've checked the stacks and it turns out all driver threads are waiting 
on socket read (reading from the server). Faban has a default read 
timeout of 30 seconds and interrupts the thread and therefore the I/O at 
2 minutes after the rampdown has ended. So it should not run forever.

Is there anything in your configuration that causes the socket not to 
timeout according to contract or be non-interruptible? Otherwise I 
couldn't see how it could run forever.

Here are some details:

1. pid 608 seems to be an old stale instance of the harness that does 
not want to terminate. Please kill it off.
2. pid 2504 is a driver agent. All threads are waiting on socket read.
3. pid 4508 is the master. It is waiting for the driver agent to finish 
up the threads (in a join).
4. pid 4080 is the harness. It is waiting for the master to terminate.

Thanks,
-Akara

Joshua Schnee wrote:
> Turns out I didn't get as much info as what is probably needed.  Here's 
> an updated zip file with more dumps...
> 
> Thanks,
> 
> 
> On Mon, Feb 1, 2010 at 5:45 PM, Joshua Schnee <jpschnee@gmail.com 
> <ma...@gmail.com>> wrote:
> 
>     Just realized the list didn't get copied.
> 
> 
>     ---------- Forwarded message ----------
>     From: *Joshua Schnee* <jpschnee@gmail.com <ma...@gmail.com>>
>     Date: Mon, Feb 1, 2010 at 1:46 PM
>     Subject: Re: OLIO-Java harness runs forever - round 2
>     To: Akara Sucharitakul <Akara.Sucharitakul@sun.com
>     <ma...@sun.com>>
> 
> 
>     Thanks,
> 
>     Here's two PIDs I dumped, let me know if these aren't the ones you
>     want as there are several.
> 
>     Thanks,
>     Joshua
> 
> 
>     On Mon, Feb 1, 2010 at 1:12 PM, Akara Sucharitakul
>     <Akara.Sucharitakul@sun.com <ma...@sun.com>> wrote:
> 
>         kill -QUIT <pid>
> 
>         or
> 
>         $JAVA_HOME/bin/jstack <pid>
> 
>         -Akara
> 
>         Joshua Schnee wrote:
> 
>             Can you tell me how to do this?  I'm not sure how do this
>             when the harness doesn't hit an exception...
> 
>             Thanks,
>             Joshua
> 
>             On Mon, Feb 1, 2010 at 12:17 PM, Akara Sucharitakul
>             <Akara.Sucharitakul@sun.com
>             <ma...@sun.com>
>             <mailto:Akara.Sucharitakul@sun.com
>             <ma...@sun.com>>> wrote:
> 
>                Can you please obtain me a stack dump of the Faban
>             harness? If it
>                hangs somewhere, we'll be able to diagnose better. Thanks.
> 
>                -Akara
> 
> 
>                Joshua Schnee wrote:
> 
>                    OK, so I'm seeing the harness just run indefinitely
>             again, and
>                    this time it isn't related to the maximum open files.
> 
>                    Details and Background information:
>                    Faban 1.0 build 111109
>                    Olio Java 0.2
>                    The workload is being driven from physical Windows client
>                    against 2 VMs on a system that is doing many other tasks.
> 
>                    Usage Info:
>                    Physical Client usage : Avg ~ 22.93%, Max ~ 42.15%
>                    Total System under test Utilization ~95%
>                    Web avg util = 5.3%, Max =40.83% (during manual reload)
>                    DB   avg uili = 4.2%, Max = 55.6% (during auto reload)
> 
>                    Granted, the system under test is near saturation,
>             the client
>                    isn't.  I'm not sure why the harness is never
>             exiting.  Even if
>                    the VMs or even the system under test gets so
>             saturated that
>                    they can't respond to requests, shouldn't the test,
>             which is
>                    running on the under utilized client finish regardless,
>                    reporting whatever results it can?  Shanti, you had
>             previously
>                    asked me to file a JIRA, on this, of which I forgot.
>              I can do
>                    so now, if you'd like.  Finally, glassfish appears to
>             be stuck,
>                    it's running, but not responding to requests,
>             probably due to
>                    the SEVERE entry in the server.log file (see below).
> 
>                    Faban\Logs:
>                     agent.log = empty
>                     cmdagent.log = No issues
>                     faban.log.xml = No issues
> 
>                    Master\Logs
>                     catalina.out = No issues
>                     localhost*log*txt = No issues
>                     OlioDriver.3C\
>                     log.xml : Two types of issues, doTagSearch and the
>             catastrophic
>                    "Forcefully terminating benchmark run" : attached
>                      GlassFish logs
>                      jvm.log : Numerous dependency_failed entries :
>             attached          server.log : Serveral SEVERE entires :
>             Most notibly one where "a
>                    signal was attempted before wait()" : Attached
>                    Any help resolving this would me much appreciated,
>                    -Josh
> 
> 
> 
> 
> 
>             -- 
>             -Josh
> 
> 
> 
> 
> 
>     -- 
>     -Josh
> 
> 
> 
> 
>     -- 
>     -Josh
> 
> 
> 
> 
> -- 
> -Josh
>

Re: OLIO-Java harness runs forever - round 2

Posted by Joshua Schnee <jp...@gmail.com>.

Turns out I didn't get as much info as what is probably needed.  Here's an
updated zip file with more dumps...

Thanks,


On Mon, Feb 1, 2010 at 5:45 PM, Joshua Schnee <jp...@gmail.com> wrote:

> Just realized the list didn't get copied.
>
>
> ---------- Forwarded message ----------
> From: Joshua Schnee <jp...@gmail.com>
> Date: Mon, Feb 1, 2010 at 1:46 PM
> Subject: Re: OLIO-Java harness runs forever - round 2
> To: Akara Sucharitakul <Ak...@sun.com>
>
>
> Thanks,
>
> Here's two PIDs I dumped, let me know if these aren't the ones you want as
> there are several.
>
> Thanks,
> Joshua
>
>
> On Mon, Feb 1, 2010 at 1:12 PM, Akara Sucharitakul <
> Akara.Sucharitakul@sun.com> wrote:
>
>> kill -QUIT <pid>
>>
>> or
>>
>> $JAVA_HOME/bin/jstack <pid>
>>
>> -Akara
>>
>> Joshua Schnee wrote:
>>
>>> Can you tell me how to do this?  I'm not sure how do this when the
>>> harness doesn't hit an exception...
>>>
>>> Thanks,
>>> Joshua
>>>
>>> On Mon, Feb 1, 2010 at 12:17 PM, Akara Sucharitakul <
>>> Akara.Sucharitakul@sun.com <ma...@sun.com>> wrote:
>>>
>>>    Can you please obtain me a stack dump of the Faban harness? If it
>>>    hangs somewhere, we'll be able to diagnose better. Thanks.
>>>
>>>    -Akara
>>>
>>>
>>>    Joshua Schnee wrote:
>>>
>>>        OK, so I'm seeing the harness just run indefinitely again, and
>>>        this time it isn't related to the maximum open files.
>>>
>>>        Details and Background information:
>>>        Faban 1.0 build 111109
>>>        Olio Java 0.2
>>>        The workload is being driven from physical Windows client
>>>        against 2 VMs on a system that is doing many other tasks.
>>>
>>>        Usage Info:
>>>        Physical Client usage : Avg ~ 22.93%, Max ~ 42.15%
>>>        Total System under test Utilization ~95%
>>>        Web avg util = 5.3%, Max =40.83% (during manual reload)
>>>        DB   avg uili = 4.2%, Max = 55.6% (during auto reload)
>>>
>>>        Granted, the system under test is near saturation, the client
>>>        isn't.  I'm not sure why the harness is never exiting.  Even if
>>>        the VMs or even the system under test gets so saturated that
>>>        they can't respond to requests, shouldn't the test, which is
>>>        running on the under utilized client finish regardless,
>>>        reporting whatever results it can?  Shanti, you had previously
>>>        asked me to file a JIRA, on this, of which I forgot.  I can do
>>>        so now, if you'd like.  Finally, glassfish appears to be stuck,
>>>        it's running, but not responding to requests, probably due to
>>>        the SEVERE entry in the server.log file (see below).
>>>
>>>        Faban\Logs:
>>>         agent.log = empty
>>>         cmdagent.log = No issues
>>>         faban.log.xml = No issues
>>>
>>>        Master\Logs
>>>         catalina.out = No issues
>>>         localhost*log*txt = No issues
>>>         OlioDriver.3C\
>>>         log.xml : Two types of issues, doTagSearch and the catastrophic
>>>        "Forcefully terminating benchmark run" : attached
>>>          GlassFish logs
>>>          jvm.log : Numerous dependency_failed entries : attached
>>>  server.log : Serveral SEVERE entires : Most notibly one where "a
>>>        signal was attempted before wait()" : Attached
>>>        Any help resolving this would me much appreciated,
>>>        -Josh
>>>
>>>
>>>
>>>
>>>
>>> --
>>> -Josh
>>>
>>>
>>
>
>
> --
> -Josh
>
>
>
>
> --
> -Josh
>
>


-- 
-Josh

Fwd: OLIO-Java harness runs forever - round 2

Posted by Joshua Schnee <jp...@gmail.com>.

Just realized the list didn't get copied.

---------- Forwarded message ----------
From: Joshua Schnee <jp...@gmail.com>
Date: Mon, Feb 1, 2010 at 1:46 PM
Subject: Re: OLIO-Java harness runs forever - round 2
To: Akara Sucharitakul <Ak...@sun.com>


Thanks,

Here's two PIDs I dumped, let me know if these aren't the ones you want as
there are several.

Thanks,
Joshua


On Mon, Feb 1, 2010 at 1:12 PM, Akara Sucharitakul <
Akara.Sucharitakul@sun.com> wrote:

> kill -QUIT <pid>
>
> or
>
> $JAVA_HOME/bin/jstack <pid>
>
> -Akara
>
> Joshua Schnee wrote:
>
>> Can you tell me how to do this?  I'm not sure how do this when the harness
>> doesn't hit an exception...
>>
>> Thanks,
>> Joshua
>>
>> On Mon, Feb 1, 2010 at 12:17 PM, Akara Sucharitakul <
>> Akara.Sucharitakul@sun.com <ma...@sun.com>> wrote:
>>
>>    Can you please obtain me a stack dump of the Faban harness? If it
>>    hangs somewhere, we'll be able to diagnose better. Thanks.
>>
>>    -Akara
>>
>>
>>    Joshua Schnee wrote:
>>
>>        OK, so I'm seeing the harness just run indefinitely again, and
>>        this time it isn't related to the maximum open files.
>>
>>        Details and Background information:
>>        Faban 1.0 build 111109
>>        Olio Java 0.2
>>        The workload is being driven from physical Windows client
>>        against 2 VMs on a system that is doing many other tasks.
>>
>>        Usage Info:
>>        Physical Client usage : Avg ~ 22.93%, Max ~ 42.15%
>>        Total System under test Utilization ~95%
>>        Web avg util = 5.3%, Max =40.83% (during manual reload)
>>        DB   avg uili = 4.2%, Max = 55.6% (during auto reload)
>>
>>        Granted, the system under test is near saturation, the client
>>        isn't.  I'm not sure why the harness is never exiting.  Even if
>>        the VMs or even the system under test gets so saturated that
>>        they can't respond to requests, shouldn't the test, which is
>>        running on the under utilized client finish regardless,
>>        reporting whatever results it can?  Shanti, you had previously
>>        asked me to file a JIRA, on this, of which I forgot.  I can do
>>        so now, if you'd like.  Finally, glassfish appears to be stuck,
>>        it's running, but not responding to requests, probably due to
>>        the SEVERE entry in the server.log file (see below).
>>
>>        Faban\Logs:
>>         agent.log = empty
>>         cmdagent.log = No issues
>>         faban.log.xml = No issues
>>
>>        Master\Logs
>>         catalina.out = No issues
>>         localhost*log*txt = No issues
>>         OlioDriver.3C\
>>         log.xml : Two types of issues, doTagSearch and the catastrophic
>>        "Forcefully terminating benchmark run" : attached
>>          GlassFish logs
>>          jvm.log : Numerous dependency_failed entries : attached
>>  server.log : Serveral SEVERE entires : Most notibly one where "a
>>        signal was attempted before wait()" : Attached
>>        Any help resolving this would me much appreciated,
>>        -Josh
>>
>>
>>
>>
>>
>> --
>> -Josh
>>
>>
>


-- 
-Josh




-- 
-Josh

Re: OLIO-Java harness runs forever - round 2

Posted by Akara Sucharitakul <Ak...@Sun.COM>.

kill -QUIT <pid>

or

$JAVA_HOME/bin/jstack <pid>

-Akara

Joshua Schnee wrote:
> Can you tell me how to do this?  I'm not sure how do this when the 
> harness doesn't hit an exception...
> 
> Thanks,
> Joshua
> 
> On Mon, Feb 1, 2010 at 12:17 PM, Akara Sucharitakul 
> <Akara.Sucharitakul@sun.com <ma...@sun.com>> wrote:
> 
>     Can you please obtain me a stack dump of the Faban harness? If it
>     hangs somewhere, we'll be able to diagnose better. Thanks.
> 
>     -Akara
> 
> 
>     Joshua Schnee wrote:
> 
>         OK, so I'm seeing the harness just run indefinitely again, and
>         this time it isn't related to the maximum open files.
> 
>         Details and Background information:
>         Faban 1.0 build 111109
>         Olio Java 0.2
>         The workload is being driven from physical Windows client
>         against 2 VMs on a system that is doing many other tasks.
> 
>         Usage Info:
>         Physical Client usage : Avg ~ 22.93%, Max ~ 42.15%
>         Total System under test Utilization ~95%
>         Web avg util = 5.3%, Max =40.83% (during manual reload)
>         DB   avg uili = 4.2%, Max = 55.6% (during auto reload)
> 
>         Granted, the system under test is near saturation, the client
>         isn't.  I'm not sure why the harness is never exiting.  Even if
>         the VMs or even the system under test gets so saturated that
>         they can't respond to requests, shouldn't the test, which is
>         running on the under utilized client finish regardless,
>         reporting whatever results it can?  Shanti, you had previously
>         asked me to file a JIRA, on this, of which I forgot.  I can do
>         so now, if you'd like.  Finally, glassfish appears to be stuck,
>         it's running, but not responding to requests, probably due to
>         the SEVERE entry in the server.log file (see below).
> 
>         Faban\Logs:
>          agent.log = empty
>          cmdagent.log = No issues
>          faban.log.xml = No issues
> 
>         Master\Logs
>          catalina.out = No issues
>          localhost*log*txt = No issues
>          OlioDriver.3C\
>          log.xml : Two types of issues, doTagSearch and the catastrophic
>         "Forcefully terminating benchmark run" : attached
>           GlassFish logs
>           jvm.log : Numerous dependency_failed entries : attached  
>         server.log : Serveral SEVERE entires : Most notibly one where "a
>         signal was attempted before wait()" : Attached
>         Any help resolving this would me much appreciated,
>         -Josh
> 
> 
> 
> 
> 
> -- 
> -Josh
>

Re: OLIO-Java harness runs forever - round 2

Posted by Joshua Schnee <jp...@gmail.com>.

Can you tell me how to do this?  I'm not sure how do this when the harness
doesn't hit an exception...

Thanks,
Joshua

On Mon, Feb 1, 2010 at 12:17 PM, Akara Sucharitakul <
Akara.Sucharitakul@sun.com> wrote:

> Can you please obtain me a stack dump of the Faban harness? If it hangs
> somewhere, we'll be able to diagnose better. Thanks.
>
> -Akara
>
>
> Joshua Schnee wrote:
>
>> OK, so I'm seeing the harness just run indefinitely again, and this time
>> it isn't related to the maximum open files.
>>
>> Details and Background information:
>> Faban 1.0 build 111109
>> Olio Java 0.2
>> The workload is being driven from physical Windows client against 2 VMs on
>> a system that is doing many other tasks.
>>
>> Usage Info:
>> Physical Client usage : Avg ~ 22.93%, Max ~ 42.15%
>> Total System under test Utilization ~95%
>> Web avg util = 5.3%, Max =40.83% (during manual reload)
>> DB   avg uili = 4.2%, Max = 55.6% (during auto reload)
>>
>> Granted, the system under test is near saturation, the client isn't.  I'm
>> not sure why the harness is never exiting.  Even if the VMs or even the
>> system under test gets so saturated that they can't respond to requests,
>> shouldn't the test, which is running on the under utilized client finish
>> regardless, reporting whatever results it can?  Shanti, you had previously
>> asked me to file a JIRA, on this, of which I forgot.  I can do so now, if
>> you'd like.  Finally, glassfish appears to be stuck, it's running, but not
>> responding to requests, probably due to the SEVERE entry in the server.log
>> file (see below).
>>
>> Faban\Logs:
>>  agent.log = empty
>>  cmdagent.log = No issues
>>  faban.log.xml = No issues
>>
>> Master\Logs
>>  catalina.out = No issues
>>  localhost*log*txt = No issues
>>  OlioDriver.3C\
>>  log.xml : Two types of issues, doTagSearch and the catastrophic
>> "Forcefully terminating benchmark run" : attached
>>   GlassFish logs
>>   jvm.log : Numerous dependency_failed entries : attached   server.log :
>> Serveral SEVERE entires : Most notibly one where "a signal was attempted
>> before wait()" : Attached
>> Any help resolving this would me much appreciated,
>> -Josh
>>
>>
>


-- 
-Josh

Re: OLIO-Java harness runs forever - round 2

Posted by Akara Sucharitakul <Ak...@Sun.COM>.

Can you please obtain me a stack dump of the Faban harness? If it hangs 
somewhere, we'll be able to diagnose better. Thanks.

-Akara

Joshua Schnee wrote:
> OK, so I'm seeing the harness just run indefinitely again, and this time 
> it isn't related to the maximum open files.
> 
> Details and Background information:
> Faban 1.0 build 111109
> Olio Java 0.2
> The workload is being driven from physical Windows client against 2 VMs 
> on a system that is doing many other tasks.
> 
> Usage Info:
> Physical Client usage : Avg ~ 22.93%, Max ~ 42.15%
> Total System under test Utilization ~95%
> Web avg util = 5.3%, Max =40.83% (during manual reload)
> DB   avg uili = 4.2%, Max = 55.6% (during auto reload)
> 
> Granted, the system under test is near saturation, the client isn't. 
>  I'm not sure why the harness is never exiting.  Even if the VMs or even 
> the system under test gets so saturated that they can't respond to 
> requests, shouldn't the test, which is running on the under utilized 
> client finish regardless, reporting whatever results it can?  Shanti, 
> you had previously asked me to file a JIRA, on this, of which I forgot. 
>  I can do so now, if you'd like.  Finally, glassfish appears to be 
> stuck, it's running, but not responding to requests, probably due to the 
> SEVERE entry in the server.log file (see below).
> 
> Faban\Logs:
>   agent.log = empty
>   cmdagent.log = No issues
>   faban.log.xml = No issues
> 
> Master\Logs
>   catalina.out = No issues
>   localhost*log*txt = No issues
>   
> OlioDriver.3C\
>   log.xml : Two types of issues, doTagSearch and the catastrophic 
> "Forcefully terminating benchmark run" : attached
>    
> GlassFish logs
>    jvm.log : Numerous dependency_failed entries : attached 
>    server.log : Serveral SEVERE entires : Most notibly one where "a 
> signal was attempted before wait()" : Attached 
> 
> Any help resolving this would me much appreciated,
> -Josh
>