You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by You Hoken <ho...@gmail.com> on 2014/01/21 07:23:31 UTC

Stopping ExecSource takes very long time (about 6 hours)

Hi,

I am using ExecSource to execute resident shell program via "rsh" command.
The resident shell program is simple program which doing "tail" log file
put in server (AIX) being "rsh".

Flume: 1.3.1
JDK: 1.6.0
Linux executing Flume (ExecSource): SUSE Linux Enterprise Server 11 SP2
AIX: V5.2

In this case, when I stop flume, took very long time (about 6 hours) to
stop ExecSource.

The details are as follows.
It took about 6 hours between (1) and (2).
(1) INFO  [node-shutdownHook] (org.apache.flume.source.ExecSource.stop:178)
     - Stopping exec source with command:rsh serverXXX sh YYY.sh
(2) INFO  [pool-4-thread-1] (org.apache.flume.source.ExecSource$ExecRunnable
     .run:307)  - Command rsh serverXXX sh YYY.sh] exited with 0

This happened always....
I guess TCP keepalive setting under OS (SUSE linux) affect this situation.
But still I don't know why takes 6 hours to stop ExecSource.

So, to find the cause, I debuged these process and result is the followings.
   1. ExecSource#stop:Process#destroy
   2. ExecSource#stop:Process#waitFor (start waiting for response No.1)
   3. ExecSource#run :Process#getErrorStream
   4. ExecSource#run :Process#destroy
   5. ExecSource#run :Process#waitFor (start waiting for response No.4)
   6. ExecSource#run :Process#waitFor (end waiting for response No.4)
   7. ExecSource#stop:Process#waitFor (end waiting for response No.1)

You can see that No.5 terminates before No.2.
It seems thread safety (synchronized (process)) is invalid, I think.
Is this execution order correct ?
Do you think this execution order caused my problem ?

by debugging, now I am sure the followings.
1.two threads (ExecSource#stop and ExecSource#run) are executed at the same
time
2.ExecSource#stop seems to wait for response at Process#waitFor after
   java.lang.Process#destroy
3.after Process#getErrorStream, ExecSource#run seems to wait for response
at
   Process#waitFor after java.lang.Process#destroy

In the above, I am worried if standard error from external process were
outputted after destroying, buffer overflow in client side might be caused
for
deadlock at Process#waitFor.

So, I think that reading standard error had better be done in other thead
before executing waitFor (after executing destroy at ExecSource#stop).

How do you think ?

regards,

YOU