You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by You Hoken <ho...@gmail.com> on 2014/01/21 07:23:31 UTC
Stopping ExecSource takes very long time (about 6 hours)
Hi,
I am using ExecSource to execute resident shell program via "rsh" command.
The resident shell program is simple program which doing "tail" log file
put in server (AIX) being "rsh".
Flume: 1.3.1
JDK: 1.6.0
Linux executing Flume (ExecSource): SUSE Linux Enterprise Server 11 SP2
AIX: V5.2
In this case, when I stop flume, took very long time (about 6 hours) to
stop ExecSource.
The details are as follows.
It took about 6 hours between (1) and (2).
(1) INFO [node-shutdownHook] (org.apache.flume.source.ExecSource.stop:178)
- Stopping exec source with command:rsh serverXXX sh YYY.sh
(2) INFO [pool-4-thread-1] (org.apache.flume.source.ExecSource$ExecRunnable
.run:307) - Command rsh serverXXX sh YYY.sh] exited with 0
This happened always....
I guess TCP keepalive setting under OS (SUSE linux) affect this situation.
But still I don't know why takes 6 hours to stop ExecSource.
So, to find the cause, I debuged these process and result is the followings.
1. ExecSource#stop:Process#destroy
2. ExecSource#stop:Process#waitFor (start waiting for response No.1)
3. ExecSource#run :Process#getErrorStream
4. ExecSource#run :Process#destroy
5. ExecSource#run :Process#waitFor (start waiting for response No.4)
6. ExecSource#run :Process#waitFor (end waiting for response No.4)
7. ExecSource#stop:Process#waitFor (end waiting for response No.1)
You can see that No.5 terminates before No.2.
It seems thread safety (synchronized (process)) is invalid, I think.
Is this execution order correct ?
Do you think this execution order caused my problem ?
by debugging, now I am sure the followings.
1.two threads (ExecSource#stop and ExecSource#run) are executed at the same
time
2.ExecSource#stop seems to wait for response at Process#waitFor after
java.lang.Process#destroy
3.after Process#getErrorStream, ExecSource#run seems to wait for response
at
Process#waitFor after java.lang.Process#destroy
In the above, I am worried if standard error from external process were
outputted after destroying, buffer overflow in client side might be caused
for
deadlock at Process#waitFor.
So, I think that reading standard error had better be done in other thead
before executing waitFor (after executing destroy at ExecSource#stop).
How do you think ?
regards,
YOU