You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by David Capwell <dc...@gmail.com> on 2017/10/03 01:02:12 UTC

Airflow stops reading stdout of forked process with BashOperator

We use the bash operator to call a Java command line. We notice that some
times the task stays running a long time (never stops) and that the logs in
airflow stop getting updated for the task. After debugging a bit it turns
out that the jvm is blocked on the stdout FD since the buffer is full. I
manually cleaned the buffer (just called cat to dump the buffer) and see
the jvm halts cleanly but the task stays stuck in airflow; airflow run is
still running but the forked process is not

Walking the code in bash_operator I see that airflow creates a shell script
than has bash run it. I see in the logs the location of the script but I
don't see it on the file system. I didn't check when the process was hung
so dont know if bash was running or not.

We have seen this a few times. Any idea what's going on? New to debugging
Python and ptrace is disabled in our env so can't find a way to get the
state of the airflow run command.

Thanks for any help!

Airflow version: 1.8.0 and 1.8.2 (above was on 1.8.2 but we see this on
1.8.0 cluster as well)

Re: Airflow stops reading stdout of forked process with BashOperator

Posted by David Capwell <dc...@gmail.com>.
Python version is 2.7.6

On Oct 4, 2017 9:52 AM, "Driesprong, Fokko" <fo...@driesprong.frl> wrote:

> Hi David,
>
> Thank you for the question. The problem for the Spark-sql hook seems
> related
> <https://issues.apache.org/jira/browse/AIRFLOW-1647>, but the issue is
> different. At the spark-sql hook, the problem was that there where two
> iterators, one for the stdout, and one for the stderr. First the one of the
> stdout was being read, and after hitting an eof, the stderr iterator would
> be emptied. This caused the stderr to grow quickly (since the stdout was
> being read first). This was fixed by redirecting the stderr to stdout and
> use a single iterator. This is already the case in the BashOperator.
>
> The iterator should be ready by Airflow, so the stdout buffer should be
> read. What version of Python are you using?
>
> Cheers, Fokko
>
>
>
>
>
> 2017-10-03 7:55 GMT+02:00 Bolke de Bruin <bd...@gmail.com>:
>
> > Probably a buffer is full or not emptied in time (as you mentioned). Ie.
> > If we’re reading from stderr but the stdout is full it gets stuck. This
> was
> > fixed for the SparkOperators but we might need to do the same here.
> >
> > Bolke
> >
> > Verstuurd vanaf mijn iPad
> >
> > > Op 3 okt. 2017 om 03:02 heeft David Capwell <dc...@gmail.com> het
> > volgende geschreven:
> > >
> > > We use the bash operator to call a Java command line. We notice that
> some
> > > times the task stays running a long time (never stops) and that the
> logs
> > in
> > > airflow stop getting updated for the task. After debugging a bit it
> turns
> > > out that the jvm is blocked on the stdout FD since the buffer is full.
> I
> > > manually cleaned the buffer (just called cat to dump the buffer) and
> see
> > > the jvm halts cleanly but the task stays stuck in airflow; airflow run
> is
> > > still running but the forked process is not
> > >
> > > Walking the code in bash_operator I see that airflow creates a shell
> > script
> > > than has bash run it. I see in the logs the location of the script but
> I
> > > don't see it on the file system. I didn't check when the process was
> hung
> > > so dont know if bash was running or not.
> > >
> > > We have seen this a few times. Any idea what's going on? New to
> debugging
> > > Python and ptrace is disabled in our env so can't find a way to get the
> > > state of the airflow run command.
> > >
> > > Thanks for any help!
> > >
> > > Airflow version: 1.8.0 and 1.8.2 (above was on 1.8.2 but we see this on
> > > 1.8.0 cluster as well)
> >
>

Re: Airflow stops reading stdout of forked process with BashOperator

Posted by "Driesprong, Fokko" <fo...@driesprong.frl>.
Hi David,

Thank you for the question. The problem for the Spark-sql hook seems related
<https://issues.apache.org/jira/browse/AIRFLOW-1647>, but the issue is
different. At the spark-sql hook, the problem was that there where two
iterators, one for the stdout, and one for the stderr. First the one of the
stdout was being read, and after hitting an eof, the stderr iterator would
be emptied. This caused the stderr to grow quickly (since the stdout was
being read first). This was fixed by redirecting the stderr to stdout and
use a single iterator. This is already the case in the BashOperator.

The iterator should be ready by Airflow, so the stdout buffer should be
read. What version of Python are you using?

Cheers, Fokko





2017-10-03 7:55 GMT+02:00 Bolke de Bruin <bd...@gmail.com>:

> Probably a buffer is full or not emptied in time (as you mentioned). Ie.
> If we’re reading from stderr but the stdout is full it gets stuck. This was
> fixed for the SparkOperators but we might need to do the same here.
>
> Bolke
>
> Verstuurd vanaf mijn iPad
>
> > Op 3 okt. 2017 om 03:02 heeft David Capwell <dc...@gmail.com> het
> volgende geschreven:
> >
> > We use the bash operator to call a Java command line. We notice that some
> > times the task stays running a long time (never stops) and that the logs
> in
> > airflow stop getting updated for the task. After debugging a bit it turns
> > out that the jvm is blocked on the stdout FD since the buffer is full. I
> > manually cleaned the buffer (just called cat to dump the buffer) and see
> > the jvm halts cleanly but the task stays stuck in airflow; airflow run is
> > still running but the forked process is not
> >
> > Walking the code in bash_operator I see that airflow creates a shell
> script
> > than has bash run it. I see in the logs the location of the script but I
> > don't see it on the file system. I didn't check when the process was hung
> > so dont know if bash was running or not.
> >
> > We have seen this a few times. Any idea what's going on? New to debugging
> > Python and ptrace is disabled in our env so can't find a way to get the
> > state of the airflow run command.
> >
> > Thanks for any help!
> >
> > Airflow version: 1.8.0 and 1.8.2 (above was on 1.8.2 but we see this on
> > 1.8.0 cluster as well)
>

Re: Airflow stops reading stdout of forked process with BashOperator

Posted by Bolke de Bruin <bd...@gmail.com>.
Probably a buffer is full or not emptied in time (as you mentioned). Ie. If we’re reading from stderr but the stdout is full it gets stuck. This was fixed for the SparkOperators but we might need to do the same here.

Bolke

Verstuurd vanaf mijn iPad

> Op 3 okt. 2017 om 03:02 heeft David Capwell <dc...@gmail.com> het volgende geschreven:
> 
> We use the bash operator to call a Java command line. We notice that some
> times the task stays running a long time (never stops) and that the logs in
> airflow stop getting updated for the task. After debugging a bit it turns
> out that the jvm is blocked on the stdout FD since the buffer is full. I
> manually cleaned the buffer (just called cat to dump the buffer) and see
> the jvm halts cleanly but the task stays stuck in airflow; airflow run is
> still running but the forked process is not
> 
> Walking the code in bash_operator I see that airflow creates a shell script
> than has bash run it. I see in the logs the location of the script but I
> don't see it on the file system. I didn't check when the process was hung
> so dont know if bash was running or not.
> 
> We have seen this a few times. Any idea what's going on? New to debugging
> Python and ptrace is disabled in our env so can't find a way to get the
> state of the airflow run command.
> 
> Thanks for any help!
> 
> Airflow version: 1.8.0 and 1.8.2 (above was on 1.8.2 but we see this on
> 1.8.0 cluster as well)