You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@airflow.apache.org by Reed Villanueva <rv...@ucera.org> on 2019/12/09 20:48:10 UTC

Airflow scheduler complains no heartbeat when running daemon

Have problem where the airflow (v1.10.5) webserver will complain...

The scheduler does not appear to be running. Last heartbeat was received 45
minutes ago.

But checking the scheduler daemon process (started via airflow scheduler -D)
can see...

[airflow@airflowetl airflow]$ cat
airflow-scheduler.pid64186[airflow@airflowetl airflow]$ ps -aux | grep
64186
airflow   64186  0.0  0.1 663340 67796 ?        S    15:03   0:00
/usr/bin/python3 /home/airflow/.local/bin/airflow scheduler -D
airflow   94305  0.0  0.0 112716   964 pts/4    R+   16:01   0:00 grep
--color=auto 64186

and after some period of time the error message *goes away again*).

This happens very frequently off-and-on even after restarting both the
webserver and scheduler.

The airflow-scheduler.err file is empty and the .out and .log files appear
innocuous (need more time to look through deeper).

Running the scheduler in the terminal to see the feed live, everything
seems to run fine until I see this output in the middle of the dag execution

[2019-11-29 15:51:57,825] {__init__.py:51} INFO - Using executor
SequentialExecutor[2019-11-29 15:51:58,259] {dagbag.py:90} INFO -
Filling up the DagBag from /home/airflow/airflow/dags/my_dag_file.py

Once this pops up, I can see in the web UI that the scheduler heartbeat
error message appears. (Oddly, killing the scheduler process here does not
generate the heartbeat error message in the web UI). Checking for the
scheduler process, I see...

[airflow@airflowetl airflow]$ ps -aux | grep scheduler
airflow    3409  0.2  0.1 523336 67384 ?        S    Oct24 115:06
airflow scheduler -- DagFileProcessorManager
airflow   25569  0.0  0.0 112716   968 pts/4    S+   16:00   0:00 grep
--color=auto scheduler
airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09
airflow scheduler -- DagFileProcessorManager
airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00
airflow scheduler -- DagFileProcessorManager
airflow  153959  0.1  0.1 662568 67232 ?        S    15:01   0:06
airflow scheduler -- DagFileProcessorManager

IDK if this is this normal or not.

Thought the problem may have been that there were older scheduler processes
that were not deleted that were still running...

[airflow@airflowetl airflow]$ kill -9 3409 36771
bash: kill: (36771) - No such process[airflow@airflowetl airflow]$ ps
-aux | grep scheduler
airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09
airflow scheduler -- DagFileProcessorManager
airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00
airflow scheduler -- DagFileProcessorManager
airflow  153959  0.0  0.1 662568 67232 ?        S    Nov29   0:06
airflow scheduler -- DagFileProcessorManager
airflow  155741  0.0  0.0 112712   968 pts/2    R+   15:54   0:00 grep
--color=auto scheduler

Notice all the various start times in the output.

Doing a kill -9 56771 64187 ... and then rerunning airflow scheduler -D does
not seem to have fixed the problem.

Note: the scheduler seems to consistently stop running after a task fails
to move a file from an FTP location to an HDFS one...

hadoop fs -Dfs.mapr.trace=debug -get \
        ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV" \
        $PROJECT_HOME/tmp/"$TABLENAME.TSV" \
        | hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV"
"$DATASTORE"# see https://stackoverflow.com/a/46433847/8236733

Note there *is* a logic error in this line since $DATASTORE is a hdfs dir
path, not a file path, but either way I don't think that the airflow
scheduler should be missing heartbeats like this from something seemingly
so unrelated.

Anyone know what could be going on here or how to fix?

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

Re: Airflow scheduler complains no heartbeat when running daemon

Posted by Reed Villanueva <rv...@ucera.org>.
I see, interesting.
Is this correlation mentioned anywhere in the airflow docs (for future
reference)? Any idea why it only popped up when the task errored (because
still using SequentialExecutor, but no longer seeing the error after fixing
the task code error)?

On Tue, Dec 10, 2019 at 10:50 AM Ash Berlin-Taylor <as...@apache.org> wrote:

> Oh - don't use SequentialExecutor! It blocks the scheduler from
> heartbeating when running tasks! So if a task takes longer than the
> scheduler heart beat to run you'll see that message.
>
> I would suggest switching to the LocalExecutor instead.
>
> On 10 Dec 2019, at 19:52, Reed Villanueva <rv...@ucera.org> wrote:
>
> I think that the multiple other scheduler DagFileProcessorManagers, were
> just from previous times when I would run the dag, the task with the
> apparent offending code would run, then the scheduler heartbeat error would
> pop up, and I'd restart the scheduler via a "airflow scheduler -D" command
> (when I guess they were not really killed, just missing heartbeats for
> whatever reason). So it's not like starting the scheduler would do
> something weird like start multiple of them or anything like that.
>
> Haven't seen anything else unusual about the dag files (to me each of the
> tasks are all pretty simple and short) and since implementing the
> previously-mentioned change, have not seen the heartbeat error again. I do
> think it's weird too considering how unrelated I would think the scheduler
> heartbeat would be to an HDFS file not found error.
>
> Running airflow with the SequentialExecutor, but not sure what you mean by
> "process supervisor". Example?
>
> On Tue, Dec 10, 2019 at 3:54 AM Ash Berlin-Taylor <as...@apache.org> wrote:
>
>> Hmm, having more than one DagFileProcessorManager alive at the same time
>> does indicate something has gone wrong -- there should only be one of those.
>>
>> Are you using sub-dags or doing anything else "unusual" in any of your
>> dag files?
>>
>> What executor are you using? What process supervisor (if any) are you
>> using to run your scheduler.
>>
>> -ash
>>
>> On 9 Dec 2019, at 20:48, Reed Villanueva <rv...@ucera.org> wrote:
>>
>> Have problem where the airflow (v1.10.5) webserver will complain...
>>
>> The scheduler does not appear to be running. Last heartbeat was received
>> 45 minutes ago.
>>
>> But checking the scheduler daemon process (started via airflow scheduler
>> -D) can see...
>>
>> [airflow@airflowetl airflow]$ cat airflow-scheduler.pid64186[airflow@airflowetl airflow]$ ps -aux | grep 64186
>> airflow   64186  0.0  0.1 663340 67796 ?        S    15:03   0:00 /usr/bin/python3 /home/airflow/.local/bin/airflow scheduler -D
>> airflow   94305  0.0  0.0 112716   964 pts/4    R+   16:01   0:00 grep --color=auto 64186
>>
>> and after some period of time the error message *goes away again*).
>>
>> This happens very frequently off-and-on even after restarting both the
>> webserver and scheduler.
>>
>> The airflow-scheduler.err file is empty and the .out and .log files
>> appear innocuous (need more time to look through deeper).
>>
>> Running the scheduler in the terminal to see the feed live, everything
>> seems to run fine until I see this output in the middle of the dag execution
>>
>> [2019-11-29 15:51:57,825] {__init__.py:51} INFO - Using executor SequentialExecutor[2019-11-29 15:51:58,259] {dagbag.py:90} INFO - Filling up the DagBag from /home/airflow/airflow/dags/my_dag_file.py
>>
>> Once this pops up, I can see in the web UI that the scheduler heartbeat
>> error message appears. (Oddly, killing the scheduler process here does not
>> generate the heartbeat error message in the web UI). Checking for the
>> scheduler process, I see...
>>
>> [airflow@airflowetl airflow]$ ps -aux | grep scheduler
>> airflow    3409  0.2  0.1 523336 67384 ?        S    Oct24 115:06 airflow scheduler -- DagFileProcessorManager
>> airflow   25569  0.0  0.0 112716   968 pts/4    S+   16:00   0:00 grep --color=auto scheduler
>> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager
>> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager
>> airflow  153959  0.1  0.1 662568 67232 ?        S    15:01   0:06 airflow scheduler -- DagFileProcessorManager
>>
>> IDK if this is this normal or not.
>>
>> Thought the problem may have been that there were older scheduler
>> processes that were not deleted that were still running...
>>
>> [airflow@airflowetl airflow]$ kill -9 3409 36771
>> bash: kill: (36771) - No such process[airflow@airflowetl airflow]$ ps -aux | grep scheduler
>> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager
>> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager
>> airflow  153959  0.0  0.1 662568 67232 ?        S    Nov29   0:06 airflow scheduler -- DagFileProcessorManager
>> airflow  155741  0.0  0.0 112712   968 pts/2    R+   15:54   0:00 grep --color=auto scheduler
>>
>> Notice all the various start times in the output.
>>
>> Doing a kill -9 56771 64187 ... and then rerunning airflow scheduler -D does
>> not seem to have fixed the problem.
>>
>> Note: the scheduler seems to consistently stop running after a task fails
>> to move a file from an FTP location to an HDFS one...
>>
>> hadoop fs -Dfs.mapr.trace=debug -get \
>>         ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV" \
>>         $PROJECT_HOME/tmp/"$TABLENAME.TSV" \
>>         | hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV" "$DATASTORE"# see https://stackoverflow.com/a/46433847/8236733
>>
>> Note there *is* a logic error in this line since $DATASTORE is a hdfs
>> dir path, not a file path, but either way I don't think that the airflow
>> scheduler should be missing heartbeats like this from something seemingly
>> so unrelated.
>>
>> Anyone know what could be going on here or how to fix?
>>
>> This electronic message is intended only for the named
>> recipient, and may contain information that is confidential or
>> privileged. If you are not the intended recipient, you are
>> hereby notified that any disclosure, copying, distribution or
>> use of the contents of this message is strictly prohibited. If
>> you have received this message in error or are not the named
>> recipient, please notify us immediately by contacting the
>> sender at the electronic mail address noted above, and delete
>> and destroy all copies of this message. Thank you.
>>
>>
>>
> This electronic message is intended only for the named
> recipient, and may contain information that is confidential or
> privileged. If you are not the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or
> use of the contents of this message is strictly prohibited. If
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the
> sender at the electronic mail address noted above, and delete
> and destroy all copies of this message. Thank you.
>
>
>

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

Re: Airflow scheduler complains no heartbeat when running daemon

Posted by Ash Berlin-Taylor <as...@apache.org>.
Oh - don't use SequentialExecutor! It blocks the scheduler from heartbeating when running tasks! So if a task takes longer than the scheduler heart beat to run you'll see that message.

I would suggest switching to the LocalExecutor instead.

> On 10 Dec 2019, at 19:52, Reed Villanueva <rv...@ucera.org> wrote:
> 
> I think that the multiple other scheduler DagFileProcessorManagers, were just from previous times when I would run the dag, the task with the apparent offending code would run, then the scheduler heartbeat error would pop up, and I'd restart the scheduler via a "airflow scheduler -D" command (when I guess they were not really killed, just missing heartbeats for whatever reason). So it's not like starting the scheduler would do something weird like start multiple of them or anything like that.
> 
> Haven't seen anything else unusual about the dag files (to me each of the tasks are all pretty simple and short) and since implementing the previously-mentioned change, have not seen the heartbeat error again. I do think it's weird too considering how unrelated I would think the scheduler heartbeat would be to an HDFS file not found error.
> 
> Running airflow with the SequentialExecutor, but not sure what you mean by "process supervisor". Example?
> 
> On Tue, Dec 10, 2019 at 3:54 AM Ash Berlin-Taylor <ash@apache.org <ma...@apache.org>> wrote:
> Hmm, having more than one DagFileProcessorManager alive at the same time does indicate something has gone wrong -- there should only be one of those.
> 
> Are you using sub-dags or doing anything else "unusual" in any of your dag files?
> 
> What executor are you using? What process supervisor (if any) are you using to run your scheduler.
> 
> -ash
> 
>> On 9 Dec 2019, at 20:48, Reed Villanueva <rvillanueva@ucera.org <ma...@ucera.org>> wrote:
>> 
>> Have problem where the airflow (v1.10.5) webserver will complain...
>> 
>> The scheduler does not appear to be running. Last heartbeat was received 45 minutes ago.
>> But checking the scheduler daemon process (started via airflow scheduler -D) can see...
>> 
>> [airflow@airflowetl airflow]$ cat airflow-scheduler.pid
>> 64186
>> [airflow@airflowetl airflow]$ ps -aux | grep 64186
>> airflow   64186  0.0  0.1 663340 67796 ?        S    15:03   0:00 /usr/bin/python3 /home/airflow/.local/bin/airflow scheduler -D
>> airflow   94305  0.0  0.0 112716   964 pts/4    R+   16:01   0:00 grep --color=auto 64186
>> and after some period of time the error message goes away again).
>> 
>> This happens very frequently off-and-on even after restarting both the webserver and scheduler.
>> 
>> The airflow-scheduler.err file is empty and the .out and .log files appear innocuous (need more time to look through deeper).
>> 
>> Running the scheduler in the terminal to see the feed live, everything seems to run fine until I see this output in the middle of the dag execution
>> 
>> [2019-11-29 15:51:57,825] {__init__.py:51} INFO - Using executor SequentialExecutor
>> [2019-11-29 15:51:58,259] {dagbag.py:90} INFO - Filling up the DagBag from /home/airflow/airflow/dags/my_dag_file.py
>> Once this pops up, I can see in the web UI that the scheduler heartbeat error message appears. (Oddly, killing the scheduler process here does not generate the heartbeat error message in the web UI). Checking for the scheduler process, I see...
>> 
>> [airflow@airflowetl airflow]$ ps -aux | grep scheduler
>> airflow    3409  0.2  0.1 523336 67384 ?        S    Oct24 115:06 airflow scheduler -- DagFileProcessorManager
>> airflow   25569  0.0  0.0 112716   968 pts/4    S+   16:00   0:00 grep --color=auto scheduler
>> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager
>> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager
>> airflow  153959  0.1  0.1 662568 67232 ?        S    15:01   0:06 airflow scheduler -- DagFileProcessorManager
>> IDK if this is this normal or not.
>> 
>> Thought the problem may have been that there were older scheduler processes that were not deleted that were still running...
>> 
>> [airflow@airflowetl airflow]$ kill -9 3409 36771
>> bash: kill: (36771) - No such process
>> [airflow@airflowetl airflow]$ ps -aux | grep scheduler
>> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager
>> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager
>> airflow  153959  0.0  0.1 662568 67232 ?        S    Nov29   0:06 airflow scheduler -- DagFileProcessorManager
>> airflow  155741  0.0  0.0 112712   968 pts/2    R+   15:54   0:00 grep --color=auto scheduler
>> Notice all the various start times in the output.
>> 
>> Doing a kill -9 56771 64187 ... and then rerunning airflow scheduler -D does not seem to have fixed the problem.
>> 
>> Note: the scheduler seems to consistently stop running after a task fails to move a file from an FTP location to an HDFS one...
>> 
>> hadoop fs -Dfs.mapr.trace=debug -get \
>>         ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV" \
>>         $PROJECT_HOME/tmp/"$TABLENAME.TSV" \
>>         | hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV" "$DATASTORE"
>> # see https://stackoverflow.com/a/46433847/8236733 <https://stackoverflow.com/a/46433847/8236733>
>> Note there is a logic error in this line since $DATASTORE is a hdfs dir path, not a file path, but either way I don't think that the airflow scheduler should be missing heartbeats like this from something seemingly so unrelated.
>> 
>> Anyone know what could be going on here or how to fix?
>> 
>> 
>> This electronic message is intended only for the named 
>> recipient, and may contain information that is confidential or 
>> privileged. If you are not the intended recipient, you are 
>> hereby notified that any disclosure, copying, distribution or 
>> use of the contents of this message is strictly prohibited. If 
>> you have received this message in error or are not the named
>> recipient, please notify us immediately by contacting the 
>> sender at the electronic mail address noted above, and delete 
>> and destroy all copies of this message. Thank you.
> 
> 
> This electronic message is intended only for the named 
> recipient, and may contain information that is confidential or 
> privileged. If you are not the intended recipient, you are 
> hereby notified that any disclosure, copying, distribution or 
> use of the contents of this message is strictly prohibited. If 
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the 
> sender at the electronic mail address noted above, and delete 
> and destroy all copies of this message. Thank you.


Re: Airflow scheduler complains no heartbeat when running daemon

Posted by Reed Villanueva <rv...@ucera.org>.
I think that the multiple other scheduler DagFileProcessorManagers, were
just from previous times when I would run the dag, the task with the
apparent offending code would run, then the scheduler heartbeat error would
pop up, and I'd restart the scheduler via a "airflow scheduler -D" command
(when I guess they were not really killed, just missing heartbeats for
whatever reason). So it's not like starting the scheduler would do
something weird like start multiple of them or anything like that.

Haven't seen anything else unusual about the dag files (to me each of the
tasks are all pretty simple and short) and since implementing the
previously-mentioned change, have not seen the heartbeat error again. I do
think it's weird too considering how unrelated I would think the scheduler
heartbeat would be to an HDFS file not found error.

Running airflow with the SequentialExecutor, but not sure what you mean by
"process supervisor". Example?

On Tue, Dec 10, 2019 at 3:54 AM Ash Berlin-Taylor <as...@apache.org> wrote:

> Hmm, having more than one DagFileProcessorManager alive at the same time
> does indicate something has gone wrong -- there should only be one of those.
>
> Are you using sub-dags or doing anything else "unusual" in any of your dag
> files?
>
> What executor are you using? What process supervisor (if any) are you
> using to run your scheduler.
>
> -ash
>
> On 9 Dec 2019, at 20:48, Reed Villanueva <rv...@ucera.org> wrote:
>
> Have problem where the airflow (v1.10.5) webserver will complain...
>
> The scheduler does not appear to be running. Last heartbeat was received
> 45 minutes ago.
>
> But checking the scheduler daemon process (started via airflow scheduler
> -D) can see...
>
> [airflow@airflowetl airflow]$ cat airflow-scheduler.pid64186[airflow@airflowetl airflow]$ ps -aux | grep 64186
> airflow   64186  0.0  0.1 663340 67796 ?        S    15:03   0:00 /usr/bin/python3 /home/airflow/.local/bin/airflow scheduler -D
> airflow   94305  0.0  0.0 112716   964 pts/4    R+   16:01   0:00 grep --color=auto 64186
>
> and after some period of time the error message *goes away again*).
>
> This happens very frequently off-and-on even after restarting both the
> webserver and scheduler.
>
> The airflow-scheduler.err file is empty and the .out and .log files
> appear innocuous (need more time to look through deeper).
>
> Running the scheduler in the terminal to see the feed live, everything
> seems to run fine until I see this output in the middle of the dag execution
>
> [2019-11-29 15:51:57,825] {__init__.py:51} INFO - Using executor SequentialExecutor[2019-11-29 15:51:58,259] {dagbag.py:90} INFO - Filling up the DagBag from /home/airflow/airflow/dags/my_dag_file.py
>
> Once this pops up, I can see in the web UI that the scheduler heartbeat
> error message appears. (Oddly, killing the scheduler process here does not
> generate the heartbeat error message in the web UI). Checking for the
> scheduler process, I see...
>
> [airflow@airflowetl airflow]$ ps -aux | grep scheduler
> airflow    3409  0.2  0.1 523336 67384 ?        S    Oct24 115:06 airflow scheduler -- DagFileProcessorManager
> airflow   25569  0.0  0.0 112716   968 pts/4    S+   16:00   0:00 grep --color=auto scheduler
> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager
> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager
> airflow  153959  0.1  0.1 662568 67232 ?        S    15:01   0:06 airflow scheduler -- DagFileProcessorManager
>
> IDK if this is this normal or not.
>
> Thought the problem may have been that there were older scheduler
> processes that were not deleted that were still running...
>
> [airflow@airflowetl airflow]$ kill -9 3409 36771
> bash: kill: (36771) - No such process[airflow@airflowetl airflow]$ ps -aux | grep scheduler
> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager
> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager
> airflow  153959  0.0  0.1 662568 67232 ?        S    Nov29   0:06 airflow scheduler -- DagFileProcessorManager
> airflow  155741  0.0  0.0 112712   968 pts/2    R+   15:54   0:00 grep --color=auto scheduler
>
> Notice all the various start times in the output.
>
> Doing a kill -9 56771 64187 ... and then rerunning airflow scheduler -D does
> not seem to have fixed the problem.
>
> Note: the scheduler seems to consistently stop running after a task fails
> to move a file from an FTP location to an HDFS one...
>
> hadoop fs -Dfs.mapr.trace=debug -get \
>         ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV" \
>         $PROJECT_HOME/tmp/"$TABLENAME.TSV" \
>         | hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV" "$DATASTORE"# see https://stackoverflow.com/a/46433847/8236733
>
> Note there *is* a logic error in this line since $DATASTORE is a hdfs dir
> path, not a file path, but either way I don't think that the airflow
> scheduler should be missing heartbeats like this from something seemingly
> so unrelated.
>
> Anyone know what could be going on here or how to fix?
>
> This electronic message is intended only for the named
> recipient, and may contain information that is confidential or
> privileged. If you are not the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or
> use of the contents of this message is strictly prohibited. If
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the
> sender at the electronic mail address noted above, and delete
> and destroy all copies of this message. Thank you.
>
>
>

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

Re: Airflow scheduler complains no heartbeat when running daemon

Posted by Ash Berlin-Taylor <as...@apache.org>.
Hmm, having more than one DagFileProcessorManager alive at the same time does indicate something has gone wrong -- there should only be one of those.

Are you using sub-dags or doing anything else "unusual" in any of your dag files?

What executor are you using? What process supervisor (if any) are you using to run your scheduler.

-ash

> On 9 Dec 2019, at 20:48, Reed Villanueva <rv...@ucera.org> wrote:
> 
> Have problem where the airflow (v1.10.5) webserver will complain...
> 
> The scheduler does not appear to be running. Last heartbeat was received 45 minutes ago.
> But checking the scheduler daemon process (started via airflow scheduler -D) can see...
> 
> [airflow@airflowetl airflow]$ cat airflow-scheduler.pid
> 64186
> [airflow@airflowetl airflow]$ ps -aux | grep 64186
> airflow   64186  0.0  0.1 663340 67796 ?        S    15:03   0:00 /usr/bin/python3 /home/airflow/.local/bin/airflow scheduler -D
> airflow   94305  0.0  0.0 112716   964 pts/4    R+   16:01   0:00 grep --color=auto 64186
> and after some period of time the error message goes away again).
> 
> This happens very frequently off-and-on even after restarting both the webserver and scheduler.
> 
> The airflow-scheduler.err file is empty and the .out and .log files appear innocuous (need more time to look through deeper).
> 
> Running the scheduler in the terminal to see the feed live, everything seems to run fine until I see this output in the middle of the dag execution
> 
> [2019-11-29 15:51:57,825] {__init__.py:51} INFO - Using executor SequentialExecutor
> [2019-11-29 15:51:58,259] {dagbag.py:90} INFO - Filling up the DagBag from /home/airflow/airflow/dags/my_dag_file.py
> Once this pops up, I can see in the web UI that the scheduler heartbeat error message appears. (Oddly, killing the scheduler process here does not generate the heartbeat error message in the web UI). Checking for the scheduler process, I see...
> 
> [airflow@airflowetl airflow]$ ps -aux | grep scheduler
> airflow    3409  0.2  0.1 523336 67384 ?        S    Oct24 115:06 airflow scheduler -- DagFileProcessorManager
> airflow   25569  0.0  0.0 112716   968 pts/4    S+   16:00   0:00 grep --color=auto scheduler
> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager
> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager
> airflow  153959  0.1  0.1 662568 67232 ?        S    15:01   0:06 airflow scheduler -- DagFileProcessorManager
> IDK if this is this normal or not.
> 
> Thought the problem may have been that there were older scheduler processes that were not deleted that were still running...
> 
> [airflow@airflowetl airflow]$ kill -9 3409 36771
> bash: kill: (36771) - No such process
> [airflow@airflowetl airflow]$ ps -aux | grep scheduler
> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager
> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager
> airflow  153959  0.0  0.1 662568 67232 ?        S    Nov29   0:06 airflow scheduler -- DagFileProcessorManager
> airflow  155741  0.0  0.0 112712   968 pts/2    R+   15:54   0:00 grep --color=auto scheduler
> Notice all the various start times in the output.
> 
> Doing a kill -9 56771 64187 ... and then rerunning airflow scheduler -D does not seem to have fixed the problem.
> 
> Note: the scheduler seems to consistently stop running after a task fails to move a file from an FTP location to an HDFS one...
> 
> hadoop fs -Dfs.mapr.trace=debug -get \
>         ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV" \
>         $PROJECT_HOME/tmp/"$TABLENAME.TSV" \
>         | hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV" "$DATASTORE"
> # see https://stackoverflow.com/a/46433847/8236733 <https://stackoverflow.com/a/46433847/8236733>
> Note there is a logic error in this line since $DATASTORE is a hdfs dir path, not a file path, but either way I don't think that the airflow scheduler should be missing heartbeats like this from something seemingly so unrelated.
> 
> Anyone know what could be going on here or how to fix?
> 
> 
> This electronic message is intended only for the named 
> recipient, and may contain information that is confidential or 
> privileged. If you are not the intended recipient, you are 
> hereby notified that any disclosure, copying, distribution or 
> use of the contents of this message is strictly prohibited. If 
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the 
> sender at the electronic mail address noted above, and delete 
> and destroy all copies of this message. Thank you.


Re: Airflow scheduler complains no heartbeat when running daemon

Posted by Reed Villanueva <rv...@ucera.org>.
Damian,
None of the tasks in the dag take that long.
The task that appears to be coinciding with the scheduler error only takes
about 1-3min. on another server that I am trying to migrate the dag from
(which uses airflow v1.9.0 as opposed to 1.10..0 here on the new server)
and manually running the bash commands in the terminal that the task is
programmed to do only takes about 1min. as well.

On Mon, Dec 9, 2019 at 11:24 AM Shaw, Damian P. <
damian.shaw.2@credit-suisse.com> wrote:

> If you’re using the Sequential Executor maybe some of your tasks are
> taking longer than 45 mins?
>
>
>
> For the sake of task isolation and many other reasons it’s very helpful to
> use a different Executor, first step is Local Executor and then maybe
> something more fancy.
>
>
>
> Damian
>
>
>
> *From:* Reed Villanueva <rv...@ucera.org>
> *Sent:* Monday, December 9, 2019 3:48 PM
> *To:* users@airflow.apache.org
> *Subject:* Airflow scheduler complains no heartbeat when running daemon
>
>
>
> Have problem where the airflow (v1.10.5) webserver will complain...
>
> The scheduler does not appear to be running. Last heartbeat was received
> 45 minutes ago.
>
> But checking the scheduler daemon process (started via airflow scheduler
> -D) can see...
>
> [airflow@airflowetl airflow]$ cat airflow-scheduler.pid
>
> 64186
>
> [airflow@airflowetl airflow]$ ps -aux | grep 64186
>
> airflow   64186  0.0  0.1 663340 67796 ?        S    15:03   0:00 /usr/bin/python3 /home/airflow/.local/bin/airflow scheduler -D
>
> airflow   94305  0.0  0.0 112716   964 pts/4    R+   16:01   0:00 grep --color=auto 64186
>
> and after some period of time the error message *goes away again*).
>
> This happens very frequently off-and-on even after restarting both the
> webserver and scheduler.
>
> The airflow-scheduler.err file is empty and the .out and .log files
> appear innocuous (need more time to look through deeper).
>
> Running the scheduler in the terminal to see the feed live, everything
> seems to run fine until I see this output in the middle of the dag execution
>
> [2019-11-29 15:51:57,825] {__init__.py:51} INFO - Using executor SequentialExecutor
>
> [2019-11-29 15:51:58,259] {dagbag.py:90} INFO - Filling up the DagBag from /home/airflow/airflow/dags/my_dag_file.py
>
> Once this pops up, I can see in the web UI that the scheduler heartbeat
> error message appears. (Oddly, killing the scheduler process here does not
> generate the heartbeat error message in the web UI). Checking for the
> scheduler process, I see...
>
> [airflow@airflowetl airflow]$ ps -aux | grep scheduler
>
> airflow    3409  0.2  0.1 523336 67384 ?        S    Oct24 115:06 airflow scheduler -- DagFileProcessorManager
>
> airflow   25569  0.0  0.0 112716   968 pts/4    S+   16:00   0:00 grep --color=auto scheduler
>
> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager
>
> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager
>
> airflow  153959  0.1  0.1 662568 67232 ?        S    15:01   0:06 airflow scheduler -- DagFileProcessorManager
>
> IDK if this is this normal or not.
>
> Thought the problem may have been that there were older scheduler
> processes that were not deleted that were still running...
>
> [airflow@airflowetl airflow]$ kill -9 3409 36771
>
> bash: kill: (36771) - No such process
>
> [airflow@airflowetl airflow]$ ps -aux | grep scheduler
>
> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager
>
> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager
>
> airflow  153959  0.0  0.1 662568 67232 ?        S    Nov29   0:06 airflow scheduler -- DagFileProcessorManager
>
> airflow  155741  0.0  0.0 112712   968 pts/2    R+   15:54   0:00 grep --color=auto scheduler
>
> Notice all the various start times in the output.
>
> Doing a kill -9 56771 64187 ... and then rerunning airflow scheduler -D does
> not seem to have fixed the problem.
>
> Note: the scheduler seems to consistently stop running after a task fails
> to move a file from an FTP location to an HDFS one...
>
> hadoop fs -Dfs.mapr.trace=debug -get \
>
>         ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV" \
>
>         $PROJECT_HOME/tmp/"$TABLENAME.TSV" \
>
>         | hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV" "$DATASTORE"
>
> # see https://stackoverflow.com/a/46433847/8236733
>
> Note there *is* a logic error in this line since $DATASTORE is a hdfs dir
> path, not a file path, but either way I don't think that the airflow
> scheduler should be missing heartbeats like this from something seemingly
> so unrelated.
>
> Anyone know what could be going on here or how to fix?
>
>
> This electronic message is intended only for the named
> recipient, and may contain information that is confidential or
> privileged. If you are not the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or
> use of the contents of this message is strictly prohibited. If
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the
> sender at the electronic mail address noted above, and delete
> and destroy all copies of this message. Thank you.
>
>
>
>
> ==============================================================================
> Please access the attached hyperlink for an important electronic
> communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>
> ==============================================================================
>

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

RE: Airflow scheduler complains no heartbeat when running daemon

Posted by "Shaw, Damian P. " <da...@credit-suisse.com>.
If you’re using the Sequential Executor maybe some of your tasks are taking longer than 45 mins?

For the sake of task isolation and many other reasons it’s very helpful to use a different Executor, first step is Local Executor and then maybe something more fancy.

Damian

From: Reed Villanueva <rv...@ucera.org>
Sent: Monday, December 9, 2019 3:48 PM
To: users@airflow.apache.org
Subject: Airflow scheduler complains no heartbeat when running daemon


Have problem where the airflow (v1.10.5) webserver will complain...

The scheduler does not appear to be running. Last heartbeat was received 45 minutes ago.

But checking the scheduler daemon process (started via airflow scheduler -D) can see...

[airflow@airflowetl airflow]$ cat airflow-scheduler.pid

64186

[airflow@airflowetl airflow]$ ps -aux | grep 64186

airflow   64186  0.0  0.1 663340 67796 ?        S    15:03   0:00 /usr/bin/python3 /home/airflow/.local/bin/airflow scheduler -D

airflow   94305  0.0  0.0 112716   964 pts/4    R+   16:01   0:00 grep --color=auto 64186

and after some period of time the error message goes away again).

This happens very frequently off-and-on even after restarting both the webserver and scheduler.

The airflow-scheduler.err file is empty and the .out and .log files appear innocuous (need more time to look through deeper).

Running the scheduler in the terminal to see the feed live, everything seems to run fine until I see this output in the middle of the dag execution

[2019-11-29 15:51:57,825] {__init__.py:51} INFO - Using executor SequentialExecutor

[2019-11-29 15:51:58,259] {dagbag.py:90} INFO - Filling up the DagBag from /home/airflow/airflow/dags/my_dag_file.py

Once this pops up, I can see in the web UI that the scheduler heartbeat error message appears. (Oddly, killing the scheduler process here does not generate the heartbeat error message in the web UI). Checking for the scheduler process, I see...

[airflow@airflowetl airflow]$ ps -aux | grep scheduler

airflow    3409  0.2  0.1 523336 67384 ?        S    Oct24 115:06 airflow scheduler -- DagFileProcessorManager

airflow   25569  0.0  0.0 112716   968 pts/4    S+   16:00   0:00 grep --color=auto scheduler

airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager

airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager

airflow  153959  0.1  0.1 662568 67232 ?        S    15:01   0:06 airflow scheduler -- DagFileProcessorManager

IDK if this is this normal or not.

Thought the problem may have been that there were older scheduler processes that were not deleted that were still running...

[airflow@airflowetl airflow]$ kill -9 3409 36771

bash: kill: (36771) - No such process

[airflow@airflowetl airflow]$ ps -aux | grep scheduler

airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager

airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager

airflow  153959  0.0  0.1 662568 67232 ?        S    Nov29   0:06 airflow scheduler -- DagFileProcessorManager

airflow  155741  0.0  0.0 112712   968 pts/2    R+   15:54   0:00 grep --color=auto scheduler

Notice all the various start times in the output.

Doing a kill -9 56771 64187 ... and then rerunning airflow scheduler -D does not seem to have fixed the problem.

Note: the scheduler seems to consistently stop running after a task fails to move a file from an FTP location to an HDFS one...

hadoop fs -Dfs.mapr.trace=debug -get \

        ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV" \

        $PROJECT_HOME/tmp/"$TABLENAME.TSV" \

        | hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV" "$DATASTORE"

# see https://stackoverflow.com/a/46433847/8236733

Note there is a logic error in this line since $DATASTORE is a hdfs dir path, not a file path, but either way I don't think that the airflow scheduler should be missing heartbeats like this from something seemingly so unrelated.

Anyone know what could be going on here or how to fix?

This electronic message is intended only for the named
recipient, and may contain information that is confidential or
privileged. If you are not the intended recipient, you are
hereby notified that any disclosure, copying, distribution or
use of the contents of this message is strictly prohibited. If
you have received this message in error or are not the named
recipient, please notify us immediately by contacting the
sender at the electronic mail address noted above, and delete
and destroy all copies of this message. Thank you.



=============================================================================== 
Please access the attached hyperlink for an important electronic communications disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=============================================================================== 

Re: Airflow scheduler complains no heartbeat when running daemon

Posted by Reed Villanueva <rv...@ucera.org>.
Seem to have found the problem. Had a piece of code like...

hadoop fs -Dfs.mapr.trace=debug -get \
        ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV" \
        $PROJECT_HOME/tmp/"$TABLENAME.TSV" \
        | hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV"
"$DATASTORE"

changed to

hadoop fs -Dfs.mapr.trace=debug -get \
        ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV" \
        $PROJECT_HOME/tmp/"$TABLENAME.TSV"
hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV" "$DATASTORE"

IDK why, but there seems to have been a problem with using the pipe in this
way, not sure why, but I *suspect* that it has something to do with latency
issues when having to read from the local temp dir before writing to
-moveFromLocal (because would get similar 'file not found' errors when
running the commands manually in the shell when chained together with a
pipe).

Also not sure why any of this would cause the airflow scheduler to have
problems, but have run the task several times now since the change and have
not seen the scheduler error again, so not sure what to make of that.

If anyone can explain any of this weirdness please do let me know to make
this answer a bit more complete. Will continue to debug and update.

On Mon, Dec 9, 2019 at 12:17 PM Reed Villanueva <rv...@ucera.org>
wrote:

> I see thanks.
>
> Though have already tried manually restarting the scheduler, but still
> seeing the same error (ie. deleting all airflow-scheduler.* files and
> killing the scheduler -D process then running it again), so not super sure
> how setting an automated restart would help.
>
> On Mon, Dec 9, 2019 at 12:09 PM Aaron Grubb <Aa...@clearpier.com>
> wrote:
>
>> I should have been more specific. I meant set it to something low to test
>> that restarting the scheduler fixes that problem, something like an hour,
>> then if it does, increase it to the 24 hours recommended.
>>
>>
>>
>> *From:* Reed Villanueva <rv...@ucera.org>
>> *Sent:* Monday, December 9, 2019 4:59 PM
>> *To:* users@airflow.apache.org
>> *Subject:* Re: Airflow scheduler complains no heartbeat when running
>> daemon
>>
>>
>>
>> Aaron,
>>
>> Pretty new to airflow as well and curious about what your thinking is
>> where setting scheduler.run_duration to something very low would be helpful
>> here. To me it seems odd to have the scheduler restarting every, say, 30
>> seconds (also not sure how this will affect the airflow jobs that need to
>> run throughout the day). From this article (
>> https://www.astronomer.io/blog/7-common-errors-to-check-when-debugging-airflow-dag/),
>> once every 24 hours seems to be recommended.
>>
>>
>>
>> On Mon, Dec 9, 2019 at 11:14 AM Aaron Grubb <Aa...@clearpier.com>
>> wrote:
>>
>> Don’t take this at face value since I’m a novice with Airflow but my
>> understanding of best practices is to have the scheduler restart every so
>> often (cmd line: -r <seconds> or config: scheduler.run_duration =
>> <seconds>) Kill all the processes and try setting that to something low,
>> then if the problem goes away, increase it to a day or something.
>>
>>
>>
>> *From:* Reed Villanueva <rv...@ucera.org>
>> *Sent:* Monday, December 9, 2019 3:48 PM
>> *To:* users@airflow.apache.org
>> *Subject:* Airflow scheduler complains no heartbeat when running daemon
>>
>>
>>
>> Have problem where the airflow (v1.10.5) webserver will complain...
>>
>> The scheduler does not appear to be running. Last heartbeat was received
>> 45 minutes ago.
>>
>> But checking the scheduler daemon process (started via airflow scheduler
>> -D) can see...
>>
>> [airflow@airflowetl airflow]$ cat airflow-scheduler.pid
>>
>> 64186
>>
>> [airflow@airflowetl airflow]$ ps -aux | grep 64186
>>
>> airflow   64186  0.0  0.1 663340 67796 ?        S    15:03   0:00 /usr/bin/python3 /home/airflow/.local/bin/airflow scheduler -D
>>
>> airflow   94305  0.0  0.0 112716   964 pts/4    R+   16:01   0:00 grep --color=auto 64186
>>
>> and after some period of time the error message *goes away again*).
>>
>> This happens very frequently off-and-on even after restarting both the
>> webserver and scheduler.
>>
>> The airflow-scheduler.err file is empty and the .out and .log files
>> appear innocuous (need more time to look through deeper).
>>
>> Running the scheduler in the terminal to see the feed live, everything
>> seems to run fine until I see this output in the middle of the dag execution
>>
>> [2019-11-29 15:51:57,825] {__init__.py:51} INFO - Using executor SequentialExecutor
>>
>> [2019-11-29 15:51:58,259] {dagbag.py:90} INFO - Filling up the DagBag from /home/airflow/airflow/dags/my_dag_file.py
>>
>> Once this pops up, I can see in the web UI that the scheduler heartbeat
>> error message appears. (Oddly, killing the scheduler process here does not
>> generate the heartbeat error message in the web UI). Checking for the
>> scheduler process, I see...
>>
>> [airflow@airflowetl airflow]$ ps -aux | grep scheduler
>>
>> airflow    3409  0.2  0.1 523336 67384 ?        S    Oct24 115:06 airflow scheduler -- DagFileProcessorManager
>>
>> airflow   25569  0.0  0.0 112716   968 pts/4    S+   16:00   0:00 grep --color=auto scheduler
>>
>> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager
>>
>> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager
>>
>> airflow  153959  0.1  0.1 662568 67232 ?        S    15:01   0:06 airflow scheduler -- DagFileProcessorManager
>>
>> IDK if this is this normal or not.
>>
>> Thought the problem may have been that there were older scheduler
>> processes that were not deleted that were still running...
>>
>> [airflow@airflowetl airflow]$ kill -9 3409 36771
>>
>> bash: kill: (36771) - No such process
>>
>> [airflow@airflowetl airflow]$ ps -aux | grep scheduler
>>
>> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager
>>
>> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager
>>
>> airflow  153959  0.0  0.1 662568 67232 ?        S    Nov29   0:06 airflow scheduler -- DagFileProcessorManager
>>
>> airflow  155741  0.0  0.0 112712   968 pts/2    R+   15:54   0:00 grep --color=auto scheduler
>>
>> Notice all the various start times in the output.
>>
>> Doing a kill -9 56771 64187 ... and then rerunning airflow scheduler -D does
>> not seem to have fixed the problem.
>>
>> Note: the scheduler seems to consistently stop running after a task fails
>> to move a file from an FTP location to an HDFS one...
>>
>> hadoop fs -Dfs.mapr.trace=debug -get \
>>
>>         ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV" \
>>
>>         $PROJECT_HOME/tmp/"$TABLENAME.TSV" \
>>
>>         | hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV" "$DATASTORE"
>>
>> # see https://stackoverflow.com/a/46433847/8236733
>>
>> Note there *is* a logic error in this line since $DATASTORE is a hdfs
>> dir path, not a file path, but either way I don't think that the airflow
>> scheduler should be missing heartbeats like this from something seemingly
>> so unrelated.
>>
>> Anyone know what could be going on here or how to fix?
>>
>>
>> This electronic message is intended only for the named
>> recipient, and may contain information that is confidential or
>> privileged. If you are not the intended recipient, you are
>> hereby notified that any disclosure, copying, distribution or
>> use of the contents of this message is strictly prohibited. If
>> you have received this message in error or are not the named
>> recipient, please notify us immediately by contacting the
>> sender at the electronic mail address noted above, and delete
>> and destroy all copies of this message. Thank you.
>>
>>
>> This electronic message is intended only for the named
>> recipient, and may contain information that is confidential or
>> privileged. If you are not the intended recipient, you are
>> hereby notified that any disclosure, copying, distribution or
>> use of the contents of this message is strictly prohibited. If
>> you have received this message in error or are not the named
>> recipient, please notify us immediately by contacting the
>> sender at the electronic mail address noted above, and delete
>> and destroy all copies of this message. Thank you.
>>
>

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

Re: Airflow scheduler complains no heartbeat when running daemon

Posted by Reed Villanueva <rv...@ucera.org>.
I see thanks.

Though have already tried manually restarting the scheduler, but still
seeing the same error (ie. deleting all airflow-scheduler.* files and
killing the scheduler -D process then running it again), so not super sure
how setting an automated restart would help.

On Mon, Dec 9, 2019 at 12:09 PM Aaron Grubb <Aa...@clearpier.com>
wrote:

> I should have been more specific. I meant set it to something low to test
> that restarting the scheduler fixes that problem, something like an hour,
> then if it does, increase it to the 24 hours recommended.
>
>
>
> *From:* Reed Villanueva <rv...@ucera.org>
> *Sent:* Monday, December 9, 2019 4:59 PM
> *To:* users@airflow.apache.org
> *Subject:* Re: Airflow scheduler complains no heartbeat when running
> daemon
>
>
>
> Aaron,
>
> Pretty new to airflow as well and curious about what your thinking is
> where setting scheduler.run_duration to something very low would be helpful
> here. To me it seems odd to have the scheduler restarting every, say, 30
> seconds (also not sure how this will affect the airflow jobs that need to
> run throughout the day). From this article (
> https://www.astronomer.io/blog/7-common-errors-to-check-when-debugging-airflow-dag/),
> once every 24 hours seems to be recommended.
>
>
>
> On Mon, Dec 9, 2019 at 11:14 AM Aaron Grubb <Aa...@clearpier.com>
> wrote:
>
> Don’t take this at face value since I’m a novice with Airflow but my
> understanding of best practices is to have the scheduler restart every so
> often (cmd line: -r <seconds> or config: scheduler.run_duration =
> <seconds>) Kill all the processes and try setting that to something low,
> then if the problem goes away, increase it to a day or something.
>
>
>
> *From:* Reed Villanueva <rv...@ucera.org>
> *Sent:* Monday, December 9, 2019 3:48 PM
> *To:* users@airflow.apache.org
> *Subject:* Airflow scheduler complains no heartbeat when running daemon
>
>
>
> Have problem where the airflow (v1.10.5) webserver will complain...
>
> The scheduler does not appear to be running. Last heartbeat was received
> 45 minutes ago.
>
> But checking the scheduler daemon process (started via airflow scheduler
> -D) can see...
>
> [airflow@airflowetl airflow]$ cat airflow-scheduler.pid
>
> 64186
>
> [airflow@airflowetl airflow]$ ps -aux | grep 64186
>
> airflow   64186  0.0  0.1 663340 67796 ?        S    15:03   0:00 /usr/bin/python3 /home/airflow/.local/bin/airflow scheduler -D
>
> airflow   94305  0.0  0.0 112716   964 pts/4    R+   16:01   0:00 grep --color=auto 64186
>
> and after some period of time the error message *goes away again*).
>
> This happens very frequently off-and-on even after restarting both the
> webserver and scheduler.
>
> The airflow-scheduler.err file is empty and the .out and .log files
> appear innocuous (need more time to look through deeper).
>
> Running the scheduler in the terminal to see the feed live, everything
> seems to run fine until I see this output in the middle of the dag execution
>
> [2019-11-29 15:51:57,825] {__init__.py:51} INFO - Using executor SequentialExecutor
>
> [2019-11-29 15:51:58,259] {dagbag.py:90} INFO - Filling up the DagBag from /home/airflow/airflow/dags/my_dag_file.py
>
> Once this pops up, I can see in the web UI that the scheduler heartbeat
> error message appears. (Oddly, killing the scheduler process here does not
> generate the heartbeat error message in the web UI). Checking for the
> scheduler process, I see...
>
> [airflow@airflowetl airflow]$ ps -aux | grep scheduler
>
> airflow    3409  0.2  0.1 523336 67384 ?        S    Oct24 115:06 airflow scheduler -- DagFileProcessorManager
>
> airflow   25569  0.0  0.0 112716   968 pts/4    S+   16:00   0:00 grep --color=auto scheduler
>
> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager
>
> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager
>
> airflow  153959  0.1  0.1 662568 67232 ?        S    15:01   0:06 airflow scheduler -- DagFileProcessorManager
>
> IDK if this is this normal or not.
>
> Thought the problem may have been that there were older scheduler
> processes that were not deleted that were still running...
>
> [airflow@airflowetl airflow]$ kill -9 3409 36771
>
> bash: kill: (36771) - No such process
>
> [airflow@airflowetl airflow]$ ps -aux | grep scheduler
>
> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager
>
> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager
>
> airflow  153959  0.0  0.1 662568 67232 ?        S    Nov29   0:06 airflow scheduler -- DagFileProcessorManager
>
> airflow  155741  0.0  0.0 112712   968 pts/2    R+   15:54   0:00 grep --color=auto scheduler
>
> Notice all the various start times in the output.
>
> Doing a kill -9 56771 64187 ... and then rerunning airflow scheduler -D does
> not seem to have fixed the problem.
>
> Note: the scheduler seems to consistently stop running after a task fails
> to move a file from an FTP location to an HDFS one...
>
> hadoop fs -Dfs.mapr.trace=debug -get \
>
>         ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV" \
>
>         $PROJECT_HOME/tmp/"$TABLENAME.TSV" \
>
>         | hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV" "$DATASTORE"
>
> # see https://stackoverflow.com/a/46433847/8236733
>
> Note there *is* a logic error in this line since $DATASTORE is a hdfs dir
> path, not a file path, but either way I don't think that the airflow
> scheduler should be missing heartbeats like this from something seemingly
> so unrelated.
>
> Anyone know what could be going on here or how to fix?
>
>
> This electronic message is intended only for the named
> recipient, and may contain information that is confidential or
> privileged. If you are not the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or
> use of the contents of this message is strictly prohibited. If
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the
> sender at the electronic mail address noted above, and delete
> and destroy all copies of this message. Thank you.
>
>
> This electronic message is intended only for the named
> recipient, and may contain information that is confidential or
> privileged. If you are not the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or
> use of the contents of this message is strictly prohibited. If
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the
> sender at the electronic mail address noted above, and delete
> and destroy all copies of this message. Thank you.
>

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

RE: Airflow scheduler complains no heartbeat when running daemon

Posted by Aaron Grubb <Aa...@clearpier.com>.
I should have been more specific. I meant set it to something low to test that restarting the scheduler fixes that problem, something like an hour, then if it does, increase it to the 24 hours recommended.

From: Reed Villanueva <rv...@ucera.org>
Sent: Monday, December 9, 2019 4:59 PM
To: users@airflow.apache.org
Subject: Re: Airflow scheduler complains no heartbeat when running daemon

Aaron,
Pretty new to airflow as well and curious about what your thinking is where setting scheduler.run_duration to something very low would be helpful here. To me it seems odd to have the scheduler restarting every, say, 30 seconds (also not sure how this will affect the airflow jobs that need to run throughout the day). From this article (https://www.astronomer.io/blog/7-common-errors-to-check-when-debugging-airflow-dag/), once every 24 hours seems to be recommended.

On Mon, Dec 9, 2019 at 11:14 AM Aaron Grubb <Aa...@clearpier.com>> wrote:
Don’t take this at face value since I’m a novice with Airflow but my understanding of best practices is to have the scheduler restart every so often (cmd line: -r <seconds> or config: scheduler.run_duration = <seconds>) Kill all the processes and try setting that to something low, then if the problem goes away, increase it to a day or something.

From: Reed Villanueva <rv...@ucera.org>>
Sent: Monday, December 9, 2019 3:48 PM
To: users@airflow.apache.org<ma...@airflow.apache.org>
Subject: Airflow scheduler complains no heartbeat when running daemon


Have problem where the airflow (v1.10.5) webserver will complain...

The scheduler does not appear to be running. Last heartbeat was received 45 minutes ago.

But checking the scheduler daemon process (started via airflow scheduler -D) can see...

[airflow@airflowetl airflow]$ cat airflow-scheduler.pid

64186

[airflow@airflowetl airflow]$ ps -aux | grep 64186

airflow   64186  0.0  0.1 663340 67796 ?        S    15:03   0:00 /usr/bin/python3 /home/airflow/.local/bin/airflow scheduler -D

airflow   94305  0.0  0.0 112716   964 pts/4    R+   16:01   0:00 grep --color=auto 64186

and after some period of time the error message goes away again).

This happens very frequently off-and-on even after restarting both the webserver and scheduler.

The airflow-scheduler.err file is empty and the .out and .log files appear innocuous (need more time to look through deeper).

Running the scheduler in the terminal to see the feed live, everything seems to run fine until I see this output in the middle of the dag execution

[2019-11-29 15:51:57,825] {__init__.py:51} INFO - Using executor SequentialExecutor

[2019-11-29 15:51:58,259] {dagbag.py:90} INFO - Filling up the DagBag from /home/airflow/airflow/dags/my_dag_file.py

Once this pops up, I can see in the web UI that the scheduler heartbeat error message appears. (Oddly, killing the scheduler process here does not generate the heartbeat error message in the web UI). Checking for the scheduler process, I see...

[airflow@airflowetl airflow]$ ps -aux | grep scheduler

airflow    3409  0.2  0.1 523336 67384 ?        S    Oct24 115:06 airflow scheduler -- DagFileProcessorManager

airflow   25569  0.0  0.0 112716   968 pts/4    S+   16:00   0:00 grep --color=auto scheduler

airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager

airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager

airflow  153959  0.1  0.1 662568 67232 ?        S    15:01   0:06 airflow scheduler -- DagFileProcessorManager

IDK if this is this normal or not.

Thought the problem may have been that there were older scheduler processes that were not deleted that were still running...

[airflow@airflowetl airflow]$ kill -9 3409 36771

bash: kill: (36771) - No such process

[airflow@airflowetl airflow]$ ps -aux | grep scheduler

airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager

airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager

airflow  153959  0.0  0.1 662568 67232 ?        S    Nov29   0:06 airflow scheduler -- DagFileProcessorManager

airflow  155741  0.0  0.0 112712   968 pts/2    R+   15:54   0:00 grep --color=auto scheduler

Notice all the various start times in the output.

Doing a kill -9 56771 64187 ... and then rerunning airflow scheduler -D does not seem to have fixed the problem.

Note: the scheduler seems to consistently stop running after a task fails to move a file from an FTP location to an HDFS one...

hadoop fs -Dfs.mapr.trace=debug -get \

        ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV<ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR%22$TABLENAME.TSV>" \

        $PROJECT_HOME/tmp/"$TABLENAME.TSV" \

        | hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV" "$DATASTORE"

# see https://stackoverflow.com/a/46433847/8236733

Note there is a logic error in this line since $DATASTORE is a hdfs dir path, not a file path, but either way I don't think that the airflow scheduler should be missing heartbeats like this from something seemingly so unrelated.

Anyone know what could be going on here or how to fix?

This electronic message is intended only for the named
recipient, and may contain information that is confidential or
privileged. If you are not the intended recipient, you are
hereby notified that any disclosure, copying, distribution or
use of the contents of this message is strictly prohibited. If
you have received this message in error or are not the named
recipient, please notify us immediately by contacting the
sender at the electronic mail address noted above, and delete
and destroy all copies of this message. Thank you.

This electronic message is intended only for the named
recipient, and may contain information that is confidential or
privileged. If you are not the intended recipient, you are
hereby notified that any disclosure, copying, distribution or
use of the contents of this message is strictly prohibited. If
you have received this message in error or are not the named
recipient, please notify us immediately by contacting the
sender at the electronic mail address noted above, and delete
and destroy all copies of this message. Thank you.

Re: Airflow scheduler complains no heartbeat when running daemon

Posted by Reed Villanueva <rv...@ucera.org>.
Aaron,
Pretty new to airflow as well and curious about what your thinking is where
setting scheduler.run_duration to something very low would be helpful here.
To me it seems odd to have the scheduler restarting every, say, 30 seconds
(also not sure how this will affect the airflow jobs that need to run
throughout the day). From this article (
https://www.astronomer.io/blog/7-common-errors-to-check-when-debugging-airflow-dag/),
once every 24 hours seems to be recommended.

On Mon, Dec 9, 2019 at 11:14 AM Aaron Grubb <Aa...@clearpier.com>
wrote:

> Don’t take this at face value since I’m a novice with Airflow but my
> understanding of best practices is to have the scheduler restart every so
> often (cmd line: -r <seconds> or config: scheduler.run_duration =
> <seconds>) Kill all the processes and try setting that to something low,
> then if the problem goes away, increase it to a day or something.
>
>
>
> *From:* Reed Villanueva <rv...@ucera.org>
> *Sent:* Monday, December 9, 2019 3:48 PM
> *To:* users@airflow.apache.org
> *Subject:* Airflow scheduler complains no heartbeat when running daemon
>
>
>
> Have problem where the airflow (v1.10.5) webserver will complain...
>
> The scheduler does not appear to be running. Last heartbeat was received
> 45 minutes ago.
>
> But checking the scheduler daemon process (started via airflow scheduler
> -D) can see...
>
> [airflow@airflowetl airflow]$ cat airflow-scheduler.pid
>
> 64186
>
> [airflow@airflowetl airflow]$ ps -aux | grep 64186
>
> airflow   64186  0.0  0.1 663340 67796 ?        S    15:03   0:00 /usr/bin/python3 /home/airflow/.local/bin/airflow scheduler -D
>
> airflow   94305  0.0  0.0 112716   964 pts/4    R+   16:01   0:00 grep --color=auto 64186
>
> and after some period of time the error message *goes away again*).
>
> This happens very frequently off-and-on even after restarting both the
> webserver and scheduler.
>
> The airflow-scheduler.err file is empty and the .out and .log files
> appear innocuous (need more time to look through deeper).
>
> Running the scheduler in the terminal to see the feed live, everything
> seems to run fine until I see this output in the middle of the dag execution
>
> [2019-11-29 15:51:57,825] {__init__.py:51} INFO - Using executor SequentialExecutor
>
> [2019-11-29 15:51:58,259] {dagbag.py:90} INFO - Filling up the DagBag from /home/airflow/airflow/dags/my_dag_file.py
>
> Once this pops up, I can see in the web UI that the scheduler heartbeat
> error message appears. (Oddly, killing the scheduler process here does not
> generate the heartbeat error message in the web UI). Checking for the
> scheduler process, I see...
>
> [airflow@airflowetl airflow]$ ps -aux | grep scheduler
>
> airflow    3409  0.2  0.1 523336 67384 ?        S    Oct24 115:06 airflow scheduler -- DagFileProcessorManager
>
> airflow   25569  0.0  0.0 112716   968 pts/4    S+   16:00   0:00 grep --color=auto scheduler
>
> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager
>
> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager
>
> airflow  153959  0.1  0.1 662568 67232 ?        S    15:01   0:06 airflow scheduler -- DagFileProcessorManager
>
> IDK if this is this normal or not.
>
> Thought the problem may have been that there were older scheduler
> processes that were not deleted that were still running...
>
> [airflow@airflowetl airflow]$ kill -9 3409 36771
>
> bash: kill: (36771) - No such process
>
> [airflow@airflowetl airflow]$ ps -aux | grep scheduler
>
> airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager
>
> airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager
>
> airflow  153959  0.0  0.1 662568 67232 ?        S    Nov29   0:06 airflow scheduler -- DagFileProcessorManager
>
> airflow  155741  0.0  0.0 112712   968 pts/2    R+   15:54   0:00 grep --color=auto scheduler
>
> Notice all the various start times in the output.
>
> Doing a kill -9 56771 64187 ... and then rerunning airflow scheduler -D does
> not seem to have fixed the problem.
>
> Note: the scheduler seems to consistently stop running after a task fails
> to move a file from an FTP location to an HDFS one...
>
> hadoop fs -Dfs.mapr.trace=debug -get \
>
>         ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV" \
>
>         $PROJECT_HOME/tmp/"$TABLENAME.TSV" \
>
>         | hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV" "$DATASTORE"
>
> # see https://stackoverflow.com/a/46433847/8236733
>
> Note there *is* a logic error in this line since $DATASTORE is a hdfs dir
> path, not a file path, but either way I don't think that the airflow
> scheduler should be missing heartbeats like this from something seemingly
> so unrelated.
>
> Anyone know what could be going on here or how to fix?
>
>
> This electronic message is intended only for the named
> recipient, and may contain information that is confidential or
> privileged. If you are not the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or
> use of the contents of this message is strictly prohibited. If
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the
> sender at the electronic mail address noted above, and delete
> and destroy all copies of this message. Thank you.
>

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

RE: Airflow scheduler complains no heartbeat when running daemon

Posted by Aaron Grubb <Aa...@clearpier.com>.
Don’t take this at face value since I’m a novice with Airflow but my understanding of best practices is to have the scheduler restart every so often (cmd line: -r <seconds> or config: scheduler.run_duration = <seconds>) Kill all the processes and try setting that to something low, then if the problem goes away, increase it to a day or something.

From: Reed Villanueva <rv...@ucera.org>
Sent: Monday, December 9, 2019 3:48 PM
To: users@airflow.apache.org
Subject: Airflow scheduler complains no heartbeat when running daemon


Have problem where the airflow (v1.10.5) webserver will complain...

The scheduler does not appear to be running. Last heartbeat was received 45 minutes ago.

But checking the scheduler daemon process (started via airflow scheduler -D) can see...

[airflow@airflowetl airflow]$ cat airflow-scheduler.pid

64186

[airflow@airflowetl airflow]$ ps -aux | grep 64186

airflow   64186  0.0  0.1 663340 67796 ?        S    15:03   0:00 /usr/bin/python3 /home/airflow/.local/bin/airflow scheduler -D

airflow   94305  0.0  0.0 112716   964 pts/4    R+   16:01   0:00 grep --color=auto 64186

and after some period of time the error message goes away again).

This happens very frequently off-and-on even after restarting both the webserver and scheduler.

The airflow-scheduler.err file is empty and the .out and .log files appear innocuous (need more time to look through deeper).

Running the scheduler in the terminal to see the feed live, everything seems to run fine until I see this output in the middle of the dag execution

[2019-11-29 15:51:57,825] {__init__.py:51} INFO - Using executor SequentialExecutor

[2019-11-29 15:51:58,259] {dagbag.py:90} INFO - Filling up the DagBag from /home/airflow/airflow/dags/my_dag_file.py

Once this pops up, I can see in the web UI that the scheduler heartbeat error message appears. (Oddly, killing the scheduler process here does not generate the heartbeat error message in the web UI). Checking for the scheduler process, I see...

[airflow@airflowetl airflow]$ ps -aux | grep scheduler

airflow    3409  0.2  0.1 523336 67384 ?        S    Oct24 115:06 airflow scheduler -- DagFileProcessorManager

airflow   25569  0.0  0.0 112716   968 pts/4    S+   16:00   0:00 grep --color=auto scheduler

airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager

airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager

airflow  153959  0.1  0.1 662568 67232 ?        S    15:01   0:06 airflow scheduler -- DagFileProcessorManager

IDK if this is this normal or not.

Thought the problem may have been that there were older scheduler processes that were not deleted that were still running...

[airflow@airflowetl airflow]$ kill -9 3409 36771

bash: kill: (36771) - No such process

[airflow@airflowetl airflow]$ ps -aux | grep scheduler

airflow   56771  0.0  0.1 662560 67264 ?        S    Nov26   4:09 airflow scheduler -- DagFileProcessorManager

airflow   64187  0.0  0.1 662564 67096 ?        S    Nov27   0:00 airflow scheduler -- DagFileProcessorManager

airflow  153959  0.0  0.1 662568 67232 ?        S    Nov29   0:06 airflow scheduler -- DagFileProcessorManager

airflow  155741  0.0  0.0 112712   968 pts/2    R+   15:54   0:00 grep --color=auto scheduler

Notice all the various start times in the output.

Doing a kill -9 56771 64187 ... and then rerunning airflow scheduler -D does not seem to have fixed the problem.

Note: the scheduler seems to consistently stop running after a task fails to move a file from an FTP location to an HDFS one...

hadoop fs -Dfs.mapr.trace=debug -get \

        ftp://$FTP_CLIENT:$FTP_PASS@$FTP_IP/$FTP_DIR"$TABLENAME.TSV" \

        $PROJECT_HOME/tmp/"$TABLENAME.TSV" \

        | hadoop fs -moveFromLocal $PROJECT_HOME/tmp/"$TABLENAME.TSV" "$DATASTORE"

# see https://stackoverflow.com/a/46433847/8236733

Note there is a logic error in this line since $DATASTORE is a hdfs dir path, not a file path, but either way I don't think that the airflow scheduler should be missing heartbeats like this from something seemingly so unrelated.

Anyone know what could be going on here or how to fix?

This electronic message is intended only for the named
recipient, and may contain information that is confidential or
privileged. If you are not the intended recipient, you are
hereby notified that any disclosure, copying, distribution or
use of the contents of this message is strictly prohibited. If
you have received this message in error or are not the named
recipient, please notify us immediately by contacting the
sender at the electronic mail address noted above, and delete
and destroy all copies of this message. Thank you.