You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by ma...@columbia.edu, ma...@columbia.edu on 2018/04/30 19:16:58 UTC

How to consolidate log files?

Hi Mailing List,

Is there a way to consolidate airflow task logs by stdout and stderr? Currently, the structure is something like /logs/taskname/task_run_date/log1.txt which is the log for a particular taskname at a particular run date.  What I would like is two large log files for all tasks, something like /logs/errors.txt and /logs/outputs.txt  Which would contain all the stderr and stdout messages for all runs of all tasks regardless of run date. I essentially want two very large log files.  For example, if I have task A and task B, instead of having two directories and then subdirectories for A and B, I would just like two files one with errors from A and B and one for outputs from A and B. Does airflow provide this information?

Thanks!

Re: How to consolidate log files?

Posted by James Meickle <jm...@quantopian.com>.
I suspect that what you actually want here is to run an external log
ingestion service (e.g. ELK stack), and watch the log directories on each
worker. They are very hierarchically laid out so it would be easy to grab
what you're looking for and tag them appropriately, then look at them in a
UI that is better suited to displaying logs from many concurrent sources.

On Mon, Apr 30, 2018 at 6:18 PM, Ruiqin Yang <yr...@gmail.com> wrote:

> AFAIK, airflow doesn't provide log in this way. Multiple tasks would run in
> different processes and potentially in parallel, thus writing to the same
> file at run time would produce log file with mix log lines from different
> tasks. Also I believe airflow now does not seperate stdour and stderr, they
> all go to same place. Not sure if there's a good point in code to
> consolidate the logs from different tasks. Maybe you can have a separate
> script/service to do the log consolidate job since the log structure and
> format are known.
>
> Cheers,
> Kevin Y
>
> On Mon, Apr 30, 2018 at 12:16 PM, mad2271@columbia.edu <
> mad2271@columbia.edu
> > wrote:
>
> > Hi Mailing List,
> >
> > Is there a way to consolidate airflow task logs by stdout and stderr?
> > Currently, the structure is something like /logs/taskname/task_run_date/
> log1.txt
> > which is the log for a particular taskname at a particular run date.
> What
> > I would like is two large log files for all tasks, something like
> > /logs/errors.txt and /logs/outputs.txt  Which would contain all the
> stderr
> > and stdout messages for all runs of all tasks regardless of run date. I
> > essentially want two very large log files.  For example, if I have task A
> > and task B, instead of having two directories and then subdirectories
> for A
> > and B, I would just like two files one with errors from A and B and one
> for
> > outputs from A and B. Does airflow provide this information?
> >
> > Thanks!
> >
>

Re: How to consolidate log files?

Posted by Ruiqin Yang <yr...@gmail.com>.
AFAIK, airflow doesn't provide log in this way. Multiple tasks would run in
different processes and potentially in parallel, thus writing to the same
file at run time would produce log file with mix log lines from different
tasks. Also I believe airflow now does not seperate stdour and stderr, they
all go to same place. Not sure if there's a good point in code to
consolidate the logs from different tasks. Maybe you can have a separate
script/service to do the log consolidate job since the log structure and
format are known.

Cheers,
Kevin Y

On Mon, Apr 30, 2018 at 12:16 PM, mad2271@columbia.edu <mad2271@columbia.edu
> wrote:

> Hi Mailing List,
>
> Is there a way to consolidate airflow task logs by stdout and stderr?
> Currently, the structure is something like /logs/taskname/task_run_date/log1.txt
> which is the log for a particular taskname at a particular run date.  What
> I would like is two large log files for all tasks, something like
> /logs/errors.txt and /logs/outputs.txt  Which would contain all the stderr
> and stdout messages for all runs of all tasks regardless of run date. I
> essentially want two very large log files.  For example, if I have task A
> and task B, instead of having two directories and then subdirectories for A
> and B, I would just like two files one with errors from A and B and one for
> outputs from A and B. Does airflow provide this information?
>
> Thanks!
>