You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Bolke de Bruin <bd...@gmail.com> on 2017/06/27 19:56:43 UTC

Airflow profiling

Just saw this tool on hacker news:

https://github.com/stackimpact/stackimpact-python <https://github.com/stackimpact/stackimpact-python>

Might be interesting for some profiling.

Bolke

Re: Airflow profiling

Posted by Ruslan Dautkhanov <da...@gmail.com>.
Maybe not as powerful as stackImpact, another nice option:
https://www.jetbrains.com/help/pycharm/profiler.html



-- 
Ruslan Dautkhanov

On Tue, Jun 27, 2017 at 2:40 PM, Alex Guziel <alex.guziel@airbnb.com.invalid
> wrote:

> Yeah, actually we have setup Newrelic for Airflow too at Airbnb, which
> gives decent insights into webserver perf. In terms of SQL queries, adding
> `echo=True` to the SQLAlchemy engine creation is pretty good for seeing
> which sql queries get created. I tried some Python profilers before but
> they weren't super helpful.
>
> On Tue, Jun 27, 2017 at 1:27 PM, Maxime Beauchemin <
> maximebeauchemin@gmail.com> wrote:
>
> > Nice. It would be great if DAG parsing was faster, and some of the
> > endpoints on the website have grown really slow as you we've grown the
> > number of DAGs, and on the DAGs with large number of tasks.
> >
> > I had the intuition that DAG parsing could be faster if operators
> > late-imported hooks (who themselves import external libs) but I have no
> > evidence or test to support it.
> >
> > I'm sure there's tons of low hanging fruit and this type of tool should
> > make it really clear.
> >
> > We've set up NewRelic (which seems similar as this tooling at first
> sight)
> > for Superset at Airbnb and it gave us great insight.
> >
> > Max
> >
> > On Tue, Jun 27, 2017 at 1:01 PM, Bolke de Bruin <bd...@gmail.com>
> wrote:
> >
> > > Free version also there, maybe more integration testing and
> benchmarking.
> > >
> > > https://stackimpact.com/pricing/ <https://stackimpact.com/pricing/>
> > >
> > > B.
> > >
> > > > On 27 Jun 2017, at 22:00, Chris Riccomini <cr...@apache.org>
> > wrote:
> > > >
> > > > Seems you have to pay?
> > > >
> > > > On Tue, Jun 27, 2017 at 12:56 PM, Bolke de Bruin <bd...@gmail.com>
> > > wrote:
> > > >
> > > >> Just saw this tool on hacker news:
> > > >>
> > > >> https://github.com/stackimpact/stackimpact-python <
> > https://github.com/
> > > >> stackimpact/stackimpact-python>
> > > >>
> > > >> Might be interesting for some profiling.
> > > >>
> > > >> Bolke
> > >
> > >
> >
>

Re: Airflow profiling

Posted by Matt Davis <ji...@gmail.com>.
It's not useful for online profiling, but to get a profile of specific
things using the builtin cProfile module I've leveraged the airflow test
CLI command, there's an example here:
https://gist.github.com/jiffyclub/a3204998c8190abd8ea50090c30347fa The
context here was profiling why a specific DAG was taking a long time to
build.

On Wed, Jun 28, 2017 at 9:45 AM Bolke de Bruin <bd...@gmail.com> wrote:

> Are you able to share some of the results/insights from this? Particularly
> on Airflow’s internals of course.
>
> Bolke
>
> > On 27 Jun 2017, at 22:40, Alex Guziel <al...@airbnb.com.INVALID>
> wrote:
> >
> > Yeah, actually we have setup Newrelic for Airflow too at Airbnb, which
> > gives decent insights into webserver perf. In terms of SQL queries,
> adding
> > `echo=True` to the SQLAlchemy engine creation is pretty good for seeing
> > which sql queries get created. I tried some Python profilers before but
> > they weren't super helpful.
> >
> > On Tue, Jun 27, 2017 at 1:27 PM, Maxime Beauchemin <
> > maximebeauchemin@gmail.com> wrote:
> >
> >> Nice. It would be great if DAG parsing was faster, and some of the
> >> endpoints on the website have grown really slow as you we've grown the
> >> number of DAGs, and on the DAGs with large number of tasks.
> >>
> >> I had the intuition that DAG parsing could be faster if operators
> >> late-imported hooks (who themselves import external libs) but I have no
> >> evidence or test to support it.
> >>
> >> I'm sure there's tons of low hanging fruit and this type of tool should
> >> make it really clear.
> >>
> >> We've set up NewRelic (which seems similar as this tooling at first
> sight)
> >> for Superset at Airbnb and it gave us great insight.
> >>
> >> Max
> >>
> >> On Tue, Jun 27, 2017 at 1:01 PM, Bolke de Bruin <bd...@gmail.com>
> wrote:
> >>
> >>> Free version also there, maybe more integration testing and
> benchmarking.
> >>>
> >>> https://stackimpact.com/pricing/ <https://stackimpact.com/pricing/>
> >>>
> >>> B.
> >>>
> >>>> On 27 Jun 2017, at 22:00, Chris Riccomini <cr...@apache.org>
> >> wrote:
> >>>>
> >>>> Seems you have to pay?
> >>>>
> >>>> On Tue, Jun 27, 2017 at 12:56 PM, Bolke de Bruin <bd...@gmail.com>
> >>> wrote:
> >>>>
> >>>>> Just saw this tool on hacker news:
> >>>>>
> >>>>> https://github.com/stackimpact/stackimpact-python <
> >> https://github.com/
> >>>>> stackimpact/stackimpact-python>
> >>>>>
> >>>>> Might be interesting for some profiling.
> >>>>>
> >>>>> Bolke
> >>>
> >>>
> >>
>
>

Re: Airflow profiling

Posted by Bolke de Bruin <bd...@gmail.com>.
Are you able to share some of the results/insights from this? Particularly on Airflow’s internals of course.

Bolke

> On 27 Jun 2017, at 22:40, Alex Guziel <al...@airbnb.com.INVALID> wrote:
> 
> Yeah, actually we have setup Newrelic for Airflow too at Airbnb, which
> gives decent insights into webserver perf. In terms of SQL queries, adding
> `echo=True` to the SQLAlchemy engine creation is pretty good for seeing
> which sql queries get created. I tried some Python profilers before but
> they weren't super helpful.
> 
> On Tue, Jun 27, 2017 at 1:27 PM, Maxime Beauchemin <
> maximebeauchemin@gmail.com> wrote:
> 
>> Nice. It would be great if DAG parsing was faster, and some of the
>> endpoints on the website have grown really slow as you we've grown the
>> number of DAGs, and on the DAGs with large number of tasks.
>> 
>> I had the intuition that DAG parsing could be faster if operators
>> late-imported hooks (who themselves import external libs) but I have no
>> evidence or test to support it.
>> 
>> I'm sure there's tons of low hanging fruit and this type of tool should
>> make it really clear.
>> 
>> We've set up NewRelic (which seems similar as this tooling at first sight)
>> for Superset at Airbnb and it gave us great insight.
>> 
>> Max
>> 
>> On Tue, Jun 27, 2017 at 1:01 PM, Bolke de Bruin <bd...@gmail.com> wrote:
>> 
>>> Free version also there, maybe more integration testing and benchmarking.
>>> 
>>> https://stackimpact.com/pricing/ <https://stackimpact.com/pricing/>
>>> 
>>> B.
>>> 
>>>> On 27 Jun 2017, at 22:00, Chris Riccomini <cr...@apache.org>
>> wrote:
>>>> 
>>>> Seems you have to pay?
>>>> 
>>>> On Tue, Jun 27, 2017 at 12:56 PM, Bolke de Bruin <bd...@gmail.com>
>>> wrote:
>>>> 
>>>>> Just saw this tool on hacker news:
>>>>> 
>>>>> https://github.com/stackimpact/stackimpact-python <
>> https://github.com/
>>>>> stackimpact/stackimpact-python>
>>>>> 
>>>>> Might be interesting for some profiling.
>>>>> 
>>>>> Bolke
>>> 
>>> 
>> 


Re: Airflow profiling

Posted by Alex Guziel <al...@airbnb.com.INVALID>.
Yeah, actually we have setup Newrelic for Airflow too at Airbnb, which
gives decent insights into webserver perf. In terms of SQL queries, adding
`echo=True` to the SQLAlchemy engine creation is pretty good for seeing
which sql queries get created. I tried some Python profilers before but
they weren't super helpful.

On Tue, Jun 27, 2017 at 1:27 PM, Maxime Beauchemin <
maximebeauchemin@gmail.com> wrote:

> Nice. It would be great if DAG parsing was faster, and some of the
> endpoints on the website have grown really slow as you we've grown the
> number of DAGs, and on the DAGs with large number of tasks.
>
> I had the intuition that DAG parsing could be faster if operators
> late-imported hooks (who themselves import external libs) but I have no
> evidence or test to support it.
>
> I'm sure there's tons of low hanging fruit and this type of tool should
> make it really clear.
>
> We've set up NewRelic (which seems similar as this tooling at first sight)
> for Superset at Airbnb and it gave us great insight.
>
> Max
>
> On Tue, Jun 27, 2017 at 1:01 PM, Bolke de Bruin <bd...@gmail.com> wrote:
>
> > Free version also there, maybe more integration testing and benchmarking.
> >
> > https://stackimpact.com/pricing/ <https://stackimpact.com/pricing/>
> >
> > B.
> >
> > > On 27 Jun 2017, at 22:00, Chris Riccomini <cr...@apache.org>
> wrote:
> > >
> > > Seems you have to pay?
> > >
> > > On Tue, Jun 27, 2017 at 12:56 PM, Bolke de Bruin <bd...@gmail.com>
> > wrote:
> > >
> > >> Just saw this tool on hacker news:
> > >>
> > >> https://github.com/stackimpact/stackimpact-python <
> https://github.com/
> > >> stackimpact/stackimpact-python>
> > >>
> > >> Might be interesting for some profiling.
> > >>
> > >> Bolke
> >
> >
>

Re: Airflow profiling

Posted by Maxime Beauchemin <ma...@gmail.com>.
Nice. It would be great if DAG parsing was faster, and some of the
endpoints on the website have grown really slow as you we've grown the
number of DAGs, and on the DAGs with large number of tasks.

I had the intuition that DAG parsing could be faster if operators
late-imported hooks (who themselves import external libs) but I have no
evidence or test to support it.

I'm sure there's tons of low hanging fruit and this type of tool should
make it really clear.

We've set up NewRelic (which seems similar as this tooling at first sight)
for Superset at Airbnb and it gave us great insight.

Max

On Tue, Jun 27, 2017 at 1:01 PM, Bolke de Bruin <bd...@gmail.com> wrote:

> Free version also there, maybe more integration testing and benchmarking.
>
> https://stackimpact.com/pricing/ <https://stackimpact.com/pricing/>
>
> B.
>
> > On 27 Jun 2017, at 22:00, Chris Riccomini <cr...@apache.org> wrote:
> >
> > Seems you have to pay?
> >
> > On Tue, Jun 27, 2017 at 12:56 PM, Bolke de Bruin <bd...@gmail.com>
> wrote:
> >
> >> Just saw this tool on hacker news:
> >>
> >> https://github.com/stackimpact/stackimpact-python <https://github.com/
> >> stackimpact/stackimpact-python>
> >>
> >> Might be interesting for some profiling.
> >>
> >> Bolke
>
>

Re: Airflow profiling

Posted by Bolke de Bruin <bd...@gmail.com>.
Free version also there, maybe more integration testing and benchmarking.

https://stackimpact.com/pricing/ <https://stackimpact.com/pricing/>

B.

> On 27 Jun 2017, at 22:00, Chris Riccomini <cr...@apache.org> wrote:
> 
> Seems you have to pay?
> 
> On Tue, Jun 27, 2017 at 12:56 PM, Bolke de Bruin <bd...@gmail.com> wrote:
> 
>> Just saw this tool on hacker news:
>> 
>> https://github.com/stackimpact/stackimpact-python <https://github.com/
>> stackimpact/stackimpact-python>
>> 
>> Might be interesting for some profiling.
>> 
>> Bolke


Re: Airflow profiling

Posted by Chris Riccomini <cr...@apache.org>.
Seems you have to pay?

On Tue, Jun 27, 2017 at 12:56 PM, Bolke de Bruin <bd...@gmail.com> wrote:

> Just saw this tool on hacker news:
>
> https://github.com/stackimpact/stackimpact-python <https://github.com/
> stackimpact/stackimpact-python>
>
> Might be interesting for some profiling.
>
> Bolke