You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jarek Potiuk <ja...@potiuk.com> on 2021/07/20 16:22:57 UTC

[DISCUSSION] Should we be more explicit about SQLite using for dev only (or kill it for non-dev entirely????)

Hello Community,

Recently we had several people who complained (on slack) that airflow 2.1
is slow in scheduling tasks. After some discussion it usually turned out
that those people were using SQLite + Sequential executor. I think it gives
very bad impression to users. We even had one user who almost gave up
Airflow seeing how slow it is in scheduling tasks (!).

I think while in Airflow 1.10 the difference was not as noticeable, Airflow
2 with Postgres/MySQL is lightning fast comparing to sqlite. It's like a
different world.

First time users might get a very bad impression when their first contact
with Airflow is via sqlite + Sequential executor.

Many people choose sqlite as their first choice when they try Airflow
(Sqlite is generally seen as solid choice in many cases and people are
afraid that setting up MySQL/Postgres might take them a lot of time to
setup).

However with current Docker-Compose quilck-start by Kamil it is already
rather quick to set-up a working setup with Postgres.

My idea is - why don't we make SQLITE "development-only" choice. That would
require editable, development version of airflow to run and fail hard when
it is installed as regular package (with appropriate "Use  proper database
- MySQL/Postgres" - and MSSQL when we release MSSQL-support in 2.2 ).

I think that would be possible, it would not violate backwards
compatibility (sqlite was anyhow for development-only) and it would help
Airflow with being seen as more "snappy".

Any other ideas?

WDYT?

J.



-- 
+48 660 796 129

Re: [DISCUSSION] Should we be more explicit about SQLite using for dev only (or kill it for non-dev entirely????)

Posted by Kaxil Naik <ka...@gmail.com>.
Agreed with Andrew and others, we should not remove SQLite.

Red flashy messages are for errors, I have created
https://github.com/apache/airflow/pull/17133 to take care of show a message
on webserver.

Regards,
Kaxil

On Wed, Jul 21, 2021 at 8:14 AM Jarek Potiuk <ja...@potiuk.com> wrote:

> Yep. Good points Lars, Andrew. I think adding a BIG RED FLASHY message in
> the webserver is the way to go.
>
> It's way better than in logs  because the "Airflow is slow" perception is
> actually when you see the progress via UI and when you try to impatiently
> use "refresh" to see why it is still running :D
>
> Anyone with UI experience who might want to add such big red flashy "Hey -
> you are using Sequential executor and Sqlite - things here are many time
> slower than if you use MySQL/Postgres/MSSQL "?). Happy to review and test
> :).
>
> J.
>
>
> On Wed, Jul 21, 2021 at 8:24 AM Lars Winderling <la...@posteo.de>
> wrote:
>
>> also as a user, having sqlite around for testing e.g. in a single,
>> self-build docker-image (no compose), is reaaaally handy. I would
>> definitely miss that. I always use the upstream pip-package to align my
>> testing and prod envs, so I wouldn't like to distinguish between an
>> editable and non-editable package.
>>
>> So maybe instead we should rather check whether sqlite and/or sequential
>> executor have been configured in the airflow instance, for recognizing a
>> dev environment.
>>
>> Instead of failing hard, it would be really cool to show a hint (big,
>> red, flashy, …) directly in the UI to the users, at least to those with
>> admin-privs. we could remind them of them using the dev-setup. and then
>> also have some info text that names the disadvantages of the dev-setup like
>> being slow etc, and then hinting to an easy alternative like docker-compose
>> that one *could* use in production as well.
>>
>> I am not sure if we have this atm, but pre-configuring airflow with
>> sqlite+sequentialexecutor is key to getting new users using airflow in the
>> first place. those with low infrastructure expertise would just be scared
>> of right away, I fear. and anyone who just wants to give it a shot as well.
>>
>> Best, Lars
>>
>>
>> On 7/21/21 12:36 AM, Andrew Godwin wrote:
>>
>> We explored a similar idea with Django many years ago, and the conclusion
>> back then, which I would also put forward here, is that having a project
>> scale down to an easy developer install is of crucial importance, and so I
>> think SQLite has to stay in that role (as there is no reasonable
>> alternative, at least not yet).
>>
>> I do think it should be heavily discouraged in production installs,
>> though. If there's a way you think we can pull that off while not making
>> development annoying, I'd be all for it.
>>
>> Andrew
>>
>> On Tue, Jul 20, 2021 at 12:24 PM Shaw, Damian P. <
>> damian.shaw.2@credit-suisse.com> wrote:
>>
>>> Some thought as a user of Airflow,
>>>
>>>
>>>
>>> I wouldn’t of adopted Airflow in the first place if I couldn’t test it
>>> with sqlite. And would be the same today, accessing docker isn’t always an
>>> easy in some companies.
>>>
>>>
>>>
>>> But having a warning that sqlite is development only and much slower
>>> than other solutions when it’s enabled seems fair. Also forcing new time
>>> users to edit the config on first run I think is acceptable as they will
>>> need to get used to do that frequently anyway if they’re rolling their own
>>> install.
>>>
>>>
>>>
>>> Damian
>>>
>>>
>>>
>>> *From:* Jarek Potiuk <ja...@potiuk.com>
>>> *Sent:* Tuesday, July 20, 2021 12:23
>>> *To:* dev@airflow.apache.org
>>> *Subject:* [DISCUSSION] Should we be more explicit about SQLite using
>>> for dev only (or kill it for non-dev entirely????)
>>>
>>>
>>>
>>> Hello Community,
>>>
>>>
>>> Recently we had several people who complained (on slack) that airflow
>>> 2.1 is slow in scheduling tasks. After some discussion it usually turned
>>> out that those people were using SQLite + Sequential executor. I think it
>>> gives very bad impression to users. We even had one user who almost gave up
>>> Airflow seeing how slow it is in scheduling tasks (!).
>>>
>>> I think while in Airflow 1.10 the difference was not as noticeable,
>>> Airflow 2 with Postgres/MySQL is lightning fast comparing to sqlite. It's
>>> like a different world.
>>>
>>>
>>>
>>> First time users might get a very bad impression when their first
>>> contact with Airflow is via sqlite + Sequential executor.
>>>
>>>
>>>
>>> Many people choose sqlite as their first choice when they try Airflow
>>> (Sqlite is generally seen as solid choice in many cases and people are
>>> afraid that setting up MySQL/Postgres might take them a lot of time to
>>> setup).
>>>
>>>
>>>
>>> However with current Docker-Compose quilck-start by Kamil it is already
>>> rather quick to set-up a working setup with Postgres.
>>>
>>>
>>>
>>> My idea is - why don't we make SQLITE "development-only" choice. That
>>> would require editable, development version of airflow to run and fail hard
>>> when it is installed as regular package (with appropriate "Use  proper
>>> database - MySQL/Postgres" - and MSSQL when we release MSSQL-support in 2.2
>>> ).
>>>
>>>
>>>
>>> I think that would be possible, it would not violate backwards
>>> compatibility (sqlite was anyhow for development-only) and it would help
>>> Airflow with being seen as more "snappy".
>>>
>>>
>>>
>>> Any other ideas?
>>>
>>>
>>>
>>> WDYT?
>>>
>>>
>>>
>>> J.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> +48 660 796 129
>>>
>>>
>>> ==============================================================================
>>> Please access the attached hyperlink for an important electronic
>>> communications disclaimer:
>>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>>>
>>> ==============================================================================
>>>
>>
>>
>
> --
> +48 660 796 129
>

Re: [DISCUSSION] Should we be more explicit about SQLite using for dev only (or kill it for non-dev entirely????)

Posted by Jarek Potiuk <ja...@potiuk.com>.
Yep. Good points Lars, Andrew. I think adding a BIG RED FLASHY message in
the webserver is the way to go.

It's way better than in logs  because the "Airflow is slow" perception is
actually when you see the progress via UI and when you try to impatiently
use "refresh" to see why it is still running :D

Anyone with UI experience who might want to add such big red flashy "Hey -
you are using Sequential executor and Sqlite - things here are many time
slower than if you use MySQL/Postgres/MSSQL "?). Happy to review and test
:).

J.


On Wed, Jul 21, 2021 at 8:24 AM Lars Winderling <la...@posteo.de>
wrote:

> also as a user, having sqlite around for testing e.g. in a single,
> self-build docker-image (no compose), is reaaaally handy. I would
> definitely miss that. I always use the upstream pip-package to align my
> testing and prod envs, so I wouldn't like to distinguish between an
> editable and non-editable package.
>
> So maybe instead we should rather check whether sqlite and/or sequential
> executor have been configured in the airflow instance, for recognizing a
> dev environment.
>
> Instead of failing hard, it would be really cool to show a hint (big, red,
> flashy, …) directly in the UI to the users, at least to those with
> admin-privs. we could remind them of them using the dev-setup. and then
> also have some info text that names the disadvantages of the dev-setup like
> being slow etc, and then hinting to an easy alternative like docker-compose
> that one *could* use in production as well.
>
> I am not sure if we have this atm, but pre-configuring airflow with
> sqlite+sequentialexecutor is key to getting new users using airflow in the
> first place. those with low infrastructure expertise would just be scared
> of right away, I fear. and anyone who just wants to give it a shot as well.
>
> Best, Lars
>
>
> On 7/21/21 12:36 AM, Andrew Godwin wrote:
>
> We explored a similar idea with Django many years ago, and the conclusion
> back then, which I would also put forward here, is that having a project
> scale down to an easy developer install is of crucial importance, and so I
> think SQLite has to stay in that role (as there is no reasonable
> alternative, at least not yet).
>
> I do think it should be heavily discouraged in production installs,
> though. If there's a way you think we can pull that off while not making
> development annoying, I'd be all for it.
>
> Andrew
>
> On Tue, Jul 20, 2021 at 12:24 PM Shaw, Damian P. <
> damian.shaw.2@credit-suisse.com> wrote:
>
>> Some thought as a user of Airflow,
>>
>>
>>
>> I wouldn’t of adopted Airflow in the first place if I couldn’t test it
>> with sqlite. And would be the same today, accessing docker isn’t always an
>> easy in some companies.
>>
>>
>>
>> But having a warning that sqlite is development only and much slower than
>> other solutions when it’s enabled seems fair. Also forcing new time users
>> to edit the config on first run I think is acceptable as they will need to
>> get used to do that frequently anyway if they’re rolling their own install.
>>
>>
>>
>> Damian
>>
>>
>>
>> *From:* Jarek Potiuk <ja...@potiuk.com>
>> *Sent:* Tuesday, July 20, 2021 12:23
>> *To:* dev@airflow.apache.org
>> *Subject:* [DISCUSSION] Should we be more explicit about SQLite using
>> for dev only (or kill it for non-dev entirely????)
>>
>>
>>
>> Hello Community,
>>
>>
>> Recently we had several people who complained (on slack) that airflow 2.1
>> is slow in scheduling tasks. After some discussion it usually turned out
>> that those people were using SQLite + Sequential executor. I think it gives
>> very bad impression to users. We even had one user who almost gave up
>> Airflow seeing how slow it is in scheduling tasks (!).
>>
>> I think while in Airflow 1.10 the difference was not as noticeable,
>> Airflow 2 with Postgres/MySQL is lightning fast comparing to sqlite. It's
>> like a different world.
>>
>>
>>
>> First time users might get a very bad impression when their first contact
>> with Airflow is via sqlite + Sequential executor.
>>
>>
>>
>> Many people choose sqlite as their first choice when they try Airflow
>> (Sqlite is generally seen as solid choice in many cases and people are
>> afraid that setting up MySQL/Postgres might take them a lot of time to
>> setup).
>>
>>
>>
>> However with current Docker-Compose quilck-start by Kamil it is already
>> rather quick to set-up a working setup with Postgres.
>>
>>
>>
>> My idea is - why don't we make SQLITE "development-only" choice. That
>> would require editable, development version of airflow to run and fail hard
>> when it is installed as regular package (with appropriate "Use  proper
>> database - MySQL/Postgres" - and MSSQL when we release MSSQL-support in 2.2
>> ).
>>
>>
>>
>> I think that would be possible, it would not violate backwards
>> compatibility (sqlite was anyhow for development-only) and it would help
>> Airflow with being seen as more "snappy".
>>
>>
>>
>> Any other ideas?
>>
>>
>>
>> WDYT?
>>
>>
>>
>> J.
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> +48 660 796 129
>>
>>
>> ==============================================================================
>> Please access the attached hyperlink for an important electronic
>> communications disclaimer:
>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>>
>> ==============================================================================
>>
>
>

-- 
+48 660 796 129

Re: [DISCUSSION] Should we be more explicit about SQLite using for dev only (or kill it for non-dev entirely????)

Posted by Lars Winderling <la...@posteo.de>.
also as a user, having sqlite around for testing e.g. in a single, 
self-build docker-image (no compose), is reaaaally handy. I would 
definitely miss that. I always use the upstream pip-package to align my 
testing and prod envs, so I wouldn't like to distinguish between an 
editable and non-editable package.

So maybe instead we should rather check whether sqlite and/or sequential 
executor have been configured in the airflow instance, for recognizing a 
dev environment.

Instead of failing hard, it would be really cool to show a hint (big, 
red, flashy, …) directly in the UI to the users, at least to those with 
admin-privs. we could remind them of them using the dev-setup. and then 
also have some info text that names the disadvantages of the dev-setup 
like being slow etc, and then hinting to an easy alternative like 
docker-compose that one /could/ use in production as well.

I am not sure if we have this atm, but pre-configuring airflow with 
sqlite+sequentialexecutor is key to getting new users using airflow in 
the first place. those with low infrastructure expertise would just be 
scared of right away, I fear. and anyone who just wants to give it a 
shot as well.

Best, Lars


On 7/21/21 12:36 AM, Andrew Godwin wrote:
> We explored a similar idea with Django many years ago, and the 
> conclusion back then, which I would also put forward here, is that 
> having a project scale down to an easy developer install is of 
> crucial importance, and so I think SQLite has to stay in that role 
(as 
> there is no reasonable alternative, at least not yet).
>
> I do think it should be heavily discouraged in production installs, 
> though. If there's a way you think we can pull that off while not 
> making development annoying, I'd be all for it.
>
> Andrew
>
> On Tue, Jul 20, 2021 at 12:24 PM Shaw, Damian P. 
> <damian.shaw.2@credit-suisse.com 
> <ma...@credit-suisse.com>> wrote:
>
>     Some thought as a user of Airflow,
>
>     I wouldn’t of adopted Airflow in the first place if I couldn’t
>     test it with sqlite. And would be the same today, accessing docker
>     isn’t always an easy in some companies.
>
>     But having a warning that sqlite is development only and much
>     slower than other solutions when it’s enabled seems fair. Also
>     forcing new time users to edit the config on first run I think is
>     acceptable as they will need to get used to do that frequently
>     anyway if they’re rolling their own install.
>
>     Damian
>
>     *From:*Jarek Potiuk <jarek@potiuk.com <ma...@potiuk.com>>
>     *Sent:* Tuesday, July 20, 2021 12:23
>     *To:* dev@airflow.apache.org <ma...@airflow.apache.org>
>     *Subject:* [DISCUSSION] Should we be more explicit about SQLite
>     using for dev only (or kill it for non-dev entirely????)
>
>     Hello Community,
>
>
>     Recently we had several people who complained (on slack) that
>     airflow 2.1 is slow in scheduling tasks. After some discussion it
>     usually turned out that those people were using SQLite +
>     Sequential executor. I think it gives very bad impression to
>     users. We even had one user who almost gave up Airflow seeing how
>     slow it is in scheduling tasks (!).
>
>     I think while in Airflow 1.10 the difference was not as
>     noticeable, Airflow 2 with Postgres/MySQL is lightning fast
>     comparing to sqlite. It's like a different world.
>
>     First time users might get a very bad impression when their first
>     contact with Airflow is via sqlite + Sequential executor.
>
>     Many people choose sqlite as their first choice when they try
>     Airflow (Sqlite is generally seen as solid choice in many cases
>     and people are afraid that setting up MySQL/Postgres might take
>     them a lot of time to setup).
>
>     However with current Docker-Compose quilck-start by Kamil it is
>     already rather quick to set-up a working setup with Postgres.
>
>     My idea is - why don't we make SQLITE "development-only" choice.
>     That would require editable, development version of airflow to run
>     and fail hard when it is installed as regular package (with
>     appropriate "Use  proper database - MySQL/Postgres" - and MSSQL
>     when we release MSSQL-support in 2.2 ).
>
>     I think that would be possible, it would not violate backwards
>     compatibility (sqlite was anyhow for development-only) and it
>     would help Airflow with being seen as more "snappy".
>
>     Any other ideas?
>
>     WDYT?
>
>     J.
>
>     -- 
>
>     +48 660 796 129
>
>
>     ==============================================================================
>     Please access the attached hyperlink for an important electronic
>     communications disclaimer:
>     http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>     <http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html>
>     ==============================================================================
>


Re: [DISCUSSION] Should we be more explicit about SQLite using for dev only (or kill it for non-dev entirely????)

Posted by Andrew Godwin <an...@astronomer.io.INVALID>.
We explored a similar idea with Django many years ago, and the conclusion
back then, which I would also put forward here, is that having a project
scale down to an easy developer install is of crucial importance, and so I
think SQLite has to stay in that role (as there is no reasonable
alternative, at least not yet).

I do think it should be heavily discouraged in production installs, though.
If there's a way you think we can pull that off while not making
development annoying, I'd be all for it.

Andrew

On Tue, Jul 20, 2021 at 12:24 PM Shaw, Damian P. <
damian.shaw.2@credit-suisse.com> wrote:

> Some thought as a user of Airflow,
>
>
>
> I wouldn’t of adopted Airflow in the first place if I couldn’t test it
> with sqlite. And would be the same today, accessing docker isn’t always an
> easy in some companies.
>
>
>
> But having a warning that sqlite is development only and much slower than
> other solutions when it’s enabled seems fair. Also forcing new time users
> to edit the config on first run I think is acceptable as they will need to
> get used to do that frequently anyway if they’re rolling their own install.
>
>
>
> Damian
>
>
>
> *From:* Jarek Potiuk <ja...@potiuk.com>
> *Sent:* Tuesday, July 20, 2021 12:23
> *To:* dev@airflow.apache.org
> *Subject:* [DISCUSSION] Should we be more explicit about SQLite using for
> dev only (or kill it for non-dev entirely????)
>
>
>
> Hello Community,
>
>
> Recently we had several people who complained (on slack) that airflow 2.1
> is slow in scheduling tasks. After some discussion it usually turned out
> that those people were using SQLite + Sequential executor. I think it gives
> very bad impression to users. We even had one user who almost gave up
> Airflow seeing how slow it is in scheduling tasks (!).
>
> I think while in Airflow 1.10 the difference was not as noticeable,
> Airflow 2 with Postgres/MySQL is lightning fast comparing to sqlite. It's
> like a different world.
>
>
>
> First time users might get a very bad impression when their first contact
> with Airflow is via sqlite + Sequential executor.
>
>
>
> Many people choose sqlite as their first choice when they try Airflow
> (Sqlite is generally seen as solid choice in many cases and people are
> afraid that setting up MySQL/Postgres might take them a lot of time to
> setup).
>
>
>
> However with current Docker-Compose quilck-start by Kamil it is already
> rather quick to set-up a working setup with Postgres.
>
>
>
> My idea is - why don't we make SQLITE "development-only" choice. That
> would require editable, development version of airflow to run and fail hard
> when it is installed as regular package (with appropriate "Use  proper
> database - MySQL/Postgres" - and MSSQL when we release MSSQL-support in 2.2
> ).
>
>
>
> I think that would be possible, it would not violate backwards
> compatibility (sqlite was anyhow for development-only) and it would help
> Airflow with being seen as more "snappy".
>
>
>
> Any other ideas?
>
>
>
> WDYT?
>
>
>
> J.
>
>
>
>
>
>
>
> --
>
> +48 660 796 129
>
>
> ==============================================================================
> Please access the attached hyperlink for an important electronic
> communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>
> ==============================================================================
>

RE: [DISCUSSION] Should we be more explicit about SQLite using for dev only (or kill it for non-dev entirely????)

Posted by "Shaw, Damian P. " <da...@credit-suisse.com>.
Some thought as a user of Airflow,

I wouldn’t of adopted Airflow in the first place if I couldn’t test it with sqlite. And would be the same today, accessing docker isn’t always an easy in some companies.

But having a warning that sqlite is development only and much slower than other solutions when it’s enabled seems fair. Also forcing new time users to edit the config on first run I think is acceptable as they will need to get used to do that frequently anyway if they’re rolling their own install.

Damian

From: Jarek Potiuk <ja...@potiuk.com>
Sent: Tuesday, July 20, 2021 12:23
To: dev@airflow.apache.org
Subject: [DISCUSSION] Should we be more explicit about SQLite using for dev only (or kill it for non-dev entirely????)

Hello Community,

Recently we had several people who complained (on slack) that airflow 2.1 is slow in scheduling tasks. After some discussion it usually turned out that those people were using SQLite + Sequential executor. I think it gives very bad impression to users. We even had one user who almost gave up Airflow seeing how slow it is in scheduling tasks (!).

I think while in Airflow 1.10 the difference was not as noticeable, Airflow 2 with Postgres/MySQL is lightning fast comparing to sqlite. It's like a different world.

First time users might get a very bad impression when their first contact with Airflow is via sqlite + Sequential executor.

Many people choose sqlite as their first choice when they try Airflow (Sqlite is generally seen as solid choice in many cases and people are afraid that setting up MySQL/Postgres might take them a lot of time to setup).

However with current Docker-Compose quilck-start by Kamil it is already rather quick to set-up a working setup with Postgres.

My idea is - why don't we make SQLITE "development-only" choice. That would require editable, development version of airflow to run and fail hard when it is installed as regular package (with appropriate "Use  proper database - MySQL/Postgres" - and MSSQL when we release MSSQL-support in 2.2 ).

I think that would be possible, it would not violate backwards compatibility (sqlite was anyhow for development-only) and it would help Airflow with being seen as more "snappy".

Any other ideas?

WDYT?

J.



--
+48 660 796 129

=============================================================================== 
Please access the attached hyperlink for an important electronic communications disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
===============================================================================