You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Siddharth Anand <sa...@apache.org> on 2016/05/04 08:18:49 UTC

Why do we need SQLite in Airflow?

From time to time, we run into bugs with the SQLite dialect in SQLAlchemy and close the bugs as "wont-fix" because we don't want to be in the business of fixing such bug. We deem SQLite as a "non-serious" database that no one [in his/her right mind] would run in his/her staging, qa, or production environments. However, we rely on the SequentialExecutor and one the SQLite DB for our tests. 
What should we do with SQLite? Should we lift up the hood and fix it for our needs or find either a different ORM or a different option for DB backend?
Example of bugs we encounter and close as won't fix : 1. Deleting a task instance : https://github.com/airbnb/airflow/issues/9552. Weird pickle issue : https://issues.apache.org/jira/browse/AIRFLOW-46




Re: Why do we need SQLite in Airflow?

Posted by Chris Riccomini <cr...@apache.org>.
Yea, I'm referring specifically to the idea that Docker as a requirement
for doing a hello world Airflow will make things better. I don't think it
will.

On Wednesday, May 4, 2016, Lance Norskog <la...@gmail.com> wrote:

> We use Docker at Edmodo and it really helped for Airflow.
>
> It's easy to say "pip install airflow" itself, but some of the database
> drivers require pip installs that then require dev versions of host .rpm or
> .deb packages because they want a .h file to compile against.
>
> We are porting a large complex Hadoop-based ETL to Airflow and used Docker
> to package web services that we call from Airflow.
>
> Another part of our system is that we want to set up Amazon "AutoStart
> Groups" to launch more Airflow executor servers when our main server
> becomes overloaded. We run a few large-memory Java jobs and this will be a
> problem soon. Our tooling lets us easily set this up with Docker. (We wrote
> something just like Docker Compose that talks to ASG. It's incredibly
> useful.)
>
> So, yeah, "pip install airflow" is fine for kicking the tires but we needed
> binary management rather quickly after that.
>
> Cheers,
>
> Lance
>
> On Wed, May 4, 2016 at 1:28 PM, Chris Riccomini <criccomini@apache.org
> <javascript:;>>
> wrote:
>
> > > As far as ease of use, while docker is definitely getting more popular,
> > it
> > is hard to beat the current pip install flow for people not quite up to
> > date
> > on how to setup docker. It seems like one more hurdle if you just want to
> > get started.
> >
> > Strongly agree. We tried to use Vagrant and then Docker with a prior
> > project, and it was a pain. Another project that I'm working with now
> uses
> > Docker for its hello-world stuff, and it's really troublesome. You will
> get
> > WAY more questions if you go this route than the current simple
> pip/sqlite
> > route.
> >
> > On Wed, May 4, 2016 at 12:27 PM, Maxime Beauchemin <
> > maximebeauchemin@gmail.com <javascript:;>> wrote:
> >
> > > Yeah I'd be curious to see how the Docker setup instructions (from
> > scratch)
> > > would compare to the current ones.
> > >
> > > On Wed, May 4, 2016 at 11:05 AM, Arthur Wiedmer <
> > arthur.wiedmer@gmail.com <javascript:;>>
> > > wrote:
> > >
> > > > +1, but it feels like just piling on.
> > > >
> > > > One thing we could consider is which part we would like to fix.
> > > >
> > > > - If it is the seriousness/production ready db, but that is still a
> > local
> > > > db/client, we could try something like firebird.
> > > > Relatively small footprint and can do multithreading, it is supported
> > by
> > > > SQLAlchemy, though it is not as easy to install as sqlite on most
> > *nixes.
> > > > We could spend some cycles baking this into containers as well.
> > > >
> > > > - As far as ease of use, while docker is definitely getting more
> > popular,
> > > > it is hard to beat the current pip install flow for people not quite
> up
> > > to
> > > > date on how to setup docker. It seems like one more hurdle if you
> just
> > > want
> > > > to get started.
> > > >
> > > > Best,
> > > > Arthur
> > > >
> > > >
> > > > On Wed, May 4, 2016 at 9:35 AM, Maxime Beauchemin <
> > > > maximebeauchemin@gmail.com <javascript:;>> wrote:
> > > >
> > > > > Making it frictionless for people to get their feet wet is
> extremely
> > > > > important. It's been a requirement since the early prototypes and I
> > > feel
> > > > > strongly about keeping it that way. It's hard to test this
> > hypothesis,
> > > > but
> > > > > it could be a defining factor in the success of this project
> (to-date
> > > and
> > > > > future).
> > > > >
> > > > > Docker may allow for more batteries to be included and offer even
> > less
> > > > > friction than the `pip install` path for folks who are familiar
> with
> > > it.
> > > > > I'd have to look to see if the community contributed Docker images
> > are
> > > up
> > > > > to date. We may want to make that "the way to go" and change the
> > > > tutorial /
> > > > > quick start instructions to reflect that if it makes sense. That
> may
> > > > > require integrating the burning of images as part of the build
> and/or
> > > > > release process.
> > > > >
> > > > > Max
> > > > >
> > > > > On Wed, May 4, 2016 at 6:33 AM, Jeremiah Lowin <jlowin@apache.org
> <javascript:;>>
> > > > wrote:
> > > > >
> > > > > > +1, shipping Airflow "batteries included" is very important in my
> > > > > opinion.
> > > > > > There is a lot to grok and the easiest way to learn is by letting
> > > folks
> > > > > > spin up a working installation right away. Unfortunately I don't
> > > think
> > > > > > there's a viable alternative to SQLite that is also supported by
> > > > > > SQLAlchemy.
> > > > > >
> > > > > > On Wed, May 4, 2016 at 2:57 AM Prateek Rungta <
> prungta2@gmail.com <javascript:;>>
> > > > > wrote:
> > > > > >
> > > > > > > It's documented pretty well that it's only for people to get
> > their
> > > > feet
> > > > > > wet
> > > > > > > with. From the quickstart
> > > > > > > <http://pythonhosted.org/airflow/start.html?highlight=sqlite>:
> > > > > > >
> > > > > > > Out of the box, Airflow uses a sqlite database, which you
> should
> > > > > outgrow
> > > > > > > fairly quickly since no parallelization is possible using this
> > > > database
> > > > > > > backend. It works in conjunction with the SequentialExecutor
> > which
> > > > will
> > > > > > > only run task instances sequentially. While this is very
> > limiting,
> > > it
> > > > > > > allows you to get up and running quickly and take a tour of the
> > UI
> > > > and
> > > > > > the
> > > > > > > command line utilities.
> > > > > > >
> > > > > > > FWIW, I'm now on day 2 of using Airflow. And while I wouldn't
> > dream
> > > > of
> > > > > > > deploying Airflow using SQLite beyond my laptop, I quite
> > > appreciated
> > > > > > being
> > > > > > > able to mess with Airflow without any of the infrastructural
> > > > > constraints.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tue, May 3, 2016 at 11:18 PM, Siddharth Anand <
> > > sanand@apache.org <javascript:;>>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > From time to time, we run into bugs with the SQLite dialect
> in
> > > > > > SQLAlchemy
> > > > > > > > and close the bugs as "wont-fix" because we don't want to be
> in
> > > the
> > > > > > > > business of fixing such bug. We deem SQLite as a
> "non-serious"
> > > > > database
> > > > > > > > that no one [in his/her right mind] would run in his/her
> > staging,
> > > > qa,
> > > > > > or
> > > > > > > > production environments. However, we rely on the
> > > SequentialExecutor
> > > > > and
> > > > > > > one
> > > > > > > > the SQLite DB for our tests.
> > > > > > > > What should we do with SQLite? Should we lift up the hood and
> > fix
> > > > it
> > > > > > for
> > > > > > > > our needs or find either a different ORM or a different
> option
> > > for
> > > > DB
> > > > > > > > backend?
> > > > > > > > Example of bugs we encounter and close as won't fix : 1.
> > > Deleting a
> > > > > > task
> > > > > > > > instance : https://github.com/airbnb/airflow/issues/9552.
> > Weird
> > > > > pickle
> > > > > > > > issue : https://issues.apache.org/jira/browse/AIRFLOW-46
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Lance Norskog
> lance.norskog@gmail.com <javascript:;>
> Redwood City, CA
>

Re: Why do we need SQLite in Airflow?

Posted by Lance Norskog <la...@gmail.com>.
We use Docker at Edmodo and it really helped for Airflow.

It's easy to say "pip install airflow" itself, but some of the database
drivers require pip installs that then require dev versions of host .rpm or
.deb packages because they want a .h file to compile against.

We are porting a large complex Hadoop-based ETL to Airflow and used Docker
to package web services that we call from Airflow.

Another part of our system is that we want to set up Amazon "AutoStart
Groups" to launch more Airflow executor servers when our main server
becomes overloaded. We run a few large-memory Java jobs and this will be a
problem soon. Our tooling lets us easily set this up with Docker. (We wrote
something just like Docker Compose that talks to ASG. It's incredibly
useful.)

So, yeah, "pip install airflow" is fine for kicking the tires but we needed
binary management rather quickly after that.

Cheers,

Lance

On Wed, May 4, 2016 at 1:28 PM, Chris Riccomini <cr...@apache.org>
wrote:

> > As far as ease of use, while docker is definitely getting more popular,
> it
> is hard to beat the current pip install flow for people not quite up to
> date
> on how to setup docker. It seems like one more hurdle if you just want to
> get started.
>
> Strongly agree. We tried to use Vagrant and then Docker with a prior
> project, and it was a pain. Another project that I'm working with now uses
> Docker for its hello-world stuff, and it's really troublesome. You will get
> WAY more questions if you go this route than the current simple pip/sqlite
> route.
>
> On Wed, May 4, 2016 at 12:27 PM, Maxime Beauchemin <
> maximebeauchemin@gmail.com> wrote:
>
> > Yeah I'd be curious to see how the Docker setup instructions (from
> scratch)
> > would compare to the current ones.
> >
> > On Wed, May 4, 2016 at 11:05 AM, Arthur Wiedmer <
> arthur.wiedmer@gmail.com>
> > wrote:
> >
> > > +1, but it feels like just piling on.
> > >
> > > One thing we could consider is which part we would like to fix.
> > >
> > > - If it is the seriousness/production ready db, but that is still a
> local
> > > db/client, we could try something like firebird.
> > > Relatively small footprint and can do multithreading, it is supported
> by
> > > SQLAlchemy, though it is not as easy to install as sqlite on most
> *nixes.
> > > We could spend some cycles baking this into containers as well.
> > >
> > > - As far as ease of use, while docker is definitely getting more
> popular,
> > > it is hard to beat the current pip install flow for people not quite up
> > to
> > > date on how to setup docker. It seems like one more hurdle if you just
> > want
> > > to get started.
> > >
> > > Best,
> > > Arthur
> > >
> > >
> > > On Wed, May 4, 2016 at 9:35 AM, Maxime Beauchemin <
> > > maximebeauchemin@gmail.com> wrote:
> > >
> > > > Making it frictionless for people to get their feet wet is extremely
> > > > important. It's been a requirement since the early prototypes and I
> > feel
> > > > strongly about keeping it that way. It's hard to test this
> hypothesis,
> > > but
> > > > it could be a defining factor in the success of this project (to-date
> > and
> > > > future).
> > > >
> > > > Docker may allow for more batteries to be included and offer even
> less
> > > > friction than the `pip install` path for folks who are familiar with
> > it.
> > > > I'd have to look to see if the community contributed Docker images
> are
> > up
> > > > to date. We may want to make that "the way to go" and change the
> > > tutorial /
> > > > quick start instructions to reflect that if it makes sense. That may
> > > > require integrating the burning of images as part of the build and/or
> > > > release process.
> > > >
> > > > Max
> > > >
> > > > On Wed, May 4, 2016 at 6:33 AM, Jeremiah Lowin <jl...@apache.org>
> > > wrote:
> > > >
> > > > > +1, shipping Airflow "batteries included" is very important in my
> > > > opinion.
> > > > > There is a lot to grok and the easiest way to learn is by letting
> > folks
> > > > > spin up a working installation right away. Unfortunately I don't
> > think
> > > > > there's a viable alternative to SQLite that is also supported by
> > > > > SQLAlchemy.
> > > > >
> > > > > On Wed, May 4, 2016 at 2:57 AM Prateek Rungta <pr...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > It's documented pretty well that it's only for people to get
> their
> > > feet
> > > > > wet
> > > > > > with. From the quickstart
> > > > > > <http://pythonhosted.org/airflow/start.html?highlight=sqlite>:
> > > > > >
> > > > > > Out of the box, Airflow uses a sqlite database, which you should
> > > > outgrow
> > > > > > fairly quickly since no parallelization is possible using this
> > > database
> > > > > > backend. It works in conjunction with the SequentialExecutor
> which
> > > will
> > > > > > only run task instances sequentially. While this is very
> limiting,
> > it
> > > > > > allows you to get up and running quickly and take a tour of the
> UI
> > > and
> > > > > the
> > > > > > command line utilities.
> > > > > >
> > > > > > FWIW, I'm now on day 2 of using Airflow. And while I wouldn't
> dream
> > > of
> > > > > > deploying Airflow using SQLite beyond my laptop, I quite
> > appreciated
> > > > > being
> > > > > > able to mess with Airflow without any of the infrastructural
> > > > constraints.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, May 3, 2016 at 11:18 PM, Siddharth Anand <
> > sanand@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > From time to time, we run into bugs with the SQLite dialect in
> > > > > SQLAlchemy
> > > > > > > and close the bugs as "wont-fix" because we don't want to be in
> > the
> > > > > > > business of fixing such bug. We deem SQLite as a "non-serious"
> > > > database
> > > > > > > that no one [in his/her right mind] would run in his/her
> staging,
> > > qa,
> > > > > or
> > > > > > > production environments. However, we rely on the
> > SequentialExecutor
> > > > and
> > > > > > one
> > > > > > > the SQLite DB for our tests.
> > > > > > > What should we do with SQLite? Should we lift up the hood and
> fix
> > > it
> > > > > for
> > > > > > > our needs or find either a different ORM or a different option
> > for
> > > DB
> > > > > > > backend?
> > > > > > > Example of bugs we encounter and close as won't fix : 1.
> > Deleting a
> > > > > task
> > > > > > > instance : https://github.com/airbnb/airflow/issues/9552.
> Weird
> > > > pickle
> > > > > > > issue : https://issues.apache.org/jira/browse/AIRFLOW-46
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 
Lance Norskog
lance.norskog@gmail.com
Redwood City, CA

Re: Why do we need SQLite in Airflow?

Posted by Chris Riccomini <cr...@apache.org>.
> As far as ease of use, while docker is definitely getting more popular, it
is hard to beat the current pip install flow for people not quite up to date
on how to setup docker. It seems like one more hurdle if you just want to
get started.

Strongly agree. We tried to use Vagrant and then Docker with a prior
project, and it was a pain. Another project that I'm working with now uses
Docker for its hello-world stuff, and it's really troublesome. You will get
WAY more questions if you go this route than the current simple pip/sqlite
route.

On Wed, May 4, 2016 at 12:27 PM, Maxime Beauchemin <
maximebeauchemin@gmail.com> wrote:

> Yeah I'd be curious to see how the Docker setup instructions (from scratch)
> would compare to the current ones.
>
> On Wed, May 4, 2016 at 11:05 AM, Arthur Wiedmer <ar...@gmail.com>
> wrote:
>
> > +1, but it feels like just piling on.
> >
> > One thing we could consider is which part we would like to fix.
> >
> > - If it is the seriousness/production ready db, but that is still a local
> > db/client, we could try something like firebird.
> > Relatively small footprint and can do multithreading, it is supported by
> > SQLAlchemy, though it is not as easy to install as sqlite on most *nixes.
> > We could spend some cycles baking this into containers as well.
> >
> > - As far as ease of use, while docker is definitely getting more popular,
> > it is hard to beat the current pip install flow for people not quite up
> to
> > date on how to setup docker. It seems like one more hurdle if you just
> want
> > to get started.
> >
> > Best,
> > Arthur
> >
> >
> > On Wed, May 4, 2016 at 9:35 AM, Maxime Beauchemin <
> > maximebeauchemin@gmail.com> wrote:
> >
> > > Making it frictionless for people to get their feet wet is extremely
> > > important. It's been a requirement since the early prototypes and I
> feel
> > > strongly about keeping it that way. It's hard to test this hypothesis,
> > but
> > > it could be a defining factor in the success of this project (to-date
> and
> > > future).
> > >
> > > Docker may allow for more batteries to be included and offer even less
> > > friction than the `pip install` path for folks who are familiar with
> it.
> > > I'd have to look to see if the community contributed Docker images are
> up
> > > to date. We may want to make that "the way to go" and change the
> > tutorial /
> > > quick start instructions to reflect that if it makes sense. That may
> > > require integrating the burning of images as part of the build and/or
> > > release process.
> > >
> > > Max
> > >
> > > On Wed, May 4, 2016 at 6:33 AM, Jeremiah Lowin <jl...@apache.org>
> > wrote:
> > >
> > > > +1, shipping Airflow "batteries included" is very important in my
> > > opinion.
> > > > There is a lot to grok and the easiest way to learn is by letting
> folks
> > > > spin up a working installation right away. Unfortunately I don't
> think
> > > > there's a viable alternative to SQLite that is also supported by
> > > > SQLAlchemy.
> > > >
> > > > On Wed, May 4, 2016 at 2:57 AM Prateek Rungta <pr...@gmail.com>
> > > wrote:
> > > >
> > > > > It's documented pretty well that it's only for people to get their
> > feet
> > > > wet
> > > > > with. From the quickstart
> > > > > <http://pythonhosted.org/airflow/start.html?highlight=sqlite>:
> > > > >
> > > > > Out of the box, Airflow uses a sqlite database, which you should
> > > outgrow
> > > > > fairly quickly since no parallelization is possible using this
> > database
> > > > > backend. It works in conjunction with the SequentialExecutor which
> > will
> > > > > only run task instances sequentially. While this is very limiting,
> it
> > > > > allows you to get up and running quickly and take a tour of the UI
> > and
> > > > the
> > > > > command line utilities.
> > > > >
> > > > > FWIW, I'm now on day 2 of using Airflow. And while I wouldn't dream
> > of
> > > > > deploying Airflow using SQLite beyond my laptop, I quite
> appreciated
> > > > being
> > > > > able to mess with Airflow without any of the infrastructural
> > > constraints.
> > > > >
> > > > >
> > > > >
> > > > > On Tue, May 3, 2016 at 11:18 PM, Siddharth Anand <
> sanand@apache.org>
> > > > > wrote:
> > > > >
> > > > > > From time to time, we run into bugs with the SQLite dialect in
> > > > SQLAlchemy
> > > > > > and close the bugs as "wont-fix" because we don't want to be in
> the
> > > > > > business of fixing such bug. We deem SQLite as a "non-serious"
> > > database
> > > > > > that no one [in his/her right mind] would run in his/her staging,
> > qa,
> > > > or
> > > > > > production environments. However, we rely on the
> SequentialExecutor
> > > and
> > > > > one
> > > > > > the SQLite DB for our tests.
> > > > > > What should we do with SQLite? Should we lift up the hood and fix
> > it
> > > > for
> > > > > > our needs or find either a different ORM or a different option
> for
> > DB
> > > > > > backend?
> > > > > > Example of bugs we encounter and close as won't fix : 1.
> Deleting a
> > > > task
> > > > > > instance : https://github.com/airbnb/airflow/issues/9552. Weird
> > > pickle
> > > > > > issue : https://issues.apache.org/jira/browse/AIRFLOW-46
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Why do we need SQLite in Airflow?

Posted by Maxime Beauchemin <ma...@gmail.com>.
Yeah I'd be curious to see how the Docker setup instructions (from scratch)
would compare to the current ones.

On Wed, May 4, 2016 at 11:05 AM, Arthur Wiedmer <ar...@gmail.com>
wrote:

> +1, but it feels like just piling on.
>
> One thing we could consider is which part we would like to fix.
>
> - If it is the seriousness/production ready db, but that is still a local
> db/client, we could try something like firebird.
> Relatively small footprint and can do multithreading, it is supported by
> SQLAlchemy, though it is not as easy to install as sqlite on most *nixes.
> We could spend some cycles baking this into containers as well.
>
> - As far as ease of use, while docker is definitely getting more popular,
> it is hard to beat the current pip install flow for people not quite up to
> date on how to setup docker. It seems like one more hurdle if you just want
> to get started.
>
> Best,
> Arthur
>
>
> On Wed, May 4, 2016 at 9:35 AM, Maxime Beauchemin <
> maximebeauchemin@gmail.com> wrote:
>
> > Making it frictionless for people to get their feet wet is extremely
> > important. It's been a requirement since the early prototypes and I feel
> > strongly about keeping it that way. It's hard to test this hypothesis,
> but
> > it could be a defining factor in the success of this project (to-date and
> > future).
> >
> > Docker may allow for more batteries to be included and offer even less
> > friction than the `pip install` path for folks who are familiar with it.
> > I'd have to look to see if the community contributed Docker images are up
> > to date. We may want to make that "the way to go" and change the
> tutorial /
> > quick start instructions to reflect that if it makes sense. That may
> > require integrating the burning of images as part of the build and/or
> > release process.
> >
> > Max
> >
> > On Wed, May 4, 2016 at 6:33 AM, Jeremiah Lowin <jl...@apache.org>
> wrote:
> >
> > > +1, shipping Airflow "batteries included" is very important in my
> > opinion.
> > > There is a lot to grok and the easiest way to learn is by letting folks
> > > spin up a working installation right away. Unfortunately I don't think
> > > there's a viable alternative to SQLite that is also supported by
> > > SQLAlchemy.
> > >
> > > On Wed, May 4, 2016 at 2:57 AM Prateek Rungta <pr...@gmail.com>
> > wrote:
> > >
> > > > It's documented pretty well that it's only for people to get their
> feet
> > > wet
> > > > with. From the quickstart
> > > > <http://pythonhosted.org/airflow/start.html?highlight=sqlite>:
> > > >
> > > > Out of the box, Airflow uses a sqlite database, which you should
> > outgrow
> > > > fairly quickly since no parallelization is possible using this
> database
> > > > backend. It works in conjunction with the SequentialExecutor which
> will
> > > > only run task instances sequentially. While this is very limiting, it
> > > > allows you to get up and running quickly and take a tour of the UI
> and
> > > the
> > > > command line utilities.
> > > >
> > > > FWIW, I'm now on day 2 of using Airflow. And while I wouldn't dream
> of
> > > > deploying Airflow using SQLite beyond my laptop, I quite appreciated
> > > being
> > > > able to mess with Airflow without any of the infrastructural
> > constraints.
> > > >
> > > >
> > > >
> > > > On Tue, May 3, 2016 at 11:18 PM, Siddharth Anand <sa...@apache.org>
> > > > wrote:
> > > >
> > > > > From time to time, we run into bugs with the SQLite dialect in
> > > SQLAlchemy
> > > > > and close the bugs as "wont-fix" because we don't want to be in the
> > > > > business of fixing such bug. We deem SQLite as a "non-serious"
> > database
> > > > > that no one [in his/her right mind] would run in his/her staging,
> qa,
> > > or
> > > > > production environments. However, we rely on the SequentialExecutor
> > and
> > > > one
> > > > > the SQLite DB for our tests.
> > > > > What should we do with SQLite? Should we lift up the hood and fix
> it
> > > for
> > > > > our needs or find either a different ORM or a different option for
> DB
> > > > > backend?
> > > > > Example of bugs we encounter and close as won't fix : 1. Deleting a
> > > task
> > > > > instance : https://github.com/airbnb/airflow/issues/9552. Weird
> > pickle
> > > > > issue : https://issues.apache.org/jira/browse/AIRFLOW-46
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Why do we need SQLite in Airflow?

Posted by Arthur Wiedmer <ar...@gmail.com>.
+1, but it feels like just piling on.

One thing we could consider is which part we would like to fix.

- If it is the seriousness/production ready db, but that is still a local
db/client, we could try something like firebird.
Relatively small footprint and can do multithreading, it is supported by
SQLAlchemy, though it is not as easy to install as sqlite on most *nixes.
We could spend some cycles baking this into containers as well.

- As far as ease of use, while docker is definitely getting more popular,
it is hard to beat the current pip install flow for people not quite up to
date on how to setup docker. It seems like one more hurdle if you just want
to get started.

Best,
Arthur


On Wed, May 4, 2016 at 9:35 AM, Maxime Beauchemin <
maximebeauchemin@gmail.com> wrote:

> Making it frictionless for people to get their feet wet is extremely
> important. It's been a requirement since the early prototypes and I feel
> strongly about keeping it that way. It's hard to test this hypothesis, but
> it could be a defining factor in the success of this project (to-date and
> future).
>
> Docker may allow for more batteries to be included and offer even less
> friction than the `pip install` path for folks who are familiar with it.
> I'd have to look to see if the community contributed Docker images are up
> to date. We may want to make that "the way to go" and change the tutorial /
> quick start instructions to reflect that if it makes sense. That may
> require integrating the burning of images as part of the build and/or
> release process.
>
> Max
>
> On Wed, May 4, 2016 at 6:33 AM, Jeremiah Lowin <jl...@apache.org> wrote:
>
> > +1, shipping Airflow "batteries included" is very important in my
> opinion.
> > There is a lot to grok and the easiest way to learn is by letting folks
> > spin up a working installation right away. Unfortunately I don't think
> > there's a viable alternative to SQLite that is also supported by
> > SQLAlchemy.
> >
> > On Wed, May 4, 2016 at 2:57 AM Prateek Rungta <pr...@gmail.com>
> wrote:
> >
> > > It's documented pretty well that it's only for people to get their feet
> > wet
> > > with. From the quickstart
> > > <http://pythonhosted.org/airflow/start.html?highlight=sqlite>:
> > >
> > > Out of the box, Airflow uses a sqlite database, which you should
> outgrow
> > > fairly quickly since no parallelization is possible using this database
> > > backend. It works in conjunction with the SequentialExecutor which will
> > > only run task instances sequentially. While this is very limiting, it
> > > allows you to get up and running quickly and take a tour of the UI and
> > the
> > > command line utilities.
> > >
> > > FWIW, I'm now on day 2 of using Airflow. And while I wouldn't dream of
> > > deploying Airflow using SQLite beyond my laptop, I quite appreciated
> > being
> > > able to mess with Airflow without any of the infrastructural
> constraints.
> > >
> > >
> > >
> > > On Tue, May 3, 2016 at 11:18 PM, Siddharth Anand <sa...@apache.org>
> > > wrote:
> > >
> > > > From time to time, we run into bugs with the SQLite dialect in
> > SQLAlchemy
> > > > and close the bugs as "wont-fix" because we don't want to be in the
> > > > business of fixing such bug. We deem SQLite as a "non-serious"
> database
> > > > that no one [in his/her right mind] would run in his/her staging, qa,
> > or
> > > > production environments. However, we rely on the SequentialExecutor
> and
> > > one
> > > > the SQLite DB for our tests.
> > > > What should we do with SQLite? Should we lift up the hood and fix it
> > for
> > > > our needs or find either a different ORM or a different option for DB
> > > > backend?
> > > > Example of bugs we encounter and close as won't fix : 1. Deleting a
> > task
> > > > instance : https://github.com/airbnb/airflow/issues/9552. Weird
> pickle
> > > > issue : https://issues.apache.org/jira/browse/AIRFLOW-46
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: Why do we need SQLite in Airflow?

Posted by Maxime Beauchemin <ma...@gmail.com>.
Making it frictionless for people to get their feet wet is extremely
important. It's been a requirement since the early prototypes and I feel
strongly about keeping it that way. It's hard to test this hypothesis, but
it could be a defining factor in the success of this project (to-date and
future).

Docker may allow for more batteries to be included and offer even less
friction than the `pip install` path for folks who are familiar with it.
I'd have to look to see if the community contributed Docker images are up
to date. We may want to make that "the way to go" and change the tutorial /
quick start instructions to reflect that if it makes sense. That may
require integrating the burning of images as part of the build and/or
release process.

Max

On Wed, May 4, 2016 at 6:33 AM, Jeremiah Lowin <jl...@apache.org> wrote:

> +1, shipping Airflow "batteries included" is very important in my opinion.
> There is a lot to grok and the easiest way to learn is by letting folks
> spin up a working installation right away. Unfortunately I don't think
> there's a viable alternative to SQLite that is also supported by
> SQLAlchemy.
>
> On Wed, May 4, 2016 at 2:57 AM Prateek Rungta <pr...@gmail.com> wrote:
>
> > It's documented pretty well that it's only for people to get their feet
> wet
> > with. From the quickstart
> > <http://pythonhosted.org/airflow/start.html?highlight=sqlite>:
> >
> > Out of the box, Airflow uses a sqlite database, which you should outgrow
> > fairly quickly since no parallelization is possible using this database
> > backend. It works in conjunction with the SequentialExecutor which will
> > only run task instances sequentially. While this is very limiting, it
> > allows you to get up and running quickly and take a tour of the UI and
> the
> > command line utilities.
> >
> > FWIW, I'm now on day 2 of using Airflow. And while I wouldn't dream of
> > deploying Airflow using SQLite beyond my laptop, I quite appreciated
> being
> > able to mess with Airflow without any of the infrastructural constraints.
> >
> >
> >
> > On Tue, May 3, 2016 at 11:18 PM, Siddharth Anand <sa...@apache.org>
> > wrote:
> >
> > > From time to time, we run into bugs with the SQLite dialect in
> SQLAlchemy
> > > and close the bugs as "wont-fix" because we don't want to be in the
> > > business of fixing such bug. We deem SQLite as a "non-serious" database
> > > that no one [in his/her right mind] would run in his/her staging, qa,
> or
> > > production environments. However, we rely on the SequentialExecutor and
> > one
> > > the SQLite DB for our tests.
> > > What should we do with SQLite? Should we lift up the hood and fix it
> for
> > > our needs or find either a different ORM or a different option for DB
> > > backend?
> > > Example of bugs we encounter and close as won't fix : 1. Deleting a
> task
> > > instance : https://github.com/airbnb/airflow/issues/9552. Weird pickle
> > > issue : https://issues.apache.org/jira/browse/AIRFLOW-46
> > >
> > >
> > >
> > >
> >
>

Re: Why do we need SQLite in Airflow?

Posted by Jeremiah Lowin <jl...@apache.org>.
+1, shipping Airflow "batteries included" is very important in my opinion.
There is a lot to grok and the easiest way to learn is by letting folks
spin up a working installation right away. Unfortunately I don't think
there's a viable alternative to SQLite that is also supported by SQLAlchemy.

On Wed, May 4, 2016 at 2:57 AM Prateek Rungta <pr...@gmail.com> wrote:

> It's documented pretty well that it's only for people to get their feet wet
> with. From the quickstart
> <http://pythonhosted.org/airflow/start.html?highlight=sqlite>:
>
> Out of the box, Airflow uses a sqlite database, which you should outgrow
> fairly quickly since no parallelization is possible using this database
> backend. It works in conjunction with the SequentialExecutor which will
> only run task instances sequentially. While this is very limiting, it
> allows you to get up and running quickly and take a tour of the UI and the
> command line utilities.
>
> FWIW, I'm now on day 2 of using Airflow. And while I wouldn't dream of
> deploying Airflow using SQLite beyond my laptop, I quite appreciated being
> able to mess with Airflow without any of the infrastructural constraints.
>
>
>
> On Tue, May 3, 2016 at 11:18 PM, Siddharth Anand <sa...@apache.org>
> wrote:
>
> > From time to time, we run into bugs with the SQLite dialect in SQLAlchemy
> > and close the bugs as "wont-fix" because we don't want to be in the
> > business of fixing such bug. We deem SQLite as a "non-serious" database
> > that no one [in his/her right mind] would run in his/her staging, qa, or
> > production environments. However, we rely on the SequentialExecutor and
> one
> > the SQLite DB for our tests.
> > What should we do with SQLite? Should we lift up the hood and fix it for
> > our needs or find either a different ORM or a different option for DB
> > backend?
> > Example of bugs we encounter and close as won't fix : 1. Deleting a task
> > instance : https://github.com/airbnb/airflow/issues/9552. Weird pickle
> > issue : https://issues.apache.org/jira/browse/AIRFLOW-46
> >
> >
> >
> >
>

Re: Why do we need SQLite in Airflow?

Posted by Prateek Rungta <pr...@gmail.com>.
It's documented pretty well that it's only for people to get their feet wet
with. From the quickstart
<http://pythonhosted.org/airflow/start.html?highlight=sqlite>:

Out of the box, Airflow uses a sqlite database, which you should outgrow
fairly quickly since no parallelization is possible using this database
backend. It works in conjunction with the SequentialExecutor which will
only run task instances sequentially. While this is very limiting, it
allows you to get up and running quickly and take a tour of the UI and the
command line utilities.

FWIW, I'm now on day 2 of using Airflow. And while I wouldn't dream of
deploying Airflow using SQLite beyond my laptop, I quite appreciated being
able to mess with Airflow without any of the infrastructural constraints.



On Tue, May 3, 2016 at 11:18 PM, Siddharth Anand <sa...@apache.org> wrote:

> From time to time, we run into bugs with the SQLite dialect in SQLAlchemy
> and close the bugs as "wont-fix" because we don't want to be in the
> business of fixing such bug. We deem SQLite as a "non-serious" database
> that no one [in his/her right mind] would run in his/her staging, qa, or
> production environments. However, we rely on the SequentialExecutor and one
> the SQLite DB for our tests.
> What should we do with SQLite? Should we lift up the hood and fix it for
> our needs or find either a different ORM or a different option for DB
> backend?
> Example of bugs we encounter and close as won't fix : 1. Deleting a task
> instance : https://github.com/airbnb/airflow/issues/9552. Weird pickle
> issue : https://issues.apache.org/jira/browse/AIRFLOW-46
>
>
>
>