You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/22 13:12:43 UTC

[GitHub] [airflow] mik-laj opened a new issue #13838: Setting up Airflow architecture for local development is hard

mik-laj opened a new issue #13838:
URL: https://github.com/apache/airflow/issues/13838


   Hello,
   
   Many users have trouble setting up Airflow in their local environment. I have noticed that there are several problems that we need to solve.
   - We lack quick start guides that allow us to quickly launch Airflow in a more development-friendly environment. I am working on [one quick start](https://github.com/apache/airflow/pull/13660), but I think other configurations are also worth describing. For example, see: [Apache Pinot](https://docs.pinot.apache.org/basics/getting-started), [Minio](https://docs.min.io/docs/minio-quickstart-guide.html).  Currently, quick start describes the environment that SequentialExecutor uses, so it needs to be adjusted in most cases.
   - The Airflow architecture consists of many components, which is problematic for beginners. Only the documentation for Airflow 2.0 contains [a general description of the architecture](https://airflow.apache.org/docs/apache-airflow/stable/start.html#basic-airflow-architecture) contributed by @vikramkoka. 
   - Recommendations for the new local environment are not clearly stated in the documentation. Such information is popping up in the community - CeleryExecutor/PostgresSQL/Redis, but some users are not aware of it and try to configure environments that are less common and have less common/more problems.
   - Windows is not officially supported, but it is not clearly documented and there is no workaround described.
   - We are missing Docker-compose files that will allow us to run some popular configurations more easily.  We have Breeze for contributors, but no tools for other people.  See: https://github.com/apache/airflow/issues/8605
   - Helm Chart has not been released, so its use is not described in the documentation, and its use itself is more problematic because it requires cloning the full repository. See: https://github.com/apache/airflow/pull/12755 https://github.com/apache/airflow/issues/10523
   
   I think it is very important that we deal with this soon, because the installation problems are described as a [disadvantage of Airflow.](https://towardsdatascience.com/is-apache-airflow-2-0-good-enough-for-current-data-engineering-needs-6e152455775c#450a) Users cannot install and then look for alternatives that are easier to use.  These problems cause a lot of messages on Slack. When Airflow is easier to install, we will have to support other users less and the community will be able to grow faster.
   
   Anyone have any comments regarding the installation of Airflow for a new user? I would like to do some work in this area this quarter, so if anyone has any thoughts, I'll be happy to hear it.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vikramkoka commented on issue #13838: Setting up Airflow for local development is hard

Posted by GitBox <gi...@apache.org>.
vikramkoka commented on issue #13838:
URL: https://github.com/apache/airflow/issues/13838#issuecomment-765505600


   @mik-laj I am glad that you raised this topic, since I have been thinking about this too. 
   I have also been seeing the questions and comments pop up in various places about installation docs and upgrade to 2.0 docs and the overall installation process. 
   I have been collecting my thoughts together on this, so will put those together and share them. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #13838: Setting up Airflow for local development is hard

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #13838:
URL: https://github.com/apache/airflow/issues/13838#issuecomment-766286812


   >  Go would be nicer for portability (we could relase convenience static binaries for different platforms) where python is better for our community to maintain.
   
   I don't know if this is needed as it would still be just a `docker-compose` wrapper, which would be limiting by nature and would create another layer of abstraction. In development and maintenance, we may find it easier to provide one docker-compoose file and one wrapper similar to the [`aws-cli.sh`](https://github.com/KlubJagiellonski/pola-backend/blob/master/scripts/aws-cli.sh)/[`mc.sh`](https://github.com/KlubJagiellonski/pola-backend/blob/master/scripts/mc.sh)/[`kadmin.sh`](https://github.com/mik-laj/presto-hive-kerberos-docker/blob/master/kadmin.sh) scripts to be able to access the CLI easily.  I don't want us to deal with creating another docker-compose alternative if docker-compose is a widely used tool that probably has everything we need. We have to remember that our users have different requirements and use cases that will be difficult for us to achieve if we build too thick abstraction. For example, I recently heard that one company would like to use Istio/Kerberoos, wh
 ich may require a major change to `docker-compose.yaml` to be able to get the network configuration working.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] marcosmarxm removed a comment on issue #13838: Setting up Airflow for local development is hard

Posted by GitBox <gi...@apache.org>.
marcosmarxm removed a comment on issue #13838:
URL: https://github.com/apache/airflow/issues/13838#issuecomment-790714532


   @mik-laj maybe documentation how to create/maintain providers? 
   - create a checklist and the common commands to test for specific providers.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj edited a comment on issue #13838: Setting up Airflow for local development is hard

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on issue #13838:
URL: https://github.com/apache/airflow/issues/13838#issuecomment-766286812






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] sodiqafolayan commented on issue #13838: Setting up Airflow for local development is hard

Posted by GitBox <gi...@apache.org>.
sodiqafolayan commented on issue #13838:
URL: https://github.com/apache/airflow/issues/13838#issuecomment-765714096


   I am most delighted to read this. Coming from a beginner who really wants to learn, understand and use Airflow, i have always had trouble with installation and making parts work together. I commend this initiative and hope it will help newbies get things together with less installation stress


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] marcosmarxm commented on issue #13838: Setting up Airflow for local development is hard

Posted by GitBox <gi...@apache.org>.
marcosmarxm commented on issue #13838:
URL: https://github.com/apache/airflow/issues/13838#issuecomment-790714532


   @mik-laj maybe documentation how to create/maintain providers? 
   - create a checklist and the common commands to test for specific providers.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #13838: Setting up Airflow for local development is hard

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #13838:
URL: https://github.com/apache/airflow/issues/13838#issuecomment-765611601


   My thoughts: 
   
   All for it as well. 
   
   Breeze is not at all designed for end-users :). I love the idea of @kaxil to integrate running "airflow" in the airflow command line. However there is one caveat -  the tool should be  - much more than Breeze - supported on Windows. And it should not be needed at all to checkout Airflow code to use it. So Installing airflow via PyPI and running "start-local-airflow" (or something like that) is one of the ways to do it.
   
   Unfortunately Airflow as a package is notoriously difficult to even install on Windows (not mentioning running)  - the #10388 and #12874 mentioned by @mik-laj  but also discussion at slack today (setproctitle preventing from installing airflow for user who wanted to just install  virtualenv on windows for Intellij autocompletion): https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1611316718050700 
   
   So either we should improve the installability of airflow package on windows (not runnability - that's much more complex :)), or we have a different tool (either go - similar  as astro-cli or python). Go would be nicer for portability (we could relase convenience static binaries for different platforms) where python is better for our community to maintain.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #13838: Setting up Airflow for local development is hard

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #13838:
URL: https://github.com/apache/airflow/issues/13838#issuecomment-765613602


   Initially however, easy-to-copy or generate set of docker-compose files that the user will be able just to copy&paste is good enough for start.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #13838: Setting up Airflow for local development is hard

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13838:
URL: https://github.com/apache/airflow/issues/13838#issuecomment-765603491


   I would love to have those docker-compose files soon as the start of making it easy.
   
   And then natively integrate this docker-compose file (similar to breeze but for users) into airflow cli natively similar to astro-cli (https://github.com/astronomer/astro-cli) so that `airflow dev start` would run `docker-compose up` under the hood.
   
   This is just an idea but I mainly suggest docker-compose so that users can easily run Airflow with CeleryExecutor instead of Sequential or Local. Specifically, it is cumbersome to setup a DB (Postgres / MySQL) working. 
   
   Having a docker-compose file eliminates the need for make sure that they have the correct version of Postgres / MySQL supported by Airflow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org