You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/12/01 18:44:48 UTC

[GitHub] [airflow] potiuk opened a new issue #12744: Difference of extras Airflow 2.0 vs. Airflow 1.10

potiuk opened a new issue #12744:
URL: https://github.com/apache/airflow/issues/12744


   **Description**
   
   When airflow 2.0 is installed from  PyPI, providers are not installed by default. In order to install them, you should add an appropriate extra. While this behavior is identical in Airflow 1.10 for those "providers" that required additional packages, there were a few "providers" that did not require any extras to function (example http, ftp) - we have "http", "ftp" extras for them now, but maybe some of those are .
   
   We have to make a decision now:
   
   - [ ] should all of them (or some of them) be included by default when you install Airflow?
   - [ ] if we decide to exclude only some (or none), we should add them in UPGRADING_to_2_0 and in UPDATING documentation.
   
   **Use case / motivation**
   
   We want people to get a familiar experience when installing airflow. Why we provide familiar mechanism (with extras) and people will expect a slightly different configurations, installation and we can describe the differences, maybe some of those providers are so popular that we should include them by default? 
   
   **Related Issues**
   
   #12685 - where we discuss which of the extras should be included in the Production Image of 2.0.
   
   
   **Additional info**
   
   Here is the list of all "providers" that were present in 1.10 and had no additional dependencies - so basically they woudl work out-fhe-box in 1.10, but they need appropriate "extra" in 2.0.
   
   
   *  "apache.pig": [],
   *  "apache.sqoop": [],
   *  "dingding": [],
   *  "discord": [],
   *  "ftp": [],
   *  "http": [],
   *  "imap": [],
   *  "openfaas": [],
   *  "opsgenie": [],
   *  "sqlite": [],
   
   Also here I appeal to the wisdom of crowd: @ashb, @dimberman @kaxil, @turbaszek, @mik-laj. @XD-DENG, @feluelle, @eladkal, @ryw, @vikramkoka, @KevinYang21  - let me know WDYT before I bring it to devlist?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #12744: Difference of extras Airflow 2.0 vs. Airflow 1.10

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #12744:
URL: https://github.com/apache/airflow/issues/12744#issuecomment-737106323


   > @potiuk doesn't that mean that we keep them in core and make them available to all users, but they still have to refactor their DAGs (due to import changes)? Should we limit the number of changes required in users' DAGs?
   
   I think moving them to core now is NOT a good idea, and I think most of the "core" operators were moved inside the core anyway - at least changed module names to conform to AIP-21. I do not think there is a big difference whether they moved inside the core, or whether they are moved to providers. 
   
   ```
   http_operator -> http
   contrib.ftp_operator -> ftp 
   ```
   
   
   etc
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #12744: Difference of extras Airflow 2.0 vs. Airflow 1.10

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #12744:
URL: https://github.com/apache/airflow/issues/12744#issuecomment-736780062


   I think the http should be part of core, see discussion in https://github.com/apache/airflow/pull/12252


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ryw commented on issue #12744: Difference of extras Airflow 2.0 vs. Airflow 1.10

Posted by GitBox <gi...@apache.org>.
ryw commented on issue #12744:
URL: https://github.com/apache/airflow/issues/12744#issuecomment-736981514


   i like adding `imap` -- essentially we're saying lower-level protocols are core (ftp, http) so imap fits into that list


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG commented on issue #12744: Difference of extras Airflow 2.0 vs. Airflow 1.10

Posted by GitBox <gi...@apache.org>.
XD-DENG commented on issue #12744:
URL: https://github.com/apache/airflow/issues/12744#issuecomment-737056615


   > The following should require explicitly installing them:
   > 
   > "apache.pig": [],
   > "apache.sqoop": [],
   > "dingding": [],
   > "discord": [],
   > "openfaas": [],
   > "opsgenie": [],
   > "sqlite": [],
   
   I agree with @kaxil , **other than `sqlite`**.
   
   Personally I think `sqlite` should come together with Airflow core by default, without explicit extra installation,
   Considering two examples:
   - Most Linux distributions & MacOS has Sqlite available by default.
   - Python has `sqlite3` as one of its build-in standard libraries.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #12744: Difference of extras Airflow 2.0 vs. Airflow 1.10

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #12744:
URL: https://github.com/apache/airflow/issues/12744#issuecomment-736751984


   > you should add an appropriate extra. 
   
   I am concerned that this is a good idea. I think it would be worthwhile for the user to pin a specific version so that they do not accidentally install a newer version that may contain regressions. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #12744: Difference of extras Airflow 2.0 vs. Airflow 1.10

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #12744:
URL: https://github.com/apache/airflow/issues/12744


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #12744: Difference of extras Airflow 2.0 vs. Airflow 1.10

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12744:
URL: https://github.com/apache/airflow/issues/12744#issuecomment-736894181


   The following should require explicitly installing them:
   
   "apache.pig": [],
   "apache.sqoop": [],
   "dingding": [],
   "discord": [],
   "openfaas": [],
   "opsgenie": [],
   "sqlite": [],


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #12744: Difference of extras Airflow 2.0 vs. Airflow 1.10

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #12744:
URL: https://github.com/apache/airflow/issues/12744#issuecomment-737095334


   Looks like ["http", "ftp", "sqlite", "imap"] is the winning set. They are all rather small and they increase the size of installation by likely less than 1%.
   
   > I am concerned that this is a good idea. I think it would be worthwhile for the user to pin a specific version so that they do not accidentally install a newer version that may contain regressions.
   
   @mik-laj  -> I do not think we have to move them to the "core". I can easily make those extras "enabled" by default as extras that are always used implicitly. This means that while they will be installed by default in their latest version even with `pip install airflow` will also install those 4 providers. There will be no "constraints" for those - the user will have to explicitly upgrade them and will keep the possibility of downgrading them. I will update FAQs explaining this behavior.
   
   One more comment: I also think it will be great to have a few providers installed from day zero. People might not fully realize that there are providers and they might be surprised to not see those other integrations installed  but by seeing few providers pre-installed, this will be much more obvious. Simply 'pip freeze | grep apache-airflow` will show them how provider packages look like.
   
   If there will be no more comments shortly, I will write this proposal to the devlist.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb edited a comment on issue #12744: Difference of extras Airflow 2.0 vs. Airflow 1.10

Posted by GitBox <gi...@apache.org>.
ashb edited a comment on issue #12744:
URL: https://github.com/apache/airflow/issues/12744#issuecomment-737500438


   Anyone know how pip would cope with circular dependencies? I.e. could `apache-airflow` depend upon `apache-airflow-provider-http` (which in turn depends upon `apache-airflow` without giving pip a heart attack?
   
   That we we can have "batteries included" but still keep the advantages of keeping smaller releases/easier updating of providers.
   
   Edit: oh Jarek has a plan already. Cool


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #12744: Difference of extras Airflow 2.0 vs. Airflow 1.10

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #12744:
URL: https://github.com/apache/airflow/issues/12744#issuecomment-738027087


   > Anyone know how pip would cope with circular dependencies? I.e. could `apache-airflow` depend upon `apache-airflow-provider-http` (which in turn depends upon `apache-airflow` without giving pip a heart attack?
   > 
   > That we we can have "batteries included" but still keep the advantages of keeping smaller releases/easier updating of providers.
   > 
   > Edit: oh Jarek has a plan already. Cool
   
   Yep. This is already happening with all providers when we specify extras, PIP is cool with that :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #12744: Difference of extras Airflow 2.0 vs. Airflow 1.10

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #12744:
URL: https://github.com/apache/airflow/issues/12744#issuecomment-736893808


   http (& even ftp) does seem like they should be part of core. Atleast for HTTP it uses all the internal hooks or requirements that are part of Airflow core's requirement too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #12744: Difference of extras Airflow 2.0 vs. Airflow 1.10

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #12744:
URL: https://github.com/apache/airflow/issues/12744#issuecomment-737106323


   > @potiuk doesn't that mean that we keep them in core and make them available to all users, but they still have to refactor their DAGs (due to import changes)? Should we limit the number of changes required in users' DAGs?
   
   I think moving them to core now is NOT a good idea, and I think most of the "core" operators were moved inside the core anyway - at least changed module names to conform to AIP-21. I do not think there is a big difference whether they moved inside the core, or whether they are moved to providers. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #12744: Difference of extras Airflow 2.0 vs. Airflow 1.10

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #12744:
URL: https://github.com/apache/airflow/issues/12744#issuecomment-737500438


   Anyone know how pip would cope with circular dependencies? I.e. could `apache-airflow` depend upon `apache-airflow-provider-http` (which in turn depends upon `apache-airflow` without giving pip a heart attack?
   
   That we we can have "batteries included" but still keep the advantages of keeping smaller releases/easier updating of providers.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on issue #12744: Difference of extras Airflow 2.0 vs. Airflow 1.10

Posted by GitBox <gi...@apache.org>.
turbaszek commented on issue #12744:
URL: https://github.com/apache/airflow/issues/12744#issuecomment-737098468


   > I do not think we have to move them to the "core".
   
   @potiuk doesn't that mean that we keep them in core and make them available to all users, but they still have to refactor their DAGs (due to import changes)? Should we limit the number of changes required in users' DAGs?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vikramkoka commented on issue #12744: Difference of extras Airflow 2.0 vs. Airflow 1.10

Posted by GitBox <gi...@apache.org>.
vikramkoka commented on issue #12744:
URL: https://github.com/apache/airflow/issues/12744#issuecomment-736935563


   Absolutely agree that http should be part of core. 
   Strongly in favor of ftp as well being part of core, assuming no additional dependencies. 
   Tempted with imap, but unsure on the dependencies. 
   
   Nothing else comes close IMHO


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org