You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@airflow.apache.org by Ash Berlin-Taylor <as...@apache.org> on 2020/12/17 17:35:55 UTC

Apache Airflow 2.0.0 is released!

I am proud to announce that Apache Airflow 2.0.0 has been released.

The source release, as well as the binary "wheel" release (no sdist 
this time), are available here

We also made this version available on PyPi for convenience (`pip 
install apache-airflow`):

šŸ“¦ PyPI: <https://pypi.org/project/apache-airflow/2.0.0>

The documentation is available on:
<https://airflow.apache.org/>
šŸ“š Docs: <http://airflow.apache.org/docs/apache-airflow/2.0.0/>

Docker images will be available shortly -- check out 
<https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0> 
for it to appear


The full changelog is about 3,000 lines long (already excluding 
everything backported to 1.10), so for now Iā€™ll simply share some of 
the major features in 2.0.0 compared to 1.10.14:

*A new way of writing dags: the TaskFlow API (AIP-31)*

(Known in 2.0.0alphas as Functional DAGs.)

DAGs are now much much nicer to author especially when using 
PythonOperator. Dependencies are handled more clearly and XCom is nicer 
to use

Read more here:

TaskFlow API Tutorial 
<http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_apihtml>
TaskFlow API Documentation 
<https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>

A quick teaser of what DAGs can now look like:

```
from airflow.decorators import dag, task
from airflow.utils.dates import days_ago

@dag(default_args={'owner': 'airflow'}, schedule_interval=None, 
start_date=days_ago(2))
def tutorial_taskflow_api_etl():
   @task
   def extract():
       return {"1001": 301.27, "1002": 433.21, "1003": 502.22}

   @task
   def transform(order_data_dict: dict) -> dict:
       total_order_value = 0

       for value in order_data_dict.values():
           total_order_value += value

       return {"total_order_value": total_order_value}

   @task()
   def load(total_order_value: float):

       print("Total order value is: %.2f" % total_order_value)

   order_data = extract()
   order_summary = transform(order_data)
   load(order_summary["total_order_value"])

tutorial_etl_dag = tutorial_taskflow_api_etl()
```

*Fully specified REST API (AIP-32)*

We now have a fully supported, no-longer-experimental API with a 
comprehensive OpenAPI specification

Read more here:

REST API Documentation 
<http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>.

*Massive Scheduler performance improvements*

As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, 
we significantly improved the performance of the Airflow Scheduler. It 
now starts tasks much, MUCH quicker.

Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast 
<https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple 
check the numbers as we donā€™t quite believe them at first!)

*Scheduler is now HA compatible (AIP-15)*

Itā€™s now possible and supported to run more than a single scheduler 
instance. This is super useful for both resiliency (in case a scheduler 
goes down) and scheduling performance.

To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, 
and MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).

Thereā€™s no config or other set up required to run more than one 
schedulerā€”just start up a scheduler somewhere else (ensuring it has 
access to the DAG files) and it will cooperate with your existing 
schedulers through the database.

For more information, read the Scheduler HA documentation 
<http://airflow.apache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>.

*Task Groups (AIP-34)*

SubDAGs were commonly used for grouping tasks in the UI, but they had 
many drawbacks in their execution behaviour (primarirly that they only 
executed a single task in parallel!) To improve this experience, 
weā€™ve introduced ā€œTask Groupsā€: a method for organizing tasks 
which provides the same grouping behaviour as a subdag without any of 
the execution-time drawbacks.

SubDAGs will still work for now, but we think that any previous use of 
SubDAGs can now be replaced with task groups. If you find an example 
where this isnā€™t the case, please let us know by opening an issue on 
GitHub

For more information, check out the Task Group documentation 
<http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>.

*Refreshed UI*

Weā€™ve given the Airflow UI a visual refresh and updated some of the 
styling. Check out the UI section of the docs 
<http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for 
screenshots.

We have also added an option to auto-refresh task states in Graph View 
so you no longer need to continuously press the refresh button :).

## Smart Sensors for reduced load from sensors (AIP-17)

If you make heavy use of sensors in your Airflow cluster, you might 
find that sensor execution takes up a significant proportion of your 
cluster even with ā€œrescheduleā€ mode. To improve this, weā€™ve added 
a new mode called ā€œSmart Sensorsā€.

This feature is in ā€œearly-accessā€: itā€™s been well-tested by 
AirBnB and is ā€œstableā€/usable, but we reserve the right to make 
backwards incompatible changes to it in a future release (if we have 
to. Weā€™ll try very hard not to!)

Read more about it in the Smart Sensors documentation 
<https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>.

*Simplified KubernetesExecutor*

For Airflow 2.0, we have re-architected the KubernetesExecutor in a 
fashion that is simultaneously faster, easier to understand, and more 
flexible for Airflow users. Users will now be able to access the full 
Kubernetes API to create a .yaml pod_template_file instead of 
specifying parameters in their airflow.cfg.

We have also replaced the executor_config dictionary with the 
pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1 
setting override. These changes have removed over three thousand lines 
of code from the KubernetesExecutor, which makes it run faster and 
creates fewer potential errors.

Read more here:

Docs on pod_template_file 
<https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
Docs on pod_override 
<https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>

*Airflow core and providers: Splitting Airflow into 60+ packages*

Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. 
Weā€™ve split Airflow into core and 61 (for now) provider packages. 
Each provider package is for either a particular external service 
(Google, Amazon, Microsoft, Snowflake), a database (Postgres, MySQL), 
or a protocol (HTTP/FTP). Now you can create a custom Airflow 
installation from ā€œbuildingā€ blocks and choose only what you need, 
plus add whatever other requirements you might have. Some of the common 
providers are installed automatically (ftp, http, imap, sqlite) as they 
are commonly used. Other providers are automatically installed when you 
choose appropriate extras when installing Airflow.

The provider architecture should make it much easier to get a fully 
customized, yet consistent runtime with the right set of Python 
dependencies.

But thatā€™s not all: you can write your own custom providers and add 
things like custom connection types, customizations of the Connection 
Forms, and extra links to your operators in a manageable way. You can 
build your own provider and install it as a Python package and have 
your customizations visible right in the Airflow UI.

Our very own Jarek Potiuk has written about providers in much more 
detail <https://www.polidea.com/blog/airflow-2-providers/> on the 
Polidea blog.

Docs on the providers concept and writing custom providers 
<http://airflow.apache.org/docs/apache-airflow-providers/>
Docs on the all providers packages available 
<http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>

*Security*

As part of Airflow 2.0 effort, there has been a conscious focus on 
Security and reducing areas of exposure. This is represented across 
different functional areas in different forms. For example, in the new 
REST API, all operations now require authorization. Similarly, in the 
configuration settings, the Fernet key is now required to be specified.

*Configuration*

Configuration in the form of the airflow.cfg file has been rationalized 
further in distinct sections, specifically around ā€œcoreā€. 
Additionally, a significant amount of configuration options have been 
deprecated or moved to individual component-specific configuration 
files, such as the pod-template-file for Kubernetes execution-related 
configuration.

*Thanks to all of you*

Weā€™ve tried to make as few breaking changes as possible and to 
provide deprecation path in the code, especially in the case of 
anything called in the DAG. That said, please read throughUPDATING.md 
to check what might affect you. For example: r We re-organized the 
layout of operators (they now all live under airflow.providers.*) but 
the old names should continue to work - youā€™ll just notice a lot of 
DeprecationWarnings that need to be fixed up.

Thank you so much to all the contributors who got us to this point, in 
no particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek 
Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang, 
James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of 
others who keep making Airflow better for everyone.


Re: Apache Airflow 2.0.0 is released!

Posted by Karolina RosĆ³Å‚ <ka...@polidea.com>.
Thanks to everyone involved for your hard work and dedication! šŸ„³šŸ‘šŸŽ‰

Karolina RosĆ³Å‚
Polidea <https://www.polidea.com/> | Head of Cloud & OSS

M: +48 606 630 236 <+48606630236>
E: karolina.rosol@polidea.com
[image: Polidea] <https://www.polidea.com/>

Check out our projects! <https://www.polidea.com/our-work>
[image: Github] <https://github.com/Polidea> [image: Facebook]
<https://www.facebook.com/Polidea.Software> [image: Twitter]
<https://twitter.com/polidea> [image: Linkedin]
<https://www.linkedin.com/company/polidea> [image: Instagram]
<https://instagram.com/polidea> [image: Behance]
<https://www.behance.net/polidea> [image: dribbble]
<https://dribbble.com/polideadesign>


On Thu, Dec 17, 2020 at 8:33 PM Manuel Martinez <ma...@gmail.com>
wrote:

> Big THANK YOU to everyone that made this work!
>
> On Thu, Dec 17, 2020 at 12:36 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>
>> I am proud to announce that Apache Airflow 2.0.0 has been released.
>>
>> The source release, as well as the binary "wheel" release (no sdist this
>> time), are available here
>>
>> We also made this version available on PyPi for convenience (`pip install
>> apache-airflow`):
>>
>> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>>
>> The documentation is available on:
>> https://airflow.apache.org/
>> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>>
>> Docker images will be available shortly -- check out
>> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
>> for it to appear
>>
>>
>> The full changelog is about 3,000 lines long (already excluding
>> everything backported to 1.10), so for now Iā€™ll simply share some of the
>> major features in 2.0.0 compared to 1.10.14:
>>
>> *A new way of writing dags: the TaskFlow API (AIP-31)*
>>
>> (Known in 2.0.0alphas as Functional DAGs.)
>>
>> DAGs are now much much nicer to author especially when using
>> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
>> use
>>
>> Read more here:
>>
>> TaskFlow API Tutorial
>> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
>> TaskFlow API Documentation
>> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>>
>> A quick teaser of what DAGs can now look like:
>>
>> ```
>> from airflow.decorators import dag, task
>> from airflow.utils.dates import days_ago
>>
>> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
>> start_date=days_ago(2))
>> def tutorial_taskflow_api_etl():
>>    @task
>>    def extract():
>>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>>
>>    @task
>>    def transform(order_data_dict: dict) -> dict:
>>        total_order_value = 0
>>
>>        for value in order_data_dict.values():
>>            total_order_value += value
>>
>>        return {"total_order_value": total_order_value}
>>
>>    @task()
>>    def load(total_order_value: float):
>>
>>        print("Total order value is: %.2f" % total_order_value)
>>
>>    order_data = extract()
>>    order_summary = transform(order_data)
>>    load(order_summary["total_order_value"])
>>
>> tutorial_etl_dag = tutorial_taskflow_api_etl()
>> ```
>>
>> *Fully specified REST API (AIP-32)*
>>
>> We now have a fully supported, no-longer-experimental API with a
>> comprehensive OpenAPI specification
>>
>> Read more here:
>>
>> REST API Documentation
>> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
>> .
>>
>> *Massive Scheduler performance improvements*
>>
>> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we
>> significantly improved the performance of the Airflow Scheduler. It now
>> starts tasks much, MUCH quicker.
>>
>> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
>> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple
>> check the numbers as we donā€™t quite believe them at first!)
>>
>> *Scheduler is now HA compatible (AIP-15)*
>>
>> Itā€™s now possible and supported to run more than a single scheduler
>> instance. This is super useful for both resiliency (in case a scheduler
>> goes down) and scheduling performance.
>>
>> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5,
>> and MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>>
>> Thereā€™s no config or other set up required to run more than one
>> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
>> to the DAG files) and it will cooperate with your existing schedulers
>> through the database.
>>
>> For more information, read the Scheduler HA documentation
>> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
>> .
>>
>> *Task Groups (AIP-34)*
>>
>> SubDAGs were commonly used for grouping tasks in the UI, but they had
>> many drawbacks in their execution behaviour (primarirly that they only
>> executed a single task in parallel!) To improve this experience, weā€™ve
>> introduced ā€œTask Groupsā€: a method for organizing tasks which provides the
>> same grouping behaviour as a subdag without any of the execution-time
>> drawbacks.
>>
>> SubDAGs will still work for now, but we think that any previous use of
>> SubDAGs can now be replaced with task groups. If you find an example where
>> this isnā€™t the case, please let us know by opening an issue on GitHub
>>
>> For more information, check out the Task Group documentation
>> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
>> .
>>
>> *Refreshed UI*
>>
>> Weā€™ve given the Airflow UI a visual refresh and updated some of the
>> styling. Check out the UI section of the docs
>> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for screenshots.
>>
>> We have also added an option to auto-refresh task states in Graph View so
>> you no longer need to continuously press the refresh button :).
>>
>> ## Smart Sensors for reduced load from sensors (AIP-17)
>>
>> If you make heavy use of sensors in your Airflow cluster, you might find
>> that sensor execution takes up a significant proportion of your cluster
>> even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called
>> ā€œSmart Sensorsā€.
>>
>> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is
>> ā€œstableā€/usable, but we reserve the right to make backwards incompatible
>> changes to it in a future release (if we have to. Weā€™ll try very hard not
>> to!)
>>
>> Read more about it in the Smart Sensors documentation
>> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>
>> .
>>
>> *Simplified KubernetesExecutor*
>>
>> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
>> fashion that is simultaneously faster, easier to understand, and more
>> flexible for Airflow users. Users will now be able to access the full
>> Kubernetes API to create a .yaml pod_template_file instead of specifying
>> parameters in their airflow.cfg.
>>
>> We have also replaced the executor_config dictionary with the
>> pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1
>> setting override. These changes have removed over three thousand lines of
>> code from the KubernetesExecutor, which makes it run faster and creates
>> fewer potential errors.
>>
>> Read more here:
>>
>> Docs on pod_template_file
>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
>> Docs on pod_override
>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>>
>> *Airflow core and providers: Splitting Airflow into 60+ packages*
>>
>> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
>> split Airflow into core and 61 (for now) provider packages. Each provider
>> package is for either a particular external service (Google, Amazon,
>> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
>> (HTTP/FTP). Now you can create a custom Airflow installation from
>> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
>> requirements you might have. Some of the common providers are installed
>> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
>> providers are automatically installed when you choose appropriate extras
>> when installing Airflow.
>>
>> The provider architecture should make it much easier to get a fully
>> customized, yet consistent runtime with the right set of Python
>> dependencies.
>>
>> But thatā€™s not all: you can write your own custom providers and add
>> things like custom connection types, customizations of the Connection
>> Forms, and extra links to your operators in a manageable way. You can build
>> your own provider and install it as a Python package and have your
>> customizations visible right in the Airflow UI.
>>
>> Our very own Jarek Potiuk has written about providers in much more detail
>> <https://www.polidea.com/blog/airflow-2-providers/> on the Polidea blog.
>>
>> Docs on the providers concept and writing custom providers
>> <http://airflow.apache.org/docs/apache-airflow-providers/>
>> Docs on the all providers packages available
>> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>>
>> *Security*
>>
>> As part of Airflow 2.0 effort, there has been a conscious focus on
>> Security and reducing areas of exposure. This is represented across
>> different functional areas in different forms. For example, in the new REST
>> API, all operations now require authorization. Similarly, in the
>> configuration settings, the Fernet key is now required to be specified.
>>
>> *Configuration*
>>
>> Configuration in the form of the airflow.cfg file has been rationalized
>> further in distinct sections, specifically around ā€œcoreā€. Additionally, a
>> significant amount of configuration options have been deprecated or moved
>> to individual component-specific configuration files, such as the
>> pod-template-file for Kubernetes execution-related configuration.
>>
>> *Thanks to all of you*
>>
>> Weā€™ve tried to make as few breaking changes as possible and to provide
>> deprecation path in the code, especially in the case of anything called in
>> the DAG. That said, please read throughUPDATING.md to check what might
>> affect you. For example: r We re-organized the layout of operators (they
>> now all live under airflow.providers.*) but the old names should continue
>> to work - youā€™ll just notice a lot of DeprecationWarnings that need to be
>> fixed up.
>>
>> Thank you so much to all the contributors who got us to this point, in no
>> particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
>> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
>> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
>> who keep making Airflow better for everyone.
>>
>

Re: Apache Airflow 2.0.0 is released!

Posted by Manuel Martinez <ma...@gmail.com>.
Big THANK YOU to everyone that made this work!

On Thu, Dec 17, 2020 at 12:36 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> I am proud to announce that Apache Airflow 2.0.0 has been released.
>
> The source release, as well as the binary "wheel" release (no sdist this
> time), are available here
>
> We also made this version available on PyPi for convenience (`pip install
> apache-airflow`):
>
> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>
> The documentation is available on:
> https://airflow.apache.org/
> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>
> Docker images will be available shortly -- check out
> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
> for it to appear
>
>
> The full changelog is about 3,000 lines long (already excluding everything
> backported to 1.10), so for now Iā€™ll simply share some of the major
> features in 2.0.0 compared to 1.10.14:
>
> *A new way of writing dags: the TaskFlow API (AIP-31)*
>
> (Known in 2.0.0alphas as Functional DAGs.)
>
> DAGs are now much much nicer to author especially when using
> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
> use
>
> Read more here:
>
> TaskFlow API Tutorial
> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
> TaskFlow API Documentation
> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>
> A quick teaser of what DAGs can now look like:
>
> ```
> from airflow.decorators import dag, task
> from airflow.utils.dates import days_ago
>
> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
> start_date=days_ago(2))
> def tutorial_taskflow_api_etl():
>    @task
>    def extract():
>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>
>    @task
>    def transform(order_data_dict: dict) -> dict:
>        total_order_value = 0
>
>        for value in order_data_dict.values():
>            total_order_value += value
>
>        return {"total_order_value": total_order_value}
>
>    @task()
>    def load(total_order_value: float):
>
>        print("Total order value is: %.2f" % total_order_value)
>
>    order_data = extract()
>    order_summary = transform(order_data)
>    load(order_summary["total_order_value"])
>
> tutorial_etl_dag = tutorial_taskflow_api_etl()
> ```
>
> *Fully specified REST API (AIP-32)*
>
> We now have a fully supported, no-longer-experimental API with a
> comprehensive OpenAPI specification
>
> Read more here:
>
> REST API Documentation
> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
> .
>
> *Massive Scheduler performance improvements*
>
> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we
> significantly improved the performance of the Airflow Scheduler. It now
> starts tasks much, MUCH quicker.
>
> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple
> check the numbers as we donā€™t quite believe them at first!)
>
> *Scheduler is now HA compatible (AIP-15)*
>
> Itā€™s now possible and supported to run more than a single scheduler
> instance. This is super useful for both resiliency (in case a scheduler
> goes down) and scheduling performance.
>
> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, and
> MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>
> Thereā€™s no config or other set up required to run more than one
> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
> to the DAG files) and it will cooperate with your existing schedulers
> through the database.
>
> For more information, read the Scheduler HA documentation
> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
> .
>
> *Task Groups (AIP-34)*
>
> SubDAGs were commonly used for grouping tasks in the UI, but they had many
> drawbacks in their execution behaviour (primarirly that they only executed
> a single task in parallel!) To improve this experience, weā€™ve introduced
> ā€œTask Groupsā€: a method for organizing tasks which provides the same
> grouping behaviour as a subdag without any of the execution-time drawbacks.
>
> SubDAGs will still work for now, but we think that any previous use of
> SubDAGs can now be replaced with task groups. If you find an example where
> this isnā€™t the case, please let us know by opening an issue on GitHub
>
> For more information, check out the Task Group documentation
> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
> .
>
> *Refreshed UI*
>
> Weā€™ve given the Airflow UI a visual refresh and updated some of the
> styling. Check out the UI section of the docs
> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for screenshots.
>
> We have also added an option to auto-refresh task states in Graph View so
> you no longer need to continuously press the refresh button :).
>
> ## Smart Sensors for reduced load from sensors (AIP-17)
>
> If you make heavy use of sensors in your Airflow cluster, you might find
> that sensor execution takes up a significant proportion of your cluster
> even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called
> ā€œSmart Sensorsā€.
>
> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is
> ā€œstableā€/usable, but we reserve the right to make backwards incompatible
> changes to it in a future release (if we have to. Weā€™ll try very hard not
> to!)
>
> Read more about it in the Smart Sensors documentation
> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>.
>
> *Simplified KubernetesExecutor*
>
> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
> fashion that is simultaneously faster, easier to understand, and more
> flexible for Airflow users. Users will now be able to access the full
> Kubernetes API to create a .yaml pod_template_file instead of specifying
> parameters in their airflow.cfg.
>
> We have also replaced the executor_config dictionary with the pod_override
> parameter, which takes a Kubernetes V1Pod object for a 1:1 setting
> override. These changes have removed over three thousand lines of code from
> the KubernetesExecutor, which makes it run faster and creates fewer
> potential errors.
>
> Read more here:
>
> Docs on pod_template_file
> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
> Docs on pod_override
> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>
> *Airflow core and providers: Splitting Airflow into 60+ packages*
>
> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
> split Airflow into core and 61 (for now) provider packages. Each provider
> package is for either a particular external service (Google, Amazon,
> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
> (HTTP/FTP). Now you can create a custom Airflow installation from
> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
> requirements you might have. Some of the common providers are installed
> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
> providers are automatically installed when you choose appropriate extras
> when installing Airflow.
>
> The provider architecture should make it much easier to get a fully
> customized, yet consistent runtime with the right set of Python
> dependencies.
>
> But thatā€™s not all: you can write your own custom providers and add things
> like custom connection types, customizations of the Connection Forms, and
> extra links to your operators in a manageable way. You can build your own
> provider and install it as a Python package and have your customizations
> visible right in the Airflow UI.
>
> Our very own Jarek Potiuk has written about providers in much more detail
> <https://www.polidea.com/blog/airflow-2-providers/> on the Polidea blog.
>
> Docs on the providers concept and writing custom providers
> <http://airflow.apache.org/docs/apache-airflow-providers/>
> Docs on the all providers packages available
> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>
> *Security*
>
> As part of Airflow 2.0 effort, there has been a conscious focus on
> Security and reducing areas of exposure. This is represented across
> different functional areas in different forms. For example, in the new REST
> API, all operations now require authorization. Similarly, in the
> configuration settings, the Fernet key is now required to be specified.
>
> *Configuration*
>
> Configuration in the form of the airflow.cfg file has been rationalized
> further in distinct sections, specifically around ā€œcoreā€. Additionally, a
> significant amount of configuration options have been deprecated or moved
> to individual component-specific configuration files, such as the
> pod-template-file for Kubernetes execution-related configuration.
>
> *Thanks to all of you*
>
> Weā€™ve tried to make as few breaking changes as possible and to provide
> deprecation path in the code, especially in the case of anything called in
> the DAG. That said, please read throughUPDATING.md to check what might
> affect you. For example: r We re-organized the layout of operators (they
> now all live under airflow.providers.*) but the old names should continue
> to work - youā€™ll just notice a lot of DeprecationWarnings that need to be
> fixed up.
>
> Thank you so much to all the contributors who got us to this point, in no
> particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
> who keep making Airflow better for everyone.
>

Re: Apache Airflow 2.0.0 is released!

Posted by Vikram Koka <vi...@astronomer.io>.
Great job team, this is awesome!


On Thu, Dec 17, 2020 at 9:48 AM Tomasz Urbaszek <tu...@apache.org>
wrote:

> Wooooooow! This is amazing news. Congrats everyone!
>
> Tomek
>
> On Thu, Dec 17, 2020 at 6:36 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>
>> I am proud to announce that Apache Airflow 2.0.0 has been released.
>>
>> The source release, as well as the binary "wheel" release (no sdist this
>> time), are available here
>>
>> We also made this version available on PyPi for convenience (`pip install
>> apache-airflow`):
>>
>> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>>
>> The documentation is available on:
>> https://airflow.apache.org/
>> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>>
>> Docker images will be available shortly -- check out
>> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
>> for it to appear
>>
>>
>> The full changelog is about 3,000 lines long (already excluding
>> everything backported to 1.10), so for now Iā€™ll simply share some of the
>> major features in 2.0.0 compared to 1.10.14:
>>
>> *A new way of writing dags: the TaskFlow API (AIP-31)*
>>
>> (Known in 2.0.0alphas as Functional DAGs.)
>>
>> DAGs are now much much nicer to author especially when using
>> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
>> use
>>
>> Read more here:
>>
>> TaskFlow API Tutorial
>> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
>> TaskFlow API Documentation
>> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>>
>> A quick teaser of what DAGs can now look like:
>>
>> ```
>> from airflow.decorators import dag, task
>> from airflow.utils.dates import days_ago
>>
>> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
>> start_date=days_ago(2))
>> def tutorial_taskflow_api_etl():
>>    @task
>>    def extract():
>>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>>
>>    @task
>>    def transform(order_data_dict: dict) -> dict:
>>        total_order_value = 0
>>
>>        for value in order_data_dict.values():
>>            total_order_value += value
>>
>>        return {"total_order_value": total_order_value}
>>
>>    @task()
>>    def load(total_order_value: float):
>>
>>        print("Total order value is: %.2f" % total_order_value)
>>
>>    order_data = extract()
>>    order_summary = transform(order_data)
>>    load(order_summary["total_order_value"])
>>
>> tutorial_etl_dag = tutorial_taskflow_api_etl()
>> ```
>>
>> *Fully specified REST API (AIP-32)*
>>
>> We now have a fully supported, no-longer-experimental API with a
>> comprehensive OpenAPI specification
>>
>> Read more here:
>>
>> REST API Documentation
>> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
>> .
>>
>> *Massive Scheduler performance improvements*
>>
>> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we
>> significantly improved the performance of the Airflow Scheduler. It now
>> starts tasks much, MUCH quicker.
>>
>> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
>> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple
>> check the numbers as we donā€™t quite believe them at first!)
>>
>> *Scheduler is now HA compatible (AIP-15)*
>>
>> Itā€™s now possible and supported to run more than a single scheduler
>> instance. This is super useful for both resiliency (in case a scheduler
>> goes down) and scheduling performance.
>>
>> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5,
>> and MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>>
>> Thereā€™s no config or other set up required to run more than one
>> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
>> to the DAG files) and it will cooperate with your existing schedulers
>> through the database.
>>
>> For more information, read the Scheduler HA documentation
>> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
>> .
>>
>> *Task Groups (AIP-34)*
>>
>> SubDAGs were commonly used for grouping tasks in the UI, but they had
>> many drawbacks in their execution behaviour (primarirly that they only
>> executed a single task in parallel!) To improve this experience, weā€™ve
>> introduced ā€œTask Groupsā€: a method for organizing tasks which provides the
>> same grouping behaviour as a subdag without any of the execution-time
>> drawbacks.
>>
>> SubDAGs will still work for now, but we think that any previous use of
>> SubDAGs can now be replaced with task groups. If you find an example where
>> this isnā€™t the case, please let us know by opening an issue on GitHub
>>
>> For more information, check out the Task Group documentation
>> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
>> .
>>
>> *Refreshed UI*
>>
>> Weā€™ve given the Airflow UI a visual refresh and updated some of the
>> styling. Check out the UI section of the docs
>> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for screenshots.
>>
>> We have also added an option to auto-refresh task states in Graph View so
>> you no longer need to continuously press the refresh button :).
>>
>> ## Smart Sensors for reduced load from sensors (AIP-17)
>>
>> If you make heavy use of sensors in your Airflow cluster, you might find
>> that sensor execution takes up a significant proportion of your cluster
>> even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called
>> ā€œSmart Sensorsā€.
>>
>> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is
>> ā€œstableā€/usable, but we reserve the right to make backwards incompatible
>> changes to it in a future release (if we have to. Weā€™ll try very hard not
>> to!)
>>
>> Read more about it in the Smart Sensors documentation
>> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>
>> .
>>
>> *Simplified KubernetesExecutor*
>>
>> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
>> fashion that is simultaneously faster, easier to understand, and more
>> flexible for Airflow users. Users will now be able to access the full
>> Kubernetes API to create a .yaml pod_template_file instead of specifying
>> parameters in their airflow.cfg.
>>
>> We have also replaced the executor_config dictionary with the
>> pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1
>> setting override. These changes have removed over three thousand lines of
>> code from the KubernetesExecutor, which makes it run faster and creates
>> fewer potential errors.
>>
>> Read more here:
>>
>> Docs on pod_template_file
>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
>> Docs on pod_override
>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>>
>> *Airflow core and providers: Splitting Airflow into 60+ packages*
>>
>> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
>> split Airflow into core and 61 (for now) provider packages. Each provider
>> package is for either a particular external service (Google, Amazon,
>> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
>> (HTTP/FTP). Now you can create a custom Airflow installation from
>> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
>> requirements you might have. Some of the common providers are installed
>> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
>> providers are automatically installed when you choose appropriate extras
>> when installing Airflow.
>>
>> The provider architecture should make it much easier to get a fully
>> customized, yet consistent runtime with the right set of Python
>> dependencies.
>>
>> But thatā€™s not all: you can write your own custom providers and add
>> things like custom connection types, customizations of the Connection
>> Forms, and extra links to your operators in a manageable way. You can build
>> your own provider and install it as a Python package and have your
>> customizations visible right in the Airflow UI.
>>
>> Our very own Jarek Potiuk has written about providers in much more detail
>> <https://www.polidea.com/blog/airflow-2-providers/> on the Polidea blog.
>>
>> Docs on the providers concept and writing custom providers
>> <http://airflow.apache.org/docs/apache-airflow-providers/>
>> Docs on the all providers packages available
>> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>>
>> *Security*
>>
>> As part of Airflow 2.0 effort, there has been a conscious focus on
>> Security and reducing areas of exposure. This is represented across
>> different functional areas in different forms. For example, in the new REST
>> API, all operations now require authorization. Similarly, in the
>> configuration settings, the Fernet key is now required to be specified.
>>
>> *Configuration*
>>
>> Configuration in the form of the airflow.cfg file has been rationalized
>> further in distinct sections, specifically around ā€œcoreā€. Additionally, a
>> significant amount of configuration options have been deprecated or moved
>> to individual component-specific configuration files, such as the
>> pod-template-file for Kubernetes execution-related configuration.
>>
>> *Thanks to all of you*
>>
>> Weā€™ve tried to make as few breaking changes as possible and to provide
>> deprecation path in the code, especially in the case of anything called in
>> the DAG. That said, please read throughUPDATING.md to check what might
>> affect you. For example: r We re-organized the layout of operators (they
>> now all live under airflow.providers.*) but the old names should continue
>> to work - youā€™ll just notice a lot of DeprecationWarnings that need to be
>> fixed up.
>>
>> Thank you so much to all the contributors who got us to this point, in no
>> particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
>> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
>> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
>> who keep making Airflow better for everyone.
>>
>

Re: Apache Airflow 2.0.0 is released!

Posted by Tomasz Urbaszek <tu...@apache.org>.
Wooooooow! This is amazing news. Congrats everyone!

Tomek

On Thu, Dec 17, 2020 at 6:36 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> I am proud to announce that Apache Airflow 2.0.0 has been released.
>
> The source release, as well as the binary "wheel" release (no sdist this
> time), are available here
>
> We also made this version available on PyPi for convenience (`pip install
> apache-airflow`):
>
> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>
> The documentation is available on:
> https://airflow.apache.org/
> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>
> Docker images will be available shortly -- check out
> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
> for it to appear
>
>
> The full changelog is about 3,000 lines long (already excluding everything
> backported to 1.10), so for now Iā€™ll simply share some of the major
> features in 2.0.0 compared to 1.10.14:
>
> *A new way of writing dags: the TaskFlow API (AIP-31)*
>
> (Known in 2.0.0alphas as Functional DAGs.)
>
> DAGs are now much much nicer to author especially when using
> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
> use
>
> Read more here:
>
> TaskFlow API Tutorial
> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
> TaskFlow API Documentation
> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>
> A quick teaser of what DAGs can now look like:
>
> ```
> from airflow.decorators import dag, task
> from airflow.utils.dates import days_ago
>
> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
> start_date=days_ago(2))
> def tutorial_taskflow_api_etl():
>    @task
>    def extract():
>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>
>    @task
>    def transform(order_data_dict: dict) -> dict:
>        total_order_value = 0
>
>        for value in order_data_dict.values():
>            total_order_value += value
>
>        return {"total_order_value": total_order_value}
>
>    @task()
>    def load(total_order_value: float):
>
>        print("Total order value is: %.2f" % total_order_value)
>
>    order_data = extract()
>    order_summary = transform(order_data)
>    load(order_summary["total_order_value"])
>
> tutorial_etl_dag = tutorial_taskflow_api_etl()
> ```
>
> *Fully specified REST API (AIP-32)*
>
> We now have a fully supported, no-longer-experimental API with a
> comprehensive OpenAPI specification
>
> Read more here:
>
> REST API Documentation
> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
> .
>
> *Massive Scheduler performance improvements*
>
> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we
> significantly improved the performance of the Airflow Scheduler. It now
> starts tasks much, MUCH quicker.
>
> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple
> check the numbers as we donā€™t quite believe them at first!)
>
> *Scheduler is now HA compatible (AIP-15)*
>
> Itā€™s now possible and supported to run more than a single scheduler
> instance. This is super useful for both resiliency (in case a scheduler
> goes down) and scheduling performance.
>
> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, and
> MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>
> Thereā€™s no config or other set up required to run more than one
> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
> to the DAG files) and it will cooperate with your existing schedulers
> through the database.
>
> For more information, read the Scheduler HA documentation
> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
> .
>
> *Task Groups (AIP-34)*
>
> SubDAGs were commonly used for grouping tasks in the UI, but they had many
> drawbacks in their execution behaviour (primarirly that they only executed
> a single task in parallel!) To improve this experience, weā€™ve introduced
> ā€œTask Groupsā€: a method for organizing tasks which provides the same
> grouping behaviour as a subdag without any of the execution-time drawbacks.
>
> SubDAGs will still work for now, but we think that any previous use of
> SubDAGs can now be replaced with task groups. If you find an example where
> this isnā€™t the case, please let us know by opening an issue on GitHub
>
> For more information, check out the Task Group documentation
> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
> .
>
> *Refreshed UI*
>
> Weā€™ve given the Airflow UI a visual refresh and updated some of the
> styling. Check out the UI section of the docs
> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for screenshots.
>
> We have also added an option to auto-refresh task states in Graph View so
> you no longer need to continuously press the refresh button :).
>
> ## Smart Sensors for reduced load from sensors (AIP-17)
>
> If you make heavy use of sensors in your Airflow cluster, you might find
> that sensor execution takes up a significant proportion of your cluster
> even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called
> ā€œSmart Sensorsā€.
>
> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is
> ā€œstableā€/usable, but we reserve the right to make backwards incompatible
> changes to it in a future release (if we have to. Weā€™ll try very hard not
> to!)
>
> Read more about it in the Smart Sensors documentation
> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>.
>
> *Simplified KubernetesExecutor*
>
> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
> fashion that is simultaneously faster, easier to understand, and more
> flexible for Airflow users. Users will now be able to access the full
> Kubernetes API to create a .yaml pod_template_file instead of specifying
> parameters in their airflow.cfg.
>
> We have also replaced the executor_config dictionary with the pod_override
> parameter, which takes a Kubernetes V1Pod object for a 1:1 setting
> override. These changes have removed over three thousand lines of code from
> the KubernetesExecutor, which makes it run faster and creates fewer
> potential errors.
>
> Read more here:
>
> Docs on pod_template_file
> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
> Docs on pod_override
> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>
> *Airflow core and providers: Splitting Airflow into 60+ packages*
>
> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
> split Airflow into core and 61 (for now) provider packages. Each provider
> package is for either a particular external service (Google, Amazon,
> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
> (HTTP/FTP). Now you can create a custom Airflow installation from
> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
> requirements you might have. Some of the common providers are installed
> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
> providers are automatically installed when you choose appropriate extras
> when installing Airflow.
>
> The provider architecture should make it much easier to get a fully
> customized, yet consistent runtime with the right set of Python
> dependencies.
>
> But thatā€™s not all: you can write your own custom providers and add things
> like custom connection types, customizations of the Connection Forms, and
> extra links to your operators in a manageable way. You can build your own
> provider and install it as a Python package and have your customizations
> visible right in the Airflow UI.
>
> Our very own Jarek Potiuk has written about providers in much more detail
> <https://www.polidea.com/blog/airflow-2-providers/> on the Polidea blog.
>
> Docs on the providers concept and writing custom providers
> <http://airflow.apache.org/docs/apache-airflow-providers/>
> Docs on the all providers packages available
> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>
> *Security*
>
> As part of Airflow 2.0 effort, there has been a conscious focus on
> Security and reducing areas of exposure. This is represented across
> different functional areas in different forms. For example, in the new REST
> API, all operations now require authorization. Similarly, in the
> configuration settings, the Fernet key is now required to be specified.
>
> *Configuration*
>
> Configuration in the form of the airflow.cfg file has been rationalized
> further in distinct sections, specifically around ā€œcoreā€. Additionally, a
> significant amount of configuration options have been deprecated or moved
> to individual component-specific configuration files, such as the
> pod-template-file for Kubernetes execution-related configuration.
>
> *Thanks to all of you*
>
> Weā€™ve tried to make as few breaking changes as possible and to provide
> deprecation path in the code, especially in the case of anything called in
> the DAG. That said, please read throughUPDATING.md to check what might
> affect you. For example: r We re-organized the layout of operators (they
> now all live under airflow.providers.*) but the old names should continue
> to work - youā€™ll just notice a lot of DeprecationWarnings that need to be
> fixed up.
>
> Thank you so much to all the contributors who got us to this point, in no
> particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
> who keep making Airflow better for everyone.
>

Re: Apache Airflow 2.0.0 is released!

Posted by Daniel Standish <dp...@gmail.com>.
Huge congratulations to the committers and the community šŸ‘

UI is beautiful, and scheduler performance was amazing ā¤ļø

Well done šŸ»


On Thu, Dec 17, 2020 at 9:36 AM Ash Berlin-Taylor <as...@apache.org> wrote:

> I am proud to announce that Apache Airflow 2.0.0 has been released.
>
> The source release, as well as the binary "wheel" release (no sdist this
> time), are available here
>
> We also made this version available on PyPi for convenience (`pip install
> apache-airflow`):
>
> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>
> The documentation is available on:
> https://airflow.apache.org/
> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>
> Docker images will be available shortly -- check out
> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
> for it to appear
>
>
> The full changelog is about 3,000 lines long (already excluding everything
> backported to 1.10), so for now Iā€™ll simply share some of the major
> features in 2.0.0 compared to 1.10.14:
>
> *A new way of writing dags: the TaskFlow API (AIP-31)*
>
> (Known in 2.0.0alphas as Functional DAGs.)
>
> DAGs are now much much nicer to author especially when using
> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
> use
>
> Read more here:
>
> TaskFlow API Tutorial
> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
> TaskFlow API Documentation
> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>
> A quick teaser of what DAGs can now look like:
>
> ```
> from airflow.decorators import dag, task
> from airflow.utils.dates import days_ago
>
> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
> start_date=days_ago(2))
> def tutorial_taskflow_api_etl():
>    @task
>    def extract():
>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>
>    @task
>    def transform(order_data_dict: dict) -> dict:
>        total_order_value = 0
>
>        for value in order_data_dict.values():
>            total_order_value += value
>
>        return {"total_order_value": total_order_value}
>
>    @task()
>    def load(total_order_value: float):
>
>        print("Total order value is: %.2f" % total_order_value)
>
>    order_data = extract()
>    order_summary = transform(order_data)
>    load(order_summary["total_order_value"])
>
> tutorial_etl_dag = tutorial_taskflow_api_etl()
> ```
>
> *Fully specified REST API (AIP-32)*
>
> We now have a fully supported, no-longer-experimental API with a
> comprehensive OpenAPI specification
>
> Read more here:
>
> REST API Documentation
> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
> .
>
> *Massive Scheduler performance improvements*
>
> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we
> significantly improved the performance of the Airflow Scheduler. It now
> starts tasks much, MUCH quicker.
>
> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple
> check the numbers as we donā€™t quite believe them at first!)
>
> *Scheduler is now HA compatible (AIP-15)*
>
> Itā€™s now possible and supported to run more than a single scheduler
> instance. This is super useful for both resiliency (in case a scheduler
> goes down) and scheduling performance.
>
> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, and
> MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>
> Thereā€™s no config or other set up required to run more than one
> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
> to the DAG files) and it will cooperate with your existing schedulers
> through the database.
>
> For more information, read the Scheduler HA documentation
> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
> .
>
> *Task Groups (AIP-34)*
>
> SubDAGs were commonly used for grouping tasks in the UI, but they had many
> drawbacks in their execution behaviour (primarirly that they only executed
> a single task in parallel!) To improve this experience, weā€™ve introduced
> ā€œTask Groupsā€: a method for organizing tasks which provides the same
> grouping behaviour as a subdag without any of the execution-time drawbacks.
>
> SubDAGs will still work for now, but we think that any previous use of
> SubDAGs can now be replaced with task groups. If you find an example where
> this isnā€™t the case, please let us know by opening an issue on GitHub
>
> For more information, check out the Task Group documentation
> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
> .
>
> *Refreshed UI*
>
> Weā€™ve given the Airflow UI a visual refresh and updated some of the
> styling. Check out the UI section of the docs
> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for screenshots.
>
> We have also added an option to auto-refresh task states in Graph View so
> you no longer need to continuously press the refresh button :).
>
> ## Smart Sensors for reduced load from sensors (AIP-17)
>
> If you make heavy use of sensors in your Airflow cluster, you might find
> that sensor execution takes up a significant proportion of your cluster
> even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called
> ā€œSmart Sensorsā€.
>
> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is
> ā€œstableā€/usable, but we reserve the right to make backwards incompatible
> changes to it in a future release (if we have to. Weā€™ll try very hard not
> to!)
>
> Read more about it in the Smart Sensors documentation
> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>.
>
> *Simplified KubernetesExecutor*
>
> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
> fashion that is simultaneously faster, easier to understand, and more
> flexible for Airflow users. Users will now be able to access the full
> Kubernetes API to create a .yaml pod_template_file instead of specifying
> parameters in their airflow.cfg.
>
> We have also replaced the executor_config dictionary with the pod_override
> parameter, which takes a Kubernetes V1Pod object for a 1:1 setting
> override. These changes have removed over three thousand lines of code from
> the KubernetesExecutor, which makes it run faster and creates fewer
> potential errors.
>
> Read more here:
>
> Docs on pod_template_file
> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
> Docs on pod_override
> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>
> *Airflow core and providers: Splitting Airflow into 60+ packages*
>
> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
> split Airflow into core and 61 (for now) provider packages. Each provider
> package is for either a particular external service (Google, Amazon,
> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
> (HTTP/FTP). Now you can create a custom Airflow installation from
> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
> requirements you might have. Some of the common providers are installed
> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
> providers are automatically installed when you choose appropriate extras
> when installing Airflow.
>
> The provider architecture should make it much easier to get a fully
> customized, yet consistent runtime with the right set of Python
> dependencies.
>
> But thatā€™s not all: you can write your own custom providers and add things
> like custom connection types, customizations of the Connection Forms, and
> extra links to your operators in a manageable way. You can build your own
> provider and install it as a Python package and have your customizations
> visible right in the Airflow UI.
>
> Our very own Jarek Potiuk has written about providers in much more detail
> <https://www.polidea.com/blog/airflow-2-providers/> on the Polidea blog.
>
> Docs on the providers concept and writing custom providers
> <http://airflow.apache.org/docs/apache-airflow-providers/>
> Docs on the all providers packages available
> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>
> *Security*
>
> As part of Airflow 2.0 effort, there has been a conscious focus on
> Security and reducing areas of exposure. This is represented across
> different functional areas in different forms. For example, in the new REST
> API, all operations now require authorization. Similarly, in the
> configuration settings, the Fernet key is now required to be specified.
>
> *Configuration*
>
> Configuration in the form of the airflow.cfg file has been rationalized
> further in distinct sections, specifically around ā€œcoreā€. Additionally, a
> significant amount of configuration options have been deprecated or moved
> to individual component-specific configuration files, such as the
> pod-template-file for Kubernetes execution-related configuration.
>
> *Thanks to all of you*
>
> Weā€™ve tried to make as few breaking changes as possible and to provide
> deprecation path in the code, especially in the case of anything called in
> the DAG. That said, please read throughUPDATING.md to check what might
> affect you. For example: r We re-organized the layout of operators (they
> now all live under airflow.providers.*) but the old names should continue
> to work - youā€™ll just notice a lot of DeprecationWarnings that need to be
> fixed up.
>
> Thank you so much to all the contributors who got us to this point, in no
> particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
> who keep making Airflow better for everyone.
>

Re: Apache Airflow 2.0.0 is released!

Posted by Michał Słowikowski <mi...@polidea.com>.
Congrat folks, this is great news!
[image: 200w.webp]

On Thu, Dec 17, 2020 at 11:25 PM Eugen Kosteev <eu...@kosteev.com> wrote:

> šŸ“£šŸŽ‡
>
> On Fri, Dec 18, 2020 at 12:16 AM Sid Anand <sa...@apache.org> wrote:
>
>> Woot!!! Wonderful work everyone. A truly long-awaited milestone for the
>> project -- almost since the beginning of incubation itself!
>>
>> -s
>>
>> On Thu, Dec 17, 2020 at 2:14 PM Aizhamal Nurmamat kyzy <
>> aizhamal@apache.org> wrote:
>>
>>> Thanks to everyone who put an incredible amount of work into making this
>>> happen! šŸŽ‰ šŸŽŠ
>>>
>>> On Thu, Dec 17, 2020 at 1:58 PM Xinbin Huang <bi...@gmail.com>
>>> wrote:
>>>
>>>> Amazing to see this! šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰
>>>>
>>>> On Thu, Dec 17, 2020 at 1:54 PM kumar pavan <pa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Congrats EveryOne
>>>>>
>>>>>
>>>>> Thanks & Regards
>>>>> Pavan
>>>>>
>>>>>
>>>>> On Thu, Dec 17, 2020 at 12:36 PM Ash Berlin-Taylor <as...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> I am proud to announce that Apache Airflow 2.0.0 has been released.
>>>>>>
>>>>>> The source release, as well as the binary "wheel" release (no sdist
>>>>>> this time), are available here
>>>>>>
>>>>>> We also made this version available on PyPi for convenience (`pip
>>>>>> install apache-airflow`):
>>>>>>
>>>>>> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>>>>>>
>>>>>> The documentation is available on:
>>>>>> https://airflow.apache.org/
>>>>>> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>>>>>>
>>>>>> Docker images will be available shortly -- check out
>>>>>> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
>>>>>> for it to appear
>>>>>>
>>>>>>
>>>>>> The full changelog is about 3,000 lines long (already excluding
>>>>>> everything backported to 1.10), so for now Iā€™ll simply share some of the
>>>>>> major features in 2.0.0 compared to 1.10.14:
>>>>>>
>>>>>> *A new way of writing dags: the TaskFlow API (AIP-31)*
>>>>>>
>>>>>> (Known in 2.0.0alphas as Functional DAGs.)
>>>>>>
>>>>>> DAGs are now much much nicer to author especially when using
>>>>>> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
>>>>>> use
>>>>>>
>>>>>> Read more here:
>>>>>>
>>>>>> TaskFlow API Tutorial
>>>>>> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
>>>>>> TaskFlow API Documentation
>>>>>> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>>>>>>
>>>>>> A quick teaser of what DAGs can now look like:
>>>>>>
>>>>>> ```
>>>>>> from airflow.decorators import dag, task
>>>>>> from airflow.utils.dates import days_ago
>>>>>>
>>>>>> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
>>>>>> start_date=days_ago(2))
>>>>>> def tutorial_taskflow_api_etl():
>>>>>>    @task
>>>>>>    def extract():
>>>>>>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>>>>>>
>>>>>>    @task
>>>>>>    def transform(order_data_dict: dict) -> dict:
>>>>>>        total_order_value = 0
>>>>>>
>>>>>>        for value in order_data_dict.values():
>>>>>>            total_order_value += value
>>>>>>
>>>>>>        return {"total_order_value": total_order_value}
>>>>>>
>>>>>>    @task()
>>>>>>    def load(total_order_value: float):
>>>>>>
>>>>>>        print("Total order value is: %.2f" % total_order_value)
>>>>>>
>>>>>>    order_data = extract()
>>>>>>    order_summary = transform(order_data)
>>>>>>    load(order_summary["total_order_value"])
>>>>>>
>>>>>> tutorial_etl_dag = tutorial_taskflow_api_etl()
>>>>>> ```
>>>>>>
>>>>>> *Fully specified REST API (AIP-32)*
>>>>>>
>>>>>> We now have a fully supported, no-longer-experimental API with a
>>>>>> comprehensive OpenAPI specification
>>>>>>
>>>>>> Read more here:
>>>>>>
>>>>>> REST API Documentation
>>>>>> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
>>>>>> .
>>>>>>
>>>>>> *Massive Scheduler performance improvements*
>>>>>>
>>>>>> As part of AIP-15 (Scheduler HA+performance) and other work Kamil
>>>>>> did, we significantly improved the performance of the Airflow Scheduler. It
>>>>>> now starts tasks much, MUCH quicker.
>>>>>>
>>>>>> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
>>>>>> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to
>>>>>> triple check the numbers as we donā€™t quite believe them at first!)
>>>>>>
>>>>>> *Scheduler is now HA compatible (AIP-15)*
>>>>>>
>>>>>> Itā€™s now possible and supported to run more than a single scheduler
>>>>>> instance. This is super useful for both resiliency (in case a scheduler
>>>>>> goes down) and scheduling performance.
>>>>>>
>>>>>> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL
>>>>>> 5, and MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>>>>>>
>>>>>> Thereā€™s no config or other set up required to run more than one
>>>>>> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
>>>>>> to the DAG files) and it will cooperate with your existing schedulers
>>>>>> through the database.
>>>>>>
>>>>>> For more information, read the Scheduler HA documentation
>>>>>> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
>>>>>> .
>>>>>>
>>>>>> *Task Groups (AIP-34)*
>>>>>>
>>>>>> SubDAGs were commonly used for grouping tasks in the UI, but they had
>>>>>> many drawbacks in their execution behaviour (primarirly that they only
>>>>>> executed a single task in parallel!) To improve this experience, weā€™ve
>>>>>> introduced ā€œTask Groupsā€: a method for organizing tasks which provides the
>>>>>> same grouping behaviour as a subdag without any of the execution-time
>>>>>> drawbacks.
>>>>>>
>>>>>> SubDAGs will still work for now, but we think that any previous use
>>>>>> of SubDAGs can now be replaced with task groups. If you find an example
>>>>>> where this isnā€™t the case, please let us know by opening an issue on GitHub
>>>>>>
>>>>>> For more information, check out the Task Group documentation
>>>>>> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
>>>>>> .
>>>>>>
>>>>>> *Refreshed UI*
>>>>>>
>>>>>> Weā€™ve given the Airflow UI a visual refresh and updated some of the
>>>>>> styling. Check out the UI section of the docs
>>>>>> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for
>>>>>> screenshots.
>>>>>>
>>>>>> We have also added an option to auto-refresh task states in Graph
>>>>>> View so you no longer need to continuously press the refresh button :).
>>>>>>
>>>>>> ## Smart Sensors for reduced load from sensors (AIP-17)
>>>>>>
>>>>>> If you make heavy use of sensors in your Airflow cluster, you might
>>>>>> find that sensor execution takes up a significant proportion of your
>>>>>> cluster even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new
>>>>>> mode called ā€œSmart Sensorsā€.
>>>>>>
>>>>>> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB
>>>>>> and is ā€œstableā€/usable, but we reserve the right to make backwards
>>>>>> incompatible changes to it in a future release (if we have to. Weā€™ll try
>>>>>> very hard not to!)
>>>>>>
>>>>>> Read more about it in the Smart Sensors documentation
>>>>>> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>
>>>>>> .
>>>>>>
>>>>>> *Simplified KubernetesExecutor*
>>>>>>
>>>>>> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
>>>>>> fashion that is simultaneously faster, easier to understand, and more
>>>>>> flexible for Airflow users. Users will now be able to access the full
>>>>>> Kubernetes API to create a .yaml pod_template_file instead of specifying
>>>>>> parameters in their airflow.cfg.
>>>>>>
>>>>>> We have also replaced the executor_config dictionary with the
>>>>>> pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1
>>>>>> setting override. These changes have removed over three thousand lines of
>>>>>> code from the KubernetesExecutor, which makes it run faster and creates
>>>>>> fewer potential errors.
>>>>>>
>>>>>> Read more here:
>>>>>>
>>>>>> Docs on pod_template_file
>>>>>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
>>>>>> Docs on pod_override
>>>>>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>>>>>>
>>>>>> *Airflow core and providers: Splitting Airflow into 60+ packages*
>>>>>>
>>>>>> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
>>>>>> split Airflow into core and 61 (for now) provider packages. Each provider
>>>>>> package is for either a particular external service (Google, Amazon,
>>>>>> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
>>>>>> (HTTP/FTP). Now you can create a custom Airflow installation from
>>>>>> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
>>>>>> requirements you might have. Some of the common providers are installed
>>>>>> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
>>>>>> providers are automatically installed when you choose appropriate extras
>>>>>> when installing Airflow.
>>>>>>
>>>>>> The provider architecture should make it much easier to get a fully
>>>>>> customized, yet consistent runtime with the right set of Python
>>>>>> dependencies.
>>>>>>
>>>>>> But thatā€™s not all: you can write your own custom providers and add
>>>>>> things like custom connection types, customizations of the Connection
>>>>>> Forms, and extra links to your operators in a manageable way. You can build
>>>>>> your own provider and install it as a Python package and have your
>>>>>> customizations visible right in the Airflow UI.
>>>>>>
>>>>>> Our very own Jarek Potiuk has written about providers in much more
>>>>>> detail <https://www.polidea.com/blog/airflow-2-providers/> on the
>>>>>> Polidea blog.
>>>>>>
>>>>>> Docs on the providers concept and writing custom providers
>>>>>> <http://airflow.apache.org/docs/apache-airflow-providers/>
>>>>>> Docs on the all providers packages available
>>>>>> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>>>>>>
>>>>>> *Security*
>>>>>>
>>>>>> As part of Airflow 2.0 effort, there has been a conscious focus on
>>>>>> Security and reducing areas of exposure. This is represented across
>>>>>> different functional areas in different forms. For example, in the new REST
>>>>>> API, all operations now require authorization. Similarly, in the
>>>>>> configuration settings, the Fernet key is now required to be specified.
>>>>>>
>>>>>> *Configuration*
>>>>>>
>>>>>> Configuration in the form of the airflow.cfg file has been
>>>>>> rationalized further in distinct sections, specifically around ā€œcoreā€.
>>>>>> Additionally, a significant amount of configuration options have been
>>>>>> deprecated or moved to individual component-specific configuration files,
>>>>>> such as the pod-template-file for Kubernetes execution-related
>>>>>> configuration.
>>>>>>
>>>>>> *Thanks to all of you*
>>>>>>
>>>>>> Weā€™ve tried to make as few breaking changes as possible and to
>>>>>> provide deprecation path in the code, especially in the case of anything
>>>>>> called in the DAG. That said, please read throughUPDATING.md to check what
>>>>>> might affect you. For example: r We re-organized the layout of operators
>>>>>> (they now all live under airflow.providers.*) but the old names should
>>>>>> continue to work - youā€™ll just notice a lot of DeprecationWarnings that
>>>>>> need to be fixed up.
>>>>>>
>>>>>> Thank you so much to all the contributors who got us to this point,
>>>>>> in no particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
>>>>>> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
>>>>>> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
>>>>>> who keep making Airflow better for everyone.
>>>>>>
>>>>>
>
> --
> Eugene
>


-- 

Michał Słowikowski
Polidea <https://www.polidea.com/> | Junior Software Engineer

E: michal.slowikowski@polidea.com

Unique Tech
Check out our projects! <https://www.polidea.com/our-work>

Re: Apache Airflow 2.0.0 is released!

Posted by Eugen Kosteev <eu...@kosteev.com>.
šŸ“£šŸŽ‡

On Fri, Dec 18, 2020 at 12:16 AM Sid Anand <sa...@apache.org> wrote:

> Woot!!! Wonderful work everyone. A truly long-awaited milestone for the
> project -- almost since the beginning of incubation itself!
>
> -s
>
> On Thu, Dec 17, 2020 at 2:14 PM Aizhamal Nurmamat kyzy <
> aizhamal@apache.org> wrote:
>
>> Thanks to everyone who put an incredible amount of work into making this
>> happen! šŸŽ‰ šŸŽŠ
>>
>> On Thu, Dec 17, 2020 at 1:58 PM Xinbin Huang <bi...@gmail.com>
>> wrote:
>>
>>> Amazing to see this! šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰
>>>
>>> On Thu, Dec 17, 2020 at 1:54 PM kumar pavan <pa...@gmail.com>
>>> wrote:
>>>
>>>> Congrats EveryOne
>>>>
>>>>
>>>> Thanks & Regards
>>>> Pavan
>>>>
>>>>
>>>> On Thu, Dec 17, 2020 at 12:36 PM Ash Berlin-Taylor <as...@apache.org>
>>>> wrote:
>>>>
>>>>> I am proud to announce that Apache Airflow 2.0.0 has been released.
>>>>>
>>>>> The source release, as well as the binary "wheel" release (no sdist
>>>>> this time), are available here
>>>>>
>>>>> We also made this version available on PyPi for convenience (`pip
>>>>> install apache-airflow`):
>>>>>
>>>>> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>>>>>
>>>>> The documentation is available on:
>>>>> https://airflow.apache.org/
>>>>> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>>>>>
>>>>> Docker images will be available shortly -- check out
>>>>> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
>>>>> for it to appear
>>>>>
>>>>>
>>>>> The full changelog is about 3,000 lines long (already excluding
>>>>> everything backported to 1.10), so for now Iā€™ll simply share some of the
>>>>> major features in 2.0.0 compared to 1.10.14:
>>>>>
>>>>> *A new way of writing dags: the TaskFlow API (AIP-31)*
>>>>>
>>>>> (Known in 2.0.0alphas as Functional DAGs.)
>>>>>
>>>>> DAGs are now much much nicer to author especially when using
>>>>> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
>>>>> use
>>>>>
>>>>> Read more here:
>>>>>
>>>>> TaskFlow API Tutorial
>>>>> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
>>>>> TaskFlow API Documentation
>>>>> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>>>>>
>>>>> A quick teaser of what DAGs can now look like:
>>>>>
>>>>> ```
>>>>> from airflow.decorators import dag, task
>>>>> from airflow.utils.dates import days_ago
>>>>>
>>>>> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
>>>>> start_date=days_ago(2))
>>>>> def tutorial_taskflow_api_etl():
>>>>>    @task
>>>>>    def extract():
>>>>>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>>>>>
>>>>>    @task
>>>>>    def transform(order_data_dict: dict) -> dict:
>>>>>        total_order_value = 0
>>>>>
>>>>>        for value in order_data_dict.values():
>>>>>            total_order_value += value
>>>>>
>>>>>        return {"total_order_value": total_order_value}
>>>>>
>>>>>    @task()
>>>>>    def load(total_order_value: float):
>>>>>
>>>>>        print("Total order value is: %.2f" % total_order_value)
>>>>>
>>>>>    order_data = extract()
>>>>>    order_summary = transform(order_data)
>>>>>    load(order_summary["total_order_value"])
>>>>>
>>>>> tutorial_etl_dag = tutorial_taskflow_api_etl()
>>>>> ```
>>>>>
>>>>> *Fully specified REST API (AIP-32)*
>>>>>
>>>>> We now have a fully supported, no-longer-experimental API with a
>>>>> comprehensive OpenAPI specification
>>>>>
>>>>> Read more here:
>>>>>
>>>>> REST API Documentation
>>>>> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
>>>>> .
>>>>>
>>>>> *Massive Scheduler performance improvements*
>>>>>
>>>>> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did,
>>>>> we significantly improved the performance of the Airflow Scheduler. It now
>>>>> starts tasks much, MUCH quicker.
>>>>>
>>>>> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
>>>>> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to
>>>>> triple check the numbers as we donā€™t quite believe them at first!)
>>>>>
>>>>> *Scheduler is now HA compatible (AIP-15)*
>>>>>
>>>>> Itā€™s now possible and supported to run more than a single scheduler
>>>>> instance. This is super useful for both resiliency (in case a scheduler
>>>>> goes down) and scheduling performance.
>>>>>
>>>>> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5,
>>>>> and MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>>>>>
>>>>> Thereā€™s no config or other set up required to run more than one
>>>>> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
>>>>> to the DAG files) and it will cooperate with your existing schedulers
>>>>> through the database.
>>>>>
>>>>> For more information, read the Scheduler HA documentation
>>>>> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
>>>>> .
>>>>>
>>>>> *Task Groups (AIP-34)*
>>>>>
>>>>> SubDAGs were commonly used for grouping tasks in the UI, but they had
>>>>> many drawbacks in their execution behaviour (primarirly that they only
>>>>> executed a single task in parallel!) To improve this experience, weā€™ve
>>>>> introduced ā€œTask Groupsā€: a method for organizing tasks which provides the
>>>>> same grouping behaviour as a subdag without any of the execution-time
>>>>> drawbacks.
>>>>>
>>>>> SubDAGs will still work for now, but we think that any previous use of
>>>>> SubDAGs can now be replaced with task groups. If you find an example where
>>>>> this isnā€™t the case, please let us know by opening an issue on GitHub
>>>>>
>>>>> For more information, check out the Task Group documentation
>>>>> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
>>>>> .
>>>>>
>>>>> *Refreshed UI*
>>>>>
>>>>> Weā€™ve given the Airflow UI a visual refresh and updated some of the
>>>>> styling. Check out the UI section of the docs
>>>>> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for
>>>>> screenshots.
>>>>>
>>>>> We have also added an option to auto-refresh task states in Graph View
>>>>> so you no longer need to continuously press the refresh button :).
>>>>>
>>>>> ## Smart Sensors for reduced load from sensors (AIP-17)
>>>>>
>>>>> If you make heavy use of sensors in your Airflow cluster, you might
>>>>> find that sensor execution takes up a significant proportion of your
>>>>> cluster even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new
>>>>> mode called ā€œSmart Sensorsā€.
>>>>>
>>>>> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and
>>>>> is ā€œstableā€/usable, but we reserve the right to make backwards incompatible
>>>>> changes to it in a future release (if we have to. Weā€™ll try very hard not
>>>>> to!)
>>>>>
>>>>> Read more about it in the Smart Sensors documentation
>>>>> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>
>>>>> .
>>>>>
>>>>> *Simplified KubernetesExecutor*
>>>>>
>>>>> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
>>>>> fashion that is simultaneously faster, easier to understand, and more
>>>>> flexible for Airflow users. Users will now be able to access the full
>>>>> Kubernetes API to create a .yaml pod_template_file instead of specifying
>>>>> parameters in their airflow.cfg.
>>>>>
>>>>> We have also replaced the executor_config dictionary with the
>>>>> pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1
>>>>> setting override. These changes have removed over three thousand lines of
>>>>> code from the KubernetesExecutor, which makes it run faster and creates
>>>>> fewer potential errors.
>>>>>
>>>>> Read more here:
>>>>>
>>>>> Docs on pod_template_file
>>>>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
>>>>> Docs on pod_override
>>>>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>>>>>
>>>>> *Airflow core and providers: Splitting Airflow into 60+ packages*
>>>>>
>>>>> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
>>>>> split Airflow into core and 61 (for now) provider packages. Each provider
>>>>> package is for either a particular external service (Google, Amazon,
>>>>> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
>>>>> (HTTP/FTP). Now you can create a custom Airflow installation from
>>>>> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
>>>>> requirements you might have. Some of the common providers are installed
>>>>> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
>>>>> providers are automatically installed when you choose appropriate extras
>>>>> when installing Airflow.
>>>>>
>>>>> The provider architecture should make it much easier to get a fully
>>>>> customized, yet consistent runtime with the right set of Python
>>>>> dependencies.
>>>>>
>>>>> But thatā€™s not all: you can write your own custom providers and add
>>>>> things like custom connection types, customizations of the Connection
>>>>> Forms, and extra links to your operators in a manageable way. You can build
>>>>> your own provider and install it as a Python package and have your
>>>>> customizations visible right in the Airflow UI.
>>>>>
>>>>> Our very own Jarek Potiuk has written about providers in much more
>>>>> detail <https://www.polidea.com/blog/airflow-2-providers/> on the
>>>>> Polidea blog.
>>>>>
>>>>> Docs on the providers concept and writing custom providers
>>>>> <http://airflow.apache.org/docs/apache-airflow-providers/>
>>>>> Docs on the all providers packages available
>>>>> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>>>>>
>>>>> *Security*
>>>>>
>>>>> As part of Airflow 2.0 effort, there has been a conscious focus on
>>>>> Security and reducing areas of exposure. This is represented across
>>>>> different functional areas in different forms. For example, in the new REST
>>>>> API, all operations now require authorization. Similarly, in the
>>>>> configuration settings, the Fernet key is now required to be specified.
>>>>>
>>>>> *Configuration*
>>>>>
>>>>> Configuration in the form of the airflow.cfg file has been
>>>>> rationalized further in distinct sections, specifically around ā€œcoreā€.
>>>>> Additionally, a significant amount of configuration options have been
>>>>> deprecated or moved to individual component-specific configuration files,
>>>>> such as the pod-template-file for Kubernetes execution-related
>>>>> configuration.
>>>>>
>>>>> *Thanks to all of you*
>>>>>
>>>>> Weā€™ve tried to make as few breaking changes as possible and to provide
>>>>> deprecation path in the code, especially in the case of anything called in
>>>>> the DAG. That said, please read throughUPDATING.md to check what might
>>>>> affect you. For example: r We re-organized the layout of operators (they
>>>>> now all live under airflow.providers.*) but the old names should continue
>>>>> to work - youā€™ll just notice a lot of DeprecationWarnings that need to be
>>>>> fixed up.
>>>>>
>>>>> Thank you so much to all the contributors who got us to this point, in
>>>>> no particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
>>>>> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
>>>>> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
>>>>> who keep making Airflow better for everyone.
>>>>>
>>>>

-- 
Eugene

Re: Apache Airflow 2.0.0 is released!

Posted by Sid Anand <sa...@apache.org>.
Woot!!! Wonderful work everyone. A truly long-awaited milestone for the
project -- almost since the beginning of incubation itself!

-s

On Thu, Dec 17, 2020 at 2:14 PM Aizhamal Nurmamat kyzy <ai...@apache.org>
wrote:

> Thanks to everyone who put an incredible amount of work into making this
> happen! šŸŽ‰ šŸŽŠ
>
> On Thu, Dec 17, 2020 at 1:58 PM Xinbin Huang <bi...@gmail.com>
> wrote:
>
>> Amazing to see this! šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰
>>
>> On Thu, Dec 17, 2020 at 1:54 PM kumar pavan <pa...@gmail.com>
>> wrote:
>>
>>> Congrats EveryOne
>>>
>>>
>>> Thanks & Regards
>>> Pavan
>>>
>>>
>>> On Thu, Dec 17, 2020 at 12:36 PM Ash Berlin-Taylor <as...@apache.org>
>>> wrote:
>>>
>>>> I am proud to announce that Apache Airflow 2.0.0 has been released.
>>>>
>>>> The source release, as well as the binary "wheel" release (no sdist
>>>> this time), are available here
>>>>
>>>> We also made this version available on PyPi for convenience (`pip
>>>> install apache-airflow`):
>>>>
>>>> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>>>>
>>>> The documentation is available on:
>>>> https://airflow.apache.org/
>>>> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>>>>
>>>> Docker images will be available shortly -- check out
>>>> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
>>>> for it to appear
>>>>
>>>>
>>>> The full changelog is about 3,000 lines long (already excluding
>>>> everything backported to 1.10), so for now Iā€™ll simply share some of the
>>>> major features in 2.0.0 compared to 1.10.14:
>>>>
>>>> *A new way of writing dags: the TaskFlow API (AIP-31)*
>>>>
>>>> (Known in 2.0.0alphas as Functional DAGs.)
>>>>
>>>> DAGs are now much much nicer to author especially when using
>>>> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
>>>> use
>>>>
>>>> Read more here:
>>>>
>>>> TaskFlow API Tutorial
>>>> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
>>>> TaskFlow API Documentation
>>>> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>>>>
>>>> A quick teaser of what DAGs can now look like:
>>>>
>>>> ```
>>>> from airflow.decorators import dag, task
>>>> from airflow.utils.dates import days_ago
>>>>
>>>> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
>>>> start_date=days_ago(2))
>>>> def tutorial_taskflow_api_etl():
>>>>    @task
>>>>    def extract():
>>>>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>>>>
>>>>    @task
>>>>    def transform(order_data_dict: dict) -> dict:
>>>>        total_order_value = 0
>>>>
>>>>        for value in order_data_dict.values():
>>>>            total_order_value += value
>>>>
>>>>        return {"total_order_value": total_order_value}
>>>>
>>>>    @task()
>>>>    def load(total_order_value: float):
>>>>
>>>>        print("Total order value is: %.2f" % total_order_value)
>>>>
>>>>    order_data = extract()
>>>>    order_summary = transform(order_data)
>>>>    load(order_summary["total_order_value"])
>>>>
>>>> tutorial_etl_dag = tutorial_taskflow_api_etl()
>>>> ```
>>>>
>>>> *Fully specified REST API (AIP-32)*
>>>>
>>>> We now have a fully supported, no-longer-experimental API with a
>>>> comprehensive OpenAPI specification
>>>>
>>>> Read more here:
>>>>
>>>> REST API Documentation
>>>> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
>>>> .
>>>>
>>>> *Massive Scheduler performance improvements*
>>>>
>>>> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did,
>>>> we significantly improved the performance of the Airflow Scheduler. It now
>>>> starts tasks much, MUCH quicker.
>>>>
>>>> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
>>>> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple
>>>> check the numbers as we donā€™t quite believe them at first!)
>>>>
>>>> *Scheduler is now HA compatible (AIP-15)*
>>>>
>>>> Itā€™s now possible and supported to run more than a single scheduler
>>>> instance. This is super useful for both resiliency (in case a scheduler
>>>> goes down) and scheduling performance.
>>>>
>>>> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5,
>>>> and MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>>>>
>>>> Thereā€™s no config or other set up required to run more than one
>>>> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
>>>> to the DAG files) and it will cooperate with your existing schedulers
>>>> through the database.
>>>>
>>>> For more information, read the Scheduler HA documentation
>>>> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
>>>> .
>>>>
>>>> *Task Groups (AIP-34)*
>>>>
>>>> SubDAGs were commonly used for grouping tasks in the UI, but they had
>>>> many drawbacks in their execution behaviour (primarirly that they only
>>>> executed a single task in parallel!) To improve this experience, weā€™ve
>>>> introduced ā€œTask Groupsā€: a method for organizing tasks which provides the
>>>> same grouping behaviour as a subdag without any of the execution-time
>>>> drawbacks.
>>>>
>>>> SubDAGs will still work for now, but we think that any previous use of
>>>> SubDAGs can now be replaced with task groups. If you find an example where
>>>> this isnā€™t the case, please let us know by opening an issue on GitHub
>>>>
>>>> For more information, check out the Task Group documentation
>>>> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
>>>> .
>>>>
>>>> *Refreshed UI*
>>>>
>>>> Weā€™ve given the Airflow UI a visual refresh and updated some of the
>>>> styling. Check out the UI section of the docs
>>>> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for
>>>> screenshots.
>>>>
>>>> We have also added an option to auto-refresh task states in Graph View
>>>> so you no longer need to continuously press the refresh button :).
>>>>
>>>> ## Smart Sensors for reduced load from sensors (AIP-17)
>>>>
>>>> If you make heavy use of sensors in your Airflow cluster, you might
>>>> find that sensor execution takes up a significant proportion of your
>>>> cluster even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new
>>>> mode called ā€œSmart Sensorsā€.
>>>>
>>>> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and
>>>> is ā€œstableā€/usable, but we reserve the right to make backwards incompatible
>>>> changes to it in a future release (if we have to. Weā€™ll try very hard not
>>>> to!)
>>>>
>>>> Read more about it in the Smart Sensors documentation
>>>> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>
>>>> .
>>>>
>>>> *Simplified KubernetesExecutor*
>>>>
>>>> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
>>>> fashion that is simultaneously faster, easier to understand, and more
>>>> flexible for Airflow users. Users will now be able to access the full
>>>> Kubernetes API to create a .yaml pod_template_file instead of specifying
>>>> parameters in their airflow.cfg.
>>>>
>>>> We have also replaced the executor_config dictionary with the
>>>> pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1
>>>> setting override. These changes have removed over three thousand lines of
>>>> code from the KubernetesExecutor, which makes it run faster and creates
>>>> fewer potential errors.
>>>>
>>>> Read more here:
>>>>
>>>> Docs on pod_template_file
>>>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
>>>> Docs on pod_override
>>>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>>>>
>>>> *Airflow core and providers: Splitting Airflow into 60+ packages*
>>>>
>>>> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
>>>> split Airflow into core and 61 (for now) provider packages. Each provider
>>>> package is for either a particular external service (Google, Amazon,
>>>> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
>>>> (HTTP/FTP). Now you can create a custom Airflow installation from
>>>> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
>>>> requirements you might have. Some of the common providers are installed
>>>> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
>>>> providers are automatically installed when you choose appropriate extras
>>>> when installing Airflow.
>>>>
>>>> The provider architecture should make it much easier to get a fully
>>>> customized, yet consistent runtime with the right set of Python
>>>> dependencies.
>>>>
>>>> But thatā€™s not all: you can write your own custom providers and add
>>>> things like custom connection types, customizations of the Connection
>>>> Forms, and extra links to your operators in a manageable way. You can build
>>>> your own provider and install it as a Python package and have your
>>>> customizations visible right in the Airflow UI.
>>>>
>>>> Our very own Jarek Potiuk has written about providers in much more
>>>> detail <https://www.polidea.com/blog/airflow-2-providers/> on the
>>>> Polidea blog.
>>>>
>>>> Docs on the providers concept and writing custom providers
>>>> <http://airflow.apache.org/docs/apache-airflow-providers/>
>>>> Docs on the all providers packages available
>>>> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>>>>
>>>> *Security*
>>>>
>>>> As part of Airflow 2.0 effort, there has been a conscious focus on
>>>> Security and reducing areas of exposure. This is represented across
>>>> different functional areas in different forms. For example, in the new REST
>>>> API, all operations now require authorization. Similarly, in the
>>>> configuration settings, the Fernet key is now required to be specified.
>>>>
>>>> *Configuration*
>>>>
>>>> Configuration in the form of the airflow.cfg file has been rationalized
>>>> further in distinct sections, specifically around ā€œcoreā€. Additionally, a
>>>> significant amount of configuration options have been deprecated or moved
>>>> to individual component-specific configuration files, such as the
>>>> pod-template-file for Kubernetes execution-related configuration.
>>>>
>>>> *Thanks to all of you*
>>>>
>>>> Weā€™ve tried to make as few breaking changes as possible and to provide
>>>> deprecation path in the code, especially in the case of anything called in
>>>> the DAG. That said, please read throughUPDATING.md to check what might
>>>> affect you. For example: r We re-organized the layout of operators (they
>>>> now all live under airflow.providers.*) but the old names should continue
>>>> to work - youā€™ll just notice a lot of DeprecationWarnings that need to be
>>>> fixed up.
>>>>
>>>> Thank you so much to all the contributors who got us to this point, in
>>>> no particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
>>>> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
>>>> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
>>>> who keep making Airflow better for everyone.
>>>>
>>>

Re: Apache Airflow 2.0.0 is released!

Posted by Aizhamal Nurmamat kyzy <ai...@apache.org>.
Thanks to everyone who put an incredible amount of work into making this
happen! šŸŽ‰ šŸŽŠ

On Thu, Dec 17, 2020 at 1:58 PM Xinbin Huang <bi...@gmail.com> wrote:

> Amazing to see this! šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰
>
> On Thu, Dec 17, 2020 at 1:54 PM kumar pavan <pa...@gmail.com>
> wrote:
>
>> Congrats EveryOne
>>
>>
>> Thanks & Regards
>> Pavan
>>
>>
>> On Thu, Dec 17, 2020 at 12:36 PM Ash Berlin-Taylor <as...@apache.org>
>> wrote:
>>
>>> I am proud to announce that Apache Airflow 2.0.0 has been released.
>>>
>>> The source release, as well as the binary "wheel" release (no sdist this
>>> time), are available here
>>>
>>> We also made this version available on PyPi for convenience (`pip
>>> install apache-airflow`):
>>>
>>> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>>>
>>> The documentation is available on:
>>> https://airflow.apache.org/
>>> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>>>
>>> Docker images will be available shortly -- check out
>>> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
>>> for it to appear
>>>
>>>
>>> The full changelog is about 3,000 lines long (already excluding
>>> everything backported to 1.10), so for now Iā€™ll simply share some of the
>>> major features in 2.0.0 compared to 1.10.14:
>>>
>>> *A new way of writing dags: the TaskFlow API (AIP-31)*
>>>
>>> (Known in 2.0.0alphas as Functional DAGs.)
>>>
>>> DAGs are now much much nicer to author especially when using
>>> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
>>> use
>>>
>>> Read more here:
>>>
>>> TaskFlow API Tutorial
>>> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
>>> TaskFlow API Documentation
>>> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>>>
>>> A quick teaser of what DAGs can now look like:
>>>
>>> ```
>>> from airflow.decorators import dag, task
>>> from airflow.utils.dates import days_ago
>>>
>>> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
>>> start_date=days_ago(2))
>>> def tutorial_taskflow_api_etl():
>>>    @task
>>>    def extract():
>>>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>>>
>>>    @task
>>>    def transform(order_data_dict: dict) -> dict:
>>>        total_order_value = 0
>>>
>>>        for value in order_data_dict.values():
>>>            total_order_value += value
>>>
>>>        return {"total_order_value": total_order_value}
>>>
>>>    @task()
>>>    def load(total_order_value: float):
>>>
>>>        print("Total order value is: %.2f" % total_order_value)
>>>
>>>    order_data = extract()
>>>    order_summary = transform(order_data)
>>>    load(order_summary["total_order_value"])
>>>
>>> tutorial_etl_dag = tutorial_taskflow_api_etl()
>>> ```
>>>
>>> *Fully specified REST API (AIP-32)*
>>>
>>> We now have a fully supported, no-longer-experimental API with a
>>> comprehensive OpenAPI specification
>>>
>>> Read more here:
>>>
>>> REST API Documentation
>>> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
>>> .
>>>
>>> *Massive Scheduler performance improvements*
>>>
>>> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did,
>>> we significantly improved the performance of the Airflow Scheduler. It now
>>> starts tasks much, MUCH quicker.
>>>
>>> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
>>> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple
>>> check the numbers as we donā€™t quite believe them at first!)
>>>
>>> *Scheduler is now HA compatible (AIP-15)*
>>>
>>> Itā€™s now possible and supported to run more than a single scheduler
>>> instance. This is super useful for both resiliency (in case a scheduler
>>> goes down) and scheduling performance.
>>>
>>> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5,
>>> and MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>>>
>>> Thereā€™s no config or other set up required to run more than one
>>> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
>>> to the DAG files) and it will cooperate with your existing schedulers
>>> through the database.
>>>
>>> For more information, read the Scheduler HA documentation
>>> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
>>> .
>>>
>>> *Task Groups (AIP-34)*
>>>
>>> SubDAGs were commonly used for grouping tasks in the UI, but they had
>>> many drawbacks in their execution behaviour (primarirly that they only
>>> executed a single task in parallel!) To improve this experience, weā€™ve
>>> introduced ā€œTask Groupsā€: a method for organizing tasks which provides the
>>> same grouping behaviour as a subdag without any of the execution-time
>>> drawbacks.
>>>
>>> SubDAGs will still work for now, but we think that any previous use of
>>> SubDAGs can now be replaced with task groups. If you find an example where
>>> this isnā€™t the case, please let us know by opening an issue on GitHub
>>>
>>> For more information, check out the Task Group documentation
>>> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
>>> .
>>>
>>> *Refreshed UI*
>>>
>>> Weā€™ve given the Airflow UI a visual refresh and updated some of the
>>> styling. Check out the UI section of the docs
>>> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for
>>> screenshots.
>>>
>>> We have also added an option to auto-refresh task states in Graph View
>>> so you no longer need to continuously press the refresh button :).
>>>
>>> ## Smart Sensors for reduced load from sensors (AIP-17)
>>>
>>> If you make heavy use of sensors in your Airflow cluster, you might find
>>> that sensor execution takes up a significant proportion of your cluster
>>> even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called
>>> ā€œSmart Sensorsā€.
>>>
>>> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and
>>> is ā€œstableā€/usable, but we reserve the right to make backwards incompatible
>>> changes to it in a future release (if we have to. Weā€™ll try very hard not
>>> to!)
>>>
>>> Read more about it in the Smart Sensors documentation
>>> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>
>>> .
>>>
>>> *Simplified KubernetesExecutor*
>>>
>>> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
>>> fashion that is simultaneously faster, easier to understand, and more
>>> flexible for Airflow users. Users will now be able to access the full
>>> Kubernetes API to create a .yaml pod_template_file instead of specifying
>>> parameters in their airflow.cfg.
>>>
>>> We have also replaced the executor_config dictionary with the
>>> pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1
>>> setting override. These changes have removed over three thousand lines of
>>> code from the KubernetesExecutor, which makes it run faster and creates
>>> fewer potential errors.
>>>
>>> Read more here:
>>>
>>> Docs on pod_template_file
>>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
>>> Docs on pod_override
>>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>>>
>>> *Airflow core and providers: Splitting Airflow into 60+ packages*
>>>
>>> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
>>> split Airflow into core and 61 (for now) provider packages. Each provider
>>> package is for either a particular external service (Google, Amazon,
>>> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
>>> (HTTP/FTP). Now you can create a custom Airflow installation from
>>> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
>>> requirements you might have. Some of the common providers are installed
>>> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
>>> providers are automatically installed when you choose appropriate extras
>>> when installing Airflow.
>>>
>>> The provider architecture should make it much easier to get a fully
>>> customized, yet consistent runtime with the right set of Python
>>> dependencies.
>>>
>>> But thatā€™s not all: you can write your own custom providers and add
>>> things like custom connection types, customizations of the Connection
>>> Forms, and extra links to your operators in a manageable way. You can build
>>> your own provider and install it as a Python package and have your
>>> customizations visible right in the Airflow UI.
>>>
>>> Our very own Jarek Potiuk has written about providers in much more
>>> detail <https://www.polidea.com/blog/airflow-2-providers/> on the
>>> Polidea blog.
>>>
>>> Docs on the providers concept and writing custom providers
>>> <http://airflow.apache.org/docs/apache-airflow-providers/>
>>> Docs on the all providers packages available
>>> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>>>
>>> *Security*
>>>
>>> As part of Airflow 2.0 effort, there has been a conscious focus on
>>> Security and reducing areas of exposure. This is represented across
>>> different functional areas in different forms. For example, in the new REST
>>> API, all operations now require authorization. Similarly, in the
>>> configuration settings, the Fernet key is now required to be specified.
>>>
>>> *Configuration*
>>>
>>> Configuration in the form of the airflow.cfg file has been rationalized
>>> further in distinct sections, specifically around ā€œcoreā€. Additionally, a
>>> significant amount of configuration options have been deprecated or moved
>>> to individual component-specific configuration files, such as the
>>> pod-template-file for Kubernetes execution-related configuration.
>>>
>>> *Thanks to all of you*
>>>
>>> Weā€™ve tried to make as few breaking changes as possible and to provide
>>> deprecation path in the code, especially in the case of anything called in
>>> the DAG. That said, please read throughUPDATING.md to check what might
>>> affect you. For example: r We re-organized the layout of operators (they
>>> now all live under airflow.providers.*) but the old names should continue
>>> to work - youā€™ll just notice a lot of DeprecationWarnings that need to be
>>> fixed up.
>>>
>>> Thank you so much to all the contributors who got us to this point, in
>>> no particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
>>> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
>>> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
>>> who keep making Airflow better for everyone.
>>>
>>

Re: Apache Airflow 2.0.0 is released!

Posted by Xinbin Huang <bi...@gmail.com>.
Amazing to see this! šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰

On Thu, Dec 17, 2020 at 1:54 PM kumar pavan <pa...@gmail.com>
wrote:

> Congrats EveryOne
>
>
> Thanks & Regards
> Pavan
>
>
> On Thu, Dec 17, 2020 at 12:36 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>
>> I am proud to announce that Apache Airflow 2.0.0 has been released.
>>
>> The source release, as well as the binary "wheel" release (no sdist this
>> time), are available here
>>
>> We also made this version available on PyPi for convenience (`pip install
>> apache-airflow`):
>>
>> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>>
>> The documentation is available on:
>> https://airflow.apache.org/
>> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>>
>> Docker images will be available shortly -- check out
>> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
>> for it to appear
>>
>>
>> The full changelog is about 3,000 lines long (already excluding
>> everything backported to 1.10), so for now Iā€™ll simply share some of the
>> major features in 2.0.0 compared to 1.10.14:
>>
>> *A new way of writing dags: the TaskFlow API (AIP-31)*
>>
>> (Known in 2.0.0alphas as Functional DAGs.)
>>
>> DAGs are now much much nicer to author especially when using
>> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
>> use
>>
>> Read more here:
>>
>> TaskFlow API Tutorial
>> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
>> TaskFlow API Documentation
>> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>>
>> A quick teaser of what DAGs can now look like:
>>
>> ```
>> from airflow.decorators import dag, task
>> from airflow.utils.dates import days_ago
>>
>> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
>> start_date=days_ago(2))
>> def tutorial_taskflow_api_etl():
>>    @task
>>    def extract():
>>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>>
>>    @task
>>    def transform(order_data_dict: dict) -> dict:
>>        total_order_value = 0
>>
>>        for value in order_data_dict.values():
>>            total_order_value += value
>>
>>        return {"total_order_value": total_order_value}
>>
>>    @task()
>>    def load(total_order_value: float):
>>
>>        print("Total order value is: %.2f" % total_order_value)
>>
>>    order_data = extract()
>>    order_summary = transform(order_data)
>>    load(order_summary["total_order_value"])
>>
>> tutorial_etl_dag = tutorial_taskflow_api_etl()
>> ```
>>
>> *Fully specified REST API (AIP-32)*
>>
>> We now have a fully supported, no-longer-experimental API with a
>> comprehensive OpenAPI specification
>>
>> Read more here:
>>
>> REST API Documentation
>> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
>> .
>>
>> *Massive Scheduler performance improvements*
>>
>> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we
>> significantly improved the performance of the Airflow Scheduler. It now
>> starts tasks much, MUCH quicker.
>>
>> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
>> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple
>> check the numbers as we donā€™t quite believe them at first!)
>>
>> *Scheduler is now HA compatible (AIP-15)*
>>
>> Itā€™s now possible and supported to run more than a single scheduler
>> instance. This is super useful for both resiliency (in case a scheduler
>> goes down) and scheduling performance.
>>
>> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5,
>> and MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>>
>> Thereā€™s no config or other set up required to run more than one
>> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
>> to the DAG files) and it will cooperate with your existing schedulers
>> through the database.
>>
>> For more information, read the Scheduler HA documentation
>> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
>> .
>>
>> *Task Groups (AIP-34)*
>>
>> SubDAGs were commonly used for grouping tasks in the UI, but they had
>> many drawbacks in their execution behaviour (primarirly that they only
>> executed a single task in parallel!) To improve this experience, weā€™ve
>> introduced ā€œTask Groupsā€: a method for organizing tasks which provides the
>> same grouping behaviour as a subdag without any of the execution-time
>> drawbacks.
>>
>> SubDAGs will still work for now, but we think that any previous use of
>> SubDAGs can now be replaced with task groups. If you find an example where
>> this isnā€™t the case, please let us know by opening an issue on GitHub
>>
>> For more information, check out the Task Group documentation
>> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
>> .
>>
>> *Refreshed UI*
>>
>> Weā€™ve given the Airflow UI a visual refresh and updated some of the
>> styling. Check out the UI section of the docs
>> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for screenshots.
>>
>> We have also added an option to auto-refresh task states in Graph View so
>> you no longer need to continuously press the refresh button :).
>>
>> ## Smart Sensors for reduced load from sensors (AIP-17)
>>
>> If you make heavy use of sensors in your Airflow cluster, you might find
>> that sensor execution takes up a significant proportion of your cluster
>> even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called
>> ā€œSmart Sensorsā€.
>>
>> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is
>> ā€œstableā€/usable, but we reserve the right to make backwards incompatible
>> changes to it in a future release (if we have to. Weā€™ll try very hard not
>> to!)
>>
>> Read more about it in the Smart Sensors documentation
>> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>
>> .
>>
>> *Simplified KubernetesExecutor*
>>
>> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
>> fashion that is simultaneously faster, easier to understand, and more
>> flexible for Airflow users. Users will now be able to access the full
>> Kubernetes API to create a .yaml pod_template_file instead of specifying
>> parameters in their airflow.cfg.
>>
>> We have also replaced the executor_config dictionary with the
>> pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1
>> setting override. These changes have removed over three thousand lines of
>> code from the KubernetesExecutor, which makes it run faster and creates
>> fewer potential errors.
>>
>> Read more here:
>>
>> Docs on pod_template_file
>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
>> Docs on pod_override
>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>>
>> *Airflow core and providers: Splitting Airflow into 60+ packages*
>>
>> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
>> split Airflow into core and 61 (for now) provider packages. Each provider
>> package is for either a particular external service (Google, Amazon,
>> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
>> (HTTP/FTP). Now you can create a custom Airflow installation from
>> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
>> requirements you might have. Some of the common providers are installed
>> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
>> providers are automatically installed when you choose appropriate extras
>> when installing Airflow.
>>
>> The provider architecture should make it much easier to get a fully
>> customized, yet consistent runtime with the right set of Python
>> dependencies.
>>
>> But thatā€™s not all: you can write your own custom providers and add
>> things like custom connection types, customizations of the Connection
>> Forms, and extra links to your operators in a manageable way. You can build
>> your own provider and install it as a Python package and have your
>> customizations visible right in the Airflow UI.
>>
>> Our very own Jarek Potiuk has written about providers in much more detail
>> <https://www.polidea.com/blog/airflow-2-providers/> on the Polidea blog.
>>
>> Docs on the providers concept and writing custom providers
>> <http://airflow.apache.org/docs/apache-airflow-providers/>
>> Docs on the all providers packages available
>> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>>
>> *Security*
>>
>> As part of Airflow 2.0 effort, there has been a conscious focus on
>> Security and reducing areas of exposure. This is represented across
>> different functional areas in different forms. For example, in the new REST
>> API, all operations now require authorization. Similarly, in the
>> configuration settings, the Fernet key is now required to be specified.
>>
>> *Configuration*
>>
>> Configuration in the form of the airflow.cfg file has been rationalized
>> further in distinct sections, specifically around ā€œcoreā€. Additionally, a
>> significant amount of configuration options have been deprecated or moved
>> to individual component-specific configuration files, such as the
>> pod-template-file for Kubernetes execution-related configuration.
>>
>> *Thanks to all of you*
>>
>> Weā€™ve tried to make as few breaking changes as possible and to provide
>> deprecation path in the code, especially in the case of anything called in
>> the DAG. That said, please read throughUPDATING.md to check what might
>> affect you. For example: r We re-organized the layout of operators (they
>> now all live under airflow.providers.*) but the old names should continue
>> to work - youā€™ll just notice a lot of DeprecationWarnings that need to be
>> fixed up.
>>
>> Thank you so much to all the contributors who got us to this point, in no
>> particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
>> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
>> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
>> who keep making Airflow better for everyone.
>>
>

Re: Apache Airflow 2.0.0 is released!

Posted by kumar pavan <pa...@gmail.com>.
Congrats EveryOne


Thanks & Regards
Pavan


On Thu, Dec 17, 2020 at 12:36 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> I am proud to announce that Apache Airflow 2.0.0 has been released.
>
> The source release, as well as the binary "wheel" release (no sdist this
> time), are available here
>
> We also made this version available on PyPi for convenience (`pip install
> apache-airflow`):
>
> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>
> The documentation is available on:
> https://airflow.apache.org/
> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>
> Docker images will be available shortly -- check out
> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
> for it to appear
>
>
> The full changelog is about 3,000 lines long (already excluding everything
> backported to 1.10), so for now Iā€™ll simply share some of the major
> features in 2.0.0 compared to 1.10.14:
>
> *A new way of writing dags: the TaskFlow API (AIP-31)*
>
> (Known in 2.0.0alphas as Functional DAGs.)
>
> DAGs are now much much nicer to author especially when using
> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
> use
>
> Read more here:
>
> TaskFlow API Tutorial
> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
> TaskFlow API Documentation
> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>
> A quick teaser of what DAGs can now look like:
>
> ```
> from airflow.decorators import dag, task
> from airflow.utils.dates import days_ago
>
> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
> start_date=days_ago(2))
> def tutorial_taskflow_api_etl():
>    @task
>    def extract():
>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>
>    @task
>    def transform(order_data_dict: dict) -> dict:
>        total_order_value = 0
>
>        for value in order_data_dict.values():
>            total_order_value += value
>
>        return {"total_order_value": total_order_value}
>
>    @task()
>    def load(total_order_value: float):
>
>        print("Total order value is: %.2f" % total_order_value)
>
>    order_data = extract()
>    order_summary = transform(order_data)
>    load(order_summary["total_order_value"])
>
> tutorial_etl_dag = tutorial_taskflow_api_etl()
> ```
>
> *Fully specified REST API (AIP-32)*
>
> We now have a fully supported, no-longer-experimental API with a
> comprehensive OpenAPI specification
>
> Read more here:
>
> REST API Documentation
> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
> .
>
> *Massive Scheduler performance improvements*
>
> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we
> significantly improved the performance of the Airflow Scheduler. It now
> starts tasks much, MUCH quicker.
>
> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple
> check the numbers as we donā€™t quite believe them at first!)
>
> *Scheduler is now HA compatible (AIP-15)*
>
> Itā€™s now possible and supported to run more than a single scheduler
> instance. This is super useful for both resiliency (in case a scheduler
> goes down) and scheduling performance.
>
> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, and
> MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>
> Thereā€™s no config or other set up required to run more than one
> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
> to the DAG files) and it will cooperate with your existing schedulers
> through the database.
>
> For more information, read the Scheduler HA documentation
> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
> .
>
> *Task Groups (AIP-34)*
>
> SubDAGs were commonly used for grouping tasks in the UI, but they had many
> drawbacks in their execution behaviour (primarirly that they only executed
> a single task in parallel!) To improve this experience, weā€™ve introduced
> ā€œTask Groupsā€: a method for organizing tasks which provides the same
> grouping behaviour as a subdag without any of the execution-time drawbacks.
>
> SubDAGs will still work for now, but we think that any previous use of
> SubDAGs can now be replaced with task groups. If you find an example where
> this isnā€™t the case, please let us know by opening an issue on GitHub
>
> For more information, check out the Task Group documentation
> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
> .
>
> *Refreshed UI*
>
> Weā€™ve given the Airflow UI a visual refresh and updated some of the
> styling. Check out the UI section of the docs
> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for screenshots.
>
> We have also added an option to auto-refresh task states in Graph View so
> you no longer need to continuously press the refresh button :).
>
> ## Smart Sensors for reduced load from sensors (AIP-17)
>
> If you make heavy use of sensors in your Airflow cluster, you might find
> that sensor execution takes up a significant proportion of your cluster
> even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called
> ā€œSmart Sensorsā€.
>
> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is
> ā€œstableā€/usable, but we reserve the right to make backwards incompatible
> changes to it in a future release (if we have to. Weā€™ll try very hard not
> to!)
>
> Read more about it in the Smart Sensors documentation
> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>.
>
> *Simplified KubernetesExecutor*
>
> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
> fashion that is simultaneously faster, easier to understand, and more
> flexible for Airflow users. Users will now be able to access the full
> Kubernetes API to create a .yaml pod_template_file instead of specifying
> parameters in their airflow.cfg.
>
> We have also replaced the executor_config dictionary with the pod_override
> parameter, which takes a Kubernetes V1Pod object for a 1:1 setting
> override. These changes have removed over three thousand lines of code from
> the KubernetesExecutor, which makes it run faster and creates fewer
> potential errors.
>
> Read more here:
>
> Docs on pod_template_file
> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
> Docs on pod_override
> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>
> *Airflow core and providers: Splitting Airflow into 60+ packages*
>
> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
> split Airflow into core and 61 (for now) provider packages. Each provider
> package is for either a particular external service (Google, Amazon,
> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
> (HTTP/FTP). Now you can create a custom Airflow installation from
> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
> requirements you might have. Some of the common providers are installed
> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
> providers are automatically installed when you choose appropriate extras
> when installing Airflow.
>
> The provider architecture should make it much easier to get a fully
> customized, yet consistent runtime with the right set of Python
> dependencies.
>
> But thatā€™s not all: you can write your own custom providers and add things
> like custom connection types, customizations of the Connection Forms, and
> extra links to your operators in a manageable way. You can build your own
> provider and install it as a Python package and have your customizations
> visible right in the Airflow UI.
>
> Our very own Jarek Potiuk has written about providers in much more detail
> <https://www.polidea.com/blog/airflow-2-providers/> on the Polidea blog.
>
> Docs on the providers concept and writing custom providers
> <http://airflow.apache.org/docs/apache-airflow-providers/>
> Docs on the all providers packages available
> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>
> *Security*
>
> As part of Airflow 2.0 effort, there has been a conscious focus on
> Security and reducing areas of exposure. This is represented across
> different functional areas in different forms. For example, in the new REST
> API, all operations now require authorization. Similarly, in the
> configuration settings, the Fernet key is now required to be specified.
>
> *Configuration*
>
> Configuration in the form of the airflow.cfg file has been rationalized
> further in distinct sections, specifically around ā€œcoreā€. Additionally, a
> significant amount of configuration options have been deprecated or moved
> to individual component-specific configuration files, such as the
> pod-template-file for Kubernetes execution-related configuration.
>
> *Thanks to all of you*
>
> Weā€™ve tried to make as few breaking changes as possible and to provide
> deprecation path in the code, especially in the case of anything called in
> the DAG. That said, please read throughUPDATING.md to check what might
> affect you. For example: r We re-organized the layout of operators (they
> now all live under airflow.providers.*) but the old names should continue
> to work - youā€™ll just notice a lot of DeprecationWarnings that need to be
> fixed up.
>
> Thank you so much to all the contributors who got us to this point, in no
> particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
> who keep making Airflow better for everyone.
>

Re: Apache Airflow 2.0.0 is released!

Posted by Manuel Martinez <ma...@gmail.com>.
Big THANK YOU to everyone that made this work!

On Thu, Dec 17, 2020 at 12:36 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> I am proud to announce that Apache Airflow 2.0.0 has been released.
>
> The source release, as well as the binary "wheel" release (no sdist this
> time), are available here
>
> We also made this version available on PyPi for convenience (`pip install
> apache-airflow`):
>
> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>
> The documentation is available on:
> https://airflow.apache.org/
> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>
> Docker images will be available shortly -- check out
> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
> for it to appear
>
>
> The full changelog is about 3,000 lines long (already excluding everything
> backported to 1.10), so for now Iā€™ll simply share some of the major
> features in 2.0.0 compared to 1.10.14:
>
> *A new way of writing dags: the TaskFlow API (AIP-31)*
>
> (Known in 2.0.0alphas as Functional DAGs.)
>
> DAGs are now much much nicer to author especially when using
> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
> use
>
> Read more here:
>
> TaskFlow API Tutorial
> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
> TaskFlow API Documentation
> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>
> A quick teaser of what DAGs can now look like:
>
> ```
> from airflow.decorators import dag, task
> from airflow.utils.dates import days_ago
>
> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
> start_date=days_ago(2))
> def tutorial_taskflow_api_etl():
>    @task
>    def extract():
>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>
>    @task
>    def transform(order_data_dict: dict) -> dict:
>        total_order_value = 0
>
>        for value in order_data_dict.values():
>            total_order_value += value
>
>        return {"total_order_value": total_order_value}
>
>    @task()
>    def load(total_order_value: float):
>
>        print("Total order value is: %.2f" % total_order_value)
>
>    order_data = extract()
>    order_summary = transform(order_data)
>    load(order_summary["total_order_value"])
>
> tutorial_etl_dag = tutorial_taskflow_api_etl()
> ```
>
> *Fully specified REST API (AIP-32)*
>
> We now have a fully supported, no-longer-experimental API with a
> comprehensive OpenAPI specification
>
> Read more here:
>
> REST API Documentation
> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
> .
>
> *Massive Scheduler performance improvements*
>
> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we
> significantly improved the performance of the Airflow Scheduler. It now
> starts tasks much, MUCH quicker.
>
> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple
> check the numbers as we donā€™t quite believe them at first!)
>
> *Scheduler is now HA compatible (AIP-15)*
>
> Itā€™s now possible and supported to run more than a single scheduler
> instance. This is super useful for both resiliency (in case a scheduler
> goes down) and scheduling performance.
>
> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, and
> MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>
> Thereā€™s no config or other set up required to run more than one
> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
> to the DAG files) and it will cooperate with your existing schedulers
> through the database.
>
> For more information, read the Scheduler HA documentation
> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
> .
>
> *Task Groups (AIP-34)*
>
> SubDAGs were commonly used for grouping tasks in the UI, but they had many
> drawbacks in their execution behaviour (primarirly that they only executed
> a single task in parallel!) To improve this experience, weā€™ve introduced
> ā€œTask Groupsā€: a method for organizing tasks which provides the same
> grouping behaviour as a subdag without any of the execution-time drawbacks.
>
> SubDAGs will still work for now, but we think that any previous use of
> SubDAGs can now be replaced with task groups. If you find an example where
> this isnā€™t the case, please let us know by opening an issue on GitHub
>
> For more information, check out the Task Group documentation
> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
> .
>
> *Refreshed UI*
>
> Weā€™ve given the Airflow UI a visual refresh and updated some of the
> styling. Check out the UI section of the docs
> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for screenshots.
>
> We have also added an option to auto-refresh task states in Graph View so
> you no longer need to continuously press the refresh button :).
>
> ## Smart Sensors for reduced load from sensors (AIP-17)
>
> If you make heavy use of sensors in your Airflow cluster, you might find
> that sensor execution takes up a significant proportion of your cluster
> even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called
> ā€œSmart Sensorsā€.
>
> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is
> ā€œstableā€/usable, but we reserve the right to make backwards incompatible
> changes to it in a future release (if we have to. Weā€™ll try very hard not
> to!)
>
> Read more about it in the Smart Sensors documentation
> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>.
>
> *Simplified KubernetesExecutor*
>
> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
> fashion that is simultaneously faster, easier to understand, and more
> flexible for Airflow users. Users will now be able to access the full
> Kubernetes API to create a .yaml pod_template_file instead of specifying
> parameters in their airflow.cfg.
>
> We have also replaced the executor_config dictionary with the pod_override
> parameter, which takes a Kubernetes V1Pod object for a 1:1 setting
> override. These changes have removed over three thousand lines of code from
> the KubernetesExecutor, which makes it run faster and creates fewer
> potential errors.
>
> Read more here:
>
> Docs on pod_template_file
> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
> Docs on pod_override
> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>
> *Airflow core and providers: Splitting Airflow into 60+ packages*
>
> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
> split Airflow into core and 61 (for now) provider packages. Each provider
> package is for either a particular external service (Google, Amazon,
> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
> (HTTP/FTP). Now you can create a custom Airflow installation from
> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
> requirements you might have. Some of the common providers are installed
> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
> providers are automatically installed when you choose appropriate extras
> when installing Airflow.
>
> The provider architecture should make it much easier to get a fully
> customized, yet consistent runtime with the right set of Python
> dependencies.
>
> But thatā€™s not all: you can write your own custom providers and add things
> like custom connection types, customizations of the Connection Forms, and
> extra links to your operators in a manageable way. You can build your own
> provider and install it as a Python package and have your customizations
> visible right in the Airflow UI.
>
> Our very own Jarek Potiuk has written about providers in much more detail
> <https://www.polidea.com/blog/airflow-2-providers/> on the Polidea blog.
>
> Docs on the providers concept and writing custom providers
> <http://airflow.apache.org/docs/apache-airflow-providers/>
> Docs on the all providers packages available
> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>
> *Security*
>
> As part of Airflow 2.0 effort, there has been a conscious focus on
> Security and reducing areas of exposure. This is represented across
> different functional areas in different forms. For example, in the new REST
> API, all operations now require authorization. Similarly, in the
> configuration settings, the Fernet key is now required to be specified.
>
> *Configuration*
>
> Configuration in the form of the airflow.cfg file has been rationalized
> further in distinct sections, specifically around ā€œcoreā€. Additionally, a
> significant amount of configuration options have been deprecated or moved
> to individual component-specific configuration files, such as the
> pod-template-file for Kubernetes execution-related configuration.
>
> *Thanks to all of you*
>
> Weā€™ve tried to make as few breaking changes as possible and to provide
> deprecation path in the code, especially in the case of anything called in
> the DAG. That said, please read throughUPDATING.md to check what might
> affect you. For example: r We re-organized the layout of operators (they
> now all live under airflow.providers.*) but the old names should continue
> to work - youā€™ll just notice a lot of DeprecationWarnings that need to be
> fixed up.
>
> Thank you so much to all the contributors who got us to this point, in no
> particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
> who keep making Airflow better for everyone.
>

Re: Apache Airflow 2.0.0 is released!

Posted by Felix Uellendall <fe...@pm.me.INVALID>.
Great job everyone! šŸŽ‰šŸ‘

Really amazing work from all of you!

Thanks.
-Felix

Sent from ProtonMail Mobile

On Thu, Dec 17, 2020 at 18:59, Jarek Potiuk <Ja...@polidea.com> wrote:

> WOHOO!
>
> On Thu, Dec 17, 2020 at 6:54 PM Shaw, Damian P. <da...@credit-suisse.com> wrote:
>
>> Great news! Is there a single web page that highlights these major features as youā€™ve listed them?
>>
>> Damian
>>
>> From: Ash Berlin-Taylor <as...@apache.org>
>> Sent: Thursday, December 17, 2020 12:36
>> To: users@airflow.apache.org
>> Cc: announce@apache.org; dev@airflow.apache.org
>> Subject: Apache Airflow 2.0.0 is released!
>>
>> I am proud to announce that Apache Airflow 2.0.0 has been released.
>>
>> The source release, as well as the binary "wheel" release (no sdist this time), are available here
>>
>> We also made this version available on PyPi for convenience (`pip install apache-airflow`):
>>
>> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>>
>> The documentation is available on:
>>
>> https://airflow.apache.org/
>>
>> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>>
>> Docker images will be available shortly -- check out https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0 for it to appear
>>
>> The full changelog is about 3,000 lines long (already excluding everything backported to 1.10), so for now Iā€™ll simply share some of the major features in 2.0.0 compared to 1.10.14:
>>
>> A new way of writing dags: the TaskFlow API (AIP-31)
>>
>> (Known in 2.0.0alphas as Functional DAGs.)
>>
>> DAGs are now much much nicer to author especially when using PythonOperator. Dependencies are handled more clearly and XCom is nicer to use
>>
>> Read more here:
>>
>> [TaskFlow API Tutorial](http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html)
>>
>> [TaskFlow API Documentation](https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows)
>>
>> A quick teaser of what DAGs can now look like:
>>
>> ```
>>
>> from airflow.decorators import dag, task
>> from airflow.utils.dates import days_ago
>>
>> @dag(default_args={'owner': 'airflow'}, schedule_interval=None, start_date=days_ago(2))
>> def tutorial_taskflow_api_etl():
>> @task
>> def extract():
>> return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>>
>> @task
>> def transform(order_data_dict: dict) -> dict:
>> total_order_value = 0
>>
>> for value in order_data_dict.values():
>> total_order_value += value
>>
>> return {"total_order_value": total_order_value}
>>
>> @task()
>> def load(total_order_value: float):
>>
>> print("Total order value is: %.2f" % total_order_value)
>>
>> order_data = extract()
>> order_summary = transform(order_data)
>> load(order_summary["total_order_value"])
>>
>> tutorial_etl_dag = tutorial_taskflow_api_etl()
>>
>> ```
>>
>> Fully specified REST API (AIP-32)
>>
>> We now have a fully supported, no-longer-experimental API with a comprehensive OpenAPI specification
>>
>> Read more here:
>>
>> [REST API Documentation](http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html).
>>
>> Massive Scheduler performance improvements
>>
>> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we significantly improved the performance of the Airflow Scheduler. It now starts tasks much, MUCH quicker.
>>
>> Over at Astronomer.io weā€™ve [benchmarked the schedulerā€”itā€™s fast](https://www.astronomer.io/blog/airflow-2-scheduler) (we had to triple check the numbers as we donā€™t quite believe them at first!)
>>
>> Scheduler is now HA compatible (AIP-15)
>>
>> Itā€™s now possible and supported to run more than a single scheduler instance. This is super useful for both resiliency (in case a scheduler goes down) and scheduling performance.
>>
>> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, and MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>>
>> Thereā€™s no config or other set up required to run more than one schedulerā€”just start up a scheduler somewhere else (ensuring it has access to the DAG files) and it will cooperate with your existing schedulers through the database.
>>
>> For more information, read the [Scheduler HA documentation](http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler).
>>
>> Task Groups (AIP-34)
>>
>> SubDAGs were commonly used for grouping tasks in the UI, but they had many drawbacks in their execution behaviour (primarirly that they only executed a single task in parallel!) To improve this experience, weā€™ve introduced ā€œTask Groupsā€: a method for organizing tasks which provides the same grouping behaviour as a subdag without any of the execution-time drawbacks.
>>
>> SubDAGs will still work for now, but we think that any previous use of SubDAGs can now be replaced with task groups. If you find an example where this isnā€™t the case, please let us know by opening an issue on GitHub
>>
>> For more information, check out the [Task Group documentation](http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup).
>>
>> Refreshed UI
>>
>> Weā€™ve given the Airflow UI a visual refresh and updated some of the styling. Check out the [UI section of the docs](http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html) for screenshots.
>>
>> We have also added an option to auto-refresh task states in Graph View so you no longer need to continuously press the refresh button :).
>>
>> ## Smart Sensors for reduced load from sensors (AIP-17)
>>
>> If you make heavy use of sensors in your Airflow cluster, you might find that sensor execution takes up a significant proportion of your cluster even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called ā€œSmart Sensorsā€.
>>
>> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is ā€œstableā€/usable, but we reserve the right to make backwards incompatible changes to it in a future release (if we have to. Weā€™ll try very hard not to!)
>>
>> Read more about it in the [Smart Sensors documentation](https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html).
>>
>> Simplified KubernetesExecutor
>>
>> For Airflow 2.0, we have re-architected the KubernetesExecutor in a fashion that is simultaneously faster, easier to understand, and more flexible for Airflow users. Users will now be able to access the full Kubernetes API to create a .yaml pod_template_file instead of specifying parameters in their airflow.cfg.
>>
>> We have also replaced the executor_config dictionary with the pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1 setting override. These changes have removed over three thousand lines of code from the KubernetesExecutor, which makes it run faster and creates fewer potential errors.
>>
>> Read more here:
>>
>> Docs on [pod_template_file](https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file)
>>
>> Docs on [pod_override](https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override)
>>
>> Airflow core and providers: Splitting Airflow into 60+ packages
>>
>> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve split Airflow into core and 61 (for now) provider packages. Each provider package is for either a particular external service (Google, Amazon, Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol (HTTP/FTP). Now you can create a custom Airflow installation from ā€œbuildingā€ blocks and choose only what you need, plus add whatever other requirements you might have. Some of the common providers are installed automatically (ftp, http, imap, sqlite) as they are commonly used. Other providers are automatically installed when you choose appropriate extras when installing Airflow.
>>
>> The provider architecture should make it much easier to get a fully customized, yet consistent runtime with the right set of Python dependencies.
>>
>> But thatā€™s not all: you can write your own custom providers and add things like custom connection types, customizations of the Connection Forms, and extra links to your operators in a manageable way. You can build your own provider and install it as a Python package and have your customizations visible right in the Airflow UI.
>>
>> Our very own Jarek Potiuk has written about [providers in much more detail](https://www.polidea.com/blog/airflow-2-providers/) on the Polidea blog.
>>
>> Docs on the [providers concept and writing custom providers](http://airflow.apache.org/docs/apache-airflow-providers/)
>>
>> Docs on the [all providers packages available](http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html)
>>
>> Security
>>
>> As part of Airflow 2.0 effort, there has been a conscious focus on Security and reducing areas of exposure. This is represented across different functional areas in different forms. For example, in the new REST API, all operations now require authorization. Similarly, in the configuration settings, the Fernet key is now required to be specified.
>>
>> Configuration
>>
>> Configuration in the form of the airflow.cfg file has been rationalized further in distinct sections, specifically around ā€œcoreā€. Additionally, a significant amount of configuration options have been deprecated or moved to individual component-specific configuration files, such as the pod-template-file for Kubernetes execution-related configuration.
>>
>> Thanks to all of you
>>
>> Weā€™ve tried to make as few breaking changes as possible and to provide deprecation path in the code, especially in the case of anything called in the DAG. That said, please read throughUPDATING.md to check what might affect you. For example: r We re-organized the layout of operators (they now all live under airflow.providers.*) but the old names should continue to work - youā€™ll just notice a lot of DeprecationWarnings that need to be fixed up.
>>
>> Thank you so much to all the contributors who got us to this point, in no particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang, James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others who keep making Airflow better for everyone.
>>
>> ==============================================================================
>> Please access the attached hyperlink for an important electronic communications disclaimer:
>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>> ==============================================================================
>
> --
>
> Jarek Potiuk
> [Polidea](https://www.polidea.com/) | Principal Software Engineer
>
> M: [+48 660 796 129](tel:+48660796129)
> [Polidea](https://www.polidea.com/)

Re: Apache Airflow 2.0.0 is released!

Posted by Felix Uellendall <fe...@pm.me>.
Great job everyone! šŸŽ‰šŸ‘

Really amazing work from all of you!

Thanks.
-Felix

Sent from ProtonMail Mobile

On Thu, Dec 17, 2020 at 18:59, Jarek Potiuk <Ja...@polidea.com> wrote:

> WOHOO!
>
> On Thu, Dec 17, 2020 at 6:54 PM Shaw, Damian P. <da...@credit-suisse.com> wrote:
>
>> Great news! Is there a single web page that highlights these major features as youā€™ve listed them?
>>
>> Damian
>>
>> From: Ash Berlin-Taylor <as...@apache.org>
>> Sent: Thursday, December 17, 2020 12:36
>> To: users@airflow.apache.org
>> Cc: announce@apache.org; dev@airflow.apache.org
>> Subject: Apache Airflow 2.0.0 is released!
>>
>> I am proud to announce that Apache Airflow 2.0.0 has been released.
>>
>> The source release, as well as the binary "wheel" release (no sdist this time), are available here
>>
>> We also made this version available on PyPi for convenience (`pip install apache-airflow`):
>>
>> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>>
>> The documentation is available on:
>>
>> https://airflow.apache.org/
>>
>> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>>
>> Docker images will be available shortly -- check out https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0 for it to appear
>>
>> The full changelog is about 3,000 lines long (already excluding everything backported to 1.10), so for now Iā€™ll simply share some of the major features in 2.0.0 compared to 1.10.14:
>>
>> A new way of writing dags: the TaskFlow API (AIP-31)
>>
>> (Known in 2.0.0alphas as Functional DAGs.)
>>
>> DAGs are now much much nicer to author especially when using PythonOperator. Dependencies are handled more clearly and XCom is nicer to use
>>
>> Read more here:
>>
>> [TaskFlow API Tutorial](http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html)
>>
>> [TaskFlow API Documentation](https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows)
>>
>> A quick teaser of what DAGs can now look like:
>>
>> ```
>>
>> from airflow.decorators import dag, task
>> from airflow.utils.dates import days_ago
>>
>> @dag(default_args={'owner': 'airflow'}, schedule_interval=None, start_date=days_ago(2))
>> def tutorial_taskflow_api_etl():
>> @task
>> def extract():
>> return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>>
>> @task
>> def transform(order_data_dict: dict) -> dict:
>> total_order_value = 0
>>
>> for value in order_data_dict.values():
>> total_order_value += value
>>
>> return {"total_order_value": total_order_value}
>>
>> @task()
>> def load(total_order_value: float):
>>
>> print("Total order value is: %.2f" % total_order_value)
>>
>> order_data = extract()
>> order_summary = transform(order_data)
>> load(order_summary["total_order_value"])
>>
>> tutorial_etl_dag = tutorial_taskflow_api_etl()
>>
>> ```
>>
>> Fully specified REST API (AIP-32)
>>
>> We now have a fully supported, no-longer-experimental API with a comprehensive OpenAPI specification
>>
>> Read more here:
>>
>> [REST API Documentation](http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html).
>>
>> Massive Scheduler performance improvements
>>
>> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we significantly improved the performance of the Airflow Scheduler. It now starts tasks much, MUCH quicker.
>>
>> Over at Astronomer.io weā€™ve [benchmarked the schedulerā€”itā€™s fast](https://www.astronomer.io/blog/airflow-2-scheduler) (we had to triple check the numbers as we donā€™t quite believe them at first!)
>>
>> Scheduler is now HA compatible (AIP-15)
>>
>> Itā€™s now possible and supported to run more than a single scheduler instance. This is super useful for both resiliency (in case a scheduler goes down) and scheduling performance.
>>
>> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, and MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>>
>> Thereā€™s no config or other set up required to run more than one schedulerā€”just start up a scheduler somewhere else (ensuring it has access to the DAG files) and it will cooperate with your existing schedulers through the database.
>>
>> For more information, read the [Scheduler HA documentation](http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler).
>>
>> Task Groups (AIP-34)
>>
>> SubDAGs were commonly used for grouping tasks in the UI, but they had many drawbacks in their execution behaviour (primarirly that they only executed a single task in parallel!) To improve this experience, weā€™ve introduced ā€œTask Groupsā€: a method for organizing tasks which provides the same grouping behaviour as a subdag without any of the execution-time drawbacks.
>>
>> SubDAGs will still work for now, but we think that any previous use of SubDAGs can now be replaced with task groups. If you find an example where this isnā€™t the case, please let us know by opening an issue on GitHub
>>
>> For more information, check out the [Task Group documentation](http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup).
>>
>> Refreshed UI
>>
>> Weā€™ve given the Airflow UI a visual refresh and updated some of the styling. Check out the [UI section of the docs](http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html) for screenshots.
>>
>> We have also added an option to auto-refresh task states in Graph View so you no longer need to continuously press the refresh button :).
>>
>> ## Smart Sensors for reduced load from sensors (AIP-17)
>>
>> If you make heavy use of sensors in your Airflow cluster, you might find that sensor execution takes up a significant proportion of your cluster even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called ā€œSmart Sensorsā€.
>>
>> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is ā€œstableā€/usable, but we reserve the right to make backwards incompatible changes to it in a future release (if we have to. Weā€™ll try very hard not to!)
>>
>> Read more about it in the [Smart Sensors documentation](https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html).
>>
>> Simplified KubernetesExecutor
>>
>> For Airflow 2.0, we have re-architected the KubernetesExecutor in a fashion that is simultaneously faster, easier to understand, and more flexible for Airflow users. Users will now be able to access the full Kubernetes API to create a .yaml pod_template_file instead of specifying parameters in their airflow.cfg.
>>
>> We have also replaced the executor_config dictionary with the pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1 setting override. These changes have removed over three thousand lines of code from the KubernetesExecutor, which makes it run faster and creates fewer potential errors.
>>
>> Read more here:
>>
>> Docs on [pod_template_file](https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file)
>>
>> Docs on [pod_override](https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override)
>>
>> Airflow core and providers: Splitting Airflow into 60+ packages
>>
>> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve split Airflow into core and 61 (for now) provider packages. Each provider package is for either a particular external service (Google, Amazon, Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol (HTTP/FTP). Now you can create a custom Airflow installation from ā€œbuildingā€ blocks and choose only what you need, plus add whatever other requirements you might have. Some of the common providers are installed automatically (ftp, http, imap, sqlite) as they are commonly used. Other providers are automatically installed when you choose appropriate extras when installing Airflow.
>>
>> The provider architecture should make it much easier to get a fully customized, yet consistent runtime with the right set of Python dependencies.
>>
>> But thatā€™s not all: you can write your own custom providers and add things like custom connection types, customizations of the Connection Forms, and extra links to your operators in a manageable way. You can build your own provider and install it as a Python package and have your customizations visible right in the Airflow UI.
>>
>> Our very own Jarek Potiuk has written about [providers in much more detail](https://www.polidea.com/blog/airflow-2-providers/) on the Polidea blog.
>>
>> Docs on the [providers concept and writing custom providers](http://airflow.apache.org/docs/apache-airflow-providers/)
>>
>> Docs on the [all providers packages available](http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html)
>>
>> Security
>>
>> As part of Airflow 2.0 effort, there has been a conscious focus on Security and reducing areas of exposure. This is represented across different functional areas in different forms. For example, in the new REST API, all operations now require authorization. Similarly, in the configuration settings, the Fernet key is now required to be specified.
>>
>> Configuration
>>
>> Configuration in the form of the airflow.cfg file has been rationalized further in distinct sections, specifically around ā€œcoreā€. Additionally, a significant amount of configuration options have been deprecated or moved to individual component-specific configuration files, such as the pod-template-file for Kubernetes execution-related configuration.
>>
>> Thanks to all of you
>>
>> Weā€™ve tried to make as few breaking changes as possible and to provide deprecation path in the code, especially in the case of anything called in the DAG. That said, please read throughUPDATING.md to check what might affect you. For example: r We re-organized the layout of operators (they now all live under airflow.providers.*) but the old names should continue to work - youā€™ll just notice a lot of DeprecationWarnings that need to be fixed up.
>>
>> Thank you so much to all the contributors who got us to this point, in no particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang, James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others who keep making Airflow better for everyone.
>>
>> ==============================================================================
>> Please access the attached hyperlink for an important electronic communications disclaimer:
>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>> ==============================================================================
>
> --
>
> Jarek Potiuk
> [Polidea](https://www.polidea.com/) | Principal Software Engineer
>
> M: [+48 660 796 129](tel:+48660796129)
> [Polidea](https://www.polidea.com/)

Re: Apache Airflow 2.0.0 is released!

Posted by Jarek Potiuk <Ja...@polidea.com>.
WOHOO!

On Thu, Dec 17, 2020 at 6:54 PM Shaw, Damian P. <
damian.shaw.2@credit-suisse.com> wrote:

> Great news! Is there a single web page that highlights these major
> features as youā€™ve listed them?
>
>
>
> Damian
>
>
>
> *From:* Ash Berlin-Taylor <as...@apache.org>
> *Sent:* Thursday, December 17, 2020 12:36
> *To:* users@airflow.apache.org
> *Cc:* announce@apache.org; dev@airflow.apache.org
> *Subject:* Apache Airflow 2.0.0 is released!
>
>
>
> I am proud to announce that Apache Airflow 2.0.0 has been released.
>
>
>
> The source release, as well as the binary "wheel" release (no sdist this
> time), are available here
>
>
>
> We also made this version available on PyPi for convenience (`pip install
> apache-airflow`):
>
>
>
> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>
>
>
> The documentation is available on:
>
> https://airflow.apache.org/
>
> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>
>
>
> Docker images will be available shortly -- check out
> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
> for it to appear
>
>
>
>
>
> The full changelog is about 3,000 lines long (already excluding everything
> backported to 1.10), so for now Iā€™ll simply share some of the major
> features in 2.0.0 compared to 1.10.14:
>
>
>
> *A new way of writing dags: the TaskFlow API (AIP-31)*
>
>
>
> (Known in 2.0.0alphas as Functional DAGs.)
>
>
>
> DAGs are now much much nicer to author especially when using
> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
> use
>
>
>
> Read more here:
>
>
>
> TaskFlow API Tutorial
> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
>
> TaskFlow API Documentation
> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>
>
>
> A quick teaser of what DAGs can now look like:
>
>
>
> ```
>
> from airflow.decorators import dag, task
> from airflow.utils.dates import days_ago
>
> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
> start_date=days_ago(2))
> def tutorial_taskflow_api_etl():
>    @task
>    def extract():
>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>
>    @task
>    def transform(order_data_dict: dict) -> dict:
>        total_order_value = 0
>
>        for value in order_data_dict.values():
>            total_order_value += value
>
>        return {"total_order_value": total_order_value}
>
>    @task()
>    def load(total_order_value: float):
>
>        print("Total order value is: %.2f" % total_order_value)
>
>    order_data = extract()
>    order_summary = transform(order_data)
>    load(order_summary["total_order_value"])
>
> tutorial_etl_dag = tutorial_taskflow_api_etl()
>
> ```
>
>
>
> *Fully specified REST API (AIP-32)*
>
>
>
> We now have a fully supported, no-longer-experimental API with a
> comprehensive OpenAPI specification
>
>
>
> Read more here:
>
>
>
> REST API Documentation
> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
> .
>
>
>
> *Massive Scheduler performance improvements*
>
>
>
> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we
> significantly improved the performance of the Airflow Scheduler. It now
> starts tasks much, MUCH quicker.
>
>
>
> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple
> check the numbers as we donā€™t quite believe them at first!)
>
>
>
> *Scheduler is now HA compatible (AIP-15)*
>
>
>
> Itā€™s now possible and supported to run more than a single scheduler
> instance. This is super useful for both resiliency (in case a scheduler
> goes down) and scheduling performance.
>
>
>
> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, and
> MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>
>
>
> Thereā€™s no config or other set up required to run more than one
> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
> to the DAG files) and it will cooperate with your existing schedulers
> through the database.
>
>
>
> For more information, read the Scheduler HA documentation
> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
> .
>
>
>
> *Task Groups (AIP-34)*
>
>
>
> SubDAGs were commonly used for grouping tasks in the UI, but they had many
> drawbacks in their execution behaviour (primarirly that they only executed
> a single task in parallel!) To improve this experience, weā€™ve introduced
> ā€œTask Groupsā€: a method for organizing tasks which provides the same
> grouping behaviour as a subdag without any of the execution-time drawbacks.
>
>
>
> SubDAGs will still work for now, but we think that any previous use of
> SubDAGs can now be replaced with task groups. If you find an example where
> this isnā€™t the case, please let us know by opening an issue on GitHub
>
>
>
> For more information, check out the Task Group documentation
> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
> .
>
>
>
> *Refreshed UI*
>
>
>
> Weā€™ve given the Airflow UI a visual refresh and updated some of the
> styling. Check out the UI section of the docs
> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for screenshots.
>
>
>
> We have also added an option to auto-refresh task states in Graph View so
> you no longer need to continuously press the refresh button :).
>
>
>
> ## Smart Sensors for reduced load from sensors (AIP-17)
>
>
>
> If you make heavy use of sensors in your Airflow cluster, you might find
> that sensor execution takes up a significant proportion of your cluster
> even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called
> ā€œSmart Sensorsā€.
>
>
>
> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is
> ā€œstableā€/usable, but we reserve the right to make backwards incompatible
> changes to it in a future release (if we have to. Weā€™ll try very hard not
> to!)
>
>
>
> Read more about it in the Smart Sensors documentation
> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>.
>
>
>
> *Simplified KubernetesExecutor*
>
>
>
> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
> fashion that is simultaneously faster, easier to understand, and more
> flexible for Airflow users. Users will now be able to access the full
> Kubernetes API to create a .yaml pod_template_file instead of specifying
> parameters in their airflow.cfg.
>
>
>
> We have also replaced the executor_config dictionary with the pod_override
> parameter, which takes a Kubernetes V1Pod object for a 1:1 setting
> override. These changes have removed over three thousand lines of code from
> the KubernetesExecutor, which makes it run faster and creates fewer
> potential errors.
>
>
>
> Read more here:
>
>
>
> Docs on pod_template_file
> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
>
> Docs on pod_override
> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>
>
>
> *Airflow core and providers: Splitting Airflow into 60+ packages*
>
>
>
> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
> split Airflow into core and 61 (for now) provider packages. Each provider
> package is for either a particular external service (Google, Amazon,
> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
> (HTTP/FTP). Now you can create a custom Airflow installation from
> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
> requirements you might have. Some of the common providers are installed
> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
> providers are automatically installed when you choose appropriate extras
> when installing Airflow.
>
>
>
> The provider architecture should make it much easier to get a fully
> customized, yet consistent runtime with the right set of Python
> dependencies.
>
>
>
> But thatā€™s not all: you can write your own custom providers and add things
> like custom connection types, customizations of the Connection Forms, and
> extra links to your operators in a manageable way. You can build your own
> provider and install it as a Python package and have your customizations
> visible right in the Airflow UI.
>
>
>
> Our very own Jarek Potiuk has written about providers in much more detail
> <https://www.polidea.com/blog/airflow-2-providers/> on the Polidea blog.
>
>
>
> Docs on the providers concept and writing custom providers
> <http://airflow.apache.org/docs/apache-airflow-providers/>
>
> Docs on the all providers packages available
> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>
>
>
> *Security*
>
>
>
> As part of Airflow 2.0 effort, there has been a conscious focus on
> Security and reducing areas of exposure. This is represented across
> different functional areas in different forms. For example, in the new REST
> API, all operations now require authorization. Similarly, in the
> configuration settings, the Fernet key is now required to be specified.
>
>
>
> *Configuration*
>
>
>
> Configuration in the form of the airflow.cfg file has been rationalized
> further in distinct sections, specifically around ā€œcoreā€. Additionally, a
> significant amount of configuration options have been deprecated or moved
> to individual component-specific configuration files, such as the
> pod-template-file for Kubernetes execution-related configuration.
>
>
>
> *Thanks to all of you*
>
>
>
> Weā€™ve tried to make as few breaking changes as possible and to provide
> deprecation path in the code, especially in the case of anything called in
> the DAG. That said, please read throughUPDATING.md to check what might
> affect you. For example: r We re-organized the layout of operators (they
> now all live under airflow.providers.*) but the old names should continue
> to work - youā€™ll just notice a lot of DeprecationWarnings that need to be
> fixed up.
>
>
>
> Thank you so much to all the contributors who got us to this point, in no
> particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
> who keep making Airflow better for everyone.
>
>
> ==============================================================================
> Please access the attached hyperlink for an important electronic
> communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>
> ==============================================================================
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Apache Airflow 2.0.0 is released!

Posted by Kevin Yang <yr...@gmail.com>.
šŸŽ‰šŸŽ‰šŸŽ‰ Kudos to everyone who worked so hard on making this happen!


Cheers,
Kevin Y

On Thu, Dec 17, 2020 at 10:13 AM Felix Uellendall <fe...@pm.me.invalid>
wrote:

> Great job everyone! šŸŽ‰šŸ‘
>
> Really amazing work from all of you!
>
> Thanks.
> -Felix
>
> Sent from ProtonMail Mobile
>
>
> On Thu, Dec 17, 2020 at 19:08, Gerard Casas Saez <
> gcasassaez@twitter.com.INVALID> wrote:
>
> Yass! šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰
> Great news!
>
> Gerard Casas Saez
> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
>
>
> On Thu, Dec 17, 2020 at 11:00 AM Tomasz Urbaszek <tu...@apache.org>
> wrote:
>
>> There's official Apache Airflow blogpost with similar content to Ash mail:
>> https://airflow.apache.org/blog/airflow-two-point-oh-is-here/
>>
>> On Thu, Dec 17, 2020 at 6:59 PM Ry Walker <ry...@rywalker.com> wrote:
>>
>>> we have a webpage on it https://www.astronomer.io/airflow and a
>>> blogpost https://www.astronomer.io/blog/introducing-airflow-2-0
>>>
>>> On Thu, Dec 17, 2020 at 12:54 PM Shaw, Damian P. <
>>> damian.shaw.2@credit-suisse.com> wrote:
>>>
>>>> Great news! Is there a single web page that highlights these major
>>>> features as youā€™ve listed them?
>>>>
>>>>
>>>>
>>>> Damian
>>>>
>>>>
>>>>
>>>> *From:* Ash Berlin-Taylor <as...@apache.org>
>>>> *Sent:* Thursday, December 17, 2020 12:36
>>>> *To:* users@airflow.apache.org
>>>> *Cc:* announce@apache.org; dev@airflow.apache.org
>>>> *Subject:* Apache Airflow 2.0.0 is released!
>>>>
>>>>
>>>>
>>>> I am proud to announce that Apache Airflow 2.0.0 has been released.
>>>>
>>>>
>>>>
>>>> The source release, as well as the binary "wheel" release (no sdist
>>>> this time), are available here
>>>>
>>>>
>>>>
>>>> We also made this version available on PyPi for convenience (`pip
>>>> install apache-airflow`):
>>>>
>>>>
>>>>
>>>> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>>>>
>>>>
>>>>
>>>> The documentation is available on:
>>>>
>>>> https://airflow.apache.org/
>>>>
>>>> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>>>>
>>>>
>>>>
>>>> Docker images will be available shortly -- check out
>>>> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
>>>> for it to appear
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> The full changelog is about 3,000 lines long (already excluding
>>>> everything backported to 1.10), so for now Iā€™ll simply share some of the
>>>> major features in 2.0.0 compared to 1.10.14:
>>>>
>>>>
>>>>
>>>> *A new way of writing dags: the TaskFlow API (AIP-31)*
>>>>
>>>>
>>>>
>>>> (Known in 2.0.0alphas as Functional DAGs.)
>>>>
>>>>
>>>>
>>>> DAGs are now much much nicer to author especially when using
>>>> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
>>>> use
>>>>
>>>>
>>>>
>>>> Read more here:
>>>>
>>>>
>>>>
>>>> TaskFlow API Tutorial
>>>> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
>>>>
>>>> TaskFlow API Documentation
>>>> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>>>>
>>>>
>>>>
>>>> A quick teaser of what DAGs can now look like:
>>>>
>>>>
>>>>
>>>> ```
>>>>
>>>> from airflow.decorators import dag, task
>>>> from airflow.utils.dates import days_ago
>>>>
>>>> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
>>>> start_date=days_ago(2))
>>>> def tutorial_taskflow_api_etl():
>>>>    @task
>>>>    def extract():
>>>>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>>>>
>>>>    @task
>>>>    def transform(order_data_dict: dict) -> dict:
>>>>        total_order_value = 0
>>>>
>>>>        for value in order_data_dict.values():
>>>>            total_order_value += value
>>>>
>>>>        return {"total_order_value": total_order_value}
>>>>
>>>>    @task()
>>>>    def load(total_order_value: float):
>>>>
>>>>        print("Total order value is: %.2f" % total_order_value)
>>>>
>>>>    order_data = extract()
>>>>    order_summary = transform(order_data)
>>>>    load(order_summary["total_order_value"])
>>>>
>>>> tutorial_etl_dag = tutorial_taskflow_api_etl()
>>>>
>>>> ```
>>>>
>>>>
>>>>
>>>> *Fully specified REST API (AIP-32)*
>>>>
>>>>
>>>>
>>>> We now have a fully supported, no-longer-experimental API with a
>>>> comprehensive OpenAPI specification
>>>>
>>>>
>>>>
>>>> Read more here:
>>>>
>>>>
>>>>
>>>> REST API Documentation
>>>> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
>>>> .
>>>>
>>>>
>>>>
>>>> *Massive Scheduler performance improvements*
>>>>
>>>>
>>>>
>>>> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did,
>>>> we significantly improved the performance of the Airflow Scheduler. It now
>>>> starts tasks much, MUCH quicker.
>>>>
>>>>
>>>>
>>>> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
>>>> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple
>>>> check the numbers as we donā€™t quite believe them at first!)
>>>>
>>>>
>>>>
>>>> *Scheduler is now HA compatible (AIP-15)*
>>>>
>>>>
>>>>
>>>> Itā€™s now possible and supported to run more than a single scheduler
>>>> instance. This is super useful for both resiliency (in case a scheduler
>>>> goes down) and scheduling performance.
>>>>
>>>>
>>>>
>>>> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5,
>>>> and MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>>>>
>>>>
>>>>
>>>> Thereā€™s no config or other set up required to run more than one
>>>> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
>>>> to the DAG files) and it will cooperate with your existing schedulers
>>>> through the database.
>>>>
>>>>
>>>>
>>>> For more information, read the Scheduler HA documentation
>>>> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
>>>> .
>>>>
>>>>
>>>>
>>>> *Task Groups (AIP-34)*
>>>>
>>>>
>>>>
>>>> SubDAGs were commonly used for grouping tasks in the UI, but they had
>>>> many drawbacks in their execution behaviour (primarirly that they only
>>>> executed a single task in parallel!) To improve this experience, weā€™ve
>>>> introduced ā€œTask Groupsā€: a method for organizing tasks which provides the
>>>> same grouping behaviour as a subdag without any of the execution-time
>>>> drawbacks.
>>>>
>>>>
>>>>
>>>> SubDAGs will still work for now, but we think that any previous use of
>>>> SubDAGs can now be replaced with task groups. If you find an example where
>>>> this isnā€™t the case, please let us know by opening an issue on GitHub
>>>>
>>>>
>>>>
>>>> For more information, check out the Task Group documentation
>>>> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
>>>> .
>>>>
>>>>
>>>>
>>>> *Refreshed UI*
>>>>
>>>>
>>>>
>>>> Weā€™ve given the Airflow UI a visual refresh and updated some of the
>>>> styling. Check out the UI section of the docs
>>>> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for
>>>> screenshots.
>>>>
>>>>
>>>>
>>>> We have also added an option to auto-refresh task states in Graph View
>>>> so you no longer need to continuously press the refresh button :).
>>>>
>>>>
>>>>
>>>> ## Smart Sensors for reduced load from sensors (AIP-17)
>>>>
>>>>
>>>>
>>>> If you make heavy use of sensors in your Airflow cluster, you might
>>>> find that sensor execution takes up a significant proportion of your
>>>> cluster even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new
>>>> mode called ā€œSmart Sensorsā€.
>>>>
>>>>
>>>>
>>>> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and
>>>> is ā€œstableā€/usable, but we reserve the right to make backwards incompatible
>>>> changes to it in a future release (if we have to. Weā€™ll try very hard not
>>>> to!)
>>>>
>>>>
>>>>
>>>> Read more about it in the Smart Sensors documentation
>>>> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>
>>>> .
>>>>
>>>>
>>>>
>>>> *Simplified KubernetesExecutor*
>>>>
>>>>
>>>>
>>>> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
>>>> fashion that is simultaneously faster, easier to understand, and more
>>>> flexible for Airflow users. Users will now be able to access the full
>>>> Kubernetes API to create a .yaml pod_template_file instead of specifying
>>>> parameters in their airflow.cfg.
>>>>
>>>>
>>>>
>>>> We have also replaced the executor_config dictionary with the
>>>> pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1
>>>> setting override. These changes have removed over three thousand lines of
>>>> code from the KubernetesExecutor, which makes it run faster and creates
>>>> fewer potential errors.
>>>>
>>>>
>>>>
>>>> Read more here:
>>>>
>>>>
>>>>
>>>> Docs on pod_template_file
>>>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
>>>>
>>>> Docs on pod_override
>>>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>>>>
>>>>
>>>>
>>>> *Airflow core and providers: Splitting Airflow into 60+ packages*
>>>>
>>>>
>>>>
>>>> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
>>>> split Airflow into core and 61 (for now) provider packages. Each provider
>>>> package is for either a particular external service (Google, Amazon,
>>>> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
>>>> (HTTP/FTP). Now you can create a custom Airflow installation from
>>>> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
>>>> requirements you might have. Some of the common providers are installed
>>>> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
>>>> providers are automatically installed when you choose appropriate extras
>>>> when installing Airflow.
>>>>
>>>>
>>>>
>>>> The provider architecture should make it much easier to get a fully
>>>> customized, yet consistent runtime with the right set of Python
>>>> dependencies.
>>>>
>>>>
>>>>
>>>> But thatā€™s not all: you can write your own custom providers and add
>>>> things like custom connection types, customizations of the Connection
>>>> Forms, and extra links to your operators in a manageable way. You can build
>>>> your own provider and install it as a Python package and have your
>>>> customizations visible right in the Airflow UI.
>>>>
>>>>
>>>>
>>>> Our very own Jarek Potiuk has written about providers in much more
>>>> detail <https://www.polidea.com/blog/airflow-2-providers/> on the
>>>> Polidea blog.
>>>>
>>>>
>>>>
>>>> Docs on the providers concept and writing custom providers
>>>> <http://airflow.apache.org/docs/apache-airflow-providers/>
>>>>
>>>> Docs on the all providers packages available
>>>> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>>>>
>>>>
>>>>
>>>> *Security*
>>>>
>>>>
>>>>
>>>> As part of Airflow 2.0 effort, there has been a conscious focus on
>>>> Security and reducing areas of exposure. This is represented across
>>>> different functional areas in different forms. For example, in the new REST
>>>> API, all operations now require authorization. Similarly, in the
>>>> configuration settings, the Fernet key is now required to be specified.
>>>>
>>>>
>>>>
>>>> *Configuration*
>>>>
>>>>
>>>>
>>>> Configuration in the form of the airflow.cfg file has been rationalized
>>>> further in distinct sections, specifically around ā€œcoreā€. Additionally, a
>>>> significant amount of configuration options have been deprecated or moved
>>>> to individual component-specific configuration files, such as the
>>>> pod-template-file for Kubernetes execution-related configuration.
>>>>
>>>>
>>>>
>>>> *Thanks to all of you*
>>>>
>>>>
>>>>
>>>> Weā€™ve tried to make as few breaking changes as possible and to provide
>>>> deprecation path in the code, especially in the case of anything called in
>>>> the DAG. That said, please read throughUPDATING.md to check what might
>>>> affect you. For example: r We re-organized the layout of operators (they
>>>> now all live under airflow.providers.*) but the old names should continue
>>>> to work - youā€™ll just notice a lot of DeprecationWarnings that need to be
>>>> fixed up.
>>>>
>>>>
>>>>
>>>> Thank you so much to all the contributors who got us to this point, in
>>>> no particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
>>>> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
>>>> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
>>>> who keep making Airflow better for everyone.
>>>>
>>>>
>>>> ==============================================================================
>>>> Please access the attached hyperlink for an important electronic
>>>> communications disclaimer:
>>>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>>>>
>>>> ==============================================================================
>>>>
>>>
>
>

Re: Apache Airflow 2.0.0 is released!

Posted by Felix Uellendall <fe...@pm.me.INVALID>.
Great job everyone! šŸŽ‰šŸ‘

Really amazing work from all of you!

Thanks.-Felix

Sent from ProtonMail Mobile

On Thu, Dec 17, 2020 at 19:08, Gerard Casas Saez <gc...@twitter.com.INVALID> wrote:

> Yass! šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰
> Great news!
>
> Gerard Casas Saez
> Twitter | Cortex | [@casassaez](http://twitter.com/casassaez)
>
> On Thu, Dec 17, 2020 at 11:00 AM Tomasz Urbaszek <tu...@apache.org> wrote:
>
>> There's official Apache Airflow blogpost with similar content to Ash mail:
>>
>> https://airflow.apache.org/blog/airflow-two-point-oh-is-here/
>>
>> On Thu, Dec 17, 2020 at 6:59 PM Ry Walker <ry...@rywalker.com> wrote:
>>
>>> we have a webpage on it https://www.astronomer.io/airflow and a blogpost https://www.astronomer.io/blog/introducing-airflow-2-0
>>>
>>> On Thu, Dec 17, 2020 at 12:54 PM Shaw, Damian P. <da...@credit-suisse.com> wrote:
>>>
>>>> Great news! Is there a single web page that highlights these major features as youā€™ve listed them?
>>>>
>>>> Damian
>>>>
>>>> From: Ash Berlin-Taylor <as...@apache.org>
>>>> Sent: Thursday, December 17, 2020 12:36
>>>> To: users@airflow.apache.org
>>>> Cc: announce@apache.org; dev@airflow.apache.org
>>>> Subject: Apache Airflow 2.0.0 is released!
>>>>
>>>> I am proud to announce that Apache Airflow 2.0.0 has been released.
>>>>
>>>> The source release, as well as the binary "wheel" release (no sdist this time), are available here
>>>>
>>>> We also made this version available on PyPi for convenience (`pip install apache-airflow`):
>>>>
>>>> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>>>>
>>>> The documentation is available on:
>>>>
>>>> https://airflow.apache.org/
>>>>
>>>> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>>>>
>>>> Docker images will be available shortly -- check out https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0 for it to appear
>>>>
>>>> The full changelog is about 3,000 lines long (already excluding everything backported to 1.10), so for now Iā€™ll simply share some of the major features in 2.0.0 compared to 1.10.14:
>>>>
>>>> A new way of writing dags: the TaskFlow API (AIP-31)
>>>>
>>>> (Known in 2.0.0alphas as Functional DAGs.)
>>>>
>>>> DAGs are now much much nicer to author especially when using PythonOperator. Dependencies are handled more clearly and XCom is nicer to use
>>>>
>>>> Read more here:
>>>>
>>>> [TaskFlow API Tutorial](http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html)
>>>>
>>>> [TaskFlow API Documentation](https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows)
>>>>
>>>> A quick teaser of what DAGs can now look like:
>>>>
>>>> ```
>>>>
>>>> from airflow.decorators import dag, task
>>>> from airflow.utils.dates import days_ago
>>>>
>>>> @dag(default_args={'owner': 'airflow'}, schedule_interval=None, start_date=days_ago(2))
>>>> def tutorial_taskflow_api_etl():
>>>> @task
>>>> def extract():
>>>> return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>>>>
>>>> @task
>>>> def transform(order_data_dict: dict) -> dict:
>>>> total_order_value = 0
>>>>
>>>> for value in order_data_dict.values():
>>>> total_order_value += value
>>>>
>>>> return {"total_order_value": total_order_value}
>>>>
>>>> @task()
>>>> def load(total_order_value: float):
>>>>
>>>> print("Total order value is: %.2f" % total_order_value)
>>>>
>>>> order_data = extract()
>>>> order_summary = transform(order_data)
>>>> load(order_summary["total_order_value"])
>>>>
>>>> tutorial_etl_dag = tutorial_taskflow_api_etl()
>>>>
>>>> ```
>>>>
>>>> Fully specified REST API (AIP-32)
>>>>
>>>> We now have a fully supported, no-longer-experimental API with a comprehensive OpenAPI specification
>>>>
>>>> Read more here:
>>>>
>>>> [REST API Documentation](http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html).
>>>>
>>>> Massive Scheduler performance improvements
>>>>
>>>> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we significantly improved the performance of the Airflow Scheduler. It now starts tasks much, MUCH quicker.
>>>>
>>>> Over at Astronomer.io weā€™ve [benchmarked the schedulerā€”itā€™s fast](https://www.astronomer.io/blog/airflow-2-scheduler) (we had to triple check the numbers as we donā€™t quite believe them at first!)
>>>>
>>>> Scheduler is now HA compatible (AIP-15)
>>>>
>>>> Itā€™s now possible and supported to run more than a single scheduler instance. This is super useful for both resiliency (in case a scheduler goes down) and scheduling performance.
>>>>
>>>> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, and MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>>>>
>>>> Thereā€™s no config or other set up required to run more than one schedulerā€”just start up a scheduler somewhere else (ensuring it has access to the DAG files) and it will cooperate with your existing schedulers through the database.
>>>>
>>>> For more information, read the [Scheduler HA documentation](http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler).
>>>>
>>>> Task Groups (AIP-34)
>>>>
>>>> SubDAGs were commonly used for grouping tasks in the UI, but they had many drawbacks in their execution behaviour (primarirly that they only executed a single task in parallel!) To improve this experience, weā€™ve introduced ā€œTask Groupsā€: a method for organizing tasks which provides the same grouping behaviour as a subdag without any of the execution-time drawbacks.
>>>>
>>>> SubDAGs will still work for now, but we think that any previous use of SubDAGs can now be replaced with task groups. If you find an example where this isnā€™t the case, please let us know by opening an issue on GitHub
>>>>
>>>> For more information, check out the [Task Group documentation](http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup).
>>>>
>>>> Refreshed UI
>>>>
>>>> Weā€™ve given the Airflow UI a visual refresh and updated some of the styling. Check out the [UI section of the docs](http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html) for screenshots.
>>>>
>>>> We have also added an option to auto-refresh task states in Graph View so you no longer need to continuously press the refresh button :).
>>>>
>>>> ## Smart Sensors for reduced load from sensors (AIP-17)
>>>>
>>>> If you make heavy use of sensors in your Airflow cluster, you might find that sensor execution takes up a significant proportion of your cluster even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called ā€œSmart Sensorsā€.
>>>>
>>>> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is ā€œstableā€/usable, but we reserve the right to make backwards incompatible changes to it in a future release (if we have to. Weā€™ll try very hard not to!)
>>>>
>>>> Read more about it in the [Smart Sensors documentation](https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html).
>>>>
>>>> Simplified KubernetesExecutor
>>>>
>>>> For Airflow 2.0, we have re-architected the KubernetesExecutor in a fashion that is simultaneously faster, easier to understand, and more flexible for Airflow users. Users will now be able to access the full Kubernetes API to create a .yaml pod_template_file instead of specifying parameters in their airflow.cfg.
>>>>
>>>> We have also replaced the executor_config dictionary with the pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1 setting override. These changes have removed over three thousand lines of code from the KubernetesExecutor, which makes it run faster and creates fewer potential errors.
>>>>
>>>> Read more here:
>>>>
>>>> Docs on [pod_template_file](https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file)
>>>>
>>>> Docs on [pod_override](https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override)
>>>>
>>>> Airflow core and providers: Splitting Airflow into 60+ packages
>>>>
>>>> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve split Airflow into core and 61 (for now) provider packages. Each provider package is for either a particular external service (Google, Amazon, Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol (HTTP/FTP). Now you can create a custom Airflow installation from ā€œbuildingā€ blocks and choose only what you need, plus add whatever other requirements you might have. Some of the common providers are installed automatically (ftp, http, imap, sqlite) as they are commonly used. Other providers are automatically installed when you choose appropriate extras when installing Airflow.
>>>>
>>>> The provider architecture should make it much easier to get a fully customized, yet consistent runtime with the right set of Python dependencies.
>>>>
>>>> But thatā€™s not all: you can write your own custom providers and add things like custom connection types, customizations of the Connection Forms, and extra links to your operators in a manageable way. You can build your own provider and install it as a Python package and have your customizations visible right in the Airflow UI.
>>>>
>>>> Our very own Jarek Potiuk has written about [providers in much more detail](https://www.polidea.com/blog/airflow-2-providers/) on the Polidea blog.
>>>>
>>>> Docs on the [providers concept and writing custom providers](http://airflow.apache.org/docs/apache-airflow-providers/)
>>>>
>>>> Docs on the [all providers packages available](http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html)
>>>>
>>>> Security
>>>>
>>>> As part of Airflow 2.0 effort, there has been a conscious focus on Security and reducing areas of exposure. This is represented across different functional areas in different forms. For example, in the new REST API, all operations now require authorization. Similarly, in the configuration settings, the Fernet key is now required to be specified.
>>>>
>>>> Configuration
>>>>
>>>> Configuration in the form of the airflow.cfg file has been rationalized further in distinct sections, specifically around ā€œcoreā€. Additionally, a significant amount of configuration options have been deprecated or moved to individual component-specific configuration files, such as the pod-template-file for Kubernetes execution-related configuration.
>>>>
>>>> Thanks to all of you
>>>>
>>>> Weā€™ve tried to make as few breaking changes as possible and to provide deprecation path in the code, especially in the case of anything called in the DAG. That said, please read throughUPDATING.md to check what might affect you. For example: r We re-organized the layout of operators (they now all live under airflow.providers.*) but the old names should continue to work - youā€™ll just notice a lot of DeprecationWarnings that need to be fixed up.
>>>>
>>>> Thank you so much to all the contributors who got us to this point, in no particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang, James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others who keep making Airflow better for everyone.
>>>>
>>>> ==============================================================================
>>>> Please access the attached hyperlink for an important electronic communications disclaimer:
>>>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>>>> ==============================================================================

Re: Apache Airflow 2.0.0 is released!

Posted by Gerard Casas Saez <gc...@twitter.com.INVALID>.
Yass! šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰ šŸŽ‰
Great news!

Gerard Casas Saez
Twitter | Cortex | @casassaez <http://twitter.com/casassaez>


On Thu, Dec 17, 2020 at 11:00 AM Tomasz Urbaszek <tu...@apache.org>
wrote:

> There's official Apache Airflow blogpost with similar content to Ash mail:
> https://airflow.apache.org/blog/airflow-two-point-oh-is-here/
>
> On Thu, Dec 17, 2020 at 6:59 PM Ry Walker <ry...@rywalker.com> wrote:
>
>> we have a webpage on it https://www.astronomer.io/airflow and a blogpost
>> https://www.astronomer.io/blog/introducing-airflow-2-0
>>
>> On Thu, Dec 17, 2020 at 12:54 PM Shaw, Damian P. <
>> damian.shaw.2@credit-suisse.com> wrote:
>>
>>> Great news! Is there a single web page that highlights these major
>>> features as youā€™ve listed them?
>>>
>>>
>>>
>>> Damian
>>>
>>>
>>>
>>> *From:* Ash Berlin-Taylor <as...@apache.org>
>>> *Sent:* Thursday, December 17, 2020 12:36
>>> *To:* users@airflow.apache.org
>>> *Cc:* announce@apache.org; dev@airflow.apache.org
>>> *Subject:* Apache Airflow 2.0.0 is released!
>>>
>>>
>>>
>>> I am proud to announce that Apache Airflow 2.0.0 has been released.
>>>
>>>
>>>
>>> The source release, as well as the binary "wheel" release (no sdist this
>>> time), are available here
>>>
>>>
>>>
>>> We also made this version available on PyPi for convenience (`pip
>>> install apache-airflow`):
>>>
>>>
>>>
>>> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>>>
>>>
>>>
>>> The documentation is available on:
>>>
>>> https://airflow.apache.org/
>>>
>>> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>>>
>>>
>>>
>>> Docker images will be available shortly -- check out
>>> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
>>> for it to appear
>>>
>>>
>>>
>>>
>>>
>>> The full changelog is about 3,000 lines long (already excluding
>>> everything backported to 1.10), so for now Iā€™ll simply share some of the
>>> major features in 2.0.0 compared to 1.10.14:
>>>
>>>
>>>
>>> *A new way of writing dags: the TaskFlow API (AIP-31)*
>>>
>>>
>>>
>>> (Known in 2.0.0alphas as Functional DAGs.)
>>>
>>>
>>>
>>> DAGs are now much much nicer to author especially when using
>>> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
>>> use
>>>
>>>
>>>
>>> Read more here:
>>>
>>>
>>>
>>> TaskFlow API Tutorial
>>> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
>>>
>>> TaskFlow API Documentation
>>> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>>>
>>>
>>>
>>> A quick teaser of what DAGs can now look like:
>>>
>>>
>>>
>>> ```
>>>
>>> from airflow.decorators import dag, task
>>> from airflow.utils.dates import days_ago
>>>
>>> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
>>> start_date=days_ago(2))
>>> def tutorial_taskflow_api_etl():
>>>    @task
>>>    def extract():
>>>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>>>
>>>    @task
>>>    def transform(order_data_dict: dict) -> dict:
>>>        total_order_value = 0
>>>
>>>        for value in order_data_dict.values():
>>>            total_order_value += value
>>>
>>>        return {"total_order_value": total_order_value}
>>>
>>>    @task()
>>>    def load(total_order_value: float):
>>>
>>>        print("Total order value is: %.2f" % total_order_value)
>>>
>>>    order_data = extract()
>>>    order_summary = transform(order_data)
>>>    load(order_summary["total_order_value"])
>>>
>>> tutorial_etl_dag = tutorial_taskflow_api_etl()
>>>
>>> ```
>>>
>>>
>>>
>>> *Fully specified REST API (AIP-32)*
>>>
>>>
>>>
>>> We now have a fully supported, no-longer-experimental API with a
>>> comprehensive OpenAPI specification
>>>
>>>
>>>
>>> Read more here:
>>>
>>>
>>>
>>> REST API Documentation
>>> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
>>> .
>>>
>>>
>>>
>>> *Massive Scheduler performance improvements*
>>>
>>>
>>>
>>> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did,
>>> we significantly improved the performance of the Airflow Scheduler. It now
>>> starts tasks much, MUCH quicker.
>>>
>>>
>>>
>>> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
>>> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple
>>> check the numbers as we donā€™t quite believe them at first!)
>>>
>>>
>>>
>>> *Scheduler is now HA compatible (AIP-15)*
>>>
>>>
>>>
>>> Itā€™s now possible and supported to run more than a single scheduler
>>> instance. This is super useful for both resiliency (in case a scheduler
>>> goes down) and scheduling performance.
>>>
>>>
>>>
>>> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5,
>>> and MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>>>
>>>
>>>
>>> Thereā€™s no config or other set up required to run more than one
>>> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
>>> to the DAG files) and it will cooperate with your existing schedulers
>>> through the database.
>>>
>>>
>>>
>>> For more information, read the Scheduler HA documentation
>>> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
>>> .
>>>
>>>
>>>
>>> *Task Groups (AIP-34)*
>>>
>>>
>>>
>>> SubDAGs were commonly used for grouping tasks in the UI, but they had
>>> many drawbacks in their execution behaviour (primarirly that they only
>>> executed a single task in parallel!) To improve this experience, weā€™ve
>>> introduced ā€œTask Groupsā€: a method for organizing tasks which provides the
>>> same grouping behaviour as a subdag without any of the execution-time
>>> drawbacks.
>>>
>>>
>>>
>>> SubDAGs will still work for now, but we think that any previous use of
>>> SubDAGs can now be replaced with task groups. If you find an example where
>>> this isnā€™t the case, please let us know by opening an issue on GitHub
>>>
>>>
>>>
>>> For more information, check out the Task Group documentation
>>> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
>>> .
>>>
>>>
>>>
>>> *Refreshed UI*
>>>
>>>
>>>
>>> Weā€™ve given the Airflow UI a visual refresh and updated some of the
>>> styling. Check out the UI section of the docs
>>> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for
>>> screenshots.
>>>
>>>
>>>
>>> We have also added an option to auto-refresh task states in Graph View
>>> so you no longer need to continuously press the refresh button :).
>>>
>>>
>>>
>>> ## Smart Sensors for reduced load from sensors (AIP-17)
>>>
>>>
>>>
>>> If you make heavy use of sensors in your Airflow cluster, you might find
>>> that sensor execution takes up a significant proportion of your cluster
>>> even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called
>>> ā€œSmart Sensorsā€.
>>>
>>>
>>>
>>> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and
>>> is ā€œstableā€/usable, but we reserve the right to make backwards incompatible
>>> changes to it in a future release (if we have to. Weā€™ll try very hard not
>>> to!)
>>>
>>>
>>>
>>> Read more about it in the Smart Sensors documentation
>>> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>
>>> .
>>>
>>>
>>>
>>> *Simplified KubernetesExecutor*
>>>
>>>
>>>
>>> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
>>> fashion that is simultaneously faster, easier to understand, and more
>>> flexible for Airflow users. Users will now be able to access the full
>>> Kubernetes API to create a .yaml pod_template_file instead of specifying
>>> parameters in their airflow.cfg.
>>>
>>>
>>>
>>> We have also replaced the executor_config dictionary with the
>>> pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1
>>> setting override. These changes have removed over three thousand lines of
>>> code from the KubernetesExecutor, which makes it run faster and creates
>>> fewer potential errors.
>>>
>>>
>>>
>>> Read more here:
>>>
>>>
>>>
>>> Docs on pod_template_file
>>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
>>>
>>> Docs on pod_override
>>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>>>
>>>
>>>
>>> *Airflow core and providers: Splitting Airflow into 60+ packages*
>>>
>>>
>>>
>>> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
>>> split Airflow into core and 61 (for now) provider packages. Each provider
>>> package is for either a particular external service (Google, Amazon,
>>> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
>>> (HTTP/FTP). Now you can create a custom Airflow installation from
>>> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
>>> requirements you might have. Some of the common providers are installed
>>> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
>>> providers are automatically installed when you choose appropriate extras
>>> when installing Airflow.
>>>
>>>
>>>
>>> The provider architecture should make it much easier to get a fully
>>> customized, yet consistent runtime with the right set of Python
>>> dependencies.
>>>
>>>
>>>
>>> But thatā€™s not all: you can write your own custom providers and add
>>> things like custom connection types, customizations of the Connection
>>> Forms, and extra links to your operators in a manageable way. You can build
>>> your own provider and install it as a Python package and have your
>>> customizations visible right in the Airflow UI.
>>>
>>>
>>>
>>> Our very own Jarek Potiuk has written about providers in much more
>>> detail <https://www.polidea.com/blog/airflow-2-providers/> on the
>>> Polidea blog.
>>>
>>>
>>>
>>> Docs on the providers concept and writing custom providers
>>> <http://airflow.apache.org/docs/apache-airflow-providers/>
>>>
>>> Docs on the all providers packages available
>>> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>>>
>>>
>>>
>>> *Security*
>>>
>>>
>>>
>>> As part of Airflow 2.0 effort, there has been a conscious focus on
>>> Security and reducing areas of exposure. This is represented across
>>> different functional areas in different forms. For example, in the new REST
>>> API, all operations now require authorization. Similarly, in the
>>> configuration settings, the Fernet key is now required to be specified.
>>>
>>>
>>>
>>> *Configuration*
>>>
>>>
>>>
>>> Configuration in the form of the airflow.cfg file has been rationalized
>>> further in distinct sections, specifically around ā€œcoreā€. Additionally, a
>>> significant amount of configuration options have been deprecated or moved
>>> to individual component-specific configuration files, such as the
>>> pod-template-file for Kubernetes execution-related configuration.
>>>
>>>
>>>
>>> *Thanks to all of you*
>>>
>>>
>>>
>>> Weā€™ve tried to make as few breaking changes as possible and to provide
>>> deprecation path in the code, especially in the case of anything called in
>>> the DAG. That said, please read throughUPDATING.md to check what might
>>> affect you. For example: r We re-organized the layout of operators (they
>>> now all live under airflow.providers.*) but the old names should continue
>>> to work - youā€™ll just notice a lot of DeprecationWarnings that need to be
>>> fixed up.
>>>
>>>
>>>
>>> Thank you so much to all the contributors who got us to this point, in
>>> no particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
>>> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
>>> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
>>> who keep making Airflow better for everyone.
>>>
>>>
>>> ==============================================================================
>>> Please access the attached hyperlink for an important electronic
>>> communications disclaimer:
>>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>>>
>>> ==============================================================================
>>>
>>

Re: Apache Airflow 2.0.0 is released!

Posted by Tomasz Urbaszek <tu...@apache.org>.
There's official Apache Airflow blogpost with similar content to Ash mail:
https://airflow.apache.org/blog/airflow-two-point-oh-is-here/

On Thu, Dec 17, 2020 at 6:59 PM Ry Walker <ry...@rywalker.com> wrote:

> we have a webpage on it https://www.astronomer.io/airflow and a blogpost
> https://www.astronomer.io/blog/introducing-airflow-2-0
>
> On Thu, Dec 17, 2020 at 12:54 PM Shaw, Damian P. <
> damian.shaw.2@credit-suisse.com> wrote:
>
>> Great news! Is there a single web page that highlights these major
>> features as youā€™ve listed them?
>>
>>
>>
>> Damian
>>
>>
>>
>> *From:* Ash Berlin-Taylor <as...@apache.org>
>> *Sent:* Thursday, December 17, 2020 12:36
>> *To:* users@airflow.apache.org
>> *Cc:* announce@apache.org; dev@airflow.apache.org
>> *Subject:* Apache Airflow 2.0.0 is released!
>>
>>
>>
>> I am proud to announce that Apache Airflow 2.0.0 has been released.
>>
>>
>>
>> The source release, as well as the binary "wheel" release (no sdist this
>> time), are available here
>>
>>
>>
>> We also made this version available on PyPi for convenience (`pip install
>> apache-airflow`):
>>
>>
>>
>> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>>
>>
>>
>> The documentation is available on:
>>
>> https://airflow.apache.org/
>>
>> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>>
>>
>>
>> Docker images will be available shortly -- check out
>> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
>> for it to appear
>>
>>
>>
>>
>>
>> The full changelog is about 3,000 lines long (already excluding
>> everything backported to 1.10), so for now Iā€™ll simply share some of the
>> major features in 2.0.0 compared to 1.10.14:
>>
>>
>>
>> *A new way of writing dags: the TaskFlow API (AIP-31)*
>>
>>
>>
>> (Known in 2.0.0alphas as Functional DAGs.)
>>
>>
>>
>> DAGs are now much much nicer to author especially when using
>> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
>> use
>>
>>
>>
>> Read more here:
>>
>>
>>
>> TaskFlow API Tutorial
>> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
>>
>> TaskFlow API Documentation
>> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>>
>>
>>
>> A quick teaser of what DAGs can now look like:
>>
>>
>>
>> ```
>>
>> from airflow.decorators import dag, task
>> from airflow.utils.dates import days_ago
>>
>> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
>> start_date=days_ago(2))
>> def tutorial_taskflow_api_etl():
>>    @task
>>    def extract():
>>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>>
>>    @task
>>    def transform(order_data_dict: dict) -> dict:
>>        total_order_value = 0
>>
>>        for value in order_data_dict.values():
>>            total_order_value += value
>>
>>        return {"total_order_value": total_order_value}
>>
>>    @task()
>>    def load(total_order_value: float):
>>
>>        print("Total order value is: %.2f" % total_order_value)
>>
>>    order_data = extract()
>>    order_summary = transform(order_data)
>>    load(order_summary["total_order_value"])
>>
>> tutorial_etl_dag = tutorial_taskflow_api_etl()
>>
>> ```
>>
>>
>>
>> *Fully specified REST API (AIP-32)*
>>
>>
>>
>> We now have a fully supported, no-longer-experimental API with a
>> comprehensive OpenAPI specification
>>
>>
>>
>> Read more here:
>>
>>
>>
>> REST API Documentation
>> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
>> .
>>
>>
>>
>> *Massive Scheduler performance improvements*
>>
>>
>>
>> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we
>> significantly improved the performance of the Airflow Scheduler. It now
>> starts tasks much, MUCH quicker.
>>
>>
>>
>> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
>> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple
>> check the numbers as we donā€™t quite believe them at first!)
>>
>>
>>
>> *Scheduler is now HA compatible (AIP-15)*
>>
>>
>>
>> Itā€™s now possible and supported to run more than a single scheduler
>> instance. This is super useful for both resiliency (in case a scheduler
>> goes down) and scheduling performance.
>>
>>
>>
>> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5,
>> and MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>>
>>
>>
>> Thereā€™s no config or other set up required to run more than one
>> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
>> to the DAG files) and it will cooperate with your existing schedulers
>> through the database.
>>
>>
>>
>> For more information, read the Scheduler HA documentation
>> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
>> .
>>
>>
>>
>> *Task Groups (AIP-34)*
>>
>>
>>
>> SubDAGs were commonly used for grouping tasks in the UI, but they had
>> many drawbacks in their execution behaviour (primarirly that they only
>> executed a single task in parallel!) To improve this experience, weā€™ve
>> introduced ā€œTask Groupsā€: a method for organizing tasks which provides the
>> same grouping behaviour as a subdag without any of the execution-time
>> drawbacks.
>>
>>
>>
>> SubDAGs will still work for now, but we think that any previous use of
>> SubDAGs can now be replaced with task groups. If you find an example where
>> this isnā€™t the case, please let us know by opening an issue on GitHub
>>
>>
>>
>> For more information, check out the Task Group documentation
>> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
>> .
>>
>>
>>
>> *Refreshed UI*
>>
>>
>>
>> Weā€™ve given the Airflow UI a visual refresh and updated some of the
>> styling. Check out the UI section of the docs
>> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for screenshots.
>>
>>
>>
>> We have also added an option to auto-refresh task states in Graph View so
>> you no longer need to continuously press the refresh button :).
>>
>>
>>
>> ## Smart Sensors for reduced load from sensors (AIP-17)
>>
>>
>>
>> If you make heavy use of sensors in your Airflow cluster, you might find
>> that sensor execution takes up a significant proportion of your cluster
>> even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called
>> ā€œSmart Sensorsā€.
>>
>>
>>
>> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is
>> ā€œstableā€/usable, but we reserve the right to make backwards incompatible
>> changes to it in a future release (if we have to. Weā€™ll try very hard not
>> to!)
>>
>>
>>
>> Read more about it in the Smart Sensors documentation
>> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>
>> .
>>
>>
>>
>> *Simplified KubernetesExecutor*
>>
>>
>>
>> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
>> fashion that is simultaneously faster, easier to understand, and more
>> flexible for Airflow users. Users will now be able to access the full
>> Kubernetes API to create a .yaml pod_template_file instead of specifying
>> parameters in their airflow.cfg.
>>
>>
>>
>> We have also replaced the executor_config dictionary with the
>> pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1
>> setting override. These changes have removed over three thousand lines of
>> code from the KubernetesExecutor, which makes it run faster and creates
>> fewer potential errors.
>>
>>
>>
>> Read more here:
>>
>>
>>
>> Docs on pod_template_file
>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
>>
>> Docs on pod_override
>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>>
>>
>>
>> *Airflow core and providers: Splitting Airflow into 60+ packages*
>>
>>
>>
>> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
>> split Airflow into core and 61 (for now) provider packages. Each provider
>> package is for either a particular external service (Google, Amazon,
>> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
>> (HTTP/FTP). Now you can create a custom Airflow installation from
>> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
>> requirements you might have. Some of the common providers are installed
>> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
>> providers are automatically installed when you choose appropriate extras
>> when installing Airflow.
>>
>>
>>
>> The provider architecture should make it much easier to get a fully
>> customized, yet consistent runtime with the right set of Python
>> dependencies.
>>
>>
>>
>> But thatā€™s not all: you can write your own custom providers and add
>> things like custom connection types, customizations of the Connection
>> Forms, and extra links to your operators in a manageable way. You can build
>> your own provider and install it as a Python package and have your
>> customizations visible right in the Airflow UI.
>>
>>
>>
>> Our very own Jarek Potiuk has written about providers in much more detail
>> <https://www.polidea.com/blog/airflow-2-providers/> on the Polidea blog.
>>
>>
>>
>> Docs on the providers concept and writing custom providers
>> <http://airflow.apache.org/docs/apache-airflow-providers/>
>>
>> Docs on the all providers packages available
>> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>>
>>
>>
>> *Security*
>>
>>
>>
>> As part of Airflow 2.0 effort, there has been a conscious focus on
>> Security and reducing areas of exposure. This is represented across
>> different functional areas in different forms. For example, in the new REST
>> API, all operations now require authorization. Similarly, in the
>> configuration settings, the Fernet key is now required to be specified.
>>
>>
>>
>> *Configuration*
>>
>>
>>
>> Configuration in the form of the airflow.cfg file has been rationalized
>> further in distinct sections, specifically around ā€œcoreā€. Additionally, a
>> significant amount of configuration options have been deprecated or moved
>> to individual component-specific configuration files, such as the
>> pod-template-file for Kubernetes execution-related configuration.
>>
>>
>>
>> *Thanks to all of you*
>>
>>
>>
>> Weā€™ve tried to make as few breaking changes as possible and to provide
>> deprecation path in the code, especially in the case of anything called in
>> the DAG. That said, please read throughUPDATING.md to check what might
>> affect you. For example: r We re-organized the layout of operators (they
>> now all live under airflow.providers.*) but the old names should continue
>> to work - youā€™ll just notice a lot of DeprecationWarnings that need to be
>> fixed up.
>>
>>
>>
>> Thank you so much to all the contributors who got us to this point, in no
>> particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
>> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
>> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
>> who keep making Airflow better for everyone.
>>
>>
>> ==============================================================================
>> Please access the attached hyperlink for an important electronic
>> communications disclaimer:
>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>>
>> ==============================================================================
>>
>

Re: Apache Airflow 2.0.0 is released!

Posted by Ry Walker <ry...@rywalker.com>.
we have a webpage on it https://www.astronomer.io/airflow and a blogpost
https://www.astronomer.io/blog/introducing-airflow-2-0

On Thu, Dec 17, 2020 at 12:54 PM Shaw, Damian P. <
damian.shaw.2@credit-suisse.com> wrote:

> Great news! Is there a single web page that highlights these major
> features as youā€™ve listed them?
>
>
>
> Damian
>
>
>
> *From:* Ash Berlin-Taylor <as...@apache.org>
> *Sent:* Thursday, December 17, 2020 12:36
> *To:* users@airflow.apache.org
> *Cc:* announce@apache.org; dev@airflow.apache.org
> *Subject:* Apache Airflow 2.0.0 is released!
>
>
>
> I am proud to announce that Apache Airflow 2.0.0 has been released.
>
>
>
> The source release, as well as the binary "wheel" release (no sdist this
> time), are available here
>
>
>
> We also made this version available on PyPi for convenience (`pip install
> apache-airflow`):
>
>
>
> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>
>
>
> The documentation is available on:
>
> https://airflow.apache.org/
>
> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>
>
>
> Docker images will be available shortly -- check out
> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
> for it to appear
>
>
>
>
>
> The full changelog is about 3,000 lines long (already excluding everything
> backported to 1.10), so for now Iā€™ll simply share some of the major
> features in 2.0.0 compared to 1.10.14:
>
>
>
> *A new way of writing dags: the TaskFlow API (AIP-31)*
>
>
>
> (Known in 2.0.0alphas as Functional DAGs.)
>
>
>
> DAGs are now much much nicer to author especially when using
> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
> use
>
>
>
> Read more here:
>
>
>
> TaskFlow API Tutorial
> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
>
> TaskFlow API Documentation
> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>
>
>
> A quick teaser of what DAGs can now look like:
>
>
>
> ```
>
> from airflow.decorators import dag, task
> from airflow.utils.dates import days_ago
>
> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
> start_date=days_ago(2))
> def tutorial_taskflow_api_etl():
>    @task
>    def extract():
>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>
>    @task
>    def transform(order_data_dict: dict) -> dict:
>        total_order_value = 0
>
>        for value in order_data_dict.values():
>            total_order_value += value
>
>        return {"total_order_value": total_order_value}
>
>    @task()
>    def load(total_order_value: float):
>
>        print("Total order value is: %.2f" % total_order_value)
>
>    order_data = extract()
>    order_summary = transform(order_data)
>    load(order_summary["total_order_value"])
>
> tutorial_etl_dag = tutorial_taskflow_api_etl()
>
> ```
>
>
>
> *Fully specified REST API (AIP-32)*
>
>
>
> We now have a fully supported, no-longer-experimental API with a
> comprehensive OpenAPI specification
>
>
>
> Read more here:
>
>
>
> REST API Documentation
> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
> .
>
>
>
> *Massive Scheduler performance improvements*
>
>
>
> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we
> significantly improved the performance of the Airflow Scheduler. It now
> starts tasks much, MUCH quicker.
>
>
>
> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple
> check the numbers as we donā€™t quite believe them at first!)
>
>
>
> *Scheduler is now HA compatible (AIP-15)*
>
>
>
> Itā€™s now possible and supported to run more than a single scheduler
> instance. This is super useful for both resiliency (in case a scheduler
> goes down) and scheduling performance.
>
>
>
> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, and
> MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>
>
>
> Thereā€™s no config or other set up required to run more than one
> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
> to the DAG files) and it will cooperate with your existing schedulers
> through the database.
>
>
>
> For more information, read the Scheduler HA documentation
> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
> .
>
>
>
> *Task Groups (AIP-34)*
>
>
>
> SubDAGs were commonly used for grouping tasks in the UI, but they had many
> drawbacks in their execution behaviour (primarirly that they only executed
> a single task in parallel!) To improve this experience, weā€™ve introduced
> ā€œTask Groupsā€: a method for organizing tasks which provides the same
> grouping behaviour as a subdag without any of the execution-time drawbacks.
>
>
>
> SubDAGs will still work for now, but we think that any previous use of
> SubDAGs can now be replaced with task groups. If you find an example where
> this isnā€™t the case, please let us know by opening an issue on GitHub
>
>
>
> For more information, check out the Task Group documentation
> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
> .
>
>
>
> *Refreshed UI*
>
>
>
> Weā€™ve given the Airflow UI a visual refresh and updated some of the
> styling. Check out the UI section of the docs
> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for screenshots.
>
>
>
> We have also added an option to auto-refresh task states in Graph View so
> you no longer need to continuously press the refresh button :).
>
>
>
> ## Smart Sensors for reduced load from sensors (AIP-17)
>
>
>
> If you make heavy use of sensors in your Airflow cluster, you might find
> that sensor execution takes up a significant proportion of your cluster
> even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called
> ā€œSmart Sensorsā€.
>
>
>
> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is
> ā€œstableā€/usable, but we reserve the right to make backwards incompatible
> changes to it in a future release (if we have to. Weā€™ll try very hard not
> to!)
>
>
>
> Read more about it in the Smart Sensors documentation
> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>.
>
>
>
> *Simplified KubernetesExecutor*
>
>
>
> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
> fashion that is simultaneously faster, easier to understand, and more
> flexible for Airflow users. Users will now be able to access the full
> Kubernetes API to create a .yaml pod_template_file instead of specifying
> parameters in their airflow.cfg.
>
>
>
> We have also replaced the executor_config dictionary with the pod_override
> parameter, which takes a Kubernetes V1Pod object for a 1:1 setting
> override. These changes have removed over three thousand lines of code from
> the KubernetesExecutor, which makes it run faster and creates fewer
> potential errors.
>
>
>
> Read more here:
>
>
>
> Docs on pod_template_file
> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
>
> Docs on pod_override
> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>
>
>
> *Airflow core and providers: Splitting Airflow into 60+ packages*
>
>
>
> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
> split Airflow into core and 61 (for now) provider packages. Each provider
> package is for either a particular external service (Google, Amazon,
> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
> (HTTP/FTP). Now you can create a custom Airflow installation from
> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
> requirements you might have. Some of the common providers are installed
> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
> providers are automatically installed when you choose appropriate extras
> when installing Airflow.
>
>
>
> The provider architecture should make it much easier to get a fully
> customized, yet consistent runtime with the right set of Python
> dependencies.
>
>
>
> But thatā€™s not all: you can write your own custom providers and add things
> like custom connection types, customizations of the Connection Forms, and
> extra links to your operators in a manageable way. You can build your own
> provider and install it as a Python package and have your customizations
> visible right in the Airflow UI.
>
>
>
> Our very own Jarek Potiuk has written about providers in much more detail
> <https://www.polidea.com/blog/airflow-2-providers/> on the Polidea blog.
>
>
>
> Docs on the providers concept and writing custom providers
> <http://airflow.apache.org/docs/apache-airflow-providers/>
>
> Docs on the all providers packages available
> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>
>
>
> *Security*
>
>
>
> As part of Airflow 2.0 effort, there has been a conscious focus on
> Security and reducing areas of exposure. This is represented across
> different functional areas in different forms. For example, in the new REST
> API, all operations now require authorization. Similarly, in the
> configuration settings, the Fernet key is now required to be specified.
>
>
>
> *Configuration*
>
>
>
> Configuration in the form of the airflow.cfg file has been rationalized
> further in distinct sections, specifically around ā€œcoreā€. Additionally, a
> significant amount of configuration options have been deprecated or moved
> to individual component-specific configuration files, such as the
> pod-template-file for Kubernetes execution-related configuration.
>
>
>
> *Thanks to all of you*
>
>
>
> Weā€™ve tried to make as few breaking changes as possible and to provide
> deprecation path in the code, especially in the case of anything called in
> the DAG. That said, please read throughUPDATING.md to check what might
> affect you. For example: r We re-organized the layout of operators (they
> now all live under airflow.providers.*) but the old names should continue
> to work - youā€™ll just notice a lot of DeprecationWarnings that need to be
> fixed up.
>
>
>
> Thank you so much to all the contributors who got us to this point, in no
> particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
> who keep making Airflow better for everyone.
>
>
> ==============================================================================
> Please access the attached hyperlink for an important electronic
> communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>
> ==============================================================================
>

Re: Apache Airflow 2.0.0 is released!

Posted by Jarek Potiuk <Ja...@polidea.com>.
WOHOO!

On Thu, Dec 17, 2020 at 6:54 PM Shaw, Damian P. <
damian.shaw.2@credit-suisse.com> wrote:

> Great news! Is there a single web page that highlights these major
> features as youā€™ve listed them?
>
>
>
> Damian
>
>
>
> *From:* Ash Berlin-Taylor <as...@apache.org>
> *Sent:* Thursday, December 17, 2020 12:36
> *To:* users@airflow.apache.org
> *Cc:* announce@apache.org; dev@airflow.apache.org
> *Subject:* Apache Airflow 2.0.0 is released!
>
>
>
> I am proud to announce that Apache Airflow 2.0.0 has been released.
>
>
>
> The source release, as well as the binary "wheel" release (no sdist this
> time), are available here
>
>
>
> We also made this version available on PyPi for convenience (`pip install
> apache-airflow`):
>
>
>
> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>
>
>
> The documentation is available on:
>
> https://airflow.apache.org/
>
> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>
>
>
> Docker images will be available shortly -- check out
> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
> for it to appear
>
>
>
>
>
> The full changelog is about 3,000 lines long (already excluding everything
> backported to 1.10), so for now Iā€™ll simply share some of the major
> features in 2.0.0 compared to 1.10.14:
>
>
>
> *A new way of writing dags: the TaskFlow API (AIP-31)*
>
>
>
> (Known in 2.0.0alphas as Functional DAGs.)
>
>
>
> DAGs are now much much nicer to author especially when using
> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
> use
>
>
>
> Read more here:
>
>
>
> TaskFlow API Tutorial
> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
>
> TaskFlow API Documentation
> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>
>
>
> A quick teaser of what DAGs can now look like:
>
>
>
> ```
>
> from airflow.decorators import dag, task
> from airflow.utils.dates import days_ago
>
> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
> start_date=days_ago(2))
> def tutorial_taskflow_api_etl():
>    @task
>    def extract():
>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>
>    @task
>    def transform(order_data_dict: dict) -> dict:
>        total_order_value = 0
>
>        for value in order_data_dict.values():
>            total_order_value += value
>
>        return {"total_order_value": total_order_value}
>
>    @task()
>    def load(total_order_value: float):
>
>        print("Total order value is: %.2f" % total_order_value)
>
>    order_data = extract()
>    order_summary = transform(order_data)
>    load(order_summary["total_order_value"])
>
> tutorial_etl_dag = tutorial_taskflow_api_etl()
>
> ```
>
>
>
> *Fully specified REST API (AIP-32)*
>
>
>
> We now have a fully supported, no-longer-experimental API with a
> comprehensive OpenAPI specification
>
>
>
> Read more here:
>
>
>
> REST API Documentation
> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
> .
>
>
>
> *Massive Scheduler performance improvements*
>
>
>
> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we
> significantly improved the performance of the Airflow Scheduler. It now
> starts tasks much, MUCH quicker.
>
>
>
> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple
> check the numbers as we donā€™t quite believe them at first!)
>
>
>
> *Scheduler is now HA compatible (AIP-15)*
>
>
>
> Itā€™s now possible and supported to run more than a single scheduler
> instance. This is super useful for both resiliency (in case a scheduler
> goes down) and scheduling performance.
>
>
>
> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, and
> MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>
>
>
> Thereā€™s no config or other set up required to run more than one
> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
> to the DAG files) and it will cooperate with your existing schedulers
> through the database.
>
>
>
> For more information, read the Scheduler HA documentation
> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
> .
>
>
>
> *Task Groups (AIP-34)*
>
>
>
> SubDAGs were commonly used for grouping tasks in the UI, but they had many
> drawbacks in their execution behaviour (primarirly that they only executed
> a single task in parallel!) To improve this experience, weā€™ve introduced
> ā€œTask Groupsā€: a method for organizing tasks which provides the same
> grouping behaviour as a subdag without any of the execution-time drawbacks.
>
>
>
> SubDAGs will still work for now, but we think that any previous use of
> SubDAGs can now be replaced with task groups. If you find an example where
> this isnā€™t the case, please let us know by opening an issue on GitHub
>
>
>
> For more information, check out the Task Group documentation
> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
> .
>
>
>
> *Refreshed UI*
>
>
>
> Weā€™ve given the Airflow UI a visual refresh and updated some of the
> styling. Check out the UI section of the docs
> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for screenshots.
>
>
>
> We have also added an option to auto-refresh task states in Graph View so
> you no longer need to continuously press the refresh button :).
>
>
>
> ## Smart Sensors for reduced load from sensors (AIP-17)
>
>
>
> If you make heavy use of sensors in your Airflow cluster, you might find
> that sensor execution takes up a significant proportion of your cluster
> even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called
> ā€œSmart Sensorsā€.
>
>
>
> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is
> ā€œstableā€/usable, but we reserve the right to make backwards incompatible
> changes to it in a future release (if we have to. Weā€™ll try very hard not
> to!)
>
>
>
> Read more about it in the Smart Sensors documentation
> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>.
>
>
>
> *Simplified KubernetesExecutor*
>
>
>
> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
> fashion that is simultaneously faster, easier to understand, and more
> flexible for Airflow users. Users will now be able to access the full
> Kubernetes API to create a .yaml pod_template_file instead of specifying
> parameters in their airflow.cfg.
>
>
>
> We have also replaced the executor_config dictionary with the pod_override
> parameter, which takes a Kubernetes V1Pod object for a 1:1 setting
> override. These changes have removed over three thousand lines of code from
> the KubernetesExecutor, which makes it run faster and creates fewer
> potential errors.
>
>
>
> Read more here:
>
>
>
> Docs on pod_template_file
> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
>
> Docs on pod_override
> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>
>
>
> *Airflow core and providers: Splitting Airflow into 60+ packages*
>
>
>
> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
> split Airflow into core and 61 (for now) provider packages. Each provider
> package is for either a particular external service (Google, Amazon,
> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
> (HTTP/FTP). Now you can create a custom Airflow installation from
> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
> requirements you might have. Some of the common providers are installed
> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
> providers are automatically installed when you choose appropriate extras
> when installing Airflow.
>
>
>
> The provider architecture should make it much easier to get a fully
> customized, yet consistent runtime with the right set of Python
> dependencies.
>
>
>
> But thatā€™s not all: you can write your own custom providers and add things
> like custom connection types, customizations of the Connection Forms, and
> extra links to your operators in a manageable way. You can build your own
> provider and install it as a Python package and have your customizations
> visible right in the Airflow UI.
>
>
>
> Our very own Jarek Potiuk has written about providers in much more detail
> <https://www.polidea.com/blog/airflow-2-providers/> on the Polidea blog.
>
>
>
> Docs on the providers concept and writing custom providers
> <http://airflow.apache.org/docs/apache-airflow-providers/>
>
> Docs on the all providers packages available
> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>
>
>
> *Security*
>
>
>
> As part of Airflow 2.0 effort, there has been a conscious focus on
> Security and reducing areas of exposure. This is represented across
> different functional areas in different forms. For example, in the new REST
> API, all operations now require authorization. Similarly, in the
> configuration settings, the Fernet key is now required to be specified.
>
>
>
> *Configuration*
>
>
>
> Configuration in the form of the airflow.cfg file has been rationalized
> further in distinct sections, specifically around ā€œcoreā€. Additionally, a
> significant amount of configuration options have been deprecated or moved
> to individual component-specific configuration files, such as the
> pod-template-file for Kubernetes execution-related configuration.
>
>
>
> *Thanks to all of you*
>
>
>
> Weā€™ve tried to make as few breaking changes as possible and to provide
> deprecation path in the code, especially in the case of anything called in
> the DAG. That said, please read throughUPDATING.md to check what might
> affect you. For example: r We re-organized the layout of operators (they
> now all live under airflow.providers.*) but the old names should continue
> to work - youā€™ll just notice a lot of DeprecationWarnings that need to be
> fixed up.
>
>
>
> Thank you so much to all the contributors who got us to this point, in no
> particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
> who keep making Airflow better for everyone.
>
>
> ==============================================================================
> Please access the attached hyperlink for an important electronic
> communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>
> ==============================================================================
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

RE: Apache Airflow 2.0.0 is released!

Posted by "Shaw, Damian P. " <da...@credit-suisse.com>.
Great news! Is there a single web page that highlights these major features as youā€™ve listed them?

Damian

From: Ash Berlin-Taylor <as...@apache.org>
Sent: Thursday, December 17, 2020 12:36
To: users@airflow.apache.org
Cc: announce@apache.org; dev@airflow.apache.org
Subject: Apache Airflow 2.0.0 is released!

I am proud to announce that Apache Airflow 2.0.0 has been released.

The source release, as well as the binary "wheel" release (no sdist this time), are available here

We also made this version available on PyPi for convenience (`pip install apache-airflow`):

šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0

The documentation is available on:
https://airflow.apache.org/
šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/

Docker images will be available shortly -- check out https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0 for it to appear


The full changelog is about 3,000 lines long (already excluding everything backported to 1.10), so for now Iā€™ll simply share some of the major features in 2.0.0 compared to 1.10.14:

A new way of writing dags: the TaskFlow API (AIP-31)

(Known in 2.0.0alphas as Functional DAGs.)

DAGs are now much much nicer to author especially when using PythonOperator. Dependencies are handled more clearly and XCom is nicer to use

Read more here:

TaskFlow API Tutorial<http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
TaskFlow API Documentation<https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>

A quick teaser of what DAGs can now look like:

```
from airflow.decorators import dag, task
from airflow.utils.dates import days_ago

@dag(default_args={'owner': 'airflow'}, schedule_interval=None, start_date=days_ago(2))
def tutorial_taskflow_api_etl():
   @task
   def extract():
       return {"1001": 301.27, "1002": 433.21, "1003": 502.22}

   @task
   def transform(order_data_dict: dict) -> dict:
       total_order_value = 0

       for value in order_data_dict.values():
           total_order_value += value

       return {"total_order_value": total_order_value}

   @task()
   def load(total_order_value: float):

       print("Total order value is: %.2f" % total_order_value)

   order_data = extract()
   order_summary = transform(order_data)
   load(order_summary["total_order_value"])
tutorial_etl_dag = tutorial_taskflow_api_etl()
```

Fully specified REST API (AIP-32)

We now have a fully supported, no-longer-experimental API with a comprehensive OpenAPI specification

Read more here:

REST API Documentation<http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>.

Massive Scheduler performance improvements

As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we significantly improved the performance of the Airflow Scheduler. It now starts tasks much, MUCH quicker.

Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast<https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple check the numbers as we donā€™t quite believe them at first!)

Scheduler is now HA compatible (AIP-15)

Itā€™s now possible and supported to run more than a single scheduler instance. This is super useful for both resiliency (in case a scheduler goes down) and scheduling performance.

To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, and MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).

Thereā€™s no config or other set up required to run more than one schedulerā€”just start up a scheduler somewhere else (ensuring it has access to the DAG files) and it will cooperate with your existing schedulers through the database.

For more information, read the Scheduler HA documentation<http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>.

Task Groups (AIP-34)

SubDAGs were commonly used for grouping tasks in the UI, but they had many drawbacks in their execution behaviour (primarirly that they only executed a single task in parallel!) To improve this experience, weā€™ve introduced ā€œTask Groupsā€: a method for organizing tasks which provides the same grouping behaviour as a subdag without any of the execution-time drawbacks.

SubDAGs will still work for now, but we think that any previous use of SubDAGs can now be replaced with task groups. If you find an example where this isnā€™t the case, please let us know by opening an issue on GitHub

For more information, check out the Task Group documentation<http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>.

Refreshed UI

Weā€™ve given the Airflow UI a visual refresh and updated some of the styling. Check out the UI section of the docs<http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for screenshots.

We have also added an option to auto-refresh task states in Graph View so you no longer need to continuously press the refresh button :).

## Smart Sensors for reduced load from sensors (AIP-17)

If you make heavy use of sensors in your Airflow cluster, you might find that sensor execution takes up a significant proportion of your cluster even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called ā€œSmart Sensorsā€.

This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is ā€œstableā€/usable, but we reserve the right to make backwards incompatible changes to it in a future release (if we have to. Weā€™ll try very hard not to!)

Read more about it in the Smart Sensors documentation<https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>.

Simplified KubernetesExecutor

For Airflow 2.0, we have re-architected the KubernetesExecutor in a fashion that is simultaneously faster, easier to understand, and more flexible for Airflow users. Users will now be able to access the full Kubernetes API to create a .yaml pod_template_file instead of specifying parameters in their airflow.cfg.

We have also replaced the executor_config dictionary with the pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1 setting override. These changes have removed over three thousand lines of code from the KubernetesExecutor, which makes it run faster and creates fewer potential errors.

Read more here:

Docs on pod_template_file<https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
Docs on pod_override<https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>

Airflow core and providers: Splitting Airflow into 60+ packages

Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve split Airflow into core and 61 (for now) provider packages. Each provider package is for either a particular external service (Google, Amazon, Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol (HTTP/FTP). Now you can create a custom Airflow installation from ā€œbuildingā€ blocks and choose only what you need, plus add whatever other requirements you might have. Some of the common providers are installed automatically (ftp, http, imap, sqlite) as they are commonly used. Other providers are automatically installed when you choose appropriate extras when installing Airflow.

The provider architecture should make it much easier to get a fully customized, yet consistent runtime with the right set of Python dependencies.

But thatā€™s not all: you can write your own custom providers and add things like custom connection types, customizations of the Connection Forms, and extra links to your operators in a manageable way. You can build your own provider and install it as a Python package and have your customizations visible right in the Airflow UI.

Our very own Jarek Potiuk has written about providers in much more detail<https://www.polidea.com/blog/airflow-2-providers/> on the Polidea blog.

Docs on the providers concept and writing custom providers<http://airflow.apache.org/docs/apache-airflow-providers/>
Docs on the all providers packages available<http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>

Security

As part of Airflow 2.0 effort, there has been a conscious focus on Security and reducing areas of exposure. This is represented across different functional areas in different forms. For example, in the new REST API, all operations now require authorization. Similarly, in the configuration settings, the Fernet key is now required to be specified.

Configuration

Configuration in the form of the airflow.cfg file has been rationalized further in distinct sections, specifically around ā€œcoreā€. Additionally, a significant amount of configuration options have been deprecated or moved to individual component-specific configuration files, such as the pod-template-file for Kubernetes execution-related configuration.

Thanks to all of you

Weā€™ve tried to make as few breaking changes as possible and to provide deprecation path in the code, especially in the case of anything called in the DAG. That said, please read throughUPDATING.md to check what might affect you. For example: r We re-organized the layout of operators (they now all live under airflow.providers.*) but the old names should continue to work - youā€™ll just notice a lot of DeprecationWarnings that need to be fixed up.

Thank you so much to all the contributors who got us to this point, in no particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang, James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others who keep making Airflow better for everyone.

=============================================================================== 
Please access the attached hyperlink for an important electronic communications disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
=============================================================================== 

Re: Apache Airflow 2.0.0 is released!

Posted by Kamil Olszewski <ka...@polidea.com>.
This is huge! Congratulations to everyone involved!

On Sat, Dec 19, 2020 at 2:05 PM Tobiasz Kędzierski <
tobiasz.kedzierski@polidea.com> wrote:

> Congrats and thank you to everyone that made this happen!
>
> On Fri, Dec 18, 2020 at 3:37 PM MONTMORY Alain <
> alain.montmory@thalesgroup.com> wrote:
>
>> Thanks to all for this great Job. It is a nice gift J
>>
>>
>>
>> *De :* Ash Berlin-Taylor <as...@apache.org>
>> *EnvoyƩ :* jeudi 17 dƩcembre 2020 18:36
>> *ƀ :* users@airflow.apache.org
>> *Cc :* announce@apache.org; dev@airflow.apache.org
>> *Objet :* Apache Airflow 2.0.0 is released!
>>
>>
>>
>> I am proud to announce that Apache Airflow 2.0.0 has been released.
>>
>>
>>
>> The source release, as well as the binary "wheel" release (no sdist this
>> time), are available here
>>
>>
>>
>> We also made this version available on PyPi for convenience (`pip install
>> apache-airflow`):
>>
>>
>>
>> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>>
>>
>>
>> The documentation is available on:
>>
>> https://airflow.apache.org/
>>
>> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>>
>>
>>
>> Docker images will be available shortly -- check out
>> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
>> for it to appear
>>
>>
>>
>>
>>
>> The full changelog is about 3,000 lines long (already excluding
>> everything backported to 1.10), so for now Iā€™ll simply share some of the
>> major features in 2.0.0 compared to 1.10.14:
>>
>>
>>
>> *A new way of writing dags: the TaskFlow API (AIP-31)*
>>
>>
>>
>> (Known in 2.0.0alphas as Functional DAGs.)
>>
>>
>>
>> DAGs are now much much nicer to author especially when using
>> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
>> use
>>
>>
>>
>> Read more here:
>>
>>
>>
>> TaskFlow API Tutorial
>> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
>>
>> TaskFlow API Documentation
>> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>>
>>
>>
>> A quick teaser of what DAGs can now look like:
>>
>>
>>
>> ```
>>
>> from airflow.decorators import dag, task
>> from airflow.utils.dates import days_ago
>>
>> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
>> start_date=days_ago(2))
>> def tutorial_taskflow_api_etl():
>>    @task
>>    def extract():
>>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>>
>>    @task
>>    def transform(order_data_dict: dict) -> dict:
>>        total_order_value = 0
>>
>>        for value in order_data_dict.values():
>>            total_order_value += value
>>
>>        return {"total_order_value": total_order_value}
>>
>>    @task()
>>    def load(total_order_value: float):
>>
>>        print("Total order value is: %.2f" % total_order_value)
>>
>>    order_data = extract()
>>    order_summary = transform(order_data)
>>    load(order_summary["total_order_value"])
>>
>> tutorial_etl_dag = tutorial_taskflow_api_etl()
>>
>> ```
>>
>>
>>
>> *Fully specified REST API (AIP-32)*
>>
>>
>>
>> We now have a fully supported, no-longer-experimental API with a
>> comprehensive OpenAPI specification
>>
>>
>>
>> Read more here:
>>
>>
>>
>> REST API Documentation
>> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
>> .
>>
>>
>>
>> *Massive Scheduler performance improvements*
>>
>>
>>
>> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we
>> significantly improved the performance of the Airflow Scheduler. It now
>> starts tasks much, MUCH quicker.
>>
>>
>>
>> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
>> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple
>> check the numbers as we donā€™t quite believe them at first!)
>>
>>
>>
>> *Scheduler is now HA compatible (AIP-15)*
>>
>>
>>
>> Itā€™s now possible and supported to run more than a single scheduler
>> instance. This is super useful for both resiliency (in case a scheduler
>> goes down) and scheduling performance.
>>
>>
>>
>> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5,
>> and MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>>
>>
>>
>> Thereā€™s no config or other set up required to run more than one
>> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
>> to the DAG files) and it will cooperate with your existing schedulers
>> through the database.
>>
>>
>>
>> For more information, read the Scheduler HA documentation
>> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
>> .
>>
>>
>>
>> *Task Groups (AIP-34)*
>>
>>
>>
>> SubDAGs were commonly used for grouping tasks in the UI, but they had
>> many drawbacks in their execution behaviour (primarirly that they only
>> executed a single task in parallel!) To improve this experience, weā€™ve
>> introduced ā€œTask Groupsā€: a method for organizing tasks which provides the
>> same grouping behaviour as a subdag without any of the execution-time
>> drawbacks.
>>
>>
>>
>> SubDAGs will still work for now, but we think that any previous use of
>> SubDAGs can now be replaced with task groups. If you find an example where
>> this isnā€™t the case, please let us know by opening an issue on GitHub
>>
>>
>>
>> For more information, check out the Task Group documentation
>> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
>> .
>>
>>
>>
>> *Refreshed UI*
>>
>>
>>
>> Weā€™ve given the Airflow UI a visual refresh and updated some of the
>> styling. Check out the UI section of the docs
>> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for screenshots.
>>
>>
>>
>> We have also added an option to auto-refresh task states in Graph View so
>> you no longer need to continuously press the refresh button :).
>>
>>
>>
>> ## Smart Sensors for reduced load from sensors (AIP-17)
>>
>>
>>
>> If you make heavy use of sensors in your Airflow cluster, you might find
>> that sensor execution takes up a significant proportion of your cluster
>> even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called
>> ā€œSmart Sensorsā€.
>>
>>
>>
>> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is
>> ā€œstableā€/usable, but we reserve the right to make backwards incompatible
>> changes to it in a future release (if we have to. Weā€™ll try very hard not
>> to!)
>>
>>
>>
>> Read more about it in the Smart Sensors documentation
>> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>
>> .
>>
>>
>>
>> *Simplified KubernetesExecutor*
>>
>>
>>
>> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
>> fashion that is simultaneously faster, easier to understand, and more
>> flexible for Airflow users. Users will now be able to access the full
>> Kubernetes API to create a .yaml pod_template_file instead of specifying
>> parameters in their airflow.cfg.
>>
>>
>>
>> We have also replaced the executor_config dictionary with the
>> pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1
>> setting override. These changes have removed over three thousand lines of
>> code from the KubernetesExecutor, which makes it run faster and creates
>> fewer potential errors.
>>
>>
>>
>> Read more here:
>>
>>
>>
>> Docs on pod_template_file
>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
>>
>> Docs on pod_override
>> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>>
>>
>>
>> *Airflow core and providers: Splitting Airflow into 60+ packages*
>>
>>
>>
>> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
>> split Airflow into core and 61 (for now) provider packages. Each provider
>> package is for either a particular external service (Google, Amazon,
>> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
>> (HTTP/FTP). Now you can create a custom Airflow installation from
>> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
>> requirements you might have. Some of the common providers are installed
>> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
>> providers are automatically installed when you choose appropriate extras
>> when installing Airflow.
>>
>>
>>
>> The provider architecture should make it much easier to get a fully
>> customized, yet consistent runtime with the right set of Python
>> dependencies.
>>
>>
>>
>> But thatā€™s not all: you can write your own custom providers and add
>> things like custom connection types, customizations of the Connection
>> Forms, and extra links to your operators in a manageable way. You can build
>> your own provider and install it as a Python package and have your
>> customizations visible right in the Airflow UI.
>>
>>
>>
>> Our very own Jarek Potiuk has written about providers in much more detail
>> <https://www.polidea.com/blog/airflow-2-providers/> on the Polidea blog.
>>
>>
>>
>> Docs on the providers concept and writing custom providers
>> <http://airflow.apache.org/docs/apache-airflow-providers/>
>>
>> Docs on the all providers packages available
>> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>>
>>
>>
>> *Security*
>>
>>
>>
>> As part of Airflow 2.0 effort, there has been a conscious focus on
>> Security and reducing areas of exposure. This is represented across
>> different functional areas in different forms. For example, in the new REST
>> API, all operations now require authorization. Similarly, in the
>> configuration settings, the Fernet key is now required to be specified.
>>
>>
>>
>> *Configuration*
>>
>>
>>
>> Configuration in the form of the airflow.cfg file has been rationalized
>> further in distinct sections, specifically around ā€œcoreā€. Additionally, a
>> significant amount of configuration options have been deprecated or moved
>> to individual component-specific configuration files, such as the
>> pod-template-file for Kubernetes execution-related configuration.
>>
>>
>>
>> *Thanks to all of you*
>>
>>
>>
>> Weā€™ve tried to make as few breaking changes as possible and to provide
>> deprecation path in the code, especially in the case of anything called in
>> the DAG. That said, please read throughUPDATING.md to check what might
>> affect you. For example: r We re-organized the layout of operators (they
>> now all live under airflow.providers.*) but the old names should continue
>> to work - youā€™ll just notice a lot of DeprecationWarnings that need to be
>> fixed up.
>>
>>
>>
>> Thank you so much to all the contributors who got us to this point, in no
>> particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
>> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
>> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
>> who keep making Airflow better for everyone.
>>
>
>
> --
>


-- 

Kamil Olszewski
Polidea <https://www.polidea.com> | Software Engineer

M: +48 503 361 783
E: kamil.olszewski@polidea.com

Unique Tech
Check out our projects! <https://www.polidea.com/our-work>

Re: Apache Airflow 2.0.0 is released!

Posted by Tobiasz Kędzierski <to...@polidea.com>.
Congrats and thank you to everyone that made this happen!

On Fri, Dec 18, 2020 at 3:37 PM MONTMORY Alain <
alain.montmory@thalesgroup.com> wrote:

> Thanks to all for this great Job. It is a nice gift J
>
>
>
> *De :* Ash Berlin-Taylor <as...@apache.org>
> *EnvoyƩ :* jeudi 17 dƩcembre 2020 18:36
> *ƀ :* users@airflow.apache.org
> *Cc :* announce@apache.org; dev@airflow.apache.org
> *Objet :* Apache Airflow 2.0.0 is released!
>
>
>
> I am proud to announce that Apache Airflow 2.0.0 has been released.
>
>
>
> The source release, as well as the binary "wheel" release (no sdist this
> time), are available here
>
>
>
> We also made this version available on PyPi for convenience (`pip install
> apache-airflow`):
>
>
>
> šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0
>
>
>
> The documentation is available on:
>
> https://airflow.apache.org/
>
> šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/
>
>
>
> Docker images will be available shortly -- check out
> https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0
> for it to appear
>
>
>
>
>
> The full changelog is about 3,000 lines long (already excluding everything
> backported to 1.10), so for now Iā€™ll simply share some of the major
> features in 2.0.0 compared to 1.10.14:
>
>
>
> *A new way of writing dags: the TaskFlow API (AIP-31)*
>
>
>
> (Known in 2.0.0alphas as Functional DAGs.)
>
>
>
> DAGs are now much much nicer to author especially when using
> PythonOperator. Dependencies are handled more clearly and XCom is nicer to
> use
>
>
>
> Read more here:
>
>
>
> TaskFlow API Tutorial
> <http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
>
> TaskFlow API Documentation
> <https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>
>
>
>
> A quick teaser of what DAGs can now look like:
>
>
>
> ```
>
> from airflow.decorators import dag, task
> from airflow.utils.dates import days_ago
>
> @dag(default_args={'owner': 'airflow'}, schedule_interval=None,
> start_date=days_ago(2))
> def tutorial_taskflow_api_etl():
>    @task
>    def extract():
>        return {"1001": 301.27, "1002": 433.21, "1003": 502.22}
>
>    @task
>    def transform(order_data_dict: dict) -> dict:
>        total_order_value = 0
>
>        for value in order_data_dict.values():
>            total_order_value += value
>
>        return {"total_order_value": total_order_value}
>
>    @task()
>    def load(total_order_value: float):
>
>        print("Total order value is: %.2f" % total_order_value)
>
>    order_data = extract()
>    order_summary = transform(order_data)
>    load(order_summary["total_order_value"])
>
> tutorial_etl_dag = tutorial_taskflow_api_etl()
>
> ```
>
>
>
> *Fully specified REST API (AIP-32)*
>
>
>
> We now have a fully supported, no-longer-experimental API with a
> comprehensive OpenAPI specification
>
>
>
> Read more here:
>
>
>
> REST API Documentation
> <http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>
> .
>
>
>
> *Massive Scheduler performance improvements*
>
>
>
> As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we
> significantly improved the performance of the Airflow Scheduler. It now
> starts tasks much, MUCH quicker.
>
>
>
> Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast
> <https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple
> check the numbers as we donā€™t quite believe them at first!)
>
>
>
> *Scheduler is now HA compatible (AIP-15)*
>
>
>
> Itā€™s now possible and supported to run more than a single scheduler
> instance. This is super useful for both resiliency (in case a scheduler
> goes down) and scheduling performance.
>
>
>
> To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, and
> MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).
>
>
>
> Thereā€™s no config or other set up required to run more than one
> schedulerā€”just start up a scheduler somewhere else (ensuring it has access
> to the DAG files) and it will cooperate with your existing schedulers
> through the database.
>
>
>
> For more information, read the Scheduler HA documentation
> <http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>
> .
>
>
>
> *Task Groups (AIP-34)*
>
>
>
> SubDAGs were commonly used for grouping tasks in the UI, but they had many
> drawbacks in their execution behaviour (primarirly that they only executed
> a single task in parallel!) To improve this experience, weā€™ve introduced
> ā€œTask Groupsā€: a method for organizing tasks which provides the same
> grouping behaviour as a subdag without any of the execution-time drawbacks.
>
>
>
> SubDAGs will still work for now, but we think that any previous use of
> SubDAGs can now be replaced with task groups. If you find an example where
> this isnā€™t the case, please let us know by opening an issue on GitHub
>
>
>
> For more information, check out the Task Group documentation
> <http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>
> .
>
>
>
> *Refreshed UI*
>
>
>
> Weā€™ve given the Airflow UI a visual refresh and updated some of the
> styling. Check out the UI section of the docs
> <http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for screenshots.
>
>
>
> We have also added an option to auto-refresh task states in Graph View so
> you no longer need to continuously press the refresh button :).
>
>
>
> ## Smart Sensors for reduced load from sensors (AIP-17)
>
>
>
> If you make heavy use of sensors in your Airflow cluster, you might find
> that sensor execution takes up a significant proportion of your cluster
> even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called
> ā€œSmart Sensorsā€.
>
>
>
> This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is
> ā€œstableā€/usable, but we reserve the right to make backwards incompatible
> changes to it in a future release (if we have to. Weā€™ll try very hard not
> to!)
>
>
>
> Read more about it in the Smart Sensors documentation
> <https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>.
>
>
>
> *Simplified KubernetesExecutor*
>
>
>
> For Airflow 2.0, we have re-architected the KubernetesExecutor in a
> fashion that is simultaneously faster, easier to understand, and more
> flexible for Airflow users. Users will now be able to access the full
> Kubernetes API to create a .yaml pod_template_file instead of specifying
> parameters in their airflow.cfg.
>
>
>
> We have also replaced the executor_config dictionary with the pod_override
> parameter, which takes a Kubernetes V1Pod object for a 1:1 setting
> override. These changes have removed over three thousand lines of code from
> the KubernetesExecutor, which makes it run faster and creates fewer
> potential errors.
>
>
>
> Read more here:
>
>
>
> Docs on pod_template_file
> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
>
> Docs on pod_override
> <https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>
>
>
>
> *Airflow core and providers: Splitting Airflow into 60+ packages*
>
>
>
> Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve
> split Airflow into core and 61 (for now) provider packages. Each provider
> package is for either a particular external service (Google, Amazon,
> Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol
> (HTTP/FTP). Now you can create a custom Airflow installation from
> ā€œbuildingā€ blocks and choose only what you need, plus add whatever other
> requirements you might have. Some of the common providers are installed
> automatically (ftp, http, imap, sqlite) as they are commonly used. Other
> providers are automatically installed when you choose appropriate extras
> when installing Airflow.
>
>
>
> The provider architecture should make it much easier to get a fully
> customized, yet consistent runtime with the right set of Python
> dependencies.
>
>
>
> But thatā€™s not all: you can write your own custom providers and add things
> like custom connection types, customizations of the Connection Forms, and
> extra links to your operators in a manageable way. You can build your own
> provider and install it as a Python package and have your customizations
> visible right in the Airflow UI.
>
>
>
> Our very own Jarek Potiuk has written about providers in much more detail
> <https://www.polidea.com/blog/airflow-2-providers/> on the Polidea blog.
>
>
>
> Docs on the providers concept and writing custom providers
> <http://airflow.apache.org/docs/apache-airflow-providers/>
>
> Docs on the all providers packages available
> <http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>
>
>
>
> *Security*
>
>
>
> As part of Airflow 2.0 effort, there has been a conscious focus on
> Security and reducing areas of exposure. This is represented across
> different functional areas in different forms. For example, in the new REST
> API, all operations now require authorization. Similarly, in the
> configuration settings, the Fernet key is now required to be specified.
>
>
>
> *Configuration*
>
>
>
> Configuration in the form of the airflow.cfg file has been rationalized
> further in distinct sections, specifically around ā€œcoreā€. Additionally, a
> significant amount of configuration options have been deprecated or moved
> to individual component-specific configuration files, such as the
> pod-template-file for Kubernetes execution-related configuration.
>
>
>
> *Thanks to all of you*
>
>
>
> Weā€™ve tried to make as few breaking changes as possible and to provide
> deprecation path in the code, especially in the case of anything called in
> the DAG. That said, please read throughUPDATING.md to check what might
> affect you. For example: r We re-organized the layout of operators (they
> now all live under airflow.providers.*) but the old names should continue
> to work - youā€™ll just notice a lot of DeprecationWarnings that need to be
> fixed up.
>
>
>
> Thank you so much to all the contributors who got us to this point, in no
> particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek
> Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang,
> James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others
> who keep making Airflow better for everyone.
>


--

RE: Apache Airflow 2.0.0 is released!

Posted by MONTMORY Alain <al...@thalesgroup.com>.
Thanks to all for this great Job. It is a nice gift ā˜ŗ

De : Ash Berlin-Taylor <as...@apache.org>
EnvoyƩ : jeudi 17 dƩcembre 2020 18:36
ƀ : users@airflow.apache.org
Cc : announce@apache.org; dev@airflow.apache.org
Objet : Apache Airflow 2.0.0 is released!

I am proud to announce that Apache Airflow 2.0.0 has been released.

The source release, as well as the binary "wheel" release (no sdist this time), are available here

We also made this version available on PyPi for convenience (`pip install apache-airflow`):

šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0

The documentation is available on:
https://airflow.apache.org/
šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/

Docker images will be available shortly -- check out https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0 for it to appear


The full changelog is about 3,000 lines long (already excluding everything backported to 1.10), so for now Iā€™ll simply share some of the major features in 2.0.0 compared to 1.10.14:

A new way of writing dags: the TaskFlow API (AIP-31)

(Known in 2.0.0alphas as Functional DAGs.)

DAGs are now much much nicer to author especially when using PythonOperator. Dependencies are handled more clearly and XCom is nicer to use

Read more here:

TaskFlow API Tutorial<http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
TaskFlow API Documentation<https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>

A quick teaser of what DAGs can now look like:

```
from airflow.decorators import dag, task
from airflow.utils.dates import days_ago

@dag(default_args={'owner': 'airflow'}, schedule_interval=None, start_date=days_ago(2))
def tutorial_taskflow_api_etl():
   @task
   def extract():
       return {"1001": 301.27, "1002": 433.21, "1003": 502.22}

   @task
   def transform(order_data_dict: dict) -> dict:
       total_order_value = 0

       for value in order_data_dict.values():
           total_order_value += value

       return {"total_order_value": total_order_value}

   @task()
   def load(total_order_value: float):

       print("Total order value is: %.2f" % total_order_value)

   order_data = extract()
   order_summary = transform(order_data)
   load(order_summary["total_order_value"])
tutorial_etl_dag = tutorial_taskflow_api_etl()
```

Fully specified REST API (AIP-32)

We now have a fully supported, no-longer-experimental API with a comprehensive OpenAPI specification

Read more here:

REST API Documentation<http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>.

Massive Scheduler performance improvements

As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we significantly improved the performance of the Airflow Scheduler. It now starts tasks much, MUCH quicker.

Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast<https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple check the numbers as we donā€™t quite believe them at first!)

Scheduler is now HA compatible (AIP-15)

Itā€™s now possible and supported to run more than a single scheduler instance. This is super useful for both resiliency (in case a scheduler goes down) and scheduling performance.

To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, and MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).

Thereā€™s no config or other set up required to run more than one schedulerā€”just start up a scheduler somewhere else (ensuring it has access to the DAG files) and it will cooperate with your existing schedulers through the database.

For more information, read the Scheduler HA documentation<http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>.

Task Groups (AIP-34)

SubDAGs were commonly used for grouping tasks in the UI, but they had many drawbacks in their execution behaviour (primarirly that they only executed a single task in parallel!) To improve this experience, weā€™ve introduced ā€œTask Groupsā€: a method for organizing tasks which provides the same grouping behaviour as a subdag without any of the execution-time drawbacks.

SubDAGs will still work for now, but we think that any previous use of SubDAGs can now be replaced with task groups. If you find an example where this isnā€™t the case, please let us know by opening an issue on GitHub

For more information, check out the Task Group documentation<http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>.

Refreshed UI

Weā€™ve given the Airflow UI a visual refresh and updated some of the styling. Check out the UI section of the docs<http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for screenshots.

We have also added an option to auto-refresh task states in Graph View so you no longer need to continuously press the refresh button :).

## Smart Sensors for reduced load from sensors (AIP-17)

If you make heavy use of sensors in your Airflow cluster, you might find that sensor execution takes up a significant proportion of your cluster even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called ā€œSmart Sensorsā€.

This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is ā€œstableā€/usable, but we reserve the right to make backwards incompatible changes to it in a future release (if we have to. Weā€™ll try very hard not to!)

Read more about it in the Smart Sensors documentation<https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>.

Simplified KubernetesExecutor

For Airflow 2.0, we have re-architected the KubernetesExecutor in a fashion that is simultaneously faster, easier to understand, and more flexible for Airflow users. Users will now be able to access the full Kubernetes API to create a .yaml pod_template_file instead of specifying parameters in their airflow.cfg.

We have also replaced the executor_config dictionary with the pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1 setting override. These changes have removed over three thousand lines of code from the KubernetesExecutor, which makes it run faster and creates fewer potential errors.

Read more here:

Docs on pod_template_file<https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
Docs on pod_override<https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>

Airflow core and providers: Splitting Airflow into 60+ packages

Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve split Airflow into core and 61 (for now) provider packages. Each provider package is for either a particular external service (Google, Amazon, Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol (HTTP/FTP). Now you can create a custom Airflow installation from ā€œbuildingā€ blocks and choose only what you need, plus add whatever other requirements you might have. Some of the common providers are installed automatically (ftp, http, imap, sqlite) as they are commonly used. Other providers are automatically installed when you choose appropriate extras when installing Airflow.

The provider architecture should make it much easier to get a fully customized, yet consistent runtime with the right set of Python dependencies.

But thatā€™s not all: you can write your own custom providers and add things like custom connection types, customizations of the Connection Forms, and extra links to your operators in a manageable way. You can build your own provider and install it as a Python package and have your customizations visible right in the Airflow UI.

Our very own Jarek Potiuk has written about providers in much more detail<https://www.polidea.com/blog/airflow-2-providers/> on the Polidea blog.

Docs on the providers concept and writing custom providers<http://airflow.apache.org/docs/apache-airflow-providers/>
Docs on the all providers packages available<http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>

Security

As part of Airflow 2.0 effort, there has been a conscious focus on Security and reducing areas of exposure. This is represented across different functional areas in different forms. For example, in the new REST API, all operations now require authorization. Similarly, in the configuration settings, the Fernet key is now required to be specified.

Configuration

Configuration in the form of the airflow.cfg file has been rationalized further in distinct sections, specifically around ā€œcoreā€. Additionally, a significant amount of configuration options have been deprecated or moved to individual component-specific configuration files, such as the pod-template-file for Kubernetes execution-related configuration.

Thanks to all of you

Weā€™ve tried to make as few breaking changes as possible and to provide deprecation path in the code, especially in the case of anything called in the DAG. That said, please read throughUPDATING.md to check what might affect you. For example: r We re-organized the layout of operators (they now all live under airflow.providers.*) but the old names should continue to work - youā€™ll just notice a lot of DeprecationWarnings that need to be fixed up.

Thank you so much to all the contributors who got us to this point, in no particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang, James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others who keep making Airflow better for everyone.

RE: Apache Airflow 2.0.0 is released!

Posted by MONTMORY Alain <al...@thalesgroup.com>.
Thanks to all for this great Job. It is a nice gift ā˜ŗ

De : Ash Berlin-Taylor <as...@apache.org>
EnvoyƩ : jeudi 17 dƩcembre 2020 18:36
ƀ : users@airflow.apache.org
Cc : announce@apache.org; dev@airflow.apache.org
Objet : Apache Airflow 2.0.0 is released!

I am proud to announce that Apache Airflow 2.0.0 has been released.

The source release, as well as the binary "wheel" release (no sdist this time), are available here

We also made this version available on PyPi for convenience (`pip install apache-airflow`):

šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0

The documentation is available on:
https://airflow.apache.org/
šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/

Docker images will be available shortly -- check out https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0 for it to appear


The full changelog is about 3,000 lines long (already excluding everything backported to 1.10), so for now Iā€™ll simply share some of the major features in 2.0.0 compared to 1.10.14:

A new way of writing dags: the TaskFlow API (AIP-31)

(Known in 2.0.0alphas as Functional DAGs.)

DAGs are now much much nicer to author especially when using PythonOperator. Dependencies are handled more clearly and XCom is nicer to use

Read more here:

TaskFlow API Tutorial<http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
TaskFlow API Documentation<https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>

A quick teaser of what DAGs can now look like:

```
from airflow.decorators import dag, task
from airflow.utils.dates import days_ago

@dag(default_args={'owner': 'airflow'}, schedule_interval=None, start_date=days_ago(2))
def tutorial_taskflow_api_etl():
   @task
   def extract():
       return {"1001": 301.27, "1002": 433.21, "1003": 502.22}

   @task
   def transform(order_data_dict: dict) -> dict:
       total_order_value = 0

       for value in order_data_dict.values():
           total_order_value += value

       return {"total_order_value": total_order_value}

   @task()
   def load(total_order_value: float):

       print("Total order value is: %.2f" % total_order_value)

   order_data = extract()
   order_summary = transform(order_data)
   load(order_summary["total_order_value"])
tutorial_etl_dag = tutorial_taskflow_api_etl()
```

Fully specified REST API (AIP-32)

We now have a fully supported, no-longer-experimental API with a comprehensive OpenAPI specification

Read more here:

REST API Documentation<http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>.

Massive Scheduler performance improvements

As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we significantly improved the performance of the Airflow Scheduler. It now starts tasks much, MUCH quicker.

Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast<https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple check the numbers as we donā€™t quite believe them at first!)

Scheduler is now HA compatible (AIP-15)

Itā€™s now possible and supported to run more than a single scheduler instance. This is super useful for both resiliency (in case a scheduler goes down) and scheduling performance.

To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, and MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).

Thereā€™s no config or other set up required to run more than one schedulerā€”just start up a scheduler somewhere else (ensuring it has access to the DAG files) and it will cooperate with your existing schedulers through the database.

For more information, read the Scheduler HA documentation<http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>.

Task Groups (AIP-34)

SubDAGs were commonly used for grouping tasks in the UI, but they had many drawbacks in their execution behaviour (primarirly that they only executed a single task in parallel!) To improve this experience, weā€™ve introduced ā€œTask Groupsā€: a method for organizing tasks which provides the same grouping behaviour as a subdag without any of the execution-time drawbacks.

SubDAGs will still work for now, but we think that any previous use of SubDAGs can now be replaced with task groups. If you find an example where this isnā€™t the case, please let us know by opening an issue on GitHub

For more information, check out the Task Group documentation<http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>.

Refreshed UI

Weā€™ve given the Airflow UI a visual refresh and updated some of the styling. Check out the UI section of the docs<http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for screenshots.

We have also added an option to auto-refresh task states in Graph View so you no longer need to continuously press the refresh button :).

## Smart Sensors for reduced load from sensors (AIP-17)

If you make heavy use of sensors in your Airflow cluster, you might find that sensor execution takes up a significant proportion of your cluster even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called ā€œSmart Sensorsā€.

This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is ā€œstableā€/usable, but we reserve the right to make backwards incompatible changes to it in a future release (if we have to. Weā€™ll try very hard not to!)

Read more about it in the Smart Sensors documentation<https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>.

Simplified KubernetesExecutor

For Airflow 2.0, we have re-architected the KubernetesExecutor in a fashion that is simultaneously faster, easier to understand, and more flexible for Airflow users. Users will now be able to access the full Kubernetes API to create a .yaml pod_template_file instead of specifying parameters in their airflow.cfg.

We have also replaced the executor_config dictionary with the pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1 setting override. These changes have removed over three thousand lines of code from the KubernetesExecutor, which makes it run faster and creates fewer potential errors.

Read more here:

Docs on pod_template_file<https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
Docs on pod_override<https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>

Airflow core and providers: Splitting Airflow into 60+ packages

Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve split Airflow into core and 61 (for now) provider packages. Each provider package is for either a particular external service (Google, Amazon, Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol (HTTP/FTP). Now you can create a custom Airflow installation from ā€œbuildingā€ blocks and choose only what you need, plus add whatever other requirements you might have. Some of the common providers are installed automatically (ftp, http, imap, sqlite) as they are commonly used. Other providers are automatically installed when you choose appropriate extras when installing Airflow.

The provider architecture should make it much easier to get a fully customized, yet consistent runtime with the right set of Python dependencies.

But thatā€™s not all: you can write your own custom providers and add things like custom connection types, customizations of the Connection Forms, and extra links to your operators in a manageable way. You can build your own provider and install it as a Python package and have your customizations visible right in the Airflow UI.

Our very own Jarek Potiuk has written about providers in much more detail<https://www.polidea.com/blog/airflow-2-providers/> on the Polidea blog.

Docs on the providers concept and writing custom providers<http://airflow.apache.org/docs/apache-airflow-providers/>
Docs on the all providers packages available<http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>

Security

As part of Airflow 2.0 effort, there has been a conscious focus on Security and reducing areas of exposure. This is represented across different functional areas in different forms. For example, in the new REST API, all operations now require authorization. Similarly, in the configuration settings, the Fernet key is now required to be specified.

Configuration

Configuration in the form of the airflow.cfg file has been rationalized further in distinct sections, specifically around ā€œcoreā€. Additionally, a significant amount of configuration options have been deprecated or moved to individual component-specific configuration files, such as the pod-template-file for Kubernetes execution-related configuration.

Thanks to all of you

Weā€™ve tried to make as few breaking changes as possible and to provide deprecation path in the code, especially in the case of anything called in the DAG. That said, please read throughUPDATING.md to check what might affect you. For example: r We re-organized the layout of operators (they now all live under airflow.providers.*) but the old names should continue to work - youā€™ll just notice a lot of DeprecationWarnings that need to be fixed up.

Thank you so much to all the contributors who got us to this point, in no particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang, James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others who keep making Airflow better for everyone.

RE: Apache Airflow 2.0.0 is released!

Posted by "Shaw, Damian P. " <da...@credit-suisse.com>.
Great news! Is there a single web page that highlights these major features as youā€™ve listed them?

Damian

From: Ash Berlin-Taylor <as...@apache.org>
Sent: Thursday, December 17, 2020 12:36
To: users@airflow.apache.org
Cc: announce@apache.org; dev@airflow.apache.org
Subject: Apache Airflow 2.0.0 is released!

I am proud to announce that Apache Airflow 2.0.0 has been released.

The source release, as well as the binary "wheel" release (no sdist this time), are available here

We also made this version available on PyPi for convenience (`pip install apache-airflow`):

šŸ“¦ PyPI: https://pypi.org/project/apache-airflow/2.0.0

The documentation is available on:
https://airflow.apache.org/
šŸ“š Docs: http://airflow.apache.org/docs/apache-airflow/2.0.0/

Docker images will be available shortly -- check out https://hub.docker.com/r/apache/airflow/tags?page=1&ordering=last_updated&name=2.0.0 for it to appear


The full changelog is about 3,000 lines long (already excluding everything backported to 1.10), so for now Iā€™ll simply share some of the major features in 2.0.0 compared to 1.10.14:

A new way of writing dags: the TaskFlow API (AIP-31)

(Known in 2.0.0alphas as Functional DAGs.)

DAGs are now much much nicer to author especially when using PythonOperator. Dependencies are handled more clearly and XCom is nicer to use

Read more here:

TaskFlow API Tutorial<http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html>
TaskFlow API Documentation<https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows>

A quick teaser of what DAGs can now look like:

```
from airflow.decorators import dag, task
from airflow.utils.dates import days_ago

@dag(default_args={'owner': 'airflow'}, schedule_interval=None, start_date=days_ago(2))
def tutorial_taskflow_api_etl():
   @task
   def extract():
       return {"1001": 301.27, "1002": 433.21, "1003": 502.22}

   @task
   def transform(order_data_dict: dict) -> dict:
       total_order_value = 0

       for value in order_data_dict.values():
           total_order_value += value

       return {"total_order_value": total_order_value}

   @task()
   def load(total_order_value: float):

       print("Total order value is: %.2f" % total_order_value)

   order_data = extract()
   order_summary = transform(order_data)
   load(order_summary["total_order_value"])
tutorial_etl_dag = tutorial_taskflow_api_etl()
```

Fully specified REST API (AIP-32)

We now have a fully supported, no-longer-experimental API with a comprehensive OpenAPI specification

Read more here:

REST API Documentation<http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html>.

Massive Scheduler performance improvements

As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we significantly improved the performance of the Airflow Scheduler. It now starts tasks much, MUCH quicker.

Over at Astronomer.io weā€™ve benchmarked the schedulerā€”itā€™s fast<https://www.astronomer.io/blog/airflow-2-scheduler> (we had to triple check the numbers as we donā€™t quite believe them at first!)

Scheduler is now HA compatible (AIP-15)

Itā€™s now possible and supported to run more than a single scheduler instance. This is super useful for both resiliency (in case a scheduler goes down) and scheduling performance.

To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, and MariaDB wonā€™t work with more than one scheduler Iā€™m afraid).

Thereā€™s no config or other set up required to run more than one schedulerā€”just start up a scheduler somewhere else (ensuring it has access to the DAG files) and it will cooperate with your existing schedulers through the database.

For more information, read the Scheduler HA documentation<http://airflowapache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler>.

Task Groups (AIP-34)

SubDAGs were commonly used for grouping tasks in the UI, but they had many drawbacks in their execution behaviour (primarirly that they only executed a single task in parallel!) To improve this experience, weā€™ve introduced ā€œTask Groupsā€: a method for organizing tasks which provides the same grouping behaviour as a subdag without any of the execution-time drawbacks.

SubDAGs will still work for now, but we think that any previous use of SubDAGs can now be replaced with task groups. If you find an example where this isnā€™t the case, please let us know by opening an issue on GitHub

For more information, check out the Task Group documentation<http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup>.

Refreshed UI

Weā€™ve given the Airflow UI a visual refresh and updated some of the styling. Check out the UI section of the docs<http://0.0.0.0:8000/docs/apache-airflow/stable/ui.html> for screenshots.

We have also added an option to auto-refresh task states in Graph View so you no longer need to continuously press the refresh button :).

## Smart Sensors for reduced load from sensors (AIP-17)

If you make heavy use of sensors in your Airflow cluster, you might find that sensor execution takes up a significant proportion of your cluster even with ā€œrescheduleā€ mode. To improve this, weā€™ve added a new mode called ā€œSmart Sensorsā€.

This feature is in ā€œearly-accessā€: itā€™s been well-tested by AirBnB and is ā€œstableā€/usable, but we reserve the right to make backwards incompatible changes to it in a future release (if we have to. Weā€™ll try very hard not to!)

Read more about it in the Smart Sensors documentation<https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html>.

Simplified KubernetesExecutor

For Airflow 2.0, we have re-architected the KubernetesExecutor in a fashion that is simultaneously faster, easier to understand, and more flexible for Airflow users. Users will now be able to access the full Kubernetes API to create a .yaml pod_template_file instead of specifying parameters in their airflow.cfg.

We have also replaced the executor_config dictionary with the pod_override parameter, which takes a Kubernetes V1Pod object for a 1:1 setting override. These changes have removed over three thousand lines of code from the KubernetesExecutor, which makes it run faster and creates fewer potential errors.

Read more here:

Docs on pod_template_file<https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file>
Docs on pod_override<https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override>

Airflow core and providers: Splitting Airflow into 60+ packages

Airflow 2.0 is not a monolithic ā€œone to rule them allā€ package. Weā€™ve split Airflow into core and 61 (for now) provider packages. Each provider package is for either a particular external service (Google, Amazon, Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol (HTTP/FTP). Now you can create a custom Airflow installation from ā€œbuildingā€ blocks and choose only what you need, plus add whatever other requirements you might have. Some of the common providers are installed automatically (ftp, http, imap, sqlite) as they are commonly used. Other providers are automatically installed when you choose appropriate extras when installing Airflow.

The provider architecture should make it much easier to get a fully customized, yet consistent runtime with the right set of Python dependencies.

But thatā€™s not all: you can write your own custom providers and add things like custom connection types, customizations of the Connection Forms, and extra links to your operators in a manageable way. You can build your own provider and install it as a Python package and have your customizations visible right in the Airflow UI.

Our very own Jarek Potiuk has written about providers in much more detail<https://www.polidea.com/blog/airflow-2-providers/> on the Polidea blog.

Docs on the providers concept and writing custom providers<http://airflow.apache.org/docs/apache-airflow-providers/>
Docs on the all providers packages available<http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html>

Security

As part of Airflow 2.0 effort, there has been a conscious focus on Security and reducing areas of exposure. This is represented across different functional areas in different forms. For example, in the new REST API, all operations now require authorization. Similarly, in the configuration settings, the Fernet key is now required to be specified.

Configuration

Configuration in the form of the airflow.cfg file has been rationalized further in distinct sections, specifically around ā€œcoreā€. Additionally, a significant amount of configuration options have been deprecated or moved to individual component-specific configuration files, such as the pod-template-file for Kubernetes execution-related configuration.

Thanks to all of you

Weā€™ve tried to make as few breaking changes as possible and to provide deprecation path in the code, especially in the case of anything called in the DAG. That said, please read throughUPDATING.md to check what might affect you. For example: r We re-organized the layout of operators (they now all live under airflow.providers.*) but the old names should continue to work - youā€™ll just notice a lot of DeprecationWarnings that need to be fixed up.

Thank you so much to all the contributors who got us to this point, in no particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang, James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others who keep making Airflow better for everyone.

=============================================================================== 
Please access the attached hyperlink for an important electronic communications disclaimer: 
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html 
===============================================================================