You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@airflow.apache.org by mi...@astronomer.io on 2022/03/31 16:39:42 UTC

What's new with Airflow | March 2022

Apache Airflow newsletter for anyone too busy to read the devlist.
View email in your browser <https://viewstripo.email/template/9e9300ba-bc20-4de4-b8da-7c15d2759339>
March 2022 

The Apache Airflow Newsletter 
Airflow Community News & Events

Introduction
This year’s Airflow Summit is fast approaching. The organizers received a great number of proposals – over 100. The CFP closed on March 14, but you can still register <https://www.crowdcast.io/e/airflowsummit2022/register> to attend the biggest Airflow event of the year. 

March was another eventful month for the project. Airflow hit 2k contributors!  🎉  A new Helm chart release (1.5.0 <https://airflow.apache.org/docs/helm-chart/1.5.0/installing-helm-chart-from-sources.html>) appeared, along with new versions of the Airflow Providers packages <https://airflow.apache.org/docs/apache-airflow-providers/installing-from-sources>. They can be installed via PyPI <https://airflow.apache.org/docs/apache-airflow-providers/installing-from-pypi>. Also, long-awaited support for Python 3.10 and robust support for development on MacOS M1 architecture are finally a reality. The relevant PRs (#22127 <https://github.com/apache/airflow/pull/22127> and #22050 <https://github.com/apache/airflow/pull/22050>) were merged on March 9 and March 21, respectively. 

The newsletter is moving to a new platform next month. Please subscribe to ensure you always receive the latest issue: http://eepurl.com/hXUA3r <http://eepurl.com/hXUA3r>.
Summit Stats & Facts

Free and online (with in-person events in many places) this May 23-27, the summit will feature not only tech talks but also deep dives, panels, lightning talks, workshops, and case studies from practitioners and data leaders across the global Airflow community. Sessions will be scheduled for different time zones to accommodate as many members as possible. This year’s session tracks are: Trends, Data Governance, Productionizing Machine Learning, Airflow Internals, Airflow in Enterprise, Community, and Open. 

A new kind of event will make this year’s summit unusual: local in-person meetups. Members in a number of cities will be able to attend in-person events, join watch sessions, and network. Twelve such events are planned for Amsterdam, Bengaluru, Lagos, London, Melbourne, New York City, Paris, Sao Paulo, Seattle, Sydney (tentative), Tel Aviv, Tokyo, and Warsaw! For more information, visit the summit website <https://airflowsummit.org/in-person-events/>.

From the organizers, we got a peek at how the conference is shaping up. Here's a bird's-eye view of the session breakdown by format, length, and track: 



 <https://www.crowdcast.io/e/airflowsummit2022/register>
🚀  Releases & Documentation Improvements 🚀
On March 11, we released Apache Airflow Helm chart 1.5.0.
📦 ArtifactHub: https://artifacthub.io/packages/helm/apache-airflow/airflow <https://artifacthub.io/packages/helm/apache-airflow/airflow>
📚 Docs: https://airflow.apache.org/docs/helm-chart/1.5.0/ <https://airflow.apache.org/docs/helm-chart/1.4.0/>
🛠️ Changelog: https://airflow.apache.org/docs/helm-chart/1.5.0/changelog.html <https://airflow.apache.org/docs/helm-chart/1.4.0/changelog.html>
🪶 Sources: https://airflow.apache.org/docs/helm-chart/1.5.0/installing-helm-chart-from-sources.html <https://airflow.apache.org/docs/helm-chart/1.4.0/installing-helm-chart-from-sources.html>
On March 10, we released new versions of the Airflow Providers packages.
📚 Docs: https://airflow.apache.org/docs/apache-airflow-providers/installing-from-pypi <https://airflow.apache.org/docs/apache-airflow-providers/installing-from-pypi>
🪶 Sources:  <https://airflow.apache.org/docs/apache-airflow-providers/installing-from-sources>https://airflow.apache.org/docs/apache-airflow-providers/installing-from-sources <https://airflow.apache.org/docs/apache-airflow-providers/installing-from-sources>
🗓️  Upcoming Events 🗓️
We can always use speakers for meetups. If you want to present your Airflow contribution, demo your project, or share your experience at our Community Meetups this year, let us know by filling out this form <http://bit.ly/3lhp1aR>. You could win some excellent swag! 
On April 5 at 2 pm ET, Astronomer’s Ross Turk, Senior Director of Community, and Michael Collado, Staff Software Engineer, will lead a webinar entitled “OpenLineage and Airflow: A Deeper Dive.” RSVP here <https://www.astronomer.io/events/webinars/openlineage-and-airflow-deeper-dive>.
On April 5, the Tel Aviv Airflow Meetup Group will hold a free, hybrid event at 6 pm in Tel Aviv (GMT +3). Register here <https://databand.ai/events/april-2022-apache-airflow-tlv-meetup/> to attend the Hebrew-language event.
There are only two months left until Airflow Summit. The CFP is closed, but you can still register <https://www.crowdcast.io/e/airflowsummit2022/register> to participate.
✔️  Recent Events ✔️
On March 25, Nisarg Shah gave a talk at the Python Web Conference entitled “Getting Started with Airflow for your Data Workflows.” <https://2022.pythonwebconf.com/presentations/getting-started-with-airflow-for-your-data-workflows> 
On March 23, Kenten Danas presented an “Introduction to Airflow” <https://2022.pythonwebconf.com/presentations/introduction-to-airflow> at the Python Web Conference.
On March 23, Julien Le Dem and Willy Lulciuc from the Observability & Lineage team at Astronomer gave a talk at Data Council Austin entitled “Data Lineage with Apache Airflow using OpenLineage.” <https://www.datacouncil.ai/talks/data-lineage-with-apache-airflow-using-openlineage?hsLang=en> During the session, Le Dem and Lulciuc announced that Datakin had recently joined Astronomer.
On March 22, Marc Lamberti, Head of Customer Training at Astronomer, and Viraj Parekh, Field CTO at Astronomer, gave a webinar entitled “Improve Your DAGs with Hidden Airflow Features.”You can register to watch it anytime here <https://www.astronomer.io/events/webinars/improve-your-dags-with-hidden-airflow-features>.
On March 16 at this month’s Apache Airflow Community Meetup, Software Engineer Sumeer Shukla gave a talk on “Data Migration Pipeline with Apache Airflow.” You can watch it on-demand here. <https://www.crowdcast.io/e/airflow-meetup-march/register>
On March 2, Martijn Beenker from Avenade gave a talk entitled “Scaling Up Apache Airflow to Enterprise Level” <https://www.dremio.com/subsurface/live/winter2022/speaker/martijn-beenker/> at Subsurface Live.
On February 28, Astronomer’s Michael Collado gave a talk <https://www.dremio.com/subsurface/live/winter2022/session/cross-platform-data-lineage-with-openlineage/> at Subsurface Live about using OpenLineage with Airflow and Apache Spark.
On February 28, the PyLadies BCN group in Barcelona hosted a meetup, “Introduction to Apache Airflow.” <https://www.meetup.com/PyLadies-BCN/events/283989295/>

#19857 Enable JSON serialization for secrets backend <https://github.com/apache/airflow/pull/19857>
Daniel <https://github.com/dstandish> says, “Until now, to store an Airflow connection in an environment variable or secrets backend you had to express the connection in an Airflow-specific URI format. (If you’ve done this you know how painful it can be.)  With this PR, we can now define connections using JSON, which is much more user-friendly.
“For me the idea first came to mind when in the course of reviewing a secrets backend PR I learned that our local filesystem secrets backend supported JSON. I didn’t work on it at the time, but later I had a use case at my company: using one creds store for both Airflow and a Jupyter notebook environment. So I implemented a JSON AWS SSM secrets backend, and it was so much nicer to work with. And then I knew as soon as I had a little extra time I wanted to contribute it, and not for just AWS SSM but all secrets backends.”

We’re eager to recognize the incredible people in the community for their work on Apache Airflow. This month we’re introducing Jed Cunningham (jedcunningham <https://github.com/jedcunningham>), Airflow PMC member and Senior Software Engineer on Astronomer’s OSS Airflow Engineering Team.

Where are you based? Denver, CO, USA
Why did you start contributing to the project? If I went through the effort to fix a bug or add a small feature, I wanted the whole community to benefit from my effort. I also didn’t want to maintain my own fork of Airflow.
What do you use it for? Currently I really only use Airflow when I’m working on Airflow :), but I’ve run Airflow for data teams who relied on it to orchestrate and have visibility into data-heavy jobs primarily using KubernetesPodOperator.
When was your first PR and what was it? I opened my first PR in August 2020 to show tracebacks of DAG import errors in the UI.
What was your latest? My latest PR added support for extraVolumeMounts in Flower in the Apache Airflow Helm Chart.
What do you like about working on OSS projects? I really enjoy seeing the output of a diverse group of overwhelmingly friendly people and being a part of the community. I wholeheartedly welcome others to get involved as well: grab a good first issue, and if you get stuck we are friendly in the #airflow-how-to-pr slack channel!
📢  Communications Digest 📢
One of the most popular Astronomer articles about Airflow, “7 Common Errors to Check when Debugging DAGS <https://www.astronomer.io/blog/7-common-errors-to-check-when-debugging-airflow-dag/>,” was updated with new information on March 29.
Towards Data Science published two blog posts about Airflow by Dario Radečić <https://medium.com/@radecicdario?source=post_page-----8f4e20bee7d----------------------------------->, CEO of Deep Digital Data: “Apache Airflow for Data Science – How to Work with REST APIs” <https://towardsdatascience.com/apache-airflow-for-data-science-how-to-work-with-rest-apis-8f4e20bee7d> and “Apache Airflow for Data Science – How to Upload Files to Amazon S3.” <https://towardsdatascience.com/apache-airflow-for-data-science-how-to-upload-files-to-amazon-s3-5bdf6fcb1cea>
On March 10, Steven Hillion and Ula Rydiger wrote a blog post entitled “Apache Airflow for Data and Analytics Leaders.” <https://www.astronomer.io/blog/apache-airflow-for-data-and-analytics-leaders> 
The New York Apache Airflow group held a meetup on March 9. Benji Lampel, Airflow Engineering Advocate at Astronomer, demoed how to use the GreatExpectationsOperator to add data quality checking to ETL/ELT pipelines. A recording of the session can be found here <https://photos.app.goo.gl/67wMsJBK4Y5rDsxB6>.
💻  In-development 💻
Airflow 2.2.5 <https://github.com/apache/airflow/milestone/50>: bug fixes
Airflow 2.3.0 <https://github.com/apache/airflow/milestone/36>: new features, including Dynamic Task Mapping
Production Docker Image <https://hub.docker.com/r/apache/airflow>: finally attaining “official” status with multi-platform support and simpler customization with Buildkit (coming in Airflow 2.3.0)
Airflow Helm Chart 1.6.0 <https://github.com/apache/airflow/milestone/51>: bug fixes and new features
For more info about upcoming releases and in-development projects, visit the current projects issue (#10176 <https://github.com/apache/airflow/issues/10176>) in the Airflow repo.
🖋  Reminder: Please subscribe! 🖋
The newsletter is moving to a new platform, Mailchimp, next month. Please subscribe to ensure you always receive the latest issue: http://eepurl.com/hXUA3r <http://eepurl.com/hXUA3r>.
That's all, Air-folks! See you in April 👋
Prepared by mschickensoup <https://github.com/mschickensoup>