You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/06/02 06:35:15 UTC

[GitHub] [airflow-site] mschickensoup commented on a change in pull request #268: adding a logo for sift use case

mschickensoup commented on a change in pull request #268:
URL: https://github.com/apache/airflow-site/pull/268#discussion_r433649947



##########
File path: landing-pages/site/content/en/use-cases/sift.md
##########
@@ -0,0 +1,32 @@
+---
+title: "Sift"
+linkTitle: "Sift"
+quote:
+    text: "Airflow helped us to define and organize our ML pipeline dependencies, and empowered us to introduce new, diverse batch processes at increasing scale."
+    author: "Handong Park"
+logo: "sift_logo.png"
+---
+
+##### What was the problem?
+
+At Sift, we’re constantly training machine learning models that feed into the core of Sift’s Digital Trust & Safety platform. The platform gives our customers a way to discern suspicious online behavior from trustworthy behavior, allowing our customers to protect their online transactions, maintain the integrity of their content platforms, and keep their users’ accounts secure. To make this possible, we’ve built model training pipelines that consist of hundreds of steps in MapReduce and Spark, with complex requirements between them. 
+
+When we built these workflows, we found that we needed a centralized way to organize the interactions between the many steps in each workflow. But before Airflow, we didn’t have an easy way to express those dependencies. And as we added steps to the workflows, it became increasingly difficult to coordinate their dependencies and keep ML experiments in sync.
+
+It soon became clear that we needed a way to orchestrate both the scheduled execution of our jobs and the dependencies between steps of not only a single workflow, but of multiple workflows. We needed a way to dynamically create several experimental ML workflows at once that could each have their own code, dependencies, and tasks. Additionally, we needed a way to be able to monitor the status of tasks, and re-run or restart tasks from any given point in a workflow with ease.
+
+##### How did Apache Airflow help to solve this problem?
+
+Airflow makes it easy to clearly define the interactions between various jobs, expanding the scope of what we can do in our model training pipelines. We now have the ability to schedule and coordinate all jobs while managing the dependencies between them using DAGs. Each of our main workflows, including our model training pipeline and our ETL pipelines, has its own DAG code that manages its tasks’ dependencies and the execution schedule for the pipeline. We even define dependencies between separate DAGs by using Airflow’s ExternalTaskSensor. This allows our DAGs to actually depend on each other and allows us to keep each one focused and compact in its scope. 

Review comment:
       ```suggestion
   Airflow makes it easy to clearly define the interactions between various jobs, expanding the scope of what we can do in our model training pipelines. We now have the ability to schedule and coordinate all jobs while managing the dependencies between them using DAGs. Each of our main workflows, including our model training pipeline and  ETL pipelines, has its own DAG code that manages its tasks’ dependencies and the execution schedule for the pipeline. We even define dependencies between separate DAGs by using Airflow’s ExternalTaskSensor. This allows our DAGs to actually depend on each other and keep each one of them focused and compact in its scope. 
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org