You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Dinesh Sharma <ds...@bandwidthx.com> on 2016/09/08 16:12:08 UTC

Airflow for business process workflows

Hi All,
I'm with BandwidthX, a wireless tech company in San Diego.
We're trying to have one workflow tool that can be used for both business
process workflows as well as data pipelines. I think Airflow can do that. I
also think that it will be a good case study for Airflow given that I see
people using it primarily for data pipelines.

We're starting with the business process workflows first wherein a user
action can lead to the scheduling of one-time tasks e.g. activate a
particular device on a particular day/time. This task may or may not have
dependencies. A subsequent user action could potentially change the date
time of the scheduled task or could potentially cancel the already
scheduled task.

I think Airflow can do that with *schedule_interval=once* and
*start_date=scheduled_date_time*; ideally if they can be passed in as
command line parameters. I made it work by writing a python script that
takes these params and generates the script with supplied start_date for
the DAG and puts that script in DAGs folder. I also added a dependent
cleanup task to this script that actually deletes .py and .pyc files of
dynamically generated the DAG.

Is there a better way to do it? Any resource that you can point me to?

PS
I'm already part of https://gitter.im/apache/incubator-airflow.

Thanks

-- 
Dinesh Sharma
BandwidthX
dsharma@bandwidthx.com
(760) 203-4955 Ext. 121

Re: Airflow for business process workflows

Posted by Gerard Toonstra <gt...@gmail.com>.
Dinesh,

Interesting use case. I'm not sure how this will work out for you
eventually compared to a specialized workflow tool,
but here are some considerations that you should make to evaluate your
chances of success:

A complex business workflow will at some point require some more complex
input from a user beyond a decision.
Airflow has no UI to do that for you, so there has to be something else
where these 'cases' are handled and the input
can be gathered and then merged into the main workflow.

If you allow users to individually run tasks like you describe and
delete/remove DAGs, it may become a headache pretty soon,
also because these DAGs are so short-lived.


Consider working with 'processing lists'  for such simple tasks. For
example set up a google sheet where you have pages
where people can enter device id's, then set up a single task that reads
from that sheet and does all TBD devices in one go.
What you win from this approach is that if people need to go back to a
specific device, you don't have to wade through
a complex interface, just find that line+date in the sheet you set up and
delete or modify the line. Undoubtedly there
will be cases where a DAG doesn't get deleted and admins need to jump in to
solve issues. With a google sheet, you
also get rudimentary access control where people can view things, but not
edit and you get dropdown lists (validation) for
particular fields of interest.

For complexer workflows, you could look at google forms (survey tool). That
allows you to send parameters on the URL that pre-populate
particular fields in that "survey" (your device id). Then the questions are
filled in and the "response" of that survey goes to another
google sheet yet again. From there, you should be able to direct airflow
towards that response sheet, pick up surveys that were not
yet processed and take more complex actions. The benefit of this approach
is that you maintain all history in one place in a format that's
easy to read. Through airflow, you can then generate URL's for devices to
be handled and send them through email to particular people.

Rgds,

Gerard



On Thu, Sep 8, 2016 at 6:12 PM, Dinesh Sharma <ds...@bandwidthx.com>
wrote:

> Hi All,
> I'm with BandwidthX, a wireless tech company in San Diego.
> We're trying to have one workflow tool that can be used for both business
> process workflows as well as data pipelines. I think Airflow can do that. I
> also think that it will be a good case study for Airflow given that I see
> people using it primarily for data pipelines.
>
> We're starting with the business process workflows first wherein a user
> action can lead to the scheduling of one-time tasks e.g. activate a
> particular device on a particular day/time. This task may or may not have
> dependencies. A subsequent user action could potentially change the date
> time of the scheduled task or could potentially cancel the already
> scheduled task.
>
> I think Airflow can do that with *schedule_interval=once* and
> *start_date=scheduled_date_time*; ideally if they can be passed in as
> command line parameters. I made it work by writing a python script that
> takes these params and generates the script with supplied start_date for
> the DAG and puts that script in DAGs folder. I also added a dependent
> cleanup task to this script that actually deletes .py and .pyc files of
> dynamically generated the DAG.
>
> Is there a better way to do it? Any resource that you can point me to?
>
> PS
> I'm already part of https://gitter.im/apache/incubator-airflow.
>
> Thanks
>
> --
> Dinesh Sharma
> BandwidthX
> dsharma@bandwidthx.com
> (760) 203-4955 Ext. 121
>