You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Dennis O'Brien <de...@dennisobrien.net> on 2018/01/30 17:13:47 UTC

best way to handle version upgrades of libraries used by tasks

Hi All,

I have a number of jobs that use scikit-learn for scoring players.
Occasionally I need to upgrade scikit-learn to take advantage of some new
features.  We have a single conda environment that specifies all the
dependencies for Airflow as well as for all of our DAGs.  So currently
upgrading scikit-learn means upgrading it for all DAGs that use it, and
retraining all models for that version.  It becomes a very involved task
and I'm hoping to find a better way.

One option is to use BashOperator (or something that wraps BashOperator)
and have bash use a specific conda environment with that version of
scikit-learn.  While simple, I don't like the idea of limiting task input
to the command line.  Still, an option.

Another option is the DockerOperator.  But when I asked around at a
previous Airflow Meetup, I couldn't find anyone actually using it.  It also
adds some complexity to the build and deploy process in that now I have to
maintain docker images for all my environments.  Still, not ruling it out.

And the last option I can think of is just heterogeneous workers.  We are
migrating our Airflow infrastructure to AWS ECS (from EC2) and plan on
having support for separate worker clusters, so this could include workers
with different conda environments.  I assume as long as a few key packages
are identical between scheduler and worker instances (airflow, redis,
celery?) the rest can be whatever.

Has anyone faced this problem and have some advice?  Am I missing any
simpler options?  Any thoughts much appreciated.

thanks,
Dennis

Re: best way to handle version upgrades of libraries used by tasks

Posted by Gerard Toonstra <gt...@gmail.com>.
As long as the differences are in API methods and not a rearrangement of
the
package structure the latter option would work. This is because the
operators would
be imported by the scheduler, just not executed (and therefore perhaps not
call the
specific operator methods).

If you serialize the parameters into a json structure to a string, you can
simplify how data
is passed to the command line, which reduces the number of parameters you'd
have to pass.

You could look into the 'queue' parameter for a task, which forces the task
instance to be run
on a specific worker instead. Seen that?  Then you don't need to maintain
all different
versions of conda on all workers and can use API's to spin up/down those
specific workers
ahead of time.

Rgds,

G>


On Tue, Jan 30, 2018 at 6:13 PM, Dennis O'Brien <de...@dennisobrien.net>
wrote:

> Hi All,
>
> I have a number of jobs that use scikit-learn for scoring players.
> Occasionally I need to upgrade scikit-learn to take advantage of some new
> features.  We have a single conda environment that specifies all the
> dependencies for Airflow as well as for all of our DAGs.  So currently
> upgrading scikit-learn means upgrading it for all DAGs that use it, and
> retraining all models for that version.  It becomes a very involved task
> and I'm hoping to find a better way.
>
> One option is to use BashOperator (or something that wraps BashOperator)
> and have bash use a specific conda environment with that version of
> scikit-learn.  While simple, I don't like the idea of limiting task input
> to the command line.  Still, an option.
>
> Another option is the DockerOperator.  But when I asked around at a
> previous Airflow Meetup, I couldn't find anyone actually using it.  It also
> adds some complexity to the build and deploy process in that now I have to
> maintain docker images for all my environments.  Still, not ruling it out.
>
> And the last option I can think of is just heterogeneous workers.  We are
> migrating our Airflow infrastructure to AWS ECS (from EC2) and plan on
> having support for separate worker clusters, so this could include workers
> with different conda environments.  I assume as long as a few key packages
> are identical between scheduler and worker instances (airflow, redis,
> celery?) the rest can be whatever.
>
> Has anyone faced this problem and have some advice?  Am I missing any
> simpler options?  Any thoughts much appreciated.
>
> thanks,
> Dennis
>

Re: best way to handle version upgrades of libraries used by tasks

Posted by Shoumitra Srivastava <sh...@gmail.com>.
We have somewhat of a similar solution to what Rob mentioned. Except, we
store our docker images in quay and use the ECSOperator to run those images
on our ECS clusters. The setup works fairly smoothly.  For some of our jobs
that require much larger machines, we use the BashOperator to execute an
AWS Scale Up/Down before/after the jobs to add/remove machines from our
cluster.

-Shoumitra

On Fri, Feb 9, 2018 at 6:19 PM, Rob Goretsky <ro...@gmail.com>
wrote:

> My team has solved for this with Docker.   When a developer works on a
> single project, they freeze their Python library versions via
> pip freeze > requirements.txt
> for that project, And then we build one Docker image per project, using
> something very similar to the official 'onbuild' version of the Python
> Docker image from here https://hub.docker.com/_/python/..
> We have Jenkins automatically build and push an updated image per project
> to ECR whenever code is pushed to GitHub's master branch for that project.
>
> This means we have currently 80 different Docker images (one per project)
> stored in ECR, but each one is completely isolated from each other in terms
> of their dependencies.   This means we never have to worry about the impact
> of upgrading a python library version for anything but the current project
> we're working on..  This has opened up some nice opportunities to start
> playing more with Python 3.x while keeping all of our older stuff running
> smoothly on Python 2.7.
>
> Airflow then simply calls a version of the DockerOperator each time to run
> the script/program within the project..   Working great for us!
>
> -rob
>
>
> On Mon, Feb 5, 2018 at 3:11 PM, Dennis O'Brien <de...@dennisobrien.net>
> wrote:
>
> > Hi Andrew,
> >
> > I think the issue is that each worker has a single airflow entry point
> > (what does `which airflow` point to) which has an associated environment
> > and list of packages installed, whether those are managed via conda,
> > virtualenv, or the available python environment.  So the executor would
> > need to know which environment you want to run.  I don't know how this
> > would be possible with the LocalExecutor or SequentialExecutor since both
> > are tied to the original python environment.  (Someone correct me if I am
> > wrong here.  I'm definitely not an expert on the Airflow internals.)
> >
> > The BashOperator will allow you to run any process you want, including
> any
> > Python environment, but there is some plumbing overhead required if you
> > want access to the context, etc.  The CeleryExecutor (and any of the
> > executors that support distributed workers) plus a queue gets around the
> > issue of the worker environment tied to the scheduler environment.
> >
> > That said, I don't want to discourage you from trying things out.  I am
> > sure there are some mysteries of Python that might make this possible.
> For
> > example, this project from Armin Ronacher that allows modules to use
> > different versions of available libraries.  (Warning: I wouldn't use this
> > in production.  I think it was more proof of concept.)
> > https://github.com/mitsuhiko/multiversion
> >
> > cheers,
> > Dennis
> >
> >
> >
> > On Mon, Feb 5, 2018 at 5:06 AM Andrew Maguire <an...@gmail.com>
> > wrote:
> >
> > > I am curious about similar issue. I'm wondering if we could use
> > > https://github.com/pypa/pipenv - so each dag is in a folder say and
> that
> > > folder has pipfile.lock that i think could then sort of bundle the
> > required
> > > environment into the dag code folder itself.
> > >
> > > I've not used this yet or anything but seems interesting...
> > >
> > > On Mon, Feb 5, 2018 at 7:17 AM Dennis O'Brien <dennis@dennisobrien.net
> >
> > > wrote:
> > >
> > > > Thanks for the input!  I'll take a look at using queues for this.
> > > >
> > > > thanks,
> > > > Dennis
> > > >
> > > > On Tue, Jan 30, 2018 at 4:17 PM Hbw <brian@heisenbergwoodworking.com
> >
> > > > wrote:
> > > >
> > > > > Run them on different workers by using queues?
> > > > > That way different workers can have different 3rd party libs while
> > > > sharing
> > > > > the same af core.
> > > > >
> > > > > B
> > > > >
> > > > > Sent from a device with less than stellar autocorrect
> > > > >
> > > > > > On Jan 30, 2018, at 9:13 AM, Dennis O'Brien <
> > dennis@dennisobrien.net
> > > >
> > > > > wrote:
> > > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I have a number of jobs that use scikit-learn for scoring
> players.
> > > > > > Occasionally I need to upgrade scikit-learn to take advantage of
> > some
> > > > new
> > > > > > features.  We have a single conda environment that specifies all
> > the
> > > > > > dependencies for Airflow as well as for all of our DAGs.  So
> > > currently
> > > > > > upgrading scikit-learn means upgrading it for all DAGs that use
> it,
> > > and
> > > > > > retraining all models for that version.  It becomes a very
> involved
> > > > task
> > > > > > and I'm hoping to find a better way.
> > > > > >
> > > > > > One option is to use BashOperator (or something that wraps
> > > > BashOperator)
> > > > > > and have bash use a specific conda environment with that version
> of
> > > > > > scikit-learn.  While simple, I don't like the idea of limiting
> task
> > > > input
> > > > > > to the command line.  Still, an option.
> > > > > >
> > > > > > Another option is the DockerOperator.  But when I asked around
> at a
> > > > > > previous Airflow Meetup, I couldn't find anyone actually using
> it.
> > > It
> > > > > also
> > > > > > adds some complexity to the build and deploy process in that now
> I
> > > have
> > > > > to
> > > > > > maintain docker images for all my environments.  Still, not
> ruling
> > it
> > > > > out.
> > > > > >
> > > > > > And the last option I can think of is just heterogeneous workers.
> > We
> > > > are
> > > > > > migrating our Airflow infrastructure to AWS ECS (from EC2) and
> plan
> > > on
> > > > > > having support for separate worker clusters, so this could
> include
> > > > > workers
> > > > > > with different conda environments.  I assume as long as a few key
> > > > > packages
> > > > > > are identical between scheduler and worker instances (airflow,
> > redis,
> > > > > > celery?) the rest can be whatever.
> > > > > >
> > > > > > Has anyone faced this problem and have some advice?  Am I missing
> > any
> > > > > > simpler options?  Any thoughts much appreciated.
> > > > > >
> > > > > > thanks,
> > > > > > Dennis
> > > > >
> > > >
> > >
> >
>

Re: best way to handle version upgrades of libraries used by tasks

Posted by Rob Goretsky <ro...@gmail.com>.
My team has solved for this with Docker.   When a developer works on a
single project, they freeze their Python library versions via
pip freeze > requirements.txt
for that project, And then we build one Docker image per project, using
something very similar to the official 'onbuild' version of the Python
Docker image from here https://hub.docker.com/_/python/..
We have Jenkins automatically build and push an updated image per project
to ECR whenever code is pushed to GitHub's master branch for that project.

This means we have currently 80 different Docker images (one per project)
stored in ECR, but each one is completely isolated from each other in terms
of their dependencies.   This means we never have to worry about the impact
of upgrading a python library version for anything but the current project
we're working on..  This has opened up some nice opportunities to start
playing more with Python 3.x while keeping all of our older stuff running
smoothly on Python 2.7.

Airflow then simply calls a version of the DockerOperator each time to run
the script/program within the project..   Working great for us!

-rob


On Mon, Feb 5, 2018 at 3:11 PM, Dennis O'Brien <de...@dennisobrien.net>
wrote:

> Hi Andrew,
>
> I think the issue is that each worker has a single airflow entry point
> (what does `which airflow` point to) which has an associated environment
> and list of packages installed, whether those are managed via conda,
> virtualenv, or the available python environment.  So the executor would
> need to know which environment you want to run.  I don't know how this
> would be possible with the LocalExecutor or SequentialExecutor since both
> are tied to the original python environment.  (Someone correct me if I am
> wrong here.  I'm definitely not an expert on the Airflow internals.)
>
> The BashOperator will allow you to run any process you want, including any
> Python environment, but there is some plumbing overhead required if you
> want access to the context, etc.  The CeleryExecutor (and any of the
> executors that support distributed workers) plus a queue gets around the
> issue of the worker environment tied to the scheduler environment.
>
> That said, I don't want to discourage you from trying things out.  I am
> sure there are some mysteries of Python that might make this possible.  For
> example, this project from Armin Ronacher that allows modules to use
> different versions of available libraries.  (Warning: I wouldn't use this
> in production.  I think it was more proof of concept.)
> https://github.com/mitsuhiko/multiversion
>
> cheers,
> Dennis
>
>
>
> On Mon, Feb 5, 2018 at 5:06 AM Andrew Maguire <an...@gmail.com>
> wrote:
>
> > I am curious about similar issue. I'm wondering if we could use
> > https://github.com/pypa/pipenv - so each dag is in a folder say and that
> > folder has pipfile.lock that i think could then sort of bundle the
> required
> > environment into the dag code folder itself.
> >
> > I've not used this yet or anything but seems interesting...
> >
> > On Mon, Feb 5, 2018 at 7:17 AM Dennis O'Brien <de...@dennisobrien.net>
> > wrote:
> >
> > > Thanks for the input!  I'll take a look at using queues for this.
> > >
> > > thanks,
> > > Dennis
> > >
> > > On Tue, Jan 30, 2018 at 4:17 PM Hbw <br...@heisenbergwoodworking.com>
> > > wrote:
> > >
> > > > Run them on different workers by using queues?
> > > > That way different workers can have different 3rd party libs while
> > > sharing
> > > > the same af core.
> > > >
> > > > B
> > > >
> > > > Sent from a device with less than stellar autocorrect
> > > >
> > > > > On Jan 30, 2018, at 9:13 AM, Dennis O'Brien <
> dennis@dennisobrien.net
> > >
> > > > wrote:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > I have a number of jobs that use scikit-learn for scoring players.
> > > > > Occasionally I need to upgrade scikit-learn to take advantage of
> some
> > > new
> > > > > features.  We have a single conda environment that specifies all
> the
> > > > > dependencies for Airflow as well as for all of our DAGs.  So
> > currently
> > > > > upgrading scikit-learn means upgrading it for all DAGs that use it,
> > and
> > > > > retraining all models for that version.  It becomes a very involved
> > > task
> > > > > and I'm hoping to find a better way.
> > > > >
> > > > > One option is to use BashOperator (or something that wraps
> > > BashOperator)
> > > > > and have bash use a specific conda environment with that version of
> > > > > scikit-learn.  While simple, I don't like the idea of limiting task
> > > input
> > > > > to the command line.  Still, an option.
> > > > >
> > > > > Another option is the DockerOperator.  But when I asked around at a
> > > > > previous Airflow Meetup, I couldn't find anyone actually using it.
> > It
> > > > also
> > > > > adds some complexity to the build and deploy process in that now I
> > have
> > > > to
> > > > > maintain docker images for all my environments.  Still, not ruling
> it
> > > > out.
> > > > >
> > > > > And the last option I can think of is just heterogeneous workers.
> We
> > > are
> > > > > migrating our Airflow infrastructure to AWS ECS (from EC2) and plan
> > on
> > > > > having support for separate worker clusters, so this could include
> > > > workers
> > > > > with different conda environments.  I assume as long as a few key
> > > > packages
> > > > > are identical between scheduler and worker instances (airflow,
> redis,
> > > > > celery?) the rest can be whatever.
> > > > >
> > > > > Has anyone faced this problem and have some advice?  Am I missing
> any
> > > > > simpler options?  Any thoughts much appreciated.
> > > > >
> > > > > thanks,
> > > > > Dennis
> > > >
> > >
> >
>

Re: best way to handle version upgrades of libraries used by tasks

Posted by Dennis O'Brien <de...@dennisobrien.net>.
Hi Andrew,

I think the issue is that each worker has a single airflow entry point
(what does `which airflow` point to) which has an associated environment
and list of packages installed, whether those are managed via conda,
virtualenv, or the available python environment.  So the executor would
need to know which environment you want to run.  I don't know how this
would be possible with the LocalExecutor or SequentialExecutor since both
are tied to the original python environment.  (Someone correct me if I am
wrong here.  I'm definitely not an expert on the Airflow internals.)

The BashOperator will allow you to run any process you want, including any
Python environment, but there is some plumbing overhead required if you
want access to the context, etc.  The CeleryExecutor (and any of the
executors that support distributed workers) plus a queue gets around the
issue of the worker environment tied to the scheduler environment.

That said, I don't want to discourage you from trying things out.  I am
sure there are some mysteries of Python that might make this possible.  For
example, this project from Armin Ronacher that allows modules to use
different versions of available libraries.  (Warning: I wouldn't use this
in production.  I think it was more proof of concept.)
https://github.com/mitsuhiko/multiversion

cheers,
Dennis



On Mon, Feb 5, 2018 at 5:06 AM Andrew Maguire <an...@gmail.com> wrote:

> I am curious about similar issue. I'm wondering if we could use
> https://github.com/pypa/pipenv - so each dag is in a folder say and that
> folder has pipfile.lock that i think could then sort of bundle the required
> environment into the dag code folder itself.
>
> I've not used this yet or anything but seems interesting...
>
> On Mon, Feb 5, 2018 at 7:17 AM Dennis O'Brien <de...@dennisobrien.net>
> wrote:
>
> > Thanks for the input!  I'll take a look at using queues for this.
> >
> > thanks,
> > Dennis
> >
> > On Tue, Jan 30, 2018 at 4:17 PM Hbw <br...@heisenbergwoodworking.com>
> > wrote:
> >
> > > Run them on different workers by using queues?
> > > That way different workers can have different 3rd party libs while
> > sharing
> > > the same af core.
> > >
> > > B
> > >
> > > Sent from a device with less than stellar autocorrect
> > >
> > > > On Jan 30, 2018, at 9:13 AM, Dennis O'Brien <dennis@dennisobrien.net
> >
> > > wrote:
> > > >
> > > > Hi All,
> > > >
> > > > I have a number of jobs that use scikit-learn for scoring players.
> > > > Occasionally I need to upgrade scikit-learn to take advantage of some
> > new
> > > > features.  We have a single conda environment that specifies all the
> > > > dependencies for Airflow as well as for all of our DAGs.  So
> currently
> > > > upgrading scikit-learn means upgrading it for all DAGs that use it,
> and
> > > > retraining all models for that version.  It becomes a very involved
> > task
> > > > and I'm hoping to find a better way.
> > > >
> > > > One option is to use BashOperator (or something that wraps
> > BashOperator)
> > > > and have bash use a specific conda environment with that version of
> > > > scikit-learn.  While simple, I don't like the idea of limiting task
> > input
> > > > to the command line.  Still, an option.
> > > >
> > > > Another option is the DockerOperator.  But when I asked around at a
> > > > previous Airflow Meetup, I couldn't find anyone actually using it.
> It
> > > also
> > > > adds some complexity to the build and deploy process in that now I
> have
> > > to
> > > > maintain docker images for all my environments.  Still, not ruling it
> > > out.
> > > >
> > > > And the last option I can think of is just heterogeneous workers.  We
> > are
> > > > migrating our Airflow infrastructure to AWS ECS (from EC2) and plan
> on
> > > > having support for separate worker clusters, so this could include
> > > workers
> > > > with different conda environments.  I assume as long as a few key
> > > packages
> > > > are identical between scheduler and worker instances (airflow, redis,
> > > > celery?) the rest can be whatever.
> > > >
> > > > Has anyone faced this problem and have some advice?  Am I missing any
> > > > simpler options?  Any thoughts much appreciated.
> > > >
> > > > thanks,
> > > > Dennis
> > >
> >
>

Re: best way to handle version upgrades of libraries used by tasks

Posted by Andrew Maguire <an...@gmail.com>.
I am curious about similar issue. I'm wondering if we could use
https://github.com/pypa/pipenv - so each dag is in a folder say and that
folder has pipfile.lock that i think could then sort of bundle the required
environment into the dag code folder itself.

I've not used this yet or anything but seems interesting...

On Mon, Feb 5, 2018 at 7:17 AM Dennis O'Brien <de...@dennisobrien.net>
wrote:

> Thanks for the input!  I'll take a look at using queues for this.
>
> thanks,
> Dennis
>
> On Tue, Jan 30, 2018 at 4:17 PM Hbw <br...@heisenbergwoodworking.com>
> wrote:
>
> > Run them on different workers by using queues?
> > That way different workers can have different 3rd party libs while
> sharing
> > the same af core.
> >
> > B
> >
> > Sent from a device with less than stellar autocorrect
> >
> > > On Jan 30, 2018, at 9:13 AM, Dennis O'Brien <de...@dennisobrien.net>
> > wrote:
> > >
> > > Hi All,
> > >
> > > I have a number of jobs that use scikit-learn for scoring players.
> > > Occasionally I need to upgrade scikit-learn to take advantage of some
> new
> > > features.  We have a single conda environment that specifies all the
> > > dependencies for Airflow as well as for all of our DAGs.  So currently
> > > upgrading scikit-learn means upgrading it for all DAGs that use it, and
> > > retraining all models for that version.  It becomes a very involved
> task
> > > and I'm hoping to find a better way.
> > >
> > > One option is to use BashOperator (or something that wraps
> BashOperator)
> > > and have bash use a specific conda environment with that version of
> > > scikit-learn.  While simple, I don't like the idea of limiting task
> input
> > > to the command line.  Still, an option.
> > >
> > > Another option is the DockerOperator.  But when I asked around at a
> > > previous Airflow Meetup, I couldn't find anyone actually using it.  It
> > also
> > > adds some complexity to the build and deploy process in that now I have
> > to
> > > maintain docker images for all my environments.  Still, not ruling it
> > out.
> > >
> > > And the last option I can think of is just heterogeneous workers.  We
> are
> > > migrating our Airflow infrastructure to AWS ECS (from EC2) and plan on
> > > having support for separate worker clusters, so this could include
> > workers
> > > with different conda environments.  I assume as long as a few key
> > packages
> > > are identical between scheduler and worker instances (airflow, redis,
> > > celery?) the rest can be whatever.
> > >
> > > Has anyone faced this problem and have some advice?  Am I missing any
> > > simpler options?  Any thoughts much appreciated.
> > >
> > > thanks,
> > > Dennis
> >
>

Re: best way to handle version upgrades of libraries used by tasks

Posted by Dennis O'Brien <de...@dennisobrien.net>.
Thanks for the input!  I'll take a look at using queues for this.

thanks,
Dennis

On Tue, Jan 30, 2018 at 4:17 PM Hbw <br...@heisenbergwoodworking.com> wrote:

> Run them on different workers by using queues?
> That way different workers can have different 3rd party libs while sharing
> the same af core.
>
> B
>
> Sent from a device with less than stellar autocorrect
>
> > On Jan 30, 2018, at 9:13 AM, Dennis O'Brien <de...@dennisobrien.net>
> wrote:
> >
> > Hi All,
> >
> > I have a number of jobs that use scikit-learn for scoring players.
> > Occasionally I need to upgrade scikit-learn to take advantage of some new
> > features.  We have a single conda environment that specifies all the
> > dependencies for Airflow as well as for all of our DAGs.  So currently
> > upgrading scikit-learn means upgrading it for all DAGs that use it, and
> > retraining all models for that version.  It becomes a very involved task
> > and I'm hoping to find a better way.
> >
> > One option is to use BashOperator (or something that wraps BashOperator)
> > and have bash use a specific conda environment with that version of
> > scikit-learn.  While simple, I don't like the idea of limiting task input
> > to the command line.  Still, an option.
> >
> > Another option is the DockerOperator.  But when I asked around at a
> > previous Airflow Meetup, I couldn't find anyone actually using it.  It
> also
> > adds some complexity to the build and deploy process in that now I have
> to
> > maintain docker images for all my environments.  Still, not ruling it
> out.
> >
> > And the last option I can think of is just heterogeneous workers.  We are
> > migrating our Airflow infrastructure to AWS ECS (from EC2) and plan on
> > having support for separate worker clusters, so this could include
> workers
> > with different conda environments.  I assume as long as a few key
> packages
> > are identical between scheduler and worker instances (airflow, redis,
> > celery?) the rest can be whatever.
> >
> > Has anyone faced this problem and have some advice?  Am I missing any
> > simpler options?  Any thoughts much appreciated.
> >
> > thanks,
> > Dennis
>

Re: best way to handle version upgrades of libraries used by tasks

Posted by Hbw <br...@heisenbergwoodworking.com>.
Run them on different workers by using queues?
That way different workers can have different 3rd party libs while sharing the same af core.

B

Sent from a device with less than stellar autocorrect

> On Jan 30, 2018, at 9:13 AM, Dennis O'Brien <de...@dennisobrien.net> wrote:
> 
> Hi All,
> 
> I have a number of jobs that use scikit-learn for scoring players.
> Occasionally I need to upgrade scikit-learn to take advantage of some new
> features.  We have a single conda environment that specifies all the
> dependencies for Airflow as well as for all of our DAGs.  So currently
> upgrading scikit-learn means upgrading it for all DAGs that use it, and
> retraining all models for that version.  It becomes a very involved task
> and I'm hoping to find a better way.
> 
> One option is to use BashOperator (or something that wraps BashOperator)
> and have bash use a specific conda environment with that version of
> scikit-learn.  While simple, I don't like the idea of limiting task input
> to the command line.  Still, an option.
> 
> Another option is the DockerOperator.  But when I asked around at a
> previous Airflow Meetup, I couldn't find anyone actually using it.  It also
> adds some complexity to the build and deploy process in that now I have to
> maintain docker images for all my environments.  Still, not ruling it out.
> 
> And the last option I can think of is just heterogeneous workers.  We are
> migrating our Airflow infrastructure to AWS ECS (from EC2) and plan on
> having support for separate worker clusters, so this could include workers
> with different conda environments.  I assume as long as a few key packages
> are identical between scheduler and worker instances (airflow, redis,
> celery?) the rest can be whatever.
> 
> Has anyone faced this problem and have some advice?  Am I missing any
> simpler options?  Any thoughts much appreciated.
> 
> thanks,
> Dennis