You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Bolke de Bruin <bd...@gmail.com> on 2016/07/22 09:31:08 UTC

Pickling vs plain text

Hi,

Although some efforts are underway to allow for syncing of DAGs by git or something else, I was wondering why we actually pickle? As we are supporting none pickled DAGs as well there seems to be no need for serialization, ie. keeping the state of the DAG. As DAG definitions are relatively small, why don’t we send the whole DAG? Or am I completely overlooking something?

Cheers
Bolke

Re: Pickling vs plain text

Posted by Maxime Beauchemin <ma...@gmail.com>.
With an executor or any environment where the worker can't access updated
pipeline definitions. But then some templating features wouldn't work
(includes/import/...).

Max

On Wed, Jul 27, 2016 at 2:11 PM, Chris Riccomini <cr...@apache.org>
wrote:

> Under what conditions would one want to use pickling right now?
>
> On Fri, Jul 22, 2016 at 9:43 AM, Maxime Beauchemin
> <ma...@gmail.com> wrote:
> > Pickling is kind of a dead feature, the original idea was that it would
> be
> > used to version and ship DAG definitions around, but in practice many
> > important things are not pickleable (jinja template objects, whatever may
> > be in callbacks or attached to a DAG object). Also I believe that the
> > MesosExecutor relies on pickling. Note that there is a hack to pickle the
> > template, but it has shortcomings (the pickled template cannot reference
> > other files in `includes` or `extends` calls)
> >
> > We should phase pickles out, probably by parameterizing the behavior in
> the
> > current version (generate_pickles=True), set the default to true (to
> match
> > the current behavior), but warn that this won't be an option in 2.0, and
> to
> > set the option to False in order to prepare for the 2.0 migration.
> >
> > Max
> >
> > On Fri, Jul 22, 2016 at 2:31 AM, Bolke de Bruin <bd...@gmail.com>
> wrote:
> >
> >> Hi,
> >>
> >> Although some efforts are underway to allow for syncing of DAGs by git
> or
> >> something else, I was wondering why we actually pickle? As we are
> >> supporting none pickled DAGs as well there seems to be no need for
> >> serialization, ie. keeping the state of the DAG. As DAG definitions are
> >> relatively small, why don’t we send the whole DAG? Or am I completely
> >> overlooking something?
> >>
> >> Cheers
> >> Bolke
>

Re: Pickling vs plain text

Posted by Chris Riccomini <cr...@apache.org>.
Under what conditions would one want to use pickling right now?

On Fri, Jul 22, 2016 at 9:43 AM, Maxime Beauchemin
<ma...@gmail.com> wrote:
> Pickling is kind of a dead feature, the original idea was that it would be
> used to version and ship DAG definitions around, but in practice many
> important things are not pickleable (jinja template objects, whatever may
> be in callbacks or attached to a DAG object). Also I believe that the
> MesosExecutor relies on pickling. Note that there is a hack to pickle the
> template, but it has shortcomings (the pickled template cannot reference
> other files in `includes` or `extends` calls)
>
> We should phase pickles out, probably by parameterizing the behavior in the
> current version (generate_pickles=True), set the default to true (to match
> the current behavior), but warn that this won't be an option in 2.0, and to
> set the option to False in order to prepare for the 2.0 migration.
>
> Max
>
> On Fri, Jul 22, 2016 at 2:31 AM, Bolke de Bruin <bd...@gmail.com> wrote:
>
>> Hi,
>>
>> Although some efforts are underway to allow for syncing of DAGs by git or
>> something else, I was wondering why we actually pickle? As we are
>> supporting none pickled DAGs as well there seems to be no need for
>> serialization, ie. keeping the state of the DAG. As DAG definitions are
>> relatively small, why don’t we send the whole DAG? Or am I completely
>> overlooking something?
>>
>> Cheers
>> Bolke

Re: Pickling vs plain text

Posted by Maxime Beauchemin <ma...@gmail.com>.
Pickling is kind of a dead feature, the original idea was that it would be
used to version and ship DAG definitions around, but in practice many
important things are not pickleable (jinja template objects, whatever may
be in callbacks or attached to a DAG object). Also I believe that the
MesosExecutor relies on pickling. Note that there is a hack to pickle the
template, but it has shortcomings (the pickled template cannot reference
other files in `includes` or `extends` calls)

We should phase pickles out, probably by parameterizing the behavior in the
current version (generate_pickles=True), set the default to true (to match
the current behavior), but warn that this won't be an option in 2.0, and to
set the option to False in order to prepare for the 2.0 migration.

Max

On Fri, Jul 22, 2016 at 2:31 AM, Bolke de Bruin <bd...@gmail.com> wrote:

> Hi,
>
> Although some efforts are underway to allow for syncing of DAGs by git or
> something else, I was wondering why we actually pickle? As we are
> supporting none pickled DAGs as well there seems to be no need for
> serialization, ie. keeping the state of the DAG. As DAG definitions are
> relatively small, why don’t we send the whole DAG? Or am I completely
> overlooking something?
>
> Cheers
> Bolke