You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Maxime Beauchemin <ma...@gmail.com> on 2018/10/01 15:51:10 UTC

Re: execution_date - can we stop the confusion?

I'm not against aliasing personally.

The downside is that it creates more vocabulary overall and most users will
need to learn the mapping of the given aliases at some point in their
learning curve anyways. Only users in environments free of `execution_date`
will benefit from less confusion, and it's likely that the pre-aliased
terms will live on for perpetuity (habit + legacy code).

I'm assuming that the scope of the aliasing would be BaseOperator, the
tutorial, examples, the web UI and CLI. If we start using `period_start` in
those user-facing locations, it creates a bit of a dissonance with the
object naming in the code base and database. Contributors will really need
to understand that aliasing, with `period_start` and `execution_date`
potentially being used interchangeably in the codebase.

I don't think anyone is pushing for this, but I feel strongly that any
campaign to deprecate the original interface would be a giant waste of
effort and time and alienate the community as whole.

Max

On Sun, Sep 30, 2018 at 1:15 AM airflowuser
<ai...@protonmail.com.invalid> wrote:

> Yep.
> Aliasing seems a reasonable solution that preserve the structure and make
> things simpler for new users.
>
> While I agree with everyone that learning a new technology has learning
> curve still we can see more and more theologies embrace the user friendly
> flag.
>
>
> Sent with ProtonMail Secure Email.
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Saturday, September 29, 2018 9:47 AM, <as...@apache.org> wrote:
>
> > What about (aliasing) execution_date to period_start, and
> next_execution_date to period_end? Would this help any do we think?
> >
> > (Though things like ds and ts might still be confusing? This is probably
> where the OP got the idea for run_stamped from? One step at a time.)
> >
> > Ash
> >
> > On 27 September 2018 20:42:07 BST, George Leslie-Waksman
> waksman@gmail.com wrote:
> >
> > > I would like to challenge the notion that "execution_date" is well
> > > documented. Looking at airflow.apache.org right now and searching for
> > > all
> > > references to "execution_date", I find that the only definition of
> > > execution_date is, "The execution date of the DAG". There are some
> > > other
> > > passing references that imply more but nothing explicit.
> > > From the documentation, as currently published, it seems reasonable to
> > > expect some concurrence between "execution_date" and when a dag
> > > executes,
> > > especially given the heavy repetition of, "execution_date - The
> > > execution
> > > date of the DAG".
> > > Personally, I think the problem is the word "execution", not with which
> > > bound is used to label/define an interval. I think this is especially
> > > difficult for people coming to Airflow with a cron background who are
> > > not
> > > necessarily thinking about intervals.
> > > On Thu, Sep 27, 2018 at 11:23 AM Brian Greene <
> > > brian@heisenbergwoodworking.com> wrote:
> > >
> > > > Second use of “inane” on this subject. Brilliant, less combative
> > > > response
> > > > Chris.
> > > > There’s another point.. left bound makes sense to some people, right
> > > > bound
> > > > to others.
> > > > There’s no way to know or measure how “hard” this is to new users, so
> > > > even
> > > > if the change was made - new name, use right bound... how can you be
> > > > sure
> > > > you’re not actually confusing a LARGER number of new users from that
> > > > point
> > > > on.
> > > > It’s like left handed versus right handed people, except there’s no
> > > > statistical basis for your argument that one group is larger than the
> > > > other, or that there would actually be a measurable uptick in
> > > > understanding
> > > > and usability across the ENTIRE user community.
> > > > So your proposal 100% breaks backwards compatibility of code AND
> > > > concept,
> > > > on anecdotal evidence that it would somehow make usage magically
> > > > easier?
> > > > Airflow is like a bulldozer made out of scalpels that can fly(not
> > > > well,
> > > > but it’s possible). A slick dag can accomplish a staggering amount
> > > > of work
> > > > with the smallest little bit of elegant code. Learning to “think in
> > > > airflow” though is so, so much more than understanding execution
> > > > date.
> > > > That’s barely table stakes in terms of concepts you’ll need to accept
> > > > to be
> > > > effective with airflow.
> > > > Maybe somebody just has a thing against lefty’s? Some kind of
> > > > left-bound-thinking conspiracy?
> > > > Sent from a device with less than stellar autocorrect
> > > >
> > > > > On Sep 27, 2018, at 12:56 PM, Chris Palmer chris@crpalmer.com
> > > > > wrote:
> > > >
> > > > > While taking a step back makes some sense, we also need to identify
> > > > > what
> > > >
> > > > > the issue is. Simply saying 'execution_date behavior is confusing
> > > > > to new
> > > >
> > > > > users' isn't good enough. What is confusing about it? Is it what it
> > > > > represents, or just the name itself?
> > > > > There are a number of different timestamps that might be of
> > > > > interest,
> > > >
> > > > > including (but not limited to):
> > > > > Identifying timestamp
> > > > > For any time interval, there are two natural choices of timestamps
> > > > > to
> > > >
> > > > > represent that interval, the left and right bounds. For Airflow the
> > > > > left
> > > >
> > > > > bound has been chosen, and is called execution_date. For various
> > > > > reasons, I
> > > > > think that makes a much better choice than the right bound.
> > > > > Create/update/delete timestamps
> > > > > Timestamps representing when particular database records where
> > > > > created,
> > > >
> > > > > updated and or deleted. I don't believe that Airflow currently
> > > > > records
> > > >
> > > > > these.
> > > > > Runtime timestamps
> > > > > The timestamps that a task or other process started and stopped.
> > > > > Airflow
> > > >
> > > > > records these for Tasks, but I think the implementation is maybe a
> > > > > little
> > > >
> > > > > lacking for DagRuns.
> > > > > So what's the confusion with execution_date? Is it what it
> > > > > represents or
> > > >
> > > > > the name itself?
> > > > > I think part of the learning curve with Airflow is understanding
> > > > > that
> > > >
> > > > > execution_date is the left bound of the interval. No matter what
> > > > > name you
> > > >
> > > > > use for the identifying timestamp I think new users will need to
> > > > > learn
> > > > > what
> > > >
> > > > > that choice means. Changing the name won't magically make all the
> > > > > confusion
> > > > > go away.
> > > > > While I don't think execution_date is the greatest name in the
> > > > > world,
> > > > > it's
> > > >
> > > > > a lot better than the suggested alternative run_stamped. Tasks also
> > > > > have
> > > > > an
> > > >
> > > > > identifying timestamp, and if I saw run_stamped on a Task I would
> > > > > have no
> > > >
> > > > > idea what it means (stamped by what?).
> > > > > While there may be better names than execution_date, I don't think
> > > > > they
> > > > > are
> > > >
> > > > > so much better that it is worth the effort to overhaul such an
> > > > > integral
> > > >
> > > > > part of Airflow. Maybe some improvements to the documentation could
> > > > > be
> > > >
> > > > > made, but nothing so drastic as to renaming such a core item.
> > > > > As for the second suggestion to add "a new variable which indicated
> > > > > the
> > > >
> > > > > actual datetime when the DAG run was generated. call it
> > > > > execution_start_date". It is very unclear what the desired outcome
> > > > > is
> > > > > with
> > > >
> > > > > this.
> > > > > To me "generated" implies creation time, i.e. recorded in the
> > > > > database.
> > > >
> > > > > However, creation of a DagRun record in the database is a distinct
> > > > > event
> > > >
> > > > > from when Tasks associated with that DagRun start executing. Plus
> > > > > DagRuns
> > > >
> > > > > themselves don't actually "run" - Tasks are the only thing that
> > > > > really
> > > > > gets
> > > >
> > > > > run by Airflow.
> > > > > What is actually desired here?
> > > > >
> > > > > -   The right bound of the schedule interval?
> > > > > -   The time the DagRun was created?
> > > > > -   The time that any Tasks associated with a DagRun were first
> > > > >     considered
> > > > >
> > > >
> > > > > by the scheduler?
> > > > >
> > > > > -   The time that any Tasks associated with a DagRun were first
> > > > >     scheduled?
> > > > >
> > > >
> > > > > -   The time that any Tasks associated with a DagRun were actually
> > > > >     started
> > > > >
> > > >
> > > > > by a worker?
> > > > > The lack of clarity and completeness around these suggestions,
> > > > > alongside
> > > >
> > > > > inane declarations like "This name won't cause people to get
> > > > > confused" is
> > > >
> > > > > hardly a good way to get people to take suggestions seriously.
> > > > > Chris
> > > > > On Wed, Sep 26, 2018 at 7:37 PM George Leslie-Waksman
> > > > > <waksman@gmail.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > This comes up a lot. I've seen it on this mailing list multiple
> > > > > > times
> > > > > > and
> > > >
> > > > > > it's something that I have to explicitly call out to every single
> > > > > > person
> > > >
> > > > > > that I've helped train up on Airflow.
> > > > > > If we take a moment to set aside why things are the way they are,
> > > > > > what
> > > > > > the
> > > >
> > > > > > documentation says, and how experienced users feel things should
> > > > > > behave;
> > > >
> > > > > > there still remains the fact that a lot of new users get confused
> > > > > > by how
> > > >
> > > > > > "execution_date" works.
> > > > > > Whether it's a problem, whether we need to do something, and what
> > > > > > we
> > > > > > could
> > > >
> > > > > > do are all separate questions but I think it's important that we
> > > > > > acknowledge and start from:
> > > > > > A lot of new users get confused by how "execution_date" works.
> > > > > > I recognize that some of this is a learning curve issue and some
> > > > > > of
> > > > > > this is
> > > >
> > > > > > a mindset issue but it begs the question: do enough users benefit
> > > > > > from
> > > > > > the
> > > >
> > > > > > current structure to justify the harm to new users?
> > > > > > --George
> > > > > > On Wed, Sep 26, 2018 at 1:40 PM Brian Greene <
> > > > > > brian@heisenbergwoodworking.com> wrote:
> > > > > >
> > > > > > > It took a minute to grok, but in the larger context of how af
> > > > > > > works it
> > > >
> > > > > > > makes perfect sense the way it is. Changing something so
> > > > > > > fundamentally
> > > >
> > > > > > > breaking to every dag in existence should bring a comparable
> > > > > > > benefit.
> > > >
> > > > > > > Beyond the avoiding teaching a concept you disagree with, what
> > > > > > > benefits
> > > >
> > > > > > > does the proposal bring to offset the cost of change?
> > > > > > > I’m gonna make a meme - “do you even airflow bro?”
> > > > > > > Sent from a device with less than stellar autocorrect
> > > > > > >
> > > > > > > > On Sep 26, 2018, at 8:33 AM, Maxime Beauchemin <
> > > > > > > > maximebeauchemin@gmail.com> wrote:
> > > > > > > > I think if you have a functional mindset (as in "functional
> data
> > > > > > > > engineering
> > > > > > > > <
> > >
> > >
> https://medium.com/@maximebeauchemin/functional-data-engineering-a-modern-paradigm-for-batch-data-processing-2327ec32c42a
> > >
> > > > > > > > ")
> > > > > > > > as opposed to a cron mindset, using the left bound of the
> time
> > > > > > > > interval
> > > > >
> > > > > > > > makes a lot of sense. Things like your daily table partition
> > > > > > > > keys
> > > > > > > > align
> > > >
> > > > > > > > with your Airflow execution_date.
> > > > > > > > The main thing is that whatever we do we cannot break
> backwards
> > > > > > > > compatibility. Offering both views (left bound/right bound),
> as
> > > > > > > > it's
> > > >
> > > > > > been
> > > > > >
> > > > > > > > proposed before, either as an environment setting or a user
> > > > > > > > personal
> > > >
> > > > > > > > preference is even more confusing to me personally. Users
> would
> > > > > > > > have
> > > > > > > > to
> > > >
> > > > > > > > switch context as they help each other or change
> environments.
> > > > > > > > Also note that your intuition may differ from other people's
> > > > > > > > intuition,
> > > > >
> > > > > > > and
> > > > > > >
> > > > > > > > that "unlearning" something is way harder than learning
> > > > > > > > something.
> > > >
> > > > > > > > My personal take on this is to make this a rite of passage.
> This
> > > > > > > > is
> > > >
> > > > > > just
> > > > > >
> > > > > > > > one of the many thing you have to learn when learning
> Airflow.
> > > > > > > > Max
> > > > > > > >
> > > > > > > > > On Wed, Sep 26, 2018 at 8:18 AM Sam Elamin
> > > > > > > > > hussam.elamin@gmail.com
> > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > > Hi Bolke
> > > > > > > > > Speaking as a consultant who is constantly training other
> teams
> > > > > > > > > how
> > > > > > > > > to
> > > >
> > > > > > > use
> > > > > > >
> > > > > > > > > airflow, I do frequently see this confusion.
> > > > > > > > > Another one is how the batch_date is always batch_date +
> > > > > > > > > interval or
> > > >
> > > > > > as
> > > > > >
> > > > > > > the
> > > > > > >
> > > > > > > > > docs make it quite clear
> > > > > > > > > "Let’s Repeat That The scheduler runs your job one
> > > > > > > > > schedule_interval
> > > > >
> > > > > > > > > AFTER
> > > > > > > > > the start date, at the END of the period."
> > > > > > > > > Renaming it would make it simpler for newbies, but
> essentially
> > > > > > > > > they
> > > >
> > > > > > will
> > > > > >
> > > > > > > > > need to understand how Airflow behaves, execution_date
> being
> > > > > > > > > the
> > > > > > > > > batch
> > > >
> > > > > > > > > execution date not the run_date of the DAG
> > > > > > > > > I am actually in the process of writing a blog post
> > > > > > > > > <
> > >
> > > https://samelamin.github.io/2017/04/27/Building-A-Datapipeline-part1/>
> > >
> > > > > > > > > about this which I could use peoples feedback
> > > > > > > > > If it helps, I find that explaining how backfills work and
> why
> > > > > > > > > they
> > > >
> > > > > > are
> > > > > >
> > > > > > > > > important will drive home what the execution_date is :)
> > > > > > > > > Regards
> > > > > > > > > Sam
> > > > > > > > >
> > > > > > > > > > On Wed, Sep 26, 2018 at 4:10 PM Bolke de Bruin
> > > > > > > > > > bdbruin@gmail.com
> > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > > > I dont think this makes sense and I dont that think
> anyone had
> > > > > > > > > > a
> > > > > > > > > > real
> > > >
> > > > > > > > > > issue with this. Execution date has been clearly
> documented
> > > > > > > > > > and is
> > > >
> > > > > > > part
> > > > > > >
> > > > > > > > > of
> > > > > > > > >
> > > > > > > > > > the core principles of airflow. Renaming will create more
> > > > > > > > > > confusion.
> > > >
> > > > > > > > > > Please note that I do think that as an anonymous user you
> > > > > > > > > > cannot
> > > >
> > > > > > speak
> > > > > >
> > > > > > > > > for
> > > > > > > > >
> > > > > > > > > > any "new airflow user". That is a contradiction to me.
> > > > > > > > > > Thanks
> > > > > > > > > > Bolke
> > > > > > > > > > Sent from my iPhone
> > > > > > > > > >
> > > > > > > > > > > On 26 Sep 2018, at 07:59, airflowuser
> > > > > > > > > > > <airflowuser@protonmail.com
> > > >
> > > > > > > > > .INVALID>
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > One of the most annoying, hard to understand and
> against all
> > > > > > > > > > > common
> > > >
> > > > > > > > > > sense is the execution_date behavior. I assume that any
> new
> > > > > > > > > > Airflow
> > > >
> > > > > > > user
> > > > > > >
> > > > > > > > > > has been struggling with it.
> > > > > > > > > >
> > > > > > > > > > > The amount of questions with answers referring to :
> > > > > > > > > > >
> https://airflow.apache.org/scheduler.html?scheduling-triggers
> > > > > > > > > > > is
> > > >
> > > > > > > > > > uncountable.
> > > > > > > > > >
> > > > > > > > > > > Most people mistakenly think that execution_date is the
> > > > > > > > > > > datetime
> > > >
> > > > > > which
> > > > > >
> > > > > > > > > > the DAG started to run.
> > > > > > > > > >
> > > > > > > > > > > I suggest the following changes:
> > > > > > > > > > >
> > > > > > > > > > > 1.  Renaming the execution_date to something else like:
> > > > > > > > > > >     run_stamped
> > > > > > > > > > >
> > > >
> > > > > > > > > > This name won't cause people to get confused.
> > > > > > > > > >
> > > > > > > > > > > 2.  Adding a new variable which indicated the actual
> datetime
> > > > > > > > > > >     when
> > > > > > > > > > >
> > > >
> > > > > > the
> > > > > >
> > > > > > > > > > DAG run was generated. call it execution_start_date.
> People
> > > > > > > > > > seem to
> > > >
> > > > > > > want
> > > > > > >
> > > > > > > > > > the information when the DAG actually started to be
> > > > > > > > > > executed/run.
> > > >
> > > > > > > > > > > This is only naming changes. No need to actual change
> the
> > > > > > > > > > > behavior
> > > >
> > > > -
> > > >
> > > > > > > > > > This will only make things simpler as when user encounter
> > > > > > > > > > run_stamped
> > > > > > >
> > > > > > > > > he
> > > > > > > > >
> > > > > > > > > > won't be confused by the name like execution_date
>
>
>

Re: execution_date - can we stop the confusion?

Posted by Deng Xiaodong <xd...@gmail.com>.
Changing terms or aliasing may both introduce another set of confusions.

Refining the documentation systematically may be a more feasible solution
to this sort of issues? Like having “execution_date” in “Concepts” section,
or having a dedicated section named “Vocabularies” to list all potentially
confusing terms?

Thanks.

XD


On Mon, Oct 1, 2018 at 23:51 Maxime Beauchemin <ma...@gmail.com>
wrote:

> I'm not against aliasing personally.
>
> The downside is that it creates more vocabulary overall and most users will
> need to learn the mapping of the given aliases at some point in their
> learning curve anyways. Only users in environments free of `execution_date`
> will benefit from less confusion, and it's likely that the pre-aliased
> terms will live on for perpetuity (habit + legacy code).
>
> I'm assuming that the scope of the aliasing would be BaseOperator, the
> tutorial, examples, the web UI and CLI. If we start using `period_start` in
> those user-facing locations, it creates a bit of a dissonance with the
> object naming in the code base and database. Contributors will really need
> to understand that aliasing, with `period_start` and `execution_date`
> potentially being used interchangeably in the codebase.
>
> I don't think anyone is pushing for this, but I feel strongly that any
> campaign to deprecate the original interface would be a giant waste of
> effort and time and alienate the community as whole.
>
> Max
>
> On Sun, Sep 30, 2018 at 1:15 AM airflowuser
> <ai...@protonmail.com.invalid> wrote:
>
> > Yep.
> > Aliasing seems a reasonable solution that preserve the structure and make
> > things simpler for new users.
> >
> > While I agree with everyone that learning a new technology has learning
> > curve still we can see more and more theologies embrace the user friendly
> > flag.
> >
> >
> > Sent with ProtonMail Secure Email.
> >
> > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> > On Saturday, September 29, 2018 9:47 AM, <as...@apache.org> wrote:
> >
> > > What about (aliasing) execution_date to period_start, and
> > next_execution_date to period_end? Would this help any do we think?
> > >
> > > (Though things like ds and ts might still be confusing? This is
> probably
> > where the OP got the idea for run_stamped from? One step at a time.)
> > >
> > > Ash
> > >
> > > On 27 September 2018 20:42:07 BST, George Leslie-Waksman
> > waksman@gmail.com wrote:
> > >
> > > > I would like to challenge the notion that "execution_date" is well
> > > > documented. Looking at airflow.apache.org right now and searching
> for
> > > > all
> > > > references to "execution_date", I find that the only definition of
> > > > execution_date is, "The execution date of the DAG". There are some
> > > > other
> > > > passing references that imply more but nothing explicit.
> > > > From the documentation, as currently published, it seems reasonable
> to
> > > > expect some concurrence between "execution_date" and when a dag
> > > > executes,
> > > > especially given the heavy repetition of, "execution_date - The
> > > > execution
> > > > date of the DAG".
> > > > Personally, I think the problem is the word "execution", not with
> which
> > > > bound is used to label/define an interval. I think this is especially
> > > > difficult for people coming to Airflow with a cron background who are
> > > > not
> > > > necessarily thinking about intervals.
> > > > On Thu, Sep 27, 2018 at 11:23 AM Brian Greene <
> > > > brian@heisenbergwoodworking.com> wrote:
> > > >
> > > > > Second use of “inane” on this subject. Brilliant, less combative
> > > > > response
> > > > > Chris.
> > > > > There’s another point.. left bound makes sense to some people,
> right
> > > > > bound
> > > > > to others.
> > > > > There’s no way to know or measure how “hard” this is to new users,
> so
> > > > > even
> > > > > if the change was made - new name, use right bound... how can you
> be
> > > > > sure
> > > > > you’re not actually confusing a LARGER number of new users from
> that
> > > > > point
> > > > > on.
> > > > > It’s like left handed versus right handed people, except there’s no
> > > > > statistical basis for your argument that one group is larger than
> the
> > > > > other, or that there would actually be a measurable uptick in
> > > > > understanding
> > > > > and usability across the ENTIRE user community.
> > > > > So your proposal 100% breaks backwards compatibility of code AND
> > > > > concept,
> > > > > on anecdotal evidence that it would somehow make usage magically
> > > > > easier?
> > > > > Airflow is like a bulldozer made out of scalpels that can fly(not
> > > > > well,
> > > > > but it’s possible). A slick dag can accomplish a staggering amount
> > > > > of work
> > > > > with the smallest little bit of elegant code. Learning to “think in
> > > > > airflow” though is so, so much more than understanding execution
> > > > > date.
> > > > > That’s barely table stakes in terms of concepts you’ll need to
> accept
> > > > > to be
> > > > > effective with airflow.
> > > > > Maybe somebody just has a thing against lefty’s? Some kind of
> > > > > left-bound-thinking conspiracy?
> > > > > Sent from a device with less than stellar autocorrect
> > > > >
> > > > > > On Sep 27, 2018, at 12:56 PM, Chris Palmer chris@crpalmer.com
> > > > > > wrote:
> > > > >
> > > > > > While taking a step back makes some sense, we also need to
> identify
> > > > > > what
> > > > >
> > > > > > the issue is. Simply saying 'execution_date behavior is confusing
> > > > > > to new
> > > > >
> > > > > > users' isn't good enough. What is confusing about it? Is it what
> it
> > > > > > represents, or just the name itself?
> > > > > > There are a number of different timestamps that might be of
> > > > > > interest,
> > > > >
> > > > > > including (but not limited to):
> > > > > > Identifying timestamp
> > > > > > For any time interval, there are two natural choices of
> timestamps
> > > > > > to
> > > > >
> > > > > > represent that interval, the left and right bounds. For Airflow
> the
> > > > > > left
> > > > >
> > > > > > bound has been chosen, and is called execution_date. For various
> > > > > > reasons, I
> > > > > > think that makes a much better choice than the right bound.
> > > > > > Create/update/delete timestamps
> > > > > > Timestamps representing when particular database records where
> > > > > > created,
> > > > >
> > > > > > updated and or deleted. I don't believe that Airflow currently
> > > > > > records
> > > > >
> > > > > > these.
> > > > > > Runtime timestamps
> > > > > > The timestamps that a task or other process started and stopped.
> > > > > > Airflow
> > > > >
> > > > > > records these for Tasks, but I think the implementation is maybe
> a
> > > > > > little
> > > > >
> > > > > > lacking for DagRuns.
> > > > > > So what's the confusion with execution_date? Is it what it
> > > > > > represents or
> > > > >
> > > > > > the name itself?
> > > > > > I think part of the learning curve with Airflow is understanding
> > > > > > that
> > > > >
> > > > > > execution_date is the left bound of the interval. No matter what
> > > > > > name you
> > > > >
> > > > > > use for the identifying timestamp I think new users will need to
> > > > > > learn
> > > > > > what
> > > > >
> > > > > > that choice means. Changing the name won't magically make all the
> > > > > > confusion
> > > > > > go away.
> > > > > > While I don't think execution_date is the greatest name in the
> > > > > > world,
> > > > > > it's
> > > > >
> > > > > > a lot better than the suggested alternative run_stamped. Tasks
> also
> > > > > > have
> > > > > > an
> > > > >
> > > > > > identifying timestamp, and if I saw run_stamped on a Task I would
> > > > > > have no
> > > > >
> > > > > > idea what it means (stamped by what?).
> > > > > > While there may be better names than execution_date, I don't
> think
> > > > > > they
> > > > > > are
> > > > >
> > > > > > so much better that it is worth the effort to overhaul such an
> > > > > > integral
> > > > >
> > > > > > part of Airflow. Maybe some improvements to the documentation
> could
> > > > > > be
> > > > >
> > > > > > made, but nothing so drastic as to renaming such a core item.
> > > > > > As for the second suggestion to add "a new variable which
> indicated
> > > > > > the
> > > > >
> > > > > > actual datetime when the DAG run was generated. call it
> > > > > > execution_start_date". It is very unclear what the desired
> outcome
> > > > > > is
> > > > > > with
> > > > >
> > > > > > this.
> > > > > > To me "generated" implies creation time, i.e. recorded in the
> > > > > > database.
> > > > >
> > > > > > However, creation of a DagRun record in the database is a
> distinct
> > > > > > event
> > > > >
> > > > > > from when Tasks associated with that DagRun start executing. Plus
> > > > > > DagRuns
> > > > >
> > > > > > themselves don't actually "run" - Tasks are the only thing that
> > > > > > really
> > > > > > gets
> > > > >
> > > > > > run by Airflow.
> > > > > > What is actually desired here?
> > > > > >
> > > > > > -   The right bound of the schedule interval?
> > > > > > -   The time the DagRun was created?
> > > > > > -   The time that any Tasks associated with a DagRun were first
> > > > > >     considered
> > > > > >
> > > > >
> > > > > > by the scheduler?
> > > > > >
> > > > > > -   The time that any Tasks associated with a DagRun were first
> > > > > >     scheduled?
> > > > > >
> > > > >
> > > > > > -   The time that any Tasks associated with a DagRun were
> actually
> > > > > >     started
> > > > > >
> > > > >
> > > > > > by a worker?
> > > > > > The lack of clarity and completeness around these suggestions,
> > > > > > alongside
> > > > >
> > > > > > inane declarations like "This name won't cause people to get
> > > > > > confused" is
> > > > >
> > > > > > hardly a good way to get people to take suggestions seriously.
> > > > > > Chris
> > > > > > On Wed, Sep 26, 2018 at 7:37 PM George Leslie-Waksman
> > > > > > <waksman@gmail.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > This comes up a lot. I've seen it on this mailing list multiple
> > > > > > > times
> > > > > > > and
> > > > >
> > > > > > > it's something that I have to explicitly call out to every
> single
> > > > > > > person
> > > > >
> > > > > > > that I've helped train up on Airflow.
> > > > > > > If we take a moment to set aside why things are the way they
> are,
> > > > > > > what
> > > > > > > the
> > > > >
> > > > > > > documentation says, and how experienced users feel things
> should
> > > > > > > behave;
> > > > >
> > > > > > > there still remains the fact that a lot of new users get
> confused
> > > > > > > by how
> > > > >
> > > > > > > "execution_date" works.
> > > > > > > Whether it's a problem, whether we need to do something, and
> what
> > > > > > > we
> > > > > > > could
> > > > >
> > > > > > > do are all separate questions but I think it's important that
> we
> > > > > > > acknowledge and start from:
> > > > > > > A lot of new users get confused by how "execution_date" works.
> > > > > > > I recognize that some of this is a learning curve issue and
> some
> > > > > > > of
> > > > > > > this is
> > > > >
> > > > > > > a mindset issue but it begs the question: do enough users
> benefit
> > > > > > > from
> > > > > > > the
> > > > >
> > > > > > > current structure to justify the harm to new users?
> > > > > > > --George
> > > > > > > On Wed, Sep 26, 2018 at 1:40 PM Brian Greene <
> > > > > > > brian@heisenbergwoodworking.com> wrote:
> > > > > > >
> > > > > > > > It took a minute to grok, but in the larger context of how af
> > > > > > > > works it
> > > > >
> > > > > > > > makes perfect sense the way it is. Changing something so
> > > > > > > > fundamentally
> > > > >
> > > > > > > > breaking to every dag in existence should bring a comparable
> > > > > > > > benefit.
> > > > >
> > > > > > > > Beyond the avoiding teaching a concept you disagree with,
> what
> > > > > > > > benefits
> > > > >
> > > > > > > > does the proposal bring to offset the cost of change?
> > > > > > > > I’m gonna make a meme - “do you even airflow bro?”
> > > > > > > > Sent from a device with less than stellar autocorrect
> > > > > > > >
> > > > > > > > > On Sep 26, 2018, at 8:33 AM, Maxime Beauchemin <
> > > > > > > > > maximebeauchemin@gmail.com> wrote:
> > > > > > > > > I think if you have a functional mindset (as in "functional
> > data
> > > > > > > > > engineering
> > > > > > > > > <
> > > >
> > > >
> >
> https://medium.com/@maximebeauchemin/functional-data-engineering-a-modern-paradigm-for-batch-data-processing-2327ec32c42a
> > > >
> > > > > > > > > ")
> > > > > > > > > as opposed to a cron mindset, using the left bound of the
> > time
> > > > > > > > > interval
> > > > > >
> > > > > > > > > makes a lot of sense. Things like your daily table
> partition
> > > > > > > > > keys
> > > > > > > > > align
> > > > >
> > > > > > > > > with your Airflow execution_date.
> > > > > > > > > The main thing is that whatever we do we cannot break
> > backwards
> > > > > > > > > compatibility. Offering both views (left bound/right
> bound),
> > as
> > > > > > > > > it's
> > > > >
> > > > > > > been
> > > > > > >
> > > > > > > > > proposed before, either as an environment setting or a user
> > > > > > > > > personal
> > > > >
> > > > > > > > > preference is even more confusing to me personally. Users
> > would
> > > > > > > > > have
> > > > > > > > > to
> > > > >
> > > > > > > > > switch context as they help each other or change
> > environments.
> > > > > > > > > Also note that your intuition may differ from other
> people's
> > > > > > > > > intuition,
> > > > > >
> > > > > > > > and
> > > > > > > >
> > > > > > > > > that "unlearning" something is way harder than learning
> > > > > > > > > something.
> > > > >
> > > > > > > > > My personal take on this is to make this a rite of passage.
> > This
> > > > > > > > > is
> > > > >
> > > > > > > just
> > > > > > >
> > > > > > > > > one of the many thing you have to learn when learning
> > Airflow.
> > > > > > > > > Max
> > > > > > > > >
> > > > > > > > > > On Wed, Sep 26, 2018 at 8:18 AM Sam Elamin
> > > > > > > > > > hussam.elamin@gmail.com
> > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > > Hi Bolke
> > > > > > > > > > Speaking as a consultant who is constantly training other
> > teams
> > > > > > > > > > how
> > > > > > > > > > to
> > > > >
> > > > > > > > use
> > > > > > > >
> > > > > > > > > > airflow, I do frequently see this confusion.
> > > > > > > > > > Another one is how the batch_date is always batch_date +
> > > > > > > > > > interval or
> > > > >
> > > > > > > as
> > > > > > >
> > > > > > > > the
> > > > > > > >
> > > > > > > > > > docs make it quite clear
> > > > > > > > > > "Let’s Repeat That The scheduler runs your job one
> > > > > > > > > > schedule_interval
> > > > > >
> > > > > > > > > > AFTER
> > > > > > > > > > the start date, at the END of the period."
> > > > > > > > > > Renaming it would make it simpler for newbies, but
> > essentially
> > > > > > > > > > they
> > > > >
> > > > > > > will
> > > > > > >
> > > > > > > > > > need to understand how Airflow behaves, execution_date
> > being
> > > > > > > > > > the
> > > > > > > > > > batch
> > > > >
> > > > > > > > > > execution date not the run_date of the DAG
> > > > > > > > > > I am actually in the process of writing a blog post
> > > > > > > > > > <
> > > >
> > > >
> https://samelamin.github.io/2017/04/27/Building-A-Datapipeline-part1/>
> > > >
> > > > > > > > > > about this which I could use peoples feedback
> > > > > > > > > > If it helps, I find that explaining how backfills work
> and
> > why
> > > > > > > > > > they
> > > > >
> > > > > > > are
> > > > > > >
> > > > > > > > > > important will drive home what the execution_date is :)
> > > > > > > > > > Regards
> > > > > > > > > > Sam
> > > > > > > > > >
> > > > > > > > > > > On Wed, Sep 26, 2018 at 4:10 PM Bolke de Bruin
> > > > > > > > > > > bdbruin@gmail.com
> > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > > > I dont think this makes sense and I dont that think
> > anyone had
> > > > > > > > > > > a
> > > > > > > > > > > real
> > > > >
> > > > > > > > > > > issue with this. Execution date has been clearly
> > documented
> > > > > > > > > > > and is
> > > > >
> > > > > > > > part
> > > > > > > >
> > > > > > > > > > of
> > > > > > > > > >
> > > > > > > > > > > the core principles of airflow. Renaming will create
> more
> > > > > > > > > > > confusion.
> > > > >
> > > > > > > > > > > Please note that I do think that as an anonymous user
> you
> > > > > > > > > > > cannot
> > > > >
> > > > > > > speak
> > > > > > >
> > > > > > > > > > for
> > > > > > > > > >
> > > > > > > > > > > any "new airflow user". That is a contradiction to me.
> > > > > > > > > > > Thanks
> > > > > > > > > > > Bolke
> > > > > > > > > > > Sent from my iPhone
> > > > > > > > > > >
> > > > > > > > > > > > On 26 Sep 2018, at 07:59, airflowuser
> > > > > > > > > > > > <airflowuser@protonmail.com
> > > > >
> > > > > > > > > > .INVALID>
> > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > One of the most annoying, hard to understand and
> > against all
> > > > > > > > > > > > common
> > > > >
> > > > > > > > > > > sense is the execution_date behavior. I assume that any
> > new
> > > > > > > > > > > Airflow
> > > > >
> > > > > > > > user
> > > > > > > >
> > > > > > > > > > > has been struggling with it.
> > > > > > > > > > >
> > > > > > > > > > > > The amount of questions with answers referring to :
> > > > > > > > > > > >
> > https://airflow.apache.org/scheduler.html?scheduling-triggers
> > > > > > > > > > > > is
> > > > >
> > > > > > > > > > > uncountable.
> > > > > > > > > > >
> > > > > > > > > > > > Most people mistakenly think that execution_date is
> the
> > > > > > > > > > > > datetime
> > > > >
> > > > > > > which
> > > > > > >
> > > > > > > > > > > the DAG started to run.
> > > > > > > > > > >
> > > > > > > > > > > > I suggest the following changes:
> > > > > > > > > > > >
> > > > > > > > > > > > 1.  Renaming the execution_date to something else
> like:
> > > > > > > > > > > >     run_stamped
> > > > > > > > > > > >
> > > > >
> > > > > > > > > > > This name won't cause people to get confused.
> > > > > > > > > > >
> > > > > > > > > > > > 2.  Adding a new variable which indicated the actual
> > datetime
> > > > > > > > > > > >     when
> > > > > > > > > > > >
> > > > >
> > > > > > > the
> > > > > > >
> > > > > > > > > > > DAG run was generated. call it execution_start_date.
> > People
> > > > > > > > > > > seem to
> > > > >
> > > > > > > > want
> > > > > > > >
> > > > > > > > > > > the information when the DAG actually started to be
> > > > > > > > > > > executed/run.
> > > > >
> > > > > > > > > > > > This is only naming changes. No need to actual change
> > the
> > > > > > > > > > > > behavior
> > > > >
> > > > > -
> > > > >
> > > > > > > > > > > This will only make things simpler as when user
> encounter
> > > > > > > > > > > run_stamped
> > > > > > > >
> > > > > > > > > > he
> > > > > > > > > >
> > > > > > > > > > > won't be confused by the name like execution_date
> >
> >
> >
>