You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Yongjun Park <th...@gmail.com> on 2017/04/19 12:41:15 UTC

Too many task instances

Hi folks.

I have a question about task instances.

Is it possible to delete old task instances that have run successfully?
Isn't it trying to backfill missing tasks?

I have about 1,500 dags and am getting more dags. There're about 300
thousand of task instances currently. 10,000 tasks instances are made by
every day. It'll use 3.6 million rows of mysql table in an year.

I have concerns about a table which stores task instances that makes large
table which can cause performance degradation.

How can I keep the table which stores task instances not to be bloated?


Thanks,
Yongjun

Re: Too many task instances

Posted by Yongjun Park <th...@gmail.com>.
Not yet. It was just worried about future possibility. Our envrionment is
on AWS so I wanted to keep the database as small as I can.


2017-05-18 8:58 GMT+09:00 George Leslie-Waksman <
george@cloverhealth.com.invalid>:

> We're sitting at over 2.4M task instances in our metadata db without much
> trouble. Have you seen substantial performance degradation or are you just
> worried about the future possibility?
>
> On Wed, Apr 19, 2017 at 12:23 PM Maxime Beauchemin <
> maximebeauchemin@gmail.com> wrote:
>
> > You can archive the `job` and `tasks_instance` table, the scheduler won't
> > try to backfill them as their respective DagRuns are not in a `running`
> > state. The scheduler only tries to schedule active DagRuns, and only
> > creates new [active] DagRuns forward from the latest one.
> >
> > Note that the criteria to archive `task_intance` should be based on
> > `start_date` and not `execution_date` as you don't want the archiving to
> > interfere with backfills or anything ongoing.
> >
> > Max
> >
> > On Wed, Apr 19, 2017 at 5:41 AM, Yongjun Park <th...@gmail.com>
> > wrote:
> >
> > > Hi folks.
> > >
> > > I have a question about task instances.
> > >
> > > Is it possible to delete old task instances that have run successfully?
> > > Isn't it trying to backfill missing tasks?
> > >
> > > I have about 1,500 dags and am getting more dags. There're about 300
> > > thousand of task instances currently. 10,000 tasks instances are made
> by
> > > every day. It'll use 3.6 million rows of mysql table in an year.
> > >
> > > I have concerns about a table which stores task instances that makes
> > large
> > > table which can cause performance degradation.
> > >
> > > How can I keep the table which stores task instances not to be bloated?
> > >
> > >
> > > Thanks,
> > > Yongjun
> > >
> >
>

Re: Too many task instances

Posted by George Leslie-Waksman <ge...@cloverhealth.com.INVALID>.
We're sitting at over 2.4M task instances in our metadata db without much
trouble. Have you seen substantial performance degradation or are you just
worried about the future possibility?

On Wed, Apr 19, 2017 at 12:23 PM Maxime Beauchemin <
maximebeauchemin@gmail.com> wrote:

> You can archive the `job` and `tasks_instance` table, the scheduler won't
> try to backfill them as their respective DagRuns are not in a `running`
> state. The scheduler only tries to schedule active DagRuns, and only
> creates new [active] DagRuns forward from the latest one.
>
> Note that the criteria to archive `task_intance` should be based on
> `start_date` and not `execution_date` as you don't want the archiving to
> interfere with backfills or anything ongoing.
>
> Max
>
> On Wed, Apr 19, 2017 at 5:41 AM, Yongjun Park <th...@gmail.com>
> wrote:
>
> > Hi folks.
> >
> > I have a question about task instances.
> >
> > Is it possible to delete old task instances that have run successfully?
> > Isn't it trying to backfill missing tasks?
> >
> > I have about 1,500 dags and am getting more dags. There're about 300
> > thousand of task instances currently. 10,000 tasks instances are made by
> > every day. It'll use 3.6 million rows of mysql table in an year.
> >
> > I have concerns about a table which stores task instances that makes
> large
> > table which can cause performance degradation.
> >
> > How can I keep the table which stores task instances not to be bloated?
> >
> >
> > Thanks,
> > Yongjun
> >
>

Re: Too many task instances

Posted by Maxime Beauchemin <ma...@gmail.com>.
You can archive the `job` and `tasks_instance` table, the scheduler won't
try to backfill them as their respective DagRuns are not in a `running`
state. The scheduler only tries to schedule active DagRuns, and only
creates new [active] DagRuns forward from the latest one.

Note that the criteria to archive `task_intance` should be based on
`start_date` and not `execution_date` as you don't want the archiving to
interfere with backfills or anything ongoing.

Max

On Wed, Apr 19, 2017 at 5:41 AM, Yongjun Park <th...@gmail.com>
wrote:

> Hi folks.
>
> I have a question about task instances.
>
> Is it possible to delete old task instances that have run successfully?
> Isn't it trying to backfill missing tasks?
>
> I have about 1,500 dags and am getting more dags. There're about 300
> thousand of task instances currently. 10,000 tasks instances are made by
> every day. It'll use 3.6 million rows of mysql table in an year.
>
> I have concerns about a table which stores task instances that makes large
> table which can cause performance degradation.
>
> How can I keep the table which stores task instances not to be bloated?
>
>
> Thanks,
> Yongjun
>