You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Yongjun Park <th...@gmail.com> on 2017/06/06 07:17:49 UTC

Re: Too many task instances

Not yet. It was just worried about future possibility. Our envrionment is
on AWS so I wanted to keep the database as small as I can.


2017-05-18 8:58 GMT+09:00 George Leslie-Waksman <
george@cloverhealth.com.invalid>:

> We're sitting at over 2.4M task instances in our metadata db without much
> trouble. Have you seen substantial performance degradation or are you just
> worried about the future possibility?
>
> On Wed, Apr 19, 2017 at 12:23 PM Maxime Beauchemin <
> maximebeauchemin@gmail.com> wrote:
>
> > You can archive the `job` and `tasks_instance` table, the scheduler won't
> > try to backfill them as their respective DagRuns are not in a `running`
> > state. The scheduler only tries to schedule active DagRuns, and only
> > creates new [active] DagRuns forward from the latest one.
> >
> > Note that the criteria to archive `task_intance` should be based on
> > `start_date` and not `execution_date` as you don't want the archiving to
> > interfere with backfills or anything ongoing.
> >
> > Max
> >
> > On Wed, Apr 19, 2017 at 5:41 AM, Yongjun Park <th...@gmail.com>
> > wrote:
> >
> > > Hi folks.
> > >
> > > I have a question about task instances.
> > >
> > > Is it possible to delete old task instances that have run successfully?
> > > Isn't it trying to backfill missing tasks?
> > >
> > > I have about 1,500 dags and am getting more dags. There're about 300
> > > thousand of task instances currently. 10,000 tasks instances are made
> by
> > > every day. It'll use 3.6 million rows of mysql table in an year.
> > >
> > > I have concerns about a table which stores task instances that makes
> > large
> > > table which can cause performance degradation.
> > >
> > > How can I keep the table which stores task instances not to be bloated?
> > >
> > >
> > > Thanks,
> > > Yongjun
> > >
> >
>