You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Bolke de Bruin <bd...@gmail.com> on 2017/02/25 16:58:19 UTC

Cutting down on testing time

Hi All,

Jeremiah and I have been looking into optimising the time that is spend on tests. The reason for this was that Travis’ runs are taking more and more time and we are being throttled by travis. As part of that we enabled color coding of test outcomes and timing of tests. The results kind of …surprising.

This is the top 20 of tests were we spend the most time. MySQL (remember concurrent access enabled) - https://s3.amazonaws.com/archive.travis-ci.org/jobs/205277617/log.txt:

tests.BackfillJobTest.test_backfill_examples: 287.9209s
tests.BackfillJobTest.test_backfill_multi_dates: 53.5198s
tests.SchedulerJobTest.test_scheduler_start_date: 36.4935s
tests.CoreTest.test_scheduler_job: 35.5852s
tests.CliTests.test_backfill: 29.7484s
tests.SchedulerJobTest.test_scheduler_multiprocessing: 26.1573s
tests.DaskExecutorTest.test_backfill_integration: 24.5456s
tests.CoreTest.test_schedule_dag_no_end_date_up_to_today_only: 17.3278s
tests.SubDagOperatorTests.test_subdag_deadlock: 16.1957s
tests.SensorTimeoutTest.test_timeout: 15.1000s
tests.SchedulerJobTest.test_dagrun_deadlock_ignore_depends_on_past: 13.8812s
tests.BackfillJobTest.test_cli_backfill_depends_on_past: 12.9539s
tests.SchedulerJobTest.test_dagrun_deadlock_ignore_depends_on_past_advance_ex_date: 12.8779s
tests.SchedulerJobTest.test_dagrun_success: 12.8177s
tests.SchedulerJobTest.test_dagrun_root_fail: 10.3953s
tests.SchedulerJobTest.test_dag_with_system_exit: 10.1132s
tests.TransferTests.test_mysql_to_hive: 8.5939s
tests.SchedulerJobTest.test_retry_still_in_executor: 8.1739s
tests.SchedulerJobTest.test_dagrun_fail: 7.9855s
tests.ImpersonationTest.test_default_impersonation: 7.4993s

Yes we spend a whopping 5 minutes on executing all examples. Another interesting one is “tests.CoreTest.test_scheduler_job”. This test just checks whether a certain directories are creating as part of logging. This could have been covered by a real unit test just covering the functionality of the function that creates the files - now it takes 35s. 

We discussed several strategies for reducing time apart from rewriting some of the tests (that would be a herculean job!). What the most optimal seems is:

1. Run the scheduler tests apart from all other tests. 
2. Run “operator” integration tests in their own unit.
3. Run UI tests separate
4. Run API tests separate

This creates the following build matrix (warning ASCII art):

——————————————————————————————————————
| 			|  Scheduler	 | 	Operators	|	UI	|	API	| 
——————————————————————————————————————
| Python 2	| x			 |.     x			|	x	|	x	|
——————————————————————————————————————
| Python 3	| x			 |	x			|	x	|	x	|
——————————————————————————————————————
| Kerberos	| 			 |				|	x	|	x	|
——————————————————————————————————————
| Ldap		|			 |				|	x	|		|
——————————————————————————————————————
| Hive		| 			 |	x			|	x	|	x	|
——————————————————————————————————————
| SSH		|			 |	x			|		|		|
——————————————————————————————————————
| Postgres	| x			 |	x			|	x	|	x	|
——————————————————————————————————————
| MySQL		| x 			 |	x			|	x	|	x	|
——————————————————————————————————————
| SQLite		| x			 |	x			|	x	|	x	|
——————————————————————————————————————


So from this build matrix one can deduct that Postgres, MySQL are generic services that will be present in every build. In addition all builds will use Python 2 and Python 3. And I propose using Python 3.4 and Python 3.5.


Furthermore, I would like us to label our tests correctly, e.g. unit test or integration test. 

Re: Cutting down on testing time

Posted by Jeremiah Lowin <jl...@apache.org>.
(I am far from an expert in nose but) I tried running nose in parallel
simply by passing the --processes flag (
http://nose.readthedocs.io/en/latest/doc_tests/test_multiprocess/multiprocess.html
).

The SQLite envs ran about 2-3 minutes quicker than normal. All other envs
deadlocked and timed out. I suspect it's because Travis only provides open
source projects with two cores but I'm not sure.

On Mon, Feb 27, 2017 at 1:55 PM Dan Davydov <da...@airbnb.com.invalid>
wrote:

> This looks like a great effort to me at least in the short term (in the
> long term I think most of the integration tests should be run together if
> the infra allows this). Another thing we could start looking into is
> parallelizing tests (though this may require beefier machines from Travis).
>
> On Sat, Feb 25, 2017 at 8:58 AM, Bolke de Bruin <bd...@gmail.com> wrote:
>
> > Hi All,
> >
> > Jeremiah and I have been looking into optimising the time that is spend
> on
> > tests. The reason for this was that Travis’ runs are taking more and more
> > time and we are being throttled by travis. As part of that we enabled
> color
> > coding of test outcomes and timing of tests. The results kind of
> > …surprising.
> >
> > This is the top 20 of tests were we spend the most time. MySQL (remember
> > concurrent access enabled) - https://s3.amazonaws.com/
> > archive.travis-ci.org/jobs/205277617/log.txt:
> >
> > tests.BackfillJobTest.test_backfill_examples:  287.9209s
> > tests.BackfillJobTest.test_backfill_multi_dates:  53.5198s
> > tests.SchedulerJobTest.test_scheduler_start_date:  36.4935s
> > tests.CoreTest.test_scheduler_job:  35.5852s
> > tests.CliTests.test_backfill:  29.7484s
> > tests.SchedulerJobTest.test_scheduler_multiprocessing:  26.1573s
> > tests.DaskExecutorTest.test_backfill_integration:  24.5456s
> > tests.CoreTest.test_schedule_dag_no_end_date_up_to_today_only:  17.3278s
> > tests.SubDagOperatorTests.test_subdag_deadlock:  16.1957s
> > tests.SensorTimeoutTest.test_timeout:  15.1000s
> > tests.SchedulerJobTest.test_dagrun_deadlock_ignore_depends_on_past:
> > 13.8812s
> > tests.BackfillJobTest.test_cli_backfill_depends_on_past:  12.9539s
> > tests.SchedulerJobTest.test_dagrun_deadlock_ignore_
> > depends_on_past_advance_ex_date:  12.8779s
> > tests.SchedulerJobTest.test_dagrun_success:  12.8177s
> > tests.SchedulerJobTest.test_dagrun_root_fail:  10.3953s
> > tests.SchedulerJobTest.test_dag_with_system_exit:  10.1132s
> > tests.TransferTests.test_mysql_to_hive:  8.5939s
> > tests.SchedulerJobTest.test_retry_still_in_executor:  8.1739s
> > tests.SchedulerJobTest.test_dagrun_fail:  7.9855s
> > tests.ImpersonationTest.test_default_impersonation:  7.4993s
> >
> > Yes we spend a whopping 5 minutes on executing all examples. Another
> > interesting one is “tests.CoreTest.test_scheduler_job”. This test just
> > checks whether a certain directories are creating as part of logging.
> This
> > could have been covered by a real unit test just covering the
> functionality
> > of the function that creates the files - now it takes 35s.
> >
> > We discussed several strategies for reducing time apart from rewriting
> > some of the tests (that would be a herculean job!). What the most optimal
> > seems is:
> >
> > 1. Run the scheduler tests apart from all other tests.
> > 2. Run “operator” integration tests in their own unit.
> > 3. Run UI tests separate
> > 4. Run API tests separate
> >
> > This creates the following build matrix (warning ASCII art):
> >
> > ——————————————————————————————————————
> > |                       |  Scheduler     |      Operators       |
> >  UI      |       API     |
> > ——————————————————————————————————————
> > | Python 2      | x                      |.     x                       |
> >      x       |       x       |
> > ——————————————————————————————————————
> > | Python 3      | x                      |      x                       |
> >      x       |       x       |
> > ——————————————————————————————————————
> > | Kerberos      |                        |                              |
> >      x       |       x       |
> > ——————————————————————————————————————
> > | Ldap          |                        |                              |
> >      x       |               |
> > ——————————————————————————————————————
> > | Hive          |                        |      x                       |
> >      x       |       x       |
> > ——————————————————————————————————————
> > | SSH           |                        |      x                       |
> >              |               |
> > ——————————————————————————————————————
> > | Postgres      | x                      |      x                       |
> >      x       |       x       |
> > ——————————————————————————————————————
> > | MySQL         | x                      |      x                       |
> >      x       |       x       |
> > ——————————————————————————————————————
> > | SQLite                | x                      |      x
> >      |       x       |       x       |
> > ——————————————————————————————————————
> >
> >
> > So from this build matrix one can deduct that Postgres, MySQL are generic
> > services that will be present in every build. In addition all builds will
> > use Python 2 and Python 3. And I propose using Python 3.4 and Python 3.5.
> >
> >
> > Furthermore, I would like us to label our tests correctly, e.g. unit test
> > or integration test.
>

Re: Cutting down on testing time

Posted by Dan Davydov <da...@airbnb.com.INVALID>.
This looks like a great effort to me at least in the short term (in the
long term I think most of the integration tests should be run together if
the infra allows this). Another thing we could start looking into is
parallelizing tests (though this may require beefier machines from Travis).

On Sat, Feb 25, 2017 at 8:58 AM, Bolke de Bruin <bd...@gmail.com> wrote:

> Hi All,
>
> Jeremiah and I have been looking into optimising the time that is spend on
> tests. The reason for this was that Travis’ runs are taking more and more
> time and we are being throttled by travis. As part of that we enabled color
> coding of test outcomes and timing of tests. The results kind of
> …surprising.
>
> This is the top 20 of tests were we spend the most time. MySQL (remember
> concurrent access enabled) - https://s3.amazonaws.com/
> archive.travis-ci.org/jobs/205277617/log.txt:
>
> tests.BackfillJobTest.test_backfill_examples:  287.9209s
> tests.BackfillJobTest.test_backfill_multi_dates:  53.5198s
> tests.SchedulerJobTest.test_scheduler_start_date:  36.4935s
> tests.CoreTest.test_scheduler_job:  35.5852s
> tests.CliTests.test_backfill:  29.7484s
> tests.SchedulerJobTest.test_scheduler_multiprocessing:  26.1573s
> tests.DaskExecutorTest.test_backfill_integration:  24.5456s
> tests.CoreTest.test_schedule_dag_no_end_date_up_to_today_only:  17.3278s
> tests.SubDagOperatorTests.test_subdag_deadlock:  16.1957s
> tests.SensorTimeoutTest.test_timeout:  15.1000s
> tests.SchedulerJobTest.test_dagrun_deadlock_ignore_depends_on_past:
> 13.8812s
> tests.BackfillJobTest.test_cli_backfill_depends_on_past:  12.9539s
> tests.SchedulerJobTest.test_dagrun_deadlock_ignore_
> depends_on_past_advance_ex_date:  12.8779s
> tests.SchedulerJobTest.test_dagrun_success:  12.8177s
> tests.SchedulerJobTest.test_dagrun_root_fail:  10.3953s
> tests.SchedulerJobTest.test_dag_with_system_exit:  10.1132s
> tests.TransferTests.test_mysql_to_hive:  8.5939s
> tests.SchedulerJobTest.test_retry_still_in_executor:  8.1739s
> tests.SchedulerJobTest.test_dagrun_fail:  7.9855s
> tests.ImpersonationTest.test_default_impersonation:  7.4993s
>
> Yes we spend a whopping 5 minutes on executing all examples. Another
> interesting one is “tests.CoreTest.test_scheduler_job”. This test just
> checks whether a certain directories are creating as part of logging. This
> could have been covered by a real unit test just covering the functionality
> of the function that creates the files - now it takes 35s.
>
> We discussed several strategies for reducing time apart from rewriting
> some of the tests (that would be a herculean job!). What the most optimal
> seems is:
>
> 1. Run the scheduler tests apart from all other tests.
> 2. Run “operator” integration tests in their own unit.
> 3. Run UI tests separate
> 4. Run API tests separate
>
> This creates the following build matrix (warning ASCII art):
>
> ——————————————————————————————————————
> |                       |  Scheduler     |      Operators       |
>  UI      |       API     |
> ——————————————————————————————————————
> | Python 2      | x                      |.     x                       |
>      x       |       x       |
> ——————————————————————————————————————
> | Python 3      | x                      |      x                       |
>      x       |       x       |
> ——————————————————————————————————————
> | Kerberos      |                        |                              |
>      x       |       x       |
> ——————————————————————————————————————
> | Ldap          |                        |                              |
>      x       |               |
> ——————————————————————————————————————
> | Hive          |                        |      x                       |
>      x       |       x       |
> ——————————————————————————————————————
> | SSH           |                        |      x                       |
>              |               |
> ——————————————————————————————————————
> | Postgres      | x                      |      x                       |
>      x       |       x       |
> ——————————————————————————————————————
> | MySQL         | x                      |      x                       |
>      x       |       x       |
> ——————————————————————————————————————
> | SQLite                | x                      |      x
>      |       x       |       x       |
> ——————————————————————————————————————
>
>
> So from this build matrix one can deduct that Postgres, MySQL are generic
> services that will be present in every build. In addition all builds will
> use Python 2 and Python 3. And I propose using Python 3.4 and Python 3.5.
>
>
> Furthermore, I would like us to label our tests correctly, e.g. unit test
> or integration test.