You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Piotr Nowojski <pn...@apache.org> on 2022/03/31 12:48:03 UTC

Speeding up the test builds

Hi devs,

Recently we were plagued with OOM errors and as a result we decided to
limit the number of testing forks/disable fork reuse for almost all tests.
Currently AFAIK each JVM fork has assigned 4GB of memory (2GB heap and 2GB
off heap) and we are running two forks, on a machine that has 7GB of RAM in
total.

What I would like to discuss is to introduce test categories, for example
small and large tests, where small tests would require less memory and thus
we would be able to increase the number of forks for them. We could
probably achieve this differentiation via different means (annotations,
test naming conventions, etc), but I would propose to just re-use our
pre-existing convention of suffixing more "unity" testing classes with
`Test` suffix, while more "integration like" classes with `ITCase` suffix.
We could keep running `ITCase`'s with two forks 4GB memory each, while for
example unit tests we could run with four forks 2GB memory each.

What do you think?

Best,
Piotrek

Re: Speeding up the test builds

Posted by Piotr Nowojski <pn...@apache.org>.
I have checked and it looks like this change (increasing forkCount) gives a
small improvement in the `test_ci_build core` module of 5-10 minutes. Let's
maybe double check this after FLINK-26995 is merged.

Best,
Piotrek

[1] https://issues.apache.org/jira/browse/FLINK-26995


czw., 31 mar 2022 o 16:29 Chesnay Schepler <ch...@apache.org> napisaƂ(a):

> Ah. I missed one very important line.
>
>  > [unit] tests would require less memory and thus we would be able to
> increase the number of forks for them
>
> That's definitely a good idea, and it should be trivial to implement it.
> It would be interesting to see how much that would give us.
>
> Note that in the long-term we could look deeper parallelizing the tests
> using junit5 (again, what Francesco already started in the table tests),
> removing the need for forks entirely.
>
> On 31/03/2022 15:40, Chesnay Schepler wrote:
> > > we decided to limit the number of testing forks/disable fork reuse
> > for almost all tests
> >
> > IT cases were generally not reusing forks for as long as I can
> > remember. A few modules opted into fork-reuse; most didn't.
> > The Table API recently enabled fork-reuse, started causing OOMs due to
> > memory leaks, which have since been fixed and fork-reuse has been
> > enabled again.
> >
> > > machine that has 7GB of RAM in total
> >
> > This is indeed the case for the azure-provided machines (used for e2e
> > tests and contributor builds).
> >
> > The machines we use for apache/flink / PRs have 64GB ram in total,
> > spread over 7 agents, so about ~8 per agent and roughly equivalent to
> > the azure machines if you account for some overhead.
> > I'm not aware of any explicit assignment of off-heap memory.
> > This has worked fine in the past, and anytime we ran into OOM errors
> > we were eventually able to pinpoint the cause to some memory leak in
> > Flink.
> > I thus assume that the OOMs we've recently seen are again due to some
> > new leak.
> >
> >
> > Overall I appreciate the sentiment of using less resources for unit
> > tests, but I don't see how it would really address the issue.
> > You still have the potential for all agents running ITCases at the
> > same time, leaving us in the status-quo.
> > It could reduce the likelihood of OOMs, but so long as leaks are there
> > it is inevitable that we hit one.
> > (And as mush as I hate it, if there is a leak I'd kinda prefer that to
> > be more visible, as it seems the only way to get people to look into
> > leaks at all is getting into a situation where everyone is seriously
> > annoyed.)
> >
> > Instead we could try decreasing the general memory usage slightly
> > (even 256mb less would give us a nice buffer on the alibaba machines),
> > and I would hazard a guess that it'd barely reduce testing times
> > (since most ITCases shouldn't hit that limit anyway).
> >
> > I do agree that in the long-term we should strive towards re-using
> > forks, and I know that Francesco also looked into executing test cases
> > in parallel using JUnit5 to speed things up.
> > But there are a number of classes that make this quite difficult;
> > singletons like FileSystem/GlobalConfiguration, anything using static
> > stuff like context environments (not sure if this still applies),
> > legacy testing code, class-loading bugs in libraries etc etc.
> >
> > I would actually split ITCases into 2 categories depending on whether
> > they work with fork-reuse.
> >
> > On 31/03/2022 14:48, Piotr Nowojski wrote:
> >> Hi devs,
> >>
> >> Recently we were plagued with OOM errors and as a result we decided to
> >> limit the number of testing forks/disable fork reuse for almost all
> >> tests.
> >> Currently AFAIK each JVM fork has assigned 4GB of memory (2GB heap
> >> and 2GB
> >> off heap) and we are running two forks, on a machine that has 7GB of
> >> RAM in
> >> total.
> >>
> >> What I would like to discuss is to introduce test categories, for
> >> example
> >> small and large tests, where small tests would require less memory
> >> and thus
> >> we would be able to increase the number of forks for them. We could
> >> probably achieve this differentiation via different means (annotations,
> >> test naming conventions, etc), but I would propose to just re-use our
> >> pre-existing convention of suffixing more "unity" testing classes with
> >> `Test` suffix, while more "integration like" classes with `ITCase`
> >> suffix.
> >> We could keep running `ITCase`'s with two forks 4GB memory each,
> >> while for
> >> example unit tests we could run with four forks 2GB memory each.
> >>
> >> What do you think?
> >>
> >> Best,
> >> Piotrek
> >>
> >
>
>

Re: Speeding up the test builds

Posted by Chesnay Schepler <ch...@apache.org>.
Ah. I missed one very important line.

 > [unit] tests would require less memory and thus we would be able to 
increase the number of forks for them

That's definitely a good idea, and it should be trivial to implement it. 
It would be interesting to see how much that would give us.

Note that in the long-term we could look deeper parallelizing the tests 
using junit5 (again, what Francesco already started in the table tests), 
removing the need for forks entirely.

On 31/03/2022 15:40, Chesnay Schepler wrote:
> > we decided to limit the number of testing forks/disable fork reuse 
> for almost all tests
>
> IT cases were generally not reusing forks for as long as I can 
> remember. A few modules opted into fork-reuse; most didn't.
> The Table API recently enabled fork-reuse, started causing OOMs due to 
> memory leaks, which have since been fixed and fork-reuse has been 
> enabled again.
>
> > machine that has 7GB of RAM in total
>
> This is indeed the case for the azure-provided machines (used for e2e 
> tests and contributor builds).
>
> The machines we use for apache/flink / PRs have 64GB ram in total, 
> spread over 7 agents, so about ~8 per agent and roughly equivalent to 
> the azure machines if you account for some overhead.
> I'm not aware of any explicit assignment of off-heap memory.
> This has worked fine in the past, and anytime we ran into OOM errors 
> we were eventually able to pinpoint the cause to some memory leak in 
> Flink.
> I thus assume that the OOMs we've recently seen are again due to some 
> new leak.
>
>
> Overall I appreciate the sentiment of using less resources for unit 
> tests, but I don't see how it would really address the issue.
> You still have the potential for all agents running ITCases at the 
> same time, leaving us in the status-quo.
> It could reduce the likelihood of OOMs, but so long as leaks are there 
> it is inevitable that we hit one.
> (And as mush as I hate it, if there is a leak I'd kinda prefer that to 
> be more visible, as it seems the only way to get people to look into 
> leaks at all is getting into a situation where everyone is seriously 
> annoyed.)
>
> Instead we could try decreasing the general memory usage slightly 
> (even 256mb less would give us a nice buffer on the alibaba machines),
> and I would hazard a guess that it'd barely reduce testing times 
> (since most ITCases shouldn't hit that limit anyway).
>
> I do agree that in the long-term we should strive towards re-using 
> forks, and I know that Francesco also looked into executing test cases 
> in parallel using JUnit5 to speed things up.
> But there are a number of classes that make this quite difficult; 
> singletons like FileSystem/GlobalConfiguration, anything using static 
> stuff like context environments (not sure if this still applies), 
> legacy testing code, class-loading bugs in libraries etc etc.
>
> I would actually split ITCases into 2 categories depending on whether 
> they work with fork-reuse.
>
> On 31/03/2022 14:48, Piotr Nowojski wrote:
>> Hi devs,
>>
>> Recently we were plagued with OOM errors and as a result we decided to
>> limit the number of testing forks/disable fork reuse for almost all 
>> tests.
>> Currently AFAIK each JVM fork has assigned 4GB of memory (2GB heap 
>> and 2GB
>> off heap) and we are running two forks, on a machine that has 7GB of 
>> RAM in
>> total.
>>
>> What I would like to discuss is to introduce test categories, for 
>> example
>> small and large tests, where small tests would require less memory 
>> and thus
>> we would be able to increase the number of forks for them. We could
>> probably achieve this differentiation via different means (annotations,
>> test naming conventions, etc), but I would propose to just re-use our
>> pre-existing convention of suffixing more "unity" testing classes with
>> `Test` suffix, while more "integration like" classes with `ITCase` 
>> suffix.
>> We could keep running `ITCase`'s with two forks 4GB memory each, 
>> while for
>> example unit tests we could run with four forks 2GB memory each.
>>
>> What do you think?
>>
>> Best,
>> Piotrek
>>
>


Re: Speeding up the test builds

Posted by Chesnay Schepler <ch...@apache.org>.
 > we decided to limit the number of testing forks/disable fork reuse 
for almost all tests

IT cases were generally not reusing forks for as long as I can remember. 
A few modules opted into fork-reuse; most didn't.
The Table API recently enabled fork-reuse, started causing OOMs due to 
memory leaks, which have since been fixed and fork-reuse has been 
enabled again.

 > machine that has 7GB of RAM in total

This is indeed the case for the azure-provided machines (used for e2e 
tests and contributor builds).

The machines we use for apache/flink / PRs have 64GB ram in total, 
spread over 7 agents, so about ~8 per agent and roughly equivalent to 
the azure machines if you account for some overhead.
I'm not aware of any explicit assignment of off-heap memory.
This has worked fine in the past, and anytime we ran into OOM errors we 
were eventually able to pinpoint the cause to some memory leak in Flink.
I thus assume that the OOMs we've recently seen are again due to some 
new leak.


Overall I appreciate the sentiment of using less resources for unit 
tests, but I don't see how it would really address the issue.
You still have the potential for all agents running ITCases at the same 
time, leaving us in the status-quo.
It could reduce the likelihood of OOMs, but so long as leaks are there 
it is inevitable that we hit one.
(And as mush as I hate it, if there is a leak I'd kinda prefer that to 
be more visible, as it seems the only way to get people to look into 
leaks at all is getting into a situation where everyone is seriously 
annoyed.)

Instead we could try decreasing the general memory usage slightly (even 
256mb less would give us a nice buffer on the alibaba machines),
and I would hazard a guess that it'd barely reduce testing times (since 
most ITCases shouldn't hit that limit anyway).

I do agree that in the long-term we should strive towards re-using 
forks, and I know that Francesco also looked into executing test cases 
in parallel using JUnit5 to speed things up.
But there are a number of classes that make this quite difficult; 
singletons like FileSystem/GlobalConfiguration, anything using static 
stuff like context environments (not sure if this still applies), legacy 
testing code, class-loading bugs in libraries etc etc.

I would actually split ITCases into 2 categories depending on whether 
they work with fork-reuse.

On 31/03/2022 14:48, Piotr Nowojski wrote:
> Hi devs,
>
> Recently we were plagued with OOM errors and as a result we decided to
> limit the number of testing forks/disable fork reuse for almost all tests.
> Currently AFAIK each JVM fork has assigned 4GB of memory (2GB heap and 2GB
> off heap) and we are running two forks, on a machine that has 7GB of RAM in
> total.
>
> What I would like to discuss is to introduce test categories, for example
> small and large tests, where small tests would require less memory and thus
> we would be able to increase the number of forks for them. We could
> probably achieve this differentiation via different means (annotations,
> test naming conventions, etc), but I would propose to just re-use our
> pre-existing convention of suffixing more "unity" testing classes with
> `Test` suffix, while more "integration like" classes with `ITCase` suffix.
> We could keep running `ITCase`'s with two forks 4GB memory each, while for
> example unit tests we could run with four forks 2GB memory each.
>
> What do you think?
>
> Best,
> Piotrek
>