You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Joan Touzet <wo...@apache.org> on 2019/05/04 04:08:45 UTC

Cross-platform support update and request for advice

Hi again,

With the support of ASF Infra, we now are in a position to run arbitrary 
alternative platforms in our CI builds. This is great news for the 
people who have been waiting for ARM (aarch64), PowerPC/POWER (ppc64le), 
mainframe (s390x), and other architecture supports.

However, we have a challenge. Emulating other architectures is 
necessarily slower than running natively on provided hardware. I ran 
some measurements today, and found that we're not able to pass our test 
suites because tests are timing out.

For example, on native x86_64 hardware, this test finishes in under half 
a second:

     b64url_tests: encode_binary_test...[0.372s] ok

Running the same test on aarch64, being emulated on the same x86_64 machine:

     b64url_tests: encode_binary_test...[4.493 s] ok

or 12x longer. The next test (decode_iolist_test) fails, presumably 
because it hits a 5s timeout period.

I need advice from the list. Should we:

* Increase the test timeouts significantly so these tests can complete
* Restrict ourselves to running only on actual hardware (which limits us
   only to aarch64, and at a stretch, ppc64le)
* Remove test timeouts entirely, rewriting the ones that wait forever to
   do something different
* Something else that I haven't mentioned

Having regression testing on a variety of platforms is something that is 
of benefit to the project; we've ignored it for too long.

-Joan

Re: Cross-platform support update and request for advice

Posted by Joan Touzet <wo...@apache.org>.
Thanks Adam! I especially want to recognize the support of the
Packet.net "Works on ARM" team, who drove the aarch64 CI/build support,
which in turn created the scripting necessary for multi-arch CI runs.

So if I commit a PR that ups all timeouts by 10x, what's the worst that
will happen? Tests that already have a very high timeout may take
significantly longer to timeout. For instance, attachment write timeouts
are currently 10s, which would become 100s.

I guess the worst that can happen is we find some badly written test
cases... ;)

Since this is the easiest path forward, I'll give it a shot and report
back. It may be...a large set of PRs, though, since we're talking about
fixing the timeouts across all the sub-repos, unless I can work out how
to tweak eunit's 5s default timeout at a global level.

-Joan

On 2019-05-09 15:34, Adam Kocoloski wrote:
> This is indeed great news.
> 
> I’m afraid I don’t have a strong opinion on how to make the tests more resilient to running in an emulated environment. Any of your suggestions are acceptable to me, assuming we can keep ppc64le support in scope.
> 
> Adam
> 
>> On May 4, 2019, at 12:08 AM, Joan Touzet <wo...@apache.org> wrote:
>>
>> Hi again,
>>
>> With the support of ASF Infra, we now are in a position to run arbitrary alternative platforms in our CI builds. This is great news for the people who have been waiting for ARM (aarch64), PowerPC/POWER (ppc64le), mainframe (s390x), and other architecture supports.
>>
>> However, we have a challenge. Emulating other architectures is necessarily slower than running natively on provided hardware. I ran some measurements today, and found that we're not able to pass our test suites because tests are timing out.
>>
>> For example, on native x86_64 hardware, this test finishes in under half a second:
>>
>>    b64url_tests: encode_binary_test...[0.372s] ok
>>
>> Running the same test on aarch64, being emulated on the same x86_64 machine:
>>
>>    b64url_tests: encode_binary_test...[4.493 s] ok
>>
>> or 12x longer. The next test (decode_iolist_test) fails, presumably because it hits a 5s timeout period.
>>
>> I need advice from the list. Should we:
>>
>> * Increase the test timeouts significantly so these tests can complete
>> * Restrict ourselves to running only on actual hardware (which limits us
>>  only to aarch64, and at a stretch, ppc64le)
>> * Remove test timeouts entirely, rewriting the ones that wait forever to
>>  do something different
>> * Something else that I haven't mentioned
>>
>> Having regression testing on a variety of platforms is something that is of benefit to the project; we've ignored it for too long.
>>
>> -Joan
> 


Re: Cross-platform support update and request for advice

Posted by Adam Kocoloski <ko...@apache.org>.
This is indeed great news.

I’m afraid I don’t have a strong opinion on how to make the tests more resilient to running in an emulated environment. Any of your suggestions are acceptable to me, assuming we can keep ppc64le support in scope.

Adam

> On May 4, 2019, at 12:08 AM, Joan Touzet <wo...@apache.org> wrote:
> 
> Hi again,
> 
> With the support of ASF Infra, we now are in a position to run arbitrary alternative platforms in our CI builds. This is great news for the people who have been waiting for ARM (aarch64), PowerPC/POWER (ppc64le), mainframe (s390x), and other architecture supports.
> 
> However, we have a challenge. Emulating other architectures is necessarily slower than running natively on provided hardware. I ran some measurements today, and found that we're not able to pass our test suites because tests are timing out.
> 
> For example, on native x86_64 hardware, this test finishes in under half a second:
> 
>    b64url_tests: encode_binary_test...[0.372s] ok
> 
> Running the same test on aarch64, being emulated on the same x86_64 machine:
> 
>    b64url_tests: encode_binary_test...[4.493 s] ok
> 
> or 12x longer. The next test (decode_iolist_test) fails, presumably because it hits a 5s timeout period.
> 
> I need advice from the list. Should we:
> 
> * Increase the test timeouts significantly so these tests can complete
> * Restrict ourselves to running only on actual hardware (which limits us
>  only to aarch64, and at a stretch, ppc64le)
> * Remove test timeouts entirely, rewriting the ones that wait forever to
>  do something different
> * Something else that I haven't mentioned
> 
> Having regression testing on a variety of platforms is something that is of benefit to the project; we've ignored it for too long.
> 
> -Joan