You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Etienne Chauchot <ec...@apache.org> on 2019/01/21 09:07:52 UTC

[DISCUSSION] UTests and embedded backends

Hi guys,

Lately I have been fixing various Elasticsearch flakiness issues in the UTests by: introducing timeouts, countdown
latches, force refresh, embedded cluster size decrease ...

These flakiness issues are due to the embedded Elasticsearch not coping well with the jenkins overload. Still, IMHO I
believe that having embedded backend for UTests are a lot better than mocks. Even if they are less tolerant to load, I
prefer having UTests 100% representative of real  backend and add countermeasures to protect against jenkins overload.

WDYT ?

Etienne

Re: [DISCUSSION] UTests and embedded backends

Posted by Etienne Chauchot <ec...@apache.org>.

Hi guys,  I just submitted the PR: https://github.com/apache/beam/pull/7751. It contains  refactorings, tests
improvements/fixes and production code fixing.
I wanted to give a little feedback because replacing the mock by a real instance allowed to - improve the tests: fix bad
tests- add missing split test - and more important to discover a bug in the production code of the split and fix it.
=> So I would love if we all agree to avoid mocks when possible.  Of course, as mentioned, some times mocks cannot be
avoided e.g. for hosted backends.
Etienne
Le lundi 28 janvier 2019 à 11:16 +0100, Etienne Chauchot a écrit :
> Guys,
> I will try using mocks where I see it is needed. As there is a current PR opened on Cassandra, I will take this
> opportunity to add the embedded cassandra server (https://github.com/jsevellec/cassandra-unit) to the UTests.Ticket
> was opened while ago: https://issues.apache.org/jira/browse/BEAM-4164
> Etienne
> Le mardi 22 janvier 2019 à 09:26 +0100, Robert Bradshaw a écrit :
> > On Mon, Jan 21, 2019 at 10:42 PM Kenneth Knowles <kl...@google.com> wrote:
> > Robert - you meant this as a mostly-automatic thing that we would engineer, yes?
> > Yes, something like TestPipeline that buffers up the pipelines andthen executes on class teardown (details TBD).
> > A lighter-weight fake, like using something in-process sharing a Java interface (versus today a locally running
> > service sharing an RPC interface) is still much better than a mock.
> > +1
> > 
> > Kenn
> > On Mon, Jan 21, 2019 at 7:17 AM Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
> > Hi,
> > it makes sense to use embedded backend when:
> > 1. it's possible to easily embed the backend2. when the backend is "predictable".
> > If it's easy to embed and the backend behavior is predictable, then itmakes sense.In other cases, we can fallback to
> > mock.
> > RegardsJB
> > On 21/01/2019 10:07, Etienne Chauchot wrote:Hi guys,
> > Lately I have been fixing various Elasticsearch flakiness issues in theUTests by: introducing timeouts, countdown
> > latches, force refresh,embedded cluster size decrease ...
> > These flakiness issues are due to the embedded Elasticsearch not copingwell with the jenkins overload. Still, IMHO I
> > believe that havingembedded backend for UTests are a lot better than mocks. Even if theyare less tolerant to load, I
> > prefer having UTests 100% representative ofreal backend and add countermeasures to protect against jenkins overload.
> > WDYT ?
> > Etienne
> > 
> > 
> > --Jean-Baptiste Onofréjbonofre@apache.orghttp://blog.nanthrax.netTalend - http://www.talend.com

Re: [DISCUSSION] UTests and embedded backends

Posted by Etienne Chauchot <ec...@apache.org>.

Hi Robert,
Yes, this is something I really believe in: test coverage offered by embedded instances are worth some temporary
flakiness (due to resource over consumption).
I also deeply agree with your point on maintenance: some mocks could hide bugs in production code that would cost a lot
in the long term.
Etienne

Le lundi 28 janvier 2019 à 11:44 +0100, Robert Bradshaw a écrit :
> I strongly agree with your original assessment "IMHO I believe thathaving embedded backend for UTests are a lot better
> than mocks." Mocksare sometimes necessary, but in my experience they are often anexpensive (in production and
> maintenance) way to get what amounts tolow true coverage.
> On Mon, Jan 28, 2019 at 11:16 AM Etienne Chauchot <ec...@apache.org> wrote:
> 
> Guys,
> I will try using mocks where I see it is needed. As there is a current PR opened on Cassandra, I will take this
> opportunity to add the embedded cassandra server (https://github.com/jsevellec/cassandra-unit) to the UTests.Ticket
> was opened while ago: https://issues.apache.org/jira/browse/BEAM-4164
> Etienne
> Le mardi 22 janvier 2019 à 09:26 +0100, Robert Bradshaw a écrit :
> On Mon, Jan 21, 2019 at 10:42 PM Kenneth Knowles <kl...@google.com> wrote:
> 
> Robert - you meant this as a mostly-automatic thing that we would engineer, yes?
> 
> Yes, something like TestPipeline that buffers up the pipelines and
> then executes on class teardown (details TBD).
> 
> A lighter-weight fake, like using something in-process sharing a Java interface (versus today a locally running
> service sharing an RPC interface) is still much better than a mock.
> 
> +1
> 
> 
> Kenn
> 
> On Mon, Jan 21, 2019 at 7:17 AM Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
> 
> Hi,
> 
> it makes sense to use embedded backend when:
> 
> 1. it's possible to easily embed the backend
> 2. when the backend is "predictable".
> 
> If it's easy to embed and the backend behavior is predictable, then it
> makes sense.
> In other cases, we can fallback to mock.
> 
> Regards
> JB
> 
> On 21/01/2019 10:07, Etienne Chauchot wrote:
> Hi guys,
> 
> Lately I have been fixing various Elasticsearch flakiness issues in the
> UTests by: introducing timeouts, countdown latches, force refresh,
> embedded cluster size decrease ...
> 
> These flakiness issues are due to the embedded Elasticsearch not coping
> well with the jenkins overload. Still, IMHO I believe that having
> embedded backend for UTests are a lot better than mocks. Even if they
> are less tolerant to load, I prefer having UTests 100% representative of
> real backend and add countermeasures to protect against jenkins overload.
> 
> WDYT ?
> 
> Etienne
> 
> 
> 
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com

Re: [DISCUSSION] UTests and embedded backends

Posted by Robert Bradshaw <ro...@google.com>.

I strongly agree with your original assessment "IMHO I believe that
having embedded backend for UTests are a lot better than mocks." Mocks
are sometimes necessary, but in my experience they are often an
expensive (in production and maintenance) way to get what amounts to
low true coverage.

On Mon, Jan 28, 2019 at 11:16 AM Etienne Chauchot <ec...@apache.org> wrote:
>
> Guys,
>
> I will try using mocks where I see it is needed. As there is a current PR opened on Cassandra, I will take this opportunity to add the embedded cassandra server (https://github.com/jsevellec/cassandra-unit) to the UTests.
> Ticket was opened while ago: https://issues.apache.org/jira/browse/BEAM-4164
>
> Etienne
>
> Le mardi 22 janvier 2019 à 09:26 +0100, Robert Bradshaw a écrit :
>
> On Mon, Jan 21, 2019 at 10:42 PM Kenneth Knowles <kl...@google.com> wrote:
>
>
> Robert - you meant this as a mostly-automatic thing that we would engineer, yes?
>
>
> Yes, something like TestPipeline that buffers up the pipelines and
>
> then executes on class teardown (details TBD).
>
>
> A lighter-weight fake, like using something in-process sharing a Java interface (versus today a locally running service sharing an RPC interface) is still much better than a mock.
>
>
> +1
>
>
>
> Kenn
>
>
> On Mon, Jan 21, 2019 at 7:17 AM Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
>
>
> Hi,
>
>
> it makes sense to use embedded backend when:
>
>
> 1. it's possible to easily embed the backend
>
> 2. when the backend is "predictable".
>
>
> If it's easy to embed and the backend behavior is predictable, then it
>
> makes sense.
>
> In other cases, we can fallback to mock.
>
>
> Regards
>
> JB
>
>
> On 21/01/2019 10:07, Etienne Chauchot wrote:
>
> Hi guys,
>
>
> Lately I have been fixing various Elasticsearch flakiness issues in the
>
> UTests by: introducing timeouts, countdown latches, force refresh,
>
> embedded cluster size decrease ...
>
>
> These flakiness issues are due to the embedded Elasticsearch not coping
>
> well with the jenkins overload. Still, IMHO I believe that having
>
> embedded backend for UTests are a lot better than mocks. Even if they
>
> are less tolerant to load, I prefer having UTests 100% representative of
>
> real backend and add countermeasures to protect against jenkins overload.
>
>
> WDYT ?
>
>
> Etienne
>
>
>
>
> --
>
> Jean-Baptiste Onofré
>
> jbonofre@apache.org
>
> http://blog.nanthrax.net
>
> Talend - http://www.talend.com

Re: [DISCUSSION] UTests and embedded backends

Posted by Etienne Chauchot <ec...@apache.org>.

Guys,
I will try using mocks where I see it is needed. As there is a current PR opened on Cassandra, I will take this
opportunity to add the embedded cassandra server (https://github.com/jsevellec/cassandra-unit) to the UTests.Ticket was
opened while ago: https://issues.apache.org/jira/browse/BEAM-4164 id="-x-evo-selection-start-marker">
Etienne
Le mardi 22 janvier 2019 à 09:26 +0100, Robert Bradshaw a écrit :
> On Mon, Jan 21, 2019 at 10:42 PM Kenneth Knowles <kl...@google.com> wrote:
> 
> Robert - you meant this as a mostly-automatic thing that we would engineer, yes?
> Yes, something like TestPipeline that buffers up the pipelines andthen executes on class teardown (details TBD).
> A lighter-weight fake, like using something in-process sharing a Java interface (versus today a locally running
> service sharing an RPC interface) is still much better than a mock.
> +1
> 
> Kenn
> On Mon, Jan 21, 2019 at 7:17 AM Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
> 
> Hi,
> it makes sense to use embedded backend when:
> 1. it's possible to easily embed the backend2. when the backend is "predictable".
> If it's easy to embed and the backend behavior is predictable, then itmakes sense.In other cases, we can fallback to
> mock.
> RegardsJB
> On 21/01/2019 10:07, Etienne Chauchot wrote:
> Hi guys,
> Lately I have been fixing various Elasticsearch flakiness issues in theUTests by: introducing timeouts, countdown
> latches, force refresh,embedded cluster size decrease ...
> These flakiness issues are due to the embedded Elasticsearch not copingwell with the jenkins overload. Still, IMHO I
> believe that havingembedded backend for UTests are a lot better than mocks. Even if theyare less tolerant to load, I
> prefer having UTests 100% representative ofreal backend and add countermeasures to protect against jenkins overload.
> WDYT ?
> Etienne
> 
> 
> --Jean-Baptiste Onofréjbonofre@apache.orghttp://blog.nanthrax.netTalend - http://www.talend.com

Re: [DISCUSSION] UTests and embedded backends

Posted by Robert Bradshaw <ro...@google.com>.

On Mon, Jan 21, 2019 at 10:42 PM Kenneth Knowles <kl...@google.com> wrote:
>
> Robert - you meant this as a mostly-automatic thing that we would engineer, yes?

Yes, something like TestPipeline that buffers up the pipelines and
then executes on class teardown (details TBD).

> A lighter-weight fake, like using something in-process sharing a Java interface (versus today a locally running service sharing an RPC interface) is still much better than a mock.

+1

>
> Kenn
>
> On Mon, Jan 21, 2019 at 7:17 AM Jean-Baptiste Onofré <jb...@nanthrax.net> wrote:
>>
>> Hi,
>>
>> it makes sense to use embedded backend when:
>>
>> 1. it's possible to easily embed the backend
>> 2. when the backend is "predictable".
>>
>> If it's easy to embed and the backend behavior is predictable, then it
>> makes sense.
>> In other cases, we can fallback to mock.
>>
>> Regards
>> JB
>>
>> On 21/01/2019 10:07, Etienne Chauchot wrote:
>> > Hi guys,
>> >
>> > Lately I have been fixing various Elasticsearch flakiness issues in the
>> > UTests by: introducing timeouts, countdown latches, force refresh,
>> > embedded cluster size decrease ...
>> >
>> > These flakiness issues are due to the embedded Elasticsearch not coping
>> > well with the jenkins overload. Still, IMHO I believe that having
>> > embedded backend for UTests are a lot better than mocks. Even if they
>> > are less tolerant to load, I prefer having UTests 100% representative of
>> > real backend and add countermeasures to protect against jenkins overload.
>> >
>> > WDYT ?
>> >
>> > Etienne
>> >
>> >
>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com

Re: [DISCUSSION] UTests and embedded backends

Posted by Kenneth Knowles <kl...@google.com>.

Robert - you meant this as a mostly-automatic thing that we would engineer,
yes?

A lighter-weight fake, like using something in-process sharing a Java
interface (versus today a locally running service sharing an RPC interface)
is still much better than a mock.

Kenn

On Mon, Jan 21, 2019 at 7:17 AM Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi,
>
> it makes sense to use embedded backend when:
>
> 1. it's possible to easily embed the backend
> 2. when the backend is "predictable".
>
> If it's easy to embed and the backend behavior is predictable, then it
> makes sense.
> In other cases, we can fallback to mock.
>
> Regards
> JB
>
> On 21/01/2019 10:07, Etienne Chauchot wrote:
> > Hi guys,
> >
> > Lately I have been fixing various Elasticsearch flakiness issues in the
> > UTests by: introducing timeouts, countdown latches, force refresh,
> > embedded cluster size decrease ...
> >
> > These flakiness issues are due to the embedded Elasticsearch not coping
> > well with the jenkins overload. Still, IMHO I believe that having
> > embedded backend for UTests are a lot better than mocks. Even if they
> > are less tolerant to load, I prefer having UTests 100% representative of
> > real backend and add countermeasures to protect against jenkins overload.
> >
> > WDYT ?
> >
> > Etienne
> >
> >
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: [DISCUSSION] UTests and embedded backends

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.

Hi,

it makes sense to use embedded backend when:

1. it's possible to easily embed the backend
2. when the backend is "predictable".

If it's easy to embed and the backend behavior is predictable, then it
makes sense.
In other cases, we can fallback to mock.

Regards
JB

On 21/01/2019 10:07, Etienne Chauchot wrote:
> Hi guys,
> 
> Lately I have been fixing various Elasticsearch flakiness issues in the
> UTests by: introducing timeouts, countdown latches, force refresh,
> embedded cluster size decrease ...
> 
> These flakiness issues are due to the embedded Elasticsearch not coping
> well with the jenkins overload. Still, IMHO I believe that having
> embedded backend for UTests are a lot better than mocks. Even if they
> are less tolerant to load, I prefer having UTests 100% representative of
> real backend and add countermeasures to protect against jenkins overload.
> 
> WDYT ?
> 
> Etienne
> 
> 

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: [DISCUSSION] UTests and embedded backends

Posted by Etienne Chauchot <ec...@apache.org>.

Thanks Robert for your answer.Grouping tests is a good idea, thanks for the reminder. I'll use that if new flakiness
show up and if I have no countermeasures left :)
Etienne
Le lundi 21 janvier 2019 à 12:39 +0100, Robert Bradshaw a écrit :
> I am of the same opinion, this is the approach we're taking for Flinkas well. Various mitigations (e.g. capping the
> parallelism at 2 ratherthan the default of num cores) have helped.
> Several times the idea has been proposed to group unit tests togetherfor "expensive" backends. E.g. for self-contained 
> tests one can createa single pipeline that contains all the tests with their asserts, andthen run that once to
> amortize the overhead (which is quitesignificant when you're only manipulating literally bytes of data).Only on
> failure would it exercise them individually (eithersequentially, or via a binary search).
> On Mon, Jan 21, 2019 at 10:07 AM Etienne Chauchot <ec...@apache.org> wrote:
> 
> Hi guys,
> Lately I have been fixing various Elasticsearch flakiness issues in the UTests by: introducing timeouts, countdown
> latches, force refresh, embedded cluster size decrease ...
> These flakiness issues are due to the embedded Elasticsearch not coping well with the jenkins overload. Still, IMHO I
> believe that having embedded backend for UTests are a lot better than mocks. Even if they are less tolerant to load, I
> prefer having UTests 100% representative of real backend and add countermeasures to protect against jenkins overload.
> WDYT ?
> Etienne
>

Re: [DISCUSSION] UTests and embedded backends

Posted by Robert Bradshaw <ro...@google.com>.

I am of the same opinion, this is the approach we're taking for Flink
as well. Various mitigations (e.g. capping the parallelism at 2 rather
than the default of num cores) have helped.

Several times the idea has been proposed to group unit tests together
for "expensive" backends. E.g. for self-contained tests one can create
a single pipeline that contains all the tests with their asserts, and
then run that once to amortize the overhead (which is quite
significant when you're only manipulating literally bytes of data).
Only on failure would it exercise them individually (either
sequentially, or via a binary search).

On Mon, Jan 21, 2019 at 10:07 AM Etienne Chauchot <ec...@apache.org> wrote:
>
> Hi guys,
>
> Lately I have been fixing various Elasticsearch flakiness issues in the UTests by: introducing timeouts, countdown latches, force refresh, embedded cluster size decrease ...
>
> These flakiness issues are due to the embedded Elasticsearch not coping well with the jenkins overload. Still, IMHO I believe that having embedded backend for UTests are a lot better than mocks. Even if they are less tolerant to load, I prefer having UTests 100% representative of real backend and add countermeasures to protect against jenkins overload.
>
> WDYT ?
>
> Etienne
>
>