You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Miguel Anzo Palomo <mi...@wizeline.com> on 2021/10/05 19:21:09 UTC

Flaky test in JmsIO help

Hi, I've been working on checking out why is this issue
<https://issues.apache.org/jira/browse/BEAM-8453> happening (flaky test in
JmsIO). The logs in this example
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18732/testReport/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/>
indicate
that the problem is a NullPointerException, specifically in this
receiveNoWait() operation
<https://github.com/apache/beam/blob/9a4cdfba601bae9165928d1a4df8035785b4c871/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/JmsIO.java#L503>
 at this
<https://github.com/apache/beam/blob/9a4cdfba601bae9165928d1a4df8035785b4c871/sdks/java/io/jms/src/test/java/org/apache/beam/sdk/io/jms/JmsIOTest.java#L386>
line
of the test. The only way I see that a NullPointerException is being caused
there, is if the consumer is closed at that point, and the only way I have
been able to reproduce a NullPointerException locally in the lines
mentioned in the log is by closing the consumer before that read operation.

My idea right now is that there could be some intermittency with ActiveMQ
that caused the flaky test at that moment, is there a way to know if that
was the case? Right now I only have two known instances of the flake, this
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18732/> one on
August 23, and another
<https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/4447/> one on
September 9 (that jenkins link is no longer available). From what I’ve
been told it is running on the cloud compute instance
apache-ci-beam-jenkins.

Thanks

-- 

Miguel Angel Anzo Palomo | WIZELINE

Software Engineer

miguel.anzo@wizeline.com

Remote Office

-- 
*This email and its contents (including any attachments) are being sent to
you on the condition of confidentiality and may be protected by legal
privilege. Access to this email by anyone other than the intended recipient
is unauthorized. If you are not the intended recipient, please immediately
notify the sender by replying to this message and delete the material
immediately from your system. Any further use, dissemination, distribution
or reproduction of this email is strictly prohibited. Further, no
representation is made with respect to any content contained in this email.*

Re: Flaky test in JmsIO help

Posted by Miguel Anzo Palomo <mi...@wizeline.com>.

Thanks for the observations, I’m still looking at the issue. As mentioned
by Alexey, yes, the flake happens very rarely, I haven’t found regular
occurrences. I'm looking for the possibility of it being a race condition
issue as mentioned, but the error seems to happen when the first messages
are initially read, before the second thread is started.

On Wed, Oct 6, 2021 at 12:04 PM Alexey Romanenko <ar...@gmail.com>
wrote:

> Looking at this test (“testCheckpointMarkSafety()"), I’m not sure that
> it’s thread-safe to use the same instance of JmsIO.UnboundedJmsReader in
> another thread. Probably, it may cause some race conditions there but seems
> it happens quite rarely.
>
> —
> Alexey
>
>
>
> On 5 Oct 2021, at 21:24, JB Onofré <jb...@nanthrax.net> wrote:
>
> Hi
>
> I will take a look. That’s probably a race condition with the broker
> service.
>
> Regards
> JB
>
> Le 5 oct. 2021 à 21:21, Miguel Anzo Palomo <mi...@wizeline.com> a
> écrit :
>
> 
> Hi, I've been working on checking out why is this issue
> <https://issues.apache.org/jira/browse/BEAM-8453> happening (flaky test
> in JmsIO). The logs in this example
> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18732/testReport/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/> indicate
> that the problem is a NullPointerException, specifically in this
> receiveNoWait() operation
> <https://github.com/apache/beam/blob/9a4cdfba601bae9165928d1a4df8035785b4c871/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/JmsIO.java#L503>
>  at this
> <https://github.com/apache/beam/blob/9a4cdfba601bae9165928d1a4df8035785b4c871/sdks/java/io/jms/src/test/java/org/apache/beam/sdk/io/jms/JmsIOTest.java#L386> line
> of the test. The only way I see that a NullPointerException is being caused
> there, is if the consumer is closed at that point, and the only way I have
> been able to reproduce a NullPointerException locally in the lines
> mentioned in the log is by closing the consumer before that read operation.
>
> My idea right now is that there could be some intermittency with ActiveMQ
> that caused the flaky test at that moment, is there a way to know if that
> was the case? Right now I only have two known instances of the flake, this
> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18732/> one on
> August 23, and another
> <https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/4447/> one on
> September 9 (that jenkins link is no longer available). From what I’ve
> been told it is running on the cloud compute instance
> apache-ci-beam-jenkins.
>
> Thanks
>
> --
> Miguel Angel Anzo Palomo | WIZELINE
> Software Engineer
> miguel.anzo@wizeline.com
> Remote Office
>
>
>
>
>
>
>
>
> *This email and its contents (including any attachments) are being sent
> toyou on the condition of confidentiality and may be protected by
> legalprivilege. Access to this email by anyone other than the intended
> recipientis unauthorized. If you are not the intended recipient, please
> immediatelynotify the sender by replying to this message and delete the
> materialimmediately from your system. Any further use, dissemination,
> distributionor reproduction of this email is strictly prohibited. Further,
> norepresentation is made with respect to any content contained in this
> email.*
>
>
>

-- 

Miguel Angel Anzo Palomo | WIZELINE

Software Engineer

miguel.anzo@wizeline.com

Remote Office

-- 
*This email and its contents (including any attachments) are being sent to
you on the condition of confidentiality and may be protected by legal
privilege. Access to this email by anyone other than the intended recipient
is unauthorized. If you are not the intended recipient, please immediately
notify the sender by replying to this message and delete the material
immediately from your system. Any further use, dissemination, distribution
or reproduction of this email is strictly prohibited. Further, no
representation is made with respect to any content contained in this email.*

Re: Flaky test in JmsIO help

Posted by Alexey Romanenko <ar...@gmail.com>.

Looking at this test (“testCheckpointMarkSafety()"), I’m not sure that it’s thread-safe to use the same instance of JmsIO.UnboundedJmsReader in another thread. Probably, it may cause some race conditions there but seems it happens quite rarely.

—
Alexey



> On 5 Oct 2021, at 21:24, JB Onofré <jb...@nanthrax.net> wrote:
> 
> Hi
> 
> I will take a look. That’s probably a race condition with the broker service. 
> 
> Regards 
> JB
> 
>> Le 5 oct. 2021 à 21:21, Miguel Anzo Palomo <mi...@wizeline.com> a écrit :
>> 
>> 
>> Hi, I've been working on checking out why is this issue <https://issues.apache.org/jira/browse/BEAM-8453> happening (flaky test in JmsIO). The logs in this example <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18732/testReport/org.apache.beam.sdk.io.jms/JmsIOTest/testCheckpointMarkSafety/> indicate that the problem is a NullPointerException, specifically in this receiveNoWait() operation <https://github.com/apache/beam/blob/9a4cdfba601bae9165928d1a4df8035785b4c871/sdks/java/io/jms/src/main/java/org/apache/beam/sdk/io/jms/JmsIO.java#L503> at this <https://github.com/apache/beam/blob/9a4cdfba601bae9165928d1a4df8035785b4c871/sdks/java/io/jms/src/test/java/org/apache/beam/sdk/io/jms/JmsIOTest.java#L386> line of the test. The only way I see that a NullPointerException is being caused there, is if the consumer is closed at that point, and the only way I have been able to reproduce a NullPointerException locally in the lines mentioned in the log is by closing the consumer before that read operation.
>> 
>> My idea right now is that there could be some intermittency with ActiveMQ that caused the flaky test at that moment, is there a way to know if that was the case? Right now I only have two known instances of the flake, this <https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/18732/> one on August 23, and another <https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/4447/> one on September 9 (that jenkins link is no longer available). From what I’ve been told it is running on the cloud compute instance apache-ci-beam-jenkins.
>> 
>> Thanks
>> 
>> -- 
>> Miguel Angel Anzo Palomo | WIZELINE
>> Software Engineer 
>> miguel.anzo@wizeline.com <ma...@wizeline.com>
>> Remote Office
>> 
>> This email and its contents (including any attachments) are being sent to
>> you on the condition of confidentiality and may be protected by legal
>> privilege. Access to this email by anyone other than the intended recipient
>> is unauthorized. If you are not the intended recipient, please immediately
>> notify the sender by replying to this message and delete the material
>> immediately from your system. Any further use, dissemination, distribution
>> or reproduction of this email is strictly prohibited. Further, no
>> representation is made with respect to any content contained in this email.

Re: Flaky test in JmsIO help

Posted by JB Onofré <jb...@nanthrax.net>.

Hi

I will take a look. That’s probably a race condition with the broker service. 

Regards 
JB

> Le 5 oct. 2021 à 21:21, Miguel Anzo Palomo <mi...@wizeline.com> a écrit :
> 
> 
> Hi, I've been working on checking out why is this issue happening (flaky test in JmsIO). The logs in this example indicate that the problem is a NullPointerException, specifically in this receiveNoWait() operation at this line of the test. The only way I see that a NullPointerException is being caused there, is if the consumer is closed at that point, and the only way I have been able to reproduce a NullPointerException locally in the lines mentioned in the log is by closing the consumer before that read operation.
> 
> My idea right now is that there could be some intermittency with ActiveMQ that caused the flaky test at that moment, is there a way to know if that was the case? Right now I only have two known instances of the flake, this one on August 23, and another one on September 9 (that jenkins link is no longer available). From what I’ve been told it is running on the cloud compute instance apache-ci-beam-jenkins.
> 
> Thanks
> 
> -- 
> Miguel Angel Anzo Palomo | WIZELINE
> Software Engineer 
> miguel.anzo@wizeline.com
> Remote Office
> 
> This email and its contents (including any attachments) are being sent to
> you on the condition of confidentiality and may be protected by legal
> privilege. Access to this email by anyone other than the intended recipient
> is unauthorized. If you are not the intended recipient, please immediately
> notify the sender by replying to this message and delete the material
> immediately from your system. Any further use, dissemination, distribution
> or reproduction of this email is strictly prohibited. Further, no
> representation is made with respect to any content contained in this email.