You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/04/08 12:23:58 UTC

[GitHub] [pulsar] abhilashmandaliya opened a new issue #10173: Tests of class NullValueTest are randomly failing

abhilashmandaliya opened a new issue #10173:
URL: https://github.com/apache/pulsar/issues/10173


   **Describe the bug**
   Tests of class org.apache.pulsar.broker.service.NullValueTest are randomly failing. I tried running it from Intellij IDEA. Test output seems nondeterministic. Not sure what is wrong.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   1. Run tests from the IDE. CLI can also reproduce the same behaviour.
   
   **Expected behavior**
   A deterministic success or failure output of the tests.
   
   **Screenshots**
   ![image](https://user-images.githubusercontent.com/11377931/114025824-2e56ff80-9893-11eb-94be-80872e5a1ec1.png)
   ![image](https://user-images.githubusercontent.com/11377931/114025895-43cc2980-9893-11eb-8801-5717ccaf6986.png)
   
   If you see the screenshots, both the test run has different test outputs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] MarvinCai commented on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
MarvinCai commented on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-817250495


   you're right, I see the same behavior, I checked with admin peek command that messages are in correct order in single partition, while some messages somehow arrive at consumer out of the original order.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] linlinnn edited a comment on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
linlinnn edited a comment on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-816263982


   When I change to non partitioned topic, it works well always. Seems something broken the order of the message sequence when consuming partitioned topic even the number of partition is 1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] linlinnn edited a comment on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
linlinnn edited a comment on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-816025121


   The cause of this issue is when the key is null and the route mode is SinglePartition, the partitioned producer will randomly pick one single partition.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] linlinnn commented on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
linlinnn commented on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-816460077


   Emm. I was not able to spot any bug yet :( @codelipenghui @eolivelli Please take a look for this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] linlinnn edited a comment on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
linlinnn edited a comment on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-816263982


   When I change to non partitioned topic, it works well always. Seems something broken the order of the message sequence when consuming partitioned topic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui closed issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
codelipenghui closed issue #10173:
URL: https://github.com/apache/pulsar/issues/10173


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] linlinnn removed a comment on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
linlinnn removed a comment on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-817900005


   I think another cause is that when exist pending receive, we should response the head of incomingMessages queue first instead of returning current message. And then we process the message on the same thread.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] MarvinCai edited a comment on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
MarvinCai edited a comment on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-817875498


   So the future returned by `consumer.receiveAsync()` could be completed by the `pulsar-external-listener`'s thread, causing the following `thenAccept` also executed on `pulsar-external-listener`'s thread, this will happen if `incomingMessages` queue is empty when `consumer.receiveAsync()`  is called.
   I think to fix this issue we can use thenAcceptAsync and provide a target executor for it.
   Might need a second opinion on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] MarvinCai commented on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
MarvinCai commented on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-818317352


   @linlinnn 's response via email
   `I think another cause is that when exist pending receive, we should response the head of incomingMessages queue first instead of returning current message. And then we process the message on the same thread.`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] MarvinCai edited a comment on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
MarvinCai edited a comment on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-817875498


   So the future returned by `consumer.receiveAsync()` could be completed by the `pulsar-external-listener`'s thread, causing the following `thenAccept` also executed on `pulsar-external-listener`'s thread, this will happen if `incomingMessages` queue is empty when `consumer.receiveAsync()`  is called.
   I think to fix this issue we can use thenAcceptAsync and provide a target executor for it.
   Could use a second opinion on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] MarvinCai commented on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
MarvinCai commented on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-817742076


   From what I observed, the message is still in order when it arrives the Consumer but it sometimes gets out of order when it's processed by the MultiTopicConsumer holding the Consumer instance, suspecting the messageReceived callback is executed in different thread causing message processed out of order.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] linlinnn commented on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
linlinnn commented on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-817137335


   @MarvinCai Yes, I agree that. I tried as you said before, but still got wrong order randomly with partitioned topic which have only one partition or use custom routing that just return 0. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] linlinnn removed a comment on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
linlinnn removed a comment on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-816025121


   The cause of this issue is when the key is null and the route mode is SinglePartition, the partitioned producer will randomly pick one single partition.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] MarvinCai commented on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
MarvinCai commented on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-817134399


   I don't think the test should expect fixed order of message anyway, partitioned topic can only guarantee order on single partition right? And for default if key is null round robin routing is used, so I think the test case is just making wrong assumption about message ordering.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] linlinnn commented on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
linlinnn commented on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-816263982


   When I change to non partitioned topic, it works well always. Something broken the order of the message sequence when consuming partitioned topic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] abhilashmandaliya commented on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
abhilashmandaliya commented on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-816339614


   > When I change to non partitioned topic, it works well always. Seems something broken the order of the message sequence when consuming partitioned topic even the number of partition is 1.
   
   Exactly. And this seems a critical issue then.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] MarvinCai commented on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
MarvinCai commented on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-817794781


   From this line of code ([ref](https://github.com/apache/pulsar/blob/master/pulsar-client/src/main/java/org/apache/pulsar/client/impl/MultiTopicsConsumerImpl.java#L261)), the `receiveMessageFromConsumer` should only be executed by thread from `pulsar-client-internal` executor which should guarantee ordering, while I was getting log like:
   `21:00:56.338 [pulsar-external-listener-39-1:org.apache.pulsar.client.impl.MultiTopicsConsumerImpl@272] INFO  org.apache.pulsar.client.impl.MultiTopicsConsumerImpl - [persistent://prop/ns-abc/null-value-test-0][test] Received message from topics-consumer 3:29:0:0`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] linlinnn commented on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
linlinnn commented on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-816025121


   The cause of this issue is when the key is null and the route is SinglePartition, the partitioned producer will randomly pick one single partition.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] linlinnn commented on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
linlinnn commented on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-817900005


   I think another cause is that when exist pending receive, we should response the head of incomingMessages queue first instead of returning current message. And then we process the message on the same thread.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] MarvinCai commented on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
MarvinCai commented on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-817875498


   So the future returned by `consumer.receiveAsync()` could be completed by the `pulsar-external-listener`'s thread, causing the following thenAccept also executed on `pulsar-external-listener`'s thread, this will happen if `incomingMessages` queue is empty when `consumer.receiveAsync()`  is called.
   I think to fix this issue we can use thenAcceptAsync and provide a target executor for it.
   Might need a second opinion on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] MarvinCai edited a comment on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
MarvinCai edited a comment on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-817134399


   I don't think the test should expect fixed order of message anyway, partitioned topic can only guarantee order on single partition right? And for default if key is null round robin routing is used, it seems to me the test case is just making wrong assumption about message ordering.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] linlinnn edited a comment on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
linlinnn edited a comment on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-816460077


   Emm. I was not able to spot any bug yet :( @codelipenghui @eolivelli @lhotari  Please take a look for this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] MarvinCai edited a comment on issue #10173: Tests of class NullValueTest are randomly failing

Posted by GitBox <gi...@apache.org>.
MarvinCai edited a comment on issue #10173:
URL: https://github.com/apache/pulsar/issues/10173#issuecomment-817742076


   From what I observed, the message is still in order when it arrives the Consumer but it sometimes gets out of order when it's processed by the MultiTopicConsumer holding the Consumer instance, suspecting the messageReceived callback is executed in different thread causing message processed out of order.
   Cause I can see log `Received message from topics-consumer .....` coming from both executor `pulsar-external-listener` and `pulsar-client-internal`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org