You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/12/12 05:09:36 UTC

[GitHub] [pulsar] codelipenghui opened a new pull request #13245: Make sure the client can get response when encounter producer busy exception.

codelipenghui opened a new pull request #13245:
URL: https://github.com/apache/pulsar/pull/13245


   When the producer with the same producer ID and same connection,
   Of course, this usually doesn't happen. When cherry-picking #12846,
   the test using the same producer ID https://github.com/apache/pulsar/pull/12846/files#diff-b73845cd7706e03b89122a21b3e60c36c95ce6a0c7dd4574f8eda3fc67dd02b1R867-R876
   And after we apply the change https://github.com/apache/pulsar/pull/8685/files#diff-1e0e8195fb5ec5a6d79acbc7d859c025a9b711f94e6ab37c94439e99b3202e84R1161
   Only one request can get a response when encountering exceptions.
   
   The fix is to make sure the broker doesn't miss the response to the client,
   we should make sure one request can get one response, otherwise the client
   side might get blocked.
   
   ### Documentation
   
   Check the box below and label this PR (if you have committer privilege).
   
   Need to update docs? 
   
   - [ ] `doc-required` 
     
     (If you need help on updating docs, create a doc issue)
     
   - [x] `no-need-doc` 
     
     (Please explain why)
     
   - [ ] `doc` 
     
     (If this PR contains doc changes)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #13245: Make sure the client can get response when encounter producer busy exception.

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #13245:
URL: https://github.com/apache/pulsar/pull/13245#issuecomment-991840775


   /pulsarbot run-failure-checks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] wolfstudy edited a comment on pull request #13245: Make sure the client can get response when encounter producer busy exception.

Posted by GitBox <gi...@apache.org>.
wolfstudy edited a comment on pull request #13245:
URL: https://github.com/apache/pulsar/pull/13245#issuecomment-1008922786


   Thanks @codelipenghui 
   This problem is very much like a problem encountered by the Go SDK community recently, and all the phenomena seem to match the description in this pr.
   
   Through tcpdump, it was found that Go SDK sent ProducerCommand to the broker every time it reconnected, but the broker did not reply any Response to the client side, which caused the Go SDK to fail to reconnect until the entire client service was restarted.
   
   The broker version: branch-2.7.2
   go SDK version: new master
   
   And the issue refer to: https://github.com/apache/pulsar-client-go/issues/697


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #13245: Make sure the client can get response when encounter producer busy exception.

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #13245:
URL: https://github.com/apache/pulsar/pull/13245#issuecomment-1002037718


   @merlimat @315157973 @hangc0276 Please help check this one.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] michaeljmarshall edited a comment on pull request #13245: Make sure the client can get response when encounter producer busy exception.

Posted by GitBox <gi...@apache.org>.
michaeljmarshall edited a comment on pull request #13245:
URL: https://github.com/apache/pulsar/pull/13245#issuecomment-991842394


   Note that when I ran the test for branch-2.7, `ServerCnxTest`, I saw the following stack trace:
   
   ```
   [ERROR] Tests run: 43, Failures: 1, Errors: 0, Skipped: 1, Time elapsed: 25.786 s <<< FAILURE! - in org.apache.pulsar.broker.service.ServerCnxTest
   [ERROR] testCreateProducerTimeoutThenCreateSameNamedProducerShouldFail(org.apache.pulsar.broker.service.ServerCnxTest)  Time elapsed: 0.025 s  <<< FAILURE!
   java.lang.AssertionError: expected [4] but found [1]
   	at org.testng.Assert.fail(Assert.java:99)
   	at org.testng.Assert.failNotEquals(Assert.java:1037)
   	at org.testng.Assert.assertEqualsImpl(Assert.java:140)
   	at org.testng.Assert.assertEquals(Assert.java:122)
   	at org.testng.Assert.assertEquals(Assert.java:797)
   	at org.testng.Assert.assertEquals(Assert.java:807)
   	at org.apache.pulsar.broker.service.ServerCnxTest.testCreateProducerTimeoutThenCreateSameNamedProducerShouldFail(ServerCnxTest.java:878)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
   	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
   	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
   	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   ```
   
   This happens because in `branch-2.7`, we respond even if the producer's future was already created. The test I wrote does not expect a response for request 1, but the 2.7 broker is sending a response. Here is the failing line in branch-2.7:
   
   https://github.com/apache/pulsar/blob/4f19b8622d5d0d17973399883e492b839fc30f0a/pulsar-broker/src/test/java/org/apache/pulsar/broker/service/ServerCnxTest.java#L883
   
   Edit: note that the line numbers are slightly off in the stack trace because I resolved cherry-pick conflicts a little differently. Here is the commit I had used to get the above stack trace: https://github.com/michaeljmarshall/pulsar/commit/358da85be37c5241499b4bb4f69631877c3eeb97


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] michaeljmarshall edited a comment on pull request #13245: Make sure the client can get response when encounter producer busy exception.

Posted by GitBox <gi...@apache.org>.
michaeljmarshall edited a comment on pull request #13245:
URL: https://github.com/apache/pulsar/pull/13245#issuecomment-1004371158






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] wolfstudy commented on pull request #13245: Make sure the client can get response when encounter producer busy exception.

Posted by GitBox <gi...@apache.org>.
wolfstudy commented on pull request #13245:
URL: https://github.com/apache/pulsar/pull/13245#issuecomment-1008922786


   Thanks @codelipenghui 
   This problem is very much like a problem encountered by the Go SDK community recently, and all the phenomena seem to match the description in this pr.
   
   Through tcpdump, it was found that Go SDK sent ProducerCommand to the broker every time it reconnected, but the broker did not reply any Response to the client side, which caused the Go SDK to fail to reconnect until the entire client service was restarted.
   
   And the issue refer to: https://github.com/apache/pulsar-client-go/issues/697


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui closed pull request #13245: Make sure the client can get response when encounter producer busy exception.

Posted by GitBox <gi...@apache.org>.
codelipenghui closed pull request #13245:
URL: https://github.com/apache/pulsar/pull/13245


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui closed pull request #13245: Make sure the client can get response when encounter producer busy exception.

Posted by GitBox <gi...@apache.org>.
codelipenghui closed pull request #13245:
URL: https://github.com/apache/pulsar/pull/13245


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] michaeljmarshall commented on pull request #13245: Make sure the client can get response when encounter producer busy exception.

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on pull request #13245:
URL: https://github.com/apache/pulsar/pull/13245#issuecomment-1010693291


   @wolfstudy - note that a 2.7.2 broker has the behavior that @codelipenghui is proposing here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] michaeljmarshall commented on pull request #13245: Make sure the client can get response when encounter producer busy exception.

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on pull request #13245:
URL: https://github.com/apache/pulsar/pull/13245#issuecomment-1004371158


   > Makes sense to me.
   > 
   > @michaeljmarshall ?
   
   @eolivelli - did you see my above comment https://github.com/apache/pulsar/pull/13245#pullrequestreview-829587801? I am opposed to this change because it makes the broker send a response to a client request that is known to be timed out. The broker knows this because the client completed the `producerFuture` by sending a `CloseProducer` command before the producer was created. The net effect of this change is that the client will log a meaningless error message for every timeout out producer.
   
   In my opinion, the core contract in this `ServerCnx` file is that the code that completes the `producerFuture` must also determine whether to send a response to the client. This change removes that contract and technically introduces a chance for the broker to respond to the same client request multiple times.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] michaeljmarshall commented on pull request #13245: Make sure the client can get response when encounter producer busy exception.

Posted by GitBox <gi...@apache.org>.
michaeljmarshall commented on pull request #13245:
URL: https://github.com/apache/pulsar/pull/13245#issuecomment-991842394


   Note that when I ran the test for branch-2.7, `ServerCnxTest`, I saw the following stack trace:
   
   ```
   [ERROR] Tests run: 43, Failures: 1, Errors: 0, Skipped: 1, Time elapsed: 25.786 s <<< FAILURE! - in org.apache.pulsar.broker.service.ServerCnxTest
   [ERROR] testCreateProducerTimeoutThenCreateSameNamedProducerShouldFail(org.apache.pulsar.broker.service.ServerCnxTest)  Time elapsed: 0.025 s  <<< FAILURE!
   java.lang.AssertionError: expected [4] but found [1]
   	at org.testng.Assert.fail(Assert.java:99)
   	at org.testng.Assert.failNotEquals(Assert.java:1037)
   	at org.testng.Assert.assertEqualsImpl(Assert.java:140)
   	at org.testng.Assert.assertEquals(Assert.java:122)
   	at org.testng.Assert.assertEquals(Assert.java:797)
   	at org.testng.Assert.assertEquals(Assert.java:807)
   	at org.apache.pulsar.broker.service.ServerCnxTest.testCreateProducerTimeoutThenCreateSameNamedProducerShouldFail(ServerCnxTest.java:878)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
   	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
   	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
   	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   ```
   
   This happens because in `branch-2.7`, we respond even if the producer's future was already created. The test I wrote does not expect a response for request 1, but the 2.7 broker is sending a response. Here is the failing line in branch-2.7:
   
   https://github.com/apache/pulsar/blob/4f19b8622d5d0d17973399883e492b839fc30f0a/pulsar-broker/src/test/java/org/apache/pulsar/broker/service/ServerCnxTest.java#L883


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] michaeljmarshall removed a comment on pull request #13245: Make sure the client can get response when encounter producer busy exception.

Posted by GitBox <gi...@apache.org>.
michaeljmarshall removed a comment on pull request #13245:
URL: https://github.com/apache/pulsar/pull/13245#issuecomment-991842394


   Note that when I ran the test for branch-2.7, `ServerCnxTest`, I saw the following stack trace:
   
   ```
   [ERROR] Tests run: 43, Failures: 1, Errors: 0, Skipped: 1, Time elapsed: 25.786 s <<< FAILURE! - in org.apache.pulsar.broker.service.ServerCnxTest
   [ERROR] testCreateProducerTimeoutThenCreateSameNamedProducerShouldFail(org.apache.pulsar.broker.service.ServerCnxTest)  Time elapsed: 0.025 s  <<< FAILURE!
   java.lang.AssertionError: expected [4] but found [1]
   	at org.testng.Assert.fail(Assert.java:99)
   	at org.testng.Assert.failNotEquals(Assert.java:1037)
   	at org.testng.Assert.assertEqualsImpl(Assert.java:140)
   	at org.testng.Assert.assertEquals(Assert.java:122)
   	at org.testng.Assert.assertEquals(Assert.java:797)
   	at org.testng.Assert.assertEquals(Assert.java:807)
   	at org.apache.pulsar.broker.service.ServerCnxTest.testCreateProducerTimeoutThenCreateSameNamedProducerShouldFail(ServerCnxTest.java:878)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132)
   	at org.testng.internal.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:45)
   	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:73)
   	at org.testng.internal.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   ```
   
   This happens because in `branch-2.7`, we respond even if the producer's future was already created. The test I wrote does not expect a response for request 1, but the 2.7 broker is sending a response. Here is the failing line in branch-2.7:
   
   https://github.com/apache/pulsar/blob/4f19b8622d5d0d17973399883e492b839fc30f0a/pulsar-broker/src/test/java/org/apache/pulsar/broker/service/ServerCnxTest.java#L883
   
   Edit: note that the line numbers are slightly off in the stack trace because I resolved cherry-pick conflicts a little differently. Here is the commit I had used to get the above stack trace: https://github.com/michaeljmarshall/pulsar/commit/358da85be37c5241499b4bb4f69631877c3eeb97


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org