You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/04/27 14:58:19 UTC

[GitHub] [pulsar] LiPengze97 opened a new issue #10406: Message loss in non-persistent geo-replication at high sending throughput

LiPengze97 opened a new issue #10406:
URL: https://github.com/apache/pulsar/issues/10406


   **Describe the bug**
   I set up five single-broker Pulsar clusters in five regions, Clemson, Wisconsin, Massachusetts, and two in Utah (Utah1, Utah2). The publisher is in Utah1, and the subscribers are in each of the rest clusters.
   For my experiment purpose, I prefer **non-persistent** topics.
   I want to use the geo-replication of Pulsar, and I found that when the sending throughput is high (like 1k message/sec, message size 10k, totally 10k message), only the subscriber in Utah2, which has the highest bandwidth from the Utah1, can receive all the published messages (not always, some times it also receives only a part). For the rest of the subscribers in other clusters, they are only able to receive part of the messages.
   I use Java client at first. I got the bug and I thought it was my code's fault. So I use the perf tool, and the problem is still there.(see picture below, the records received are all not 10000, but the producer says 
   `08:25:20.996 [Thread-1] INFO  org.apache.pulsar.testclient.PerformanceProducer - Aggregated throughput stats --- 10000 records sent --- 639.113 msg/s --- 49.931 Mbit/s`)
   
   I set the parameters below in broker.conf to 10000(higher or equal to the total number of message than I want to publish)
   ```
   maxConcurrentNonPersistentMessagePerConnection
   
   replicationProducerQueueSize
   
   maxPendingPublishdRequestsPerConnection
   ```
   and I also set the `receiverQueueSize(10000)` in my Consumer client code.
   
   I think this is not a bug, but a feature, because the sending bandwidth is much higher than the bandwidth on WAN. In [Issue 451](https://github.com/apache/pulsar/issues/451), there seems some discussion about the sending rate of the publisher. But I don't know what sending rate is reasonable under the WAN environment. Could you offer me some suggestions on what the sending bandwidth should be?
   
   
   The network information is:
   
   |       | Utah2      | Wisconsin | Clemson  | MIT     |
   | ----- | ---------- | --------- | -------- | ------- |
   | Utah1 | 1174.74125 | 3.2375    | 52.99125 | 16.5625 |
   
   bandwidth(Mbyte/s)
   
   |       | Utah2 | Wisconsin | Clemson | MIT    |
   | ----- | ----- | --------- | ------- | ------ |
   | Utah1 | 0.061 | 35.612    | 50.918  | 48.083 |
   
   latency(ms)
   
   **To Reproduce**
   Steps to reproduce the behavior:
   1. Go to root path of the pulsar
   2. run `bin/pulsar-perf produce non-persistent://my-tenant/my-namespace/wa -bm 0 -s 10240 -r 1000 -m 10000` at publisher
   3. run `bin/pulsar-perf consume non-persistent://my-tenant/my-namespace/wa` at subscriber
   4. See the output of the consumer, and the "bug" can be seen.
   
   **Expected behavior**
   All the non-persistent messages are received correctly.
   
   **Screenshots**
   ![image](https://user-images.githubusercontent.com/18279506/116259418-5d80d280-a7a8-11eb-8e18-2547f4cdc5d3.png)
   
   **Desktop (please complete the following information):**
    - OS: [e.g. iOS]
   Ubuntu 18.04
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] rdhabalia commented on issue #10406: Message loss in non-persistent geo-replication at high sending throughput

Posted by GitBox <gi...@apache.org>.
rdhabalia commented on issue #10406:
URL: https://github.com/apache/pulsar/issues/10406#issuecomment-828894619


   yes, non-persistent topic doesn't persist messages so, it doesn't give message delivery guarantee. it's used for usecases which only care about the latest data and don't care about past data for example live video streaming. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] wmccarley commented on issue #10406: Message loss in non-persistent geo-replication at high sending throughput

Posted by GitBox <gi...@apache.org>.
wmccarley commented on issue #10406:
URL: https://github.com/apache/pulsar/issues/10406#issuecomment-829291112


   We use a lot of high-throughput non-persistent topics on our system for streaming telemetry, it's useful for use cases where the overhead of tracking message acknowledgement and redelivery is just not worth it.
   
   1. If you can spare the CPU cycles compressing the data will dramatically reduce the bits on the wire (obviously that's the point) which may be useful if the latency is high.
   2. You mention using Ubuntu... Most linux distributions have absolutely terrible default TCP settings and you have to change them if you want to get reasonable high-throughput performance. There are dozens of kernel settings but the ones I always end up changing are the rx/txqueuelen, net.core.rmem_max & net.core.wmem_max, net.ipv4.tcp_rmem & net.ipv4.tcp_wmem, ec..
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on issue #10406: Message loss in non-persistent geo-replication at high sending throughput

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on issue #10406:
URL: https://github.com/apache/pulsar/issues/10406#issuecomment-1058890985


   The issue had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] LiPengze97 commented on issue #10406: Message loss in non-persistent geo-replication at high sending throughput

Posted by GitBox <gi...@apache.org>.
LiPengze97 commented on issue #10406:
URL: https://github.com/apache/pulsar/issues/10406#issuecomment-829311445


   > yes, non-persistent topic doesn't persist messages so, it doesn't give message delivery guarantee. it's used for usecases which only care about the latest data and don't care about past data for example live video streaming.
   
   Get confirmed from the author of the pulsar is indeed useful, thanks a lot!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] LiPengze97 commented on issue #10406: Message loss in non-persistent geo-replication at high sending throughput

Posted by GitBox <gi...@apache.org>.
LiPengze97 commented on issue #10406:
URL: https://github.com/apache/pulsar/issues/10406#issuecomment-829310879


   > We use a lot of high-throughput non-persistent topics on our system for streaming telemetry, it's useful for use cases where the overhead of tracking message acknowledgement and redelivery is just not worth it.
   > 
   > 1. If you can spare the CPU cycles compressing the data will dramatically reduce the bits on the wire (obviously that's the point) which may be useful if the latency is high.
   > 2. You mention using Ubuntu... Most linux distributions have absolutely terrible default TCP settings and you have to change them if you want to get reasonable high-throughput performance. There are dozens of kernel settings but the ones I always end up changing are the rx/txqueuelen, net.core.rmem_max & net.core.wmem_max, net.ipv4.tcp_rmem & net.ipv4.tcp_wmem, ec..
   
   Thanks for your useful suggestions, I will try it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] LiPengze97 commented on issue #10406: Message loss in non-persistent geo-replication at high sending throughput

Posted by GitBox <gi...@apache.org>.
LiPengze97 commented on issue #10406:
URL: https://github.com/apache/pulsar/issues/10406#issuecomment-828886188


   Taking a deep view into the source code, in[ NonPersistentReplicator.java](https://github.com/apache/pulsar/blob/6704f12104219611164aa2bb5bbdfc929613f1bf/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/nonpersistent/NonPersistentReplicator.java#L123), if the netty connection is not writable, the non-persistent messages are just dropped silently. I think the connection is not writable when the load is too heavy for it, am I right?
   Is 'dropping the non-persistent messages' the common design for the non-persistent topics in the pub-sub system when the load on the connection is heavy?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org