You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@activemq.apache.org by GitBox <gi...@apache.org> on 2021/05/04 16:19:37 UTC
[GitHub] [activemq-artemis] franz1981 opened a new pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
franz1981 opened a new pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566
https://issues.apache.org/jira/browse/ARTEMIS-3282
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 commented on pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#issuecomment-832470852
I believe this can be further improved too, just need to think twice how :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 commented on pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#issuecomment-845190747
I think this one is a good shape now and it's fixing the bug of sync'ing on the replca too @clebertsuconic
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 commented on a change in pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 commented on a change in pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#discussion_r630973519
##########
File path: artemis-server/src/main/java/org/apache/activemq/artemis/core/replication/ReplicationEndpoint.java
##########
@@ -221,6 +221,8 @@ public void handlePacket(final Packet packet) {
handleCommitRollback((ReplicationCommitMessage) packet);
break;
case PacketImpl.REPLICATION_PAGE_WRITE:
+ // potential blocking I/O operation! flush existing packets to save long tail latency
+ endOfBatch();
Review comment:
> I really don't see how a user can choose a value for maxReplicaResponseBatchBytes
It can be 0 with users that care about 99.XXX percentile latencies and have configured kernel bypass drivers.
It could be the MTU size for users that knows it, it should be -1 for "common" users (right now I've chosen 1500 that's the typical MTU size).
I think is very low level config really, yet to be see how it can be usefull: I'm still in the process of validating it before dropping the "draft" status of the PR.
> I think it has to be automatic or automagical, based on the some limit on what can be read.
From the point of view of network utilization and memory usage, just using -1 or 1500 is already a huge step forward if compared with the "previous" (pre-https://issues.apache.org/jira/browse/ARTEMIS-2877) behaviour.
Size of Ethernet frame - 24 Bytes
Size of IPv4 Header (without any options) - 20 bytes
Size of TCP Header (without any options) - 20 Bytes
Total size of an Ethernet Frame carrying an IP Packet with an empty TCP Segment - 24 + 20 + 20 = 64 bytes
When packet size is > MTU, the TCP packets are going to be fragmented, but that's fine because it will amortize syscall cost instead, while maximizing network usage too.
While just sending responses one by one means sending a ~3X overhead of data for each response sent, that will hurt both latencies and CPU/network usage.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 commented on a change in pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 commented on a change in pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#discussion_r630973519
##########
File path: artemis-server/src/main/java/org/apache/activemq/artemis/core/replication/ReplicationEndpoint.java
##########
@@ -221,6 +221,8 @@ public void handlePacket(final Packet packet) {
handleCommitRollback((ReplicationCommitMessage) packet);
break;
case PacketImpl.REPLICATION_PAGE_WRITE:
+ // potential blocking I/O operation! flush existing packets to save long tail latency
+ endOfBatch();
Review comment:
> I really don't see how a user can choose a value for maxReplicaResponseBatchBytes
It can be 0 with users that care about 99.XXX percentile latencies and have configured kernel bypass drivers.
It could be the MTU size for users that knows it, it should be -1 for "common" users (right now I've chosen 1500 that's the typical MTU size).
I think is very low level config really, yet to be see how it can be usefull: I'm still in the process of validating it before dropping the "draft" status of the PR.
> I think it has to be automatic or automagical, based on the some limit on what can be read.
From the point of view of network utilization and memory usage, just using -1 or 1500 is already a huge step forward if compared with the "previous" (pre-https://issues.apache.org/jira/browse/ARTEMIS-2877) behaviour.
Size of Ethernet frame - 24 Bytes
Size of IPv4 Header (without any options) - 20 bytes
Size of TCP Header (without any options) - 20 Bytes
Total size of an Ethernet Frame carrying an IP Packet with an empty TCP Segment - 24 + 20 + 20 = 64 bytes
When packet size is > MTU, the TCP packets are going to be fragmented, but that's fine because it will amortize syscall cost instead, while maximizing network usage too.
While just sending responses one by one means sending a ~3X overhead of data for each response sent, that will hurt both latencies and CPU/network usage.
> I wonder if something similar based on confirmation-window could work here
No idea, IIRC the replication flow of packets won't obey any of the other cluster connection channel packets flow rules, no duplicate checks/no confirmation window or anything similar, @clebertsuconic can you confirm?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] gtully commented on a change in pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
gtully commented on a change in pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#discussion_r636791950
##########
File path: artemis-server/src/main/java/org/apache/activemq/artemis/core/replication/ReplicationEndpoint.java
##########
@@ -221,6 +221,8 @@ public void handlePacket(final Packet packet) {
handleCommitRollback((ReplicationCommitMessage) packet);
break;
case PacketImpl.REPLICATION_PAGE_WRITE:
+ // potential blocking I/O operation! flush existing packets to save long tail latency
+ endOfBatch();
Review comment:
ok, I understand better, the batching is limited by the consumed buffers, before the next read there will always be a batch end.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 commented on pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#issuecomment-833834162
Preliminary tests using paging won't show such regression anyway:
no batching
```
**************
RUN 1 EndToEnd Throughput: 10626 ops/sec
**************
EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
mean 1522.66
min 129.54
50.00% 1376.26
90.00% 1507.33
99.00% 5734.40
99.90% 10747.90
99.99% 18350.08
max 26214.40
count 160000
```
batching:
```
**************
RUN 1 EndToEnd Throughput: 11202 ops/sec
**************
EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
mean 1446.31
min 111.10
50.00% 1376.26
90.00% 1482.75
99.00% 3784.70
99.90% 8224.77
99.99% 14024.70
max 17563.65
count 160000
```
Same scenario: 16 producer / consumers sending to 16 queues 100 bytes persistent msg with replication vs a broker that's already paging due to
```
./artemis producer --message-count 22000 --message-size 100000
```
During the whole load test the broker is paging over the 16 queues != TEST and batching still seems beneficial
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 edited a comment on pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#issuecomment-833839211
Tests has shown something interesting going on on backup side:
![image](https://user-images.githubusercontent.com/13125299/117361311-2aca8e80-aeba-11eb-83fd-9000719ba939.png)
In violet there are calls to `FileChannel::write` due to `handlePageWriteEvent` and `FileChannelImpl::force` due to `handlePageEvent`. The latter doesn't seem right, considered what we deliver with common journal writes ie writes are performed in async and we send response back before hitting the journal. @clebertsuconic wdyt?
I'm thinking to save `fdatasync` due to paging to happen in the Netty event loop....
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 commented on a change in pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 commented on a change in pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#discussion_r630973519
##########
File path: artemis-server/src/main/java/org/apache/activemq/artemis/core/replication/ReplicationEndpoint.java
##########
@@ -221,6 +221,8 @@ public void handlePacket(final Packet packet) {
handleCommitRollback((ReplicationCommitMessage) packet);
break;
case PacketImpl.REPLICATION_PAGE_WRITE:
+ // potential blocking I/O operation! flush existing packets to save long tail latency
+ endOfBatch();
Review comment:
> I really don't see how a user can choose a value for maxReplicaResponseBatchBytes
It can be 0 with users that care about 99.XXX percentile latencies and have configured kernel bypass drivers.
It could be the MTU size for users that knows it, it should be -1 for "common" users (right now I've chosen 1500 that's the typical MTU size).
I think is very low level config really, yet to be seen how it can be useful: I'm still in the process of validating it before dropping the "draft" status of the PR.
> I think it has to be automatic or automagical, based on the some limit on what can be read.
From the point of view of network utilization and memory usage, just using -1 or 1500 is already a huge step forward if compared with the "previous" (pre-https://issues.apache.org/jira/browse/ARTEMIS-2877) behaviour.
Size of Ethernet frame - 24 Bytes
Size of IPv4 Header (without any options) - 20 bytes
Size of TCP Header (without any options) - 20 Bytes
Total size of an Ethernet Frame carrying an IP Packet with an empty TCP Segment - 24 + 20 + 20 = 64 bytes
When packet size is > MTU, the TCP packets are going to be fragmented, but that's fine because it will amortize syscall cost instead, while maximizing network usage too.
While just sending responses one by one means sending a ~3X overhead of data for each response sent, that will hurt both latencies and CPU/network usage.
> I wonder if something similar based on confirmation-window could work here
No idea, IIRC the replication flow of packets won't obey any of the other cluster connection channel packets flow rules, no duplicate checks/no confirmation window or anything similar, @clebertsuconic can you confirm?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 commented on pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#issuecomment-832632624
Results shows that with a long enough stream of packet types, `switch` perform slightly better then the `if cascade`:
```
Benchmark (mostSeenProb) (size) Mode Cnt Score Error Units
SwitchVsIfCascadeBenchmark.testIf 40 8192 avgt 16 21.344 ± 1.075 ns/op
SwitchVsIfCascadeBenchmark.testIf 60 8192 avgt 16 17.265 ± 0.091 ns/op
SwitchVsIfCascadeBenchmark.testIf 80 8192 avgt 16 11.571 ± 0.046 ns/op
SwitchVsIfCascadeBenchmark.testSwitch 40 8192 avgt 16 19.416 ± 0.166 ns/op
SwitchVsIfCascadeBenchmark.testSwitch 60 8192 avgt 16 15.342 ± 0.074 ns/op
SwitchVsIfCascadeBenchmark.testSwitch 80 8192 avgt 16 10.942 ± 0.013 ns/op
```
So there's no point to use the if one given that's a bit less readable too.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 closed pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 closed pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 commented on pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#issuecomment-832072034
This can be relevant for you, @michaelandrepearce , because with super fast kernel bypass drivers batching can cause some tail latency outlier; this tuning can save it to happen (by setting the batching === 0) and can be used to improve performance too.
I still need to perform some tests and see if the new default (using ~ TCP MTU === 1500 bytes) is good enough for most users.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 commented on pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#issuecomment-833811477
I was expecting exposing batching was meant to give a good speedup with kernel bypass drivers but seems not: but I still think the current implementation can still hide a subtle regression with paging/large messages.
I'm going to perform some test to verify it, but seems related to the fact that, on backup, large msgs write and page writes are still happening on the Netty thread and given that they perform synchronous operations on file channels (some form of FileChannel::write) they can delay for some time (untile Netty decide there are no more incoming data from the replication socket) any already batched response, causing it to be delayed too much.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] gtully commented on a change in pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
gtully commented on a change in pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#discussion_r631215271
##########
File path: artemis-server/src/main/java/org/apache/activemq/artemis/core/replication/ReplicationEndpoint.java
##########
@@ -221,6 +221,8 @@ public void handlePacket(final Packet packet) {
handleCommitRollback((ReplicationCommitMessage) packet);
break;
case PacketImpl.REPLICATION_PAGE_WRITE:
+ // potential blocking I/O operation! flush existing packets to save long tail latency
+ endOfBatch();
Review comment:
For sure batching is good, what I am struggling with is the use case for setting a limit on accumulating, b/c the other end is waiting, there also needs to be some timeout in case the limit is not reached. Batching on low utilisation seems unfair on the sender in this case, and the sender is blocking. I need to peek more to see what -1 does :-) thanks for the detail.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 commented on a change in pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 commented on a change in pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#discussion_r630803743
##########
File path: artemis-server/src/main/java/org/apache/activemq/artemis/core/replication/ReplicationEndpoint.java
##########
@@ -221,6 +221,8 @@ public void handlePacket(final Packet packet) {
handleCommitRollback((ReplicationCommitMessage) packet);
break;
case PacketImpl.REPLICATION_PAGE_WRITE:
+ // potential blocking I/O operation! flush existing packets to save long tail latency
+ endOfBatch();
Review comment:
@clebertsuconic @gtully
I'm not yet sure about it: a page write can be a very fast operation depending on the amount of data to be written ie it's the cost to copy some buffer X times, really. Flushing any existing batch is just to ensure no further delay is going to affect the overall response time of replication, but if the batch is too small, bad network utilization will still hurt scalability and average latencies.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 edited a comment on pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#issuecomment-833811477
I was expecting exposing batching size was meant to give a good speedup with kernel bypass drivers but seems not: I see instead that the current implementation can hide a subtle perf regression with paging/large messages.
I'm going to perform some test to verify it, but seems related to the fact that, on backup, large msgs write and page writes are still happening on the Netty thread and given that they perform synchronous operations on file channels (some form of FileChannel::write) they can delay for some time (untile Netty decide there are no more incoming data from the replication socket) any already batched response, causing it to be delayed too much.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 commented on pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#issuecomment-857460000
I've decided to NOT expose the response batch size, but save sync on paging/large messages to happen, instead, going to create a separate PR to address this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 commented on pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#issuecomment-833881055
The last commit isn't givin the nice speedup I was expecting:
```
**************
RUN 1 EndToEnd Throughput: 11155 ops/sec
**************
EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
mean 1438.52
min 116.74
50.00% 1376.26
90.00% 1482.75
99.00% 3424.26
99.90% 7143.42
99.99% 15400.96
max 17825.79
count 160000
```
latency are on par with normal batching + fdatasync, although the box used have a very fast disk and the amount of paged data wasn't enough to make fdatasync to be long enough.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 commented on a change in pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 commented on a change in pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#discussion_r630973519
##########
File path: artemis-server/src/main/java/org/apache/activemq/artemis/core/replication/ReplicationEndpoint.java
##########
@@ -221,6 +221,8 @@ public void handlePacket(final Packet packet) {
handleCommitRollback((ReplicationCommitMessage) packet);
break;
case PacketImpl.REPLICATION_PAGE_WRITE:
+ // potential blocking I/O operation! flush existing packets to save long tail latency
+ endOfBatch();
Review comment:
> I really don't see how a user can choose a value for maxReplicaResponseBatchBytes
It can be 0 with users that care about 99.XXX percentile latencies and have configured kernel bypass drivers.
It could be the MTU size for users that knows it, it should be -1 for "common" users (right now I've chosen 1500 that's the typical MTU size).
I think is very low level config really, yet to be see how it can be usefull: I'm still in the process of validating it before dropping the "draft" status of the PR.
> I think it has to be automatic or automagical, based on the some limit on what can be read.
From the point of view of network utilization and memory usage, just using -1 or 1500 is already a huge step forward if compared with the "previous" (pre-https://issues.apache.org/jira/browse/ARTEMIS-2877) behaviour.
Size of Ethernet frame - 24 Bytes
Size of IPv4 Header (without any options) - 20 bytes
Size of TCP Header (without any options) - 20 Bytes
Total size of an Ethernet Frame carrying an IP Packet with an empty TCP Segment - 24 + 20 + 20 = 64 bytes
When packet size is > MTU, the TCP packets are going to be fragmented, but that's fine because it will amortize syscall cost instead, while maximizing network usage too.
While just sending responses one by one means sending a ~3X overhead of data for each response sent, that will hurt both latencies and CPU/network usage.
> I wonder if something similar based on confirmation-window could work here
No idea, the replication flow of packets won't obey IIRC any of the other cluster connection channel packets flow rules AFAIK, no duplicate checks/no confirmation window or anything similar, @clebertsuconic can you confirm?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] gtully commented on a change in pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
gtully commented on a change in pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#discussion_r630953529
##########
File path: artemis-server/src/main/java/org/apache/activemq/artemis/core/replication/ReplicationEndpoint.java
##########
@@ -221,6 +221,8 @@ public void handlePacket(final Packet packet) {
handleCommitRollback((ReplicationCommitMessage) packet);
break;
case PacketImpl.REPLICATION_PAGE_WRITE:
+ // potential blocking I/O operation! flush existing packets to save long tail latency
+ endOfBatch();
Review comment:
I think if blocking ops are removed, there is no need to eagerly flush the batch here. However I really don't see how a user can choose a value for maxReplicaResponseBatchBytes. I think it has to be automatic or automagical, based on the some limit on what can be read.
5.x has optimiseAck on by default, at 70% of the prefetch for openwire consumers. That covers the auto ack case and only sends an actual ack every X non persistent messages. I wonder if something similar based on confirmation-window could work here. Tho the endpoint probably has no way of knowing what is set on the other end I guess. Is there a relationship between the confirmation window and responses already?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 commented on pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 commented on pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#issuecomment-833839211
Tests has shown something interesting:
![image](https://user-images.githubusercontent.com/13125299/117361311-2aca8e80-aeba-11eb-83fd-9000719ba939.png)
In violet there are calls to `FileChannel::write` due to `handlePageWriteEvent` and `FileChannelImpl::force` due to `handlePageEvent`. The latter doesn't seem right, considered what we deliver with common journal writes ie writes are performed in async and we send response back before hitting the journal. @clebertsuconic wdyt?
I'm thinking to save `fdatasync` due to paging to happen in the Netty event loop....
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 edited a comment on pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#issuecomment-833811477
I was expecting exposing batching size was meant to give a good speedup with kernel bypass drivers but seems not: but I still think the current implementation can still hide a subtle regression with paging/large messages.
I'm going to perform some test to verify it, but seems related to the fact that, on backup, large msgs write and page writes are still happening on the Netty thread and given that they perform synchronous operations on file channels (some form of FileChannel::write) they can delay for some time (untile Netty decide there are no more incoming data from the replication socket) any already batched response, causing it to be delayed too much.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 edited a comment on pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#issuecomment-833811477
I was expecting exposing batching size was meant to give a good speedup with kernel bypass drivers but seems not: I see instead that the current implementation can hide a subtle perf regression with paging/large messages.
I'm going to perform some test to verify it, but seems related to the fact that, on backup, large msgs write and page writes are still happening on the Netty thread and given that they perform synchronous operations on file channels (some form of FileChannel::write) they can delay for some time any already batched responses to be sent back to the master broker.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3566: ARTEMIS-3282 Expose replication response batching tuning
Posted by GitBox <gi...@apache.org>.
franz1981 edited a comment on pull request #3566:
URL: https://github.com/apache/activemq-artemis/pull/3566#issuecomment-833839211
Tests has shown something interesting going on on backup side, although it's not a regression:
![image](https://user-images.githubusercontent.com/13125299/117361311-2aca8e80-aeba-11eb-83fd-9000719ba939.png)
In violet there are calls to `FileChannel::write` due to `handlePageWriteEvent` and `FileChannelImpl::force` due to `handlePageEvent`. The latter doesn't seem right, considered what we deliver with common journal writes ie writes are performed in async and we send response back before hitting the journal. @clebertsuconic wdyt?
I'm thinking to save `fdatasync` due to paging to happen in the Netty event loop....
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org