You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Yubiao Feng <yu...@streamnative.io.INVALID> on 2024/03/28 17:48:53 UTC
[DISCUSS] Cherry-pick #21667 that compressed /metrics responses into stable branches
Hi all
Our cluster encountered an issue: brokers were killed because the HTTP API
for the health check did not work.
After an analysis, I found the root cause below:
The API pathed /metrics/ is very slow (it would cost `20s`) due to the
metrics content being too large(more than `160M`); this API is serialized.
In other words, the broker handles it one by one. The connections will be
used overall, and then leading the health check can not be handled.
```
curl http://127.0.0.1:8080/metrics/ -D c -o output.txt
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 164M 0 164M 0 0 7808k 0 --:--:-- 0:00:21 --:--:--
48.9M
HTTP/1.1 200 OK
Date: Wed, 27 Mar 2024 12:25:21 GMT
broker-address:
workflows-broker-5.workflows-broker-headless.o-o2kq3.svc.cluster.local
Content-Type: text/plain;charset=utf-8
Transfer-Encoding: chunked
Server: Jetty(9.4.51.v20230217)
```
PR #21667 can solve the issue above, so I want to cherry-pick it into the
stable branches.
- branch-2.11
- branch-3.0
- branch-3.2
Thanks
Yubiao Feng
Re: [DISCUSS] Cherry-pick #21667 that compressed /metrics responses into stable branches
Posted by Lari Hotari <lh...@apache.org>.
I'm fine with going ahead and cherry-picking the change in PR 21667 to all maintenance branches.
The only concern I have is about having a way to selectively disable Gzip compression if it causes a regression for some users.
I started an experiment in draft PR https://github.com/apache/pulsar/pull/22370.
It turned into a larger PR. We should split a separate PR for the part for adding httpServerGzipCompressionExcludedPaths to configuration so that possibly problematic paths could be excluded from Gzip compression if any regressions occur.
https://github.com/apache/pulsar/pull/22370/files#diff-cc761e782083f37db72cd91684fee07b931c188dd93333397c62b0a4c45a657eR345
Before the next set of releases we should consider adding this change.
Another useful addition would be to enable gzip compression by default for pulsar-admin. The current experiment in PR 22370 includes that.
-Lari
On 2024/03/28 17:48:53 Yubiao Feng wrote:
> Hi all
>
> Our cluster encountered an issue: brokers were killed because the HTTP API
> for the health check did not work.
>
> After an analysis, I found the root cause below:
>
> The API pathed /metrics/ is very slow (it would cost `20s`) due to the
> metrics content being too large(more than `160M`); this API is serialized.
> In other words, the broker handles it one by one. The connections will be
> used overall, and then leading the health check can not be handled.
>
> ```
> curl http://127.0.0.1:8080/metrics/ -D c -o output.txt
> % Total % Received % Xferd Average Speed Time Time Time
> Current
> Dload Upload Total Spent Left
> Speed
> 100 164M 0 164M 0 0 7808k 0 --:--:-- 0:00:21 --:--:--
> 48.9M
>
> HTTP/1.1 200 OK
> Date: Wed, 27 Mar 2024 12:25:21 GMT
> broker-address:
> workflows-broker-5.workflows-broker-headless.o-o2kq3.svc.cluster.local
> Content-Type: text/plain;charset=utf-8
> Transfer-Encoding: chunked
> Server: Jetty(9.4.51.v20230217)
> ```
>
> PR #21667 can solve the issue above, so I want to cherry-pick it into the
> stable branches.
> - branch-2.11
> - branch-3.0
> - branch-3.2
>
>
> Thanks
> Yubiao Feng
>