You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Yubiao Feng <yu...@streamnative.io.INVALID> on 2024/03/28 17:48:53 UTC

[DISCUSS] Cherry-pick #21667 that compressed /metrics responses into stable branches

Hi all

Our cluster encountered an issue: brokers were killed because the HTTP API
for the health check did not work.

After an analysis, I found the root cause below:

The API pathed /metrics/ is very slow (it would cost `20s`) due to the
metrics content being too large(more than `160M`); this API is serialized.
In other words, the broker handles it one by one. The connections will be
used overall, and then leading the health check can not be handled.

```
curl http://127.0.0.1:8080/metrics/ -D c -o output.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time
Current
                                 Dload  Upload   Total   Spent    Left
Speed
100  164M    0  164M    0     0  7808k      0 --:--:--  0:00:21 --:--:--
48.9M

HTTP/1.1 200 OK
Date: Wed, 27 Mar 2024 12:25:21 GMT
broker-address:
workflows-broker-5.workflows-broker-headless.o-o2kq3.svc.cluster.local
Content-Type: text/plain;charset=utf-8
Transfer-Encoding: chunked
Server: Jetty(9.4.51.v20230217)
```

PR #21667 can solve the issue above, so I want to cherry-pick it into the
stable branches.
- branch-2.11
- branch-3.0
- branch-3.2


Thanks
Yubiao Feng

Re: [DISCUSS] Cherry-pick #21667 that compressed /metrics responses into stable branches

Posted by Lari Hotari <lh...@apache.org>.
I'm fine with going ahead and cherry-picking the change in PR 21667 to all maintenance branches.

The only concern I have is about having a way to selectively disable Gzip compression if it causes a regression for some users. 

I started an experiment in draft PR https://github.com/apache/pulsar/pull/22370.
It turned into a larger PR. We should split a separate PR for the part for adding httpServerGzipCompressionExcludedPaths to configuration so that possibly problematic paths could be excluded from Gzip compression if any regressions occur.
https://github.com/apache/pulsar/pull/22370/files#diff-cc761e782083f37db72cd91684fee07b931c188dd93333397c62b0a4c45a657eR345

Before the next set of releases we should consider adding this change.
Another useful addition would be to enable gzip compression by default for pulsar-admin. The current experiment in PR 22370 includes that.

-Lari

On 2024/03/28 17:48:53 Yubiao Feng wrote:
> Hi all
> 
> Our cluster encountered an issue: brokers were killed because the HTTP API
> for the health check did not work.
> 
> After an analysis, I found the root cause below:
> 
> The API pathed /metrics/ is very slow (it would cost `20s`) due to the
> metrics content being too large(more than `160M`); this API is serialized.
> In other words, the broker handles it one by one. The connections will be
> used overall, and then leading the health check can not be handled.
> 
> ```
> curl http://127.0.0.1:8080/metrics/ -D c -o output.txt
>   % Total    % Received % Xferd  Average Speed   Time    Time     Time
> Current
>                                  Dload  Upload   Total   Spent    Left
> Speed
> 100  164M    0  164M    0     0  7808k      0 --:--:--  0:00:21 --:--:--
> 48.9M
> 
> HTTP/1.1 200 OK
> Date: Wed, 27 Mar 2024 12:25:21 GMT
> broker-address:
> workflows-broker-5.workflows-broker-headless.o-o2kq3.svc.cluster.local
> Content-Type: text/plain;charset=utf-8
> Transfer-Encoding: chunked
> Server: Jetty(9.4.51.v20230217)
> ```
> 
> PR #21667 can solve the issue above, so I want to cherry-pick it into the
> stable branches.
> - branch-2.11
> - branch-3.0
> - branch-3.2
> 
> 
> Thanks
> Yubiao Feng
>