You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by tamer Abdlatif <ta...@gmail.com> on 2022/03/23 12:48:16 UTC

Broker freeze for communications in v 2.7.4

Hi everyone,


We saw strange behaviour , as broker stopped accepting connections and
client start receiving Connection Already Closed or Topic not available
exceptions.

The broker java process itself is up and running,  but curl http ports such
as curl broker  metrics stop return anything.

It is only works when we restart the broker again , however,  I was able to
reproduce the issue by a simple java program which kerp looping to open new
socket to the broker port without closing the socket on every loop.  So
after a while the clients start to get that connection already closed
exceptions.

So it seems as connection leak as it keeps alive.

Would you please advise how to fix that issue?

Thanks and best regards
Tamer

Re: Broker freeze for communications in v 2.7.4

Posted by tamer Abdlatif <ta...@gmail.com>.
Thanks Lari ,

I opened issue # 14826.

Thanks and best regards,
Tamer

On Wed, 23 Mar 2022, 18:04 Lari Hotari, <lh...@apache.org> wrote:

> Hi,
>
> Thank you for the problem report. Have you already filed an issue in
> https://github.com/apache/pulsar/issues ? If not, I think it will be
> helpful for tracking the issue.
>
> When there are such issues where the broker seems to freeze, it is helpful
> to get a threaddump from the frozen broker. That helps verify whether
> there's a thread dead lock.
> You can get a thread dump by executing "jstack -l [PID]". In k8s, the PID
> is 1 for the Pulsar java process.
>
> I have created a script for automating collection of diagnostics
> information from Pulsar pods such as broker pods:
>
> https://github.com/lhotari/pulsar-contributor-toolbox/blob/master/scripts/collect_jvm_diagnostics_from_pod.sh
> In addition to multiple thread dumps, the script collects a heap dump. A
> heap dump contains the memory contents which might contain sensitive
> information, so that shouldn't usually be shared publicly.
> The script is just an example of how diagnostics collection can be
> automated.
>
> It would be helpful to provide a thread dump in the GitHub issue.
> I'd recommend uploading thread dumps to https://gist.github.com/ and
> providing a link in the GitHub issue.
> Thread dumps stored in https://gist.github.com/ can be analysed with
> https://jstack.review web tool by prepending the thread dump url with "
> https://jstack.review?".
> For example, I have a thread dump at
> https://gist.github.com/lhotari/28f71311f9dccc7dd2c0ef267b0242b1 .
> I can analyse the thread dump with the url
> https://jstack.review?https://gist.github.com/lhotari/28f71311f9dccc7dd2c0ef267b0242b1
> .
>
> Other questions:
> What type of deployment do you have? Is it k8s? Is it deployed with Apache
> Pulsar Helm chart? Do you access the Pulsar Broker via the Pulsar Proxy?
>
> -Lari
>
> On 2022/03/23 12:48:16 tamer Abdlatif wrote:
> > Hi everyone,
> >
> >
> > We saw strange behaviour , as broker stopped accepting connections and
> > client start receiving Connection Already Closed or Topic not available
> > exceptions.
> >
> > The broker java process itself is up and running,  but curl http ports
> such
> > as curl broker  metrics stop return anything.
> >
> > It is only works when we restart the broker again , however,  I was able
> to
> > reproduce the issue by a simple java program which kerp looping to open
> new
> > socket to the broker port without closing the socket on every loop.  So
> > after a while the clients start to get that connection already closed
> > exceptions.
> >
> > So it seems as connection leak as it keeps alive.
> >
> > Would you please advise how to fix that issue?
> >
> > Thanks and best regards
> > Tamer
> >
>

Re: Broker freeze for communications in v 2.7.4

Posted by Lari Hotari <lh...@apache.org>.
Hi,

Thank you for the problem report. Have you already filed an issue in 
https://github.com/apache/pulsar/issues ? If not, I think it will be helpful for tracking the issue.

When there are such issues where the broker seems to freeze, it is helpful to get a threaddump from the frozen broker. That helps verify whether there's a thread dead lock.
You can get a thread dump by executing "jstack -l [PID]". In k8s, the PID is 1 for the Pulsar java process. 

I have created a script for automating collection of diagnostics information from Pulsar pods such as broker pods:
https://github.com/lhotari/pulsar-contributor-toolbox/blob/master/scripts/collect_jvm_diagnostics_from_pod.sh
In addition to multiple thread dumps, the script collects a heap dump. A heap dump contains the memory contents which might contain sensitive information, so that shouldn't usually be shared publicly.
The script is just an example of how diagnostics collection can be automated.

It would be helpful to provide a thread dump in the GitHub issue.
I'd recommend uploading thread dumps to https://gist.github.com/ and providing a link in the GitHub issue.
Thread dumps stored in https://gist.github.com/ can be analysed with https://jstack.review web tool by prepending the thread dump url with "https://jstack.review?".
For example, I have a thread dump at https://gist.github.com/lhotari/28f71311f9dccc7dd2c0ef267b0242b1 . 
I can analyse the thread dump with the url https://jstack.review?https://gist.github.com/lhotari/28f71311f9dccc7dd2c0ef267b0242b1 .

Other questions:
What type of deployment do you have? Is it k8s? Is it deployed with Apache Pulsar Helm chart? Do you access the Pulsar Broker via the Pulsar Proxy?

-Lari

On 2022/03/23 12:48:16 tamer Abdlatif wrote:
> Hi everyone,
> 
> 
> We saw strange behaviour , as broker stopped accepting connections and
> client start receiving Connection Already Closed or Topic not available
> exceptions.
> 
> The broker java process itself is up and running,  but curl http ports such
> as curl broker  metrics stop return anything.
> 
> It is only works when we restart the broker again , however,  I was able to
> reproduce the issue by a simple java program which kerp looping to open new
> socket to the broker port without closing the socket on every loop.  So
> after a while the clients start to get that connection already closed
> exceptions.
> 
> So it seems as connection leak as it keeps alive.
> 
> Would you please advise how to fix that issue?
> 
> Thanks and best regards
> Tamer
>