You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2023/04/04 03:36:03 UTC
[GitHub] [pulsar] yuripean created a discussion: Pulsar Broker node hangs, and Flink cannot consume the corresponding data.

GitHub user yuripean created a discussion: Pulsar Broker node hangs, and Flink cannot consume the corresponding data.

Background:
Pulsar version 2.10
The following is deployed in our business environment.
server	Hardware configuration	The components deployed on the machine.
ddx001	32C/128G/5.5T SSD	Zookeeper/Bookkeeper/Pulsar Broker
ddx002	32C/128G/5.5T SSD	Zookeeper/Bookkeeper/Pulsar Broker
ddx003	32C/128G/5.5T SSD	Zookeeper/Bookkeeper/Pulsar Broker

On our machine, we have deployed Debezium to synchronize data from Mongo. Debezium synchronizes the data from Mongo to the Pulsar mongo/hamster namespace, and the Flink program consume the data in the namespace  bigdata/dwd. During our operation, when a single broker hangs, its corresponding ports 6650 and 8080 remain active but Debezium is unable to wirite data into Pulsar's corresponding broker. As a result, downstream Flink cannot cosume any data.

After restarting the broker on ddx001 at 22:00 in the evening, Debezium was able to cosume data in Pulsar again.
<img width="1575" alt="image" src="https://user-images.githubusercontent.com/48741896/229680046-582fee02-f41d-4d83-b56a-72de355a381d.png">
Flink consumed data for a while in the morning but was unable to consume it.
<img width="1587" alt="image" src="https://user-images.githubusercontent.com/48741896/229680228-1f0ec9c4-932f-4fce-afd2-f8cf86bd5782.png">

The error message when viewing debezium logs is as follows:

2023-03-30T17:17:22,907+0800 [pulsar-client-io-1-10] WARN org.apache.pulsar.client.impl.ClientCnx - [id: 0x21142b91, L:/10.1.62.3:56732 - R:10.1.62.1/10.1.62.1:6650] request timeout {‘durationMs’: ‘30000’, ‘reqId’:‘2994521255180757909’, ‘remote’:‘10.1.62.1/10.1.62.1:6650’, ‘local’:‘/10.1.62.3:56732’}
2023-03-30T17:17:22,907+0800 [pulsar-client-io-1-10] ERROR org.apache.pulsar.client.impl.ProducerImpl - [persistent://mongo/hamster/dbservermongo.hamster.ClassSummary10] [pulsar-cluster-ddx-73-29] Failed to create producer: request timeout {‘durationMs’: ‘30000’, ‘reqId’:‘2994521255180757911’, ‘remote’:‘10.1.62.1/10.1.62.1:6650’, ‘local’:‘/10.1.62.3:56732’}
2023-03-30T17:17:22,907+0800 [pulsar-client-io-1-10] WARN org.apache.pulsar.client.impl.ConnectionHandler - [persistent://mongo/hamster/dbservermongo.hamster.ClassSummary10] [pulsar-cluster-ddx-73-29] Could not get connection to broker: request timeout {‘durationMs’: ‘30000’, ‘reqId’:‘2994521255180757911’, ‘remote’:‘10.1.62.1/10.1.62.1:6650’, ‘local’:‘/10.1.62.3:56732’} – Will try again in 56.137 s
2023-03-30T17:17:22,907+0800 [pulsar-client-io-1-10] WARN org.apache.pulsar.client.impl.ClientCnx - [id: 0x21142b91, L:/10.1.62.3:56732 - R:10.1.62.1/10.1.62.1:6650] request timeout {‘durationMs’: ‘30000’, ‘reqId’:‘2994521255180757911’, ‘remote’:‘10.1.62.1/10.1.62.1:6650’, ‘local’:‘/10.1.62.3:56732’}
2023-03-30T17:17:22,907+0800 [pulsar-client-io-1-10] ERROR org.apache.pulsar.client.impl.ProducerImpl - [persistent://mongo/hamster/dbservermongo.hamster.ClassRating1] [pulsar-cluster-ddx-73-66] Failed to create producer: request timeout {‘durationMs’: ‘30000’, ‘reqId’:‘2994521255180757913’, ‘remote’:‘10.1.62.1/10.1.62.1:6650’, ‘local’:‘/10.1.62.3:56732’}
2023-03-30T17:17:22,907+0800 [pulsar-client-io-1-10] WARN org.apache.pulsar.client.impl.ConnectionHandler - [persistent://mongo/hamster/dbservermongo.hamster.ClassRating1] [pulsar-cluster-ddx-73-66] Could not get connection to broker: request timeout {‘durationMs’: ‘30000’, ‘reqId’:‘2994521255180757913’, ‘remote’:‘10.1.62.1/10.1.62.1:6650’, ‘local’:‘/10.1.62.3:56732’} – Will try again in 57.424 s
2023-03-30T17:17:22,907+0800 [pulsar-client-io-1-10] WARN org.apache.pulsar.client.impl.ClientCnx - [id: 0x21142b91, L:/10.1.62.3:56732 - R:10.1.62.1/10.1.62.1:6650] request timeout {‘durationMs’: ‘30000’, ‘reqId’:‘2994521255180757913’, ‘remote’:‘10.1.62.1/10.1.62.1:6650’, ‘local’:‘/10.1.62.3:56732’}

During our testing, ddx001 was unable to send data while ddx002 and ddx003 were functioning normally. After restarting the broker on ddx001, Debezium was able to write data into the broker.


Q1：Why can Debezium only write to a fixed broker?
Q2: Why does the broker hang after starting Flink, but the corresponding port of the broker is still active?
Q3: Debezium cannot automatically switch to other nodes during daily maintenance, and the connector keeps restarting.
Q4: There is a connector for Debezium on ddx001. The machine crashed at 10 am, and after switching to another node ddx003 at 11 am, Debezium could not obtain MySQL data during the interruption from 10 am to 11 am, resulting in gaps in the message stream.





GitHub link: https://github.com/apache/pulsar/discussions/20004

----
This is an automatically sent email for commits@pulsar.apache.org.
To unsubscribe, please send an email to: commits-unsubscribe@pulsar.apache.org