You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/09/01 03:14:15 UTC
[GitHub] [pulsar] BewareMyPower commented on issue #10721: (2.7.2) Python client blocks forever in "create_producer" during a broker failure, even if "operation_timeout_seconds" is set
BewareMyPower commented on issue #10721:
URL: https://github.com/apache/pulsar/issues/10721#issuecomment-909838082
I still cannot reproduce the bug. For the first log file, it looks like the reconnection logs only happened for `send`, instead of `create_producer`. I ran your code with a little change that adds the timeout for send
```python
rv = cl.create_producer('foobar', send_timeout_millis=1000)
```
Here's the output:
```
Sent: 2000
Sent: 3000
2021-09-01 11:02:40.385 INFO [0x7000048d7000] HandlerBase:142 | [persistent://public/default/foobar, standalone-6-0] Schedule reconnection in 0.1 s
2021-09-01 11:02:40.394 ERROR [0x7000048d7000] ClientConnection:572 | [127.0.0.1:59743 -> 127.0.0.1:6650] Read failed: End of file
2021-09-01 11:02:40.394 INFO [0x7000048d7000] ClientConnection:1495 | [127.0.0.1:59743 -> 127.0.0.1:6650] Connection closed
2021-09-01 11:02:40.487 INFO [0x7000048d7000] HandlerBase:64 | [persistent://public/default/foobar, standalone-6-0] Getting connection from pool
2021-09-01 11:02:40.487 INFO [0x7000048d7000] ConnectionPool:86 | Deleting stale connection from pool for pulsar://localhost:6650 use_count: 1 @ 0x7fae5580c000
2021-09-01 11:02:40.487 INFO [0x7000048d7000] ClientConnection:181 | [<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
2021-09-01 11:02:40.487 INFO [0x7000048d7000] ConnectionPool:96 | Created connection for pulsar://localhost:6650
2021-09-01 11:02:40.489 ERROR [0x7000048d7000] ClientConnection:448 | [<none> -> pulsar://localhost:6650] Failed to establish connection: Connection refused
2021-09-01 11:02:40.489 INFO [0x7000048d7000] ClientConnection:1495 | [<none> -> pulsar://localhost:6650] Connection closed
2021-09-01 11:02:40.489 INFO [0x7000048d7000] HandlerBase:142 | [persistent://public/default/foobar, standalone-6-0] Schedule reconnection in 0.182 s
2021-09-01 11:02:40.676 INFO [0x7000048d7000] HandlerBase:64 | [persistent://public/default/foobar, standalone-6-0] Getting connection from pool
2021-09-01 11:02:40.676 INFO [0x7000048d7000] ConnectionPool:86 | Deleting stale connection from pool for pulsar://localhost:6650 use_count: 1 @ 0x7fae54852800
2021-09-01 11:02:40.676 INFO [0x7000048d7000] ClientConnection:181 | [<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
2021-09-01 11:02:40.676 INFO [0x7000048d7000] ConnectionPool:96 | Created connection for pulsar://localhost:6650
2021-09-01 11:02:40.678 ERROR [0x7000048d7000] ClientConnection:448 | [<none> -> pulsar://localhost:6650] Failed to establish connection: Connection refused
2021-09-01 11:02:40.678 INFO [0x7000048d7000] ClientConnection:1495 | [<none> -> pulsar://localhost:6650] Connection closed
2021-09-01 11:02:40.678 INFO [0x7000048d7000] HandlerBase:142 | [persistent://public/default/foobar, standalone-6-0] Schedule reconnection in 0.372 s
2021-09-01 11:02:41.052 INFO [0x7000048d7000] HandlerBase:64 | [persistent://public/default/foobar, standalone-6-0] Getting connection from pool
2021-09-01 11:02:41.052 INFO [0x7000048d7000] ConnectionPool:86 | Deleting stale connection from pool for pulsar://localhost:6650 use_count: 1 @ 0x7fae5384e800
2021-09-01 11:02:41.052 INFO [0x7000048d7000] ClientConnection:181 | [<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
2021-09-01 11:02:41.052 INFO [0x7000048d7000] ConnectionPool:96 | Created connection for pulsar://localhost:6650
2021-09-01 11:02:41.054 ERROR [0x7000048d7000] ClientConnection:448 | [<none> -> pulsar://localhost:6650] Failed to establish connection: Connection refused
2021-09-01 11:02:41.054 INFO [0x7000048d7000] ClientConnection:1495 | [<none> -> pulsar://localhost:6650] Connection closed
2021-09-01 11:02:41.054 INFO [0x7000048d7000] HandlerBase:142 | [persistent://public/default/foobar, standalone-6-0] Schedule reconnection in 0.224 s
2021-09-01 11:02:41.281 INFO [0x7000048d7000] HandlerBase:64 | [persistent://public/default/foobar, standalone-6-0] Getting connection from pool
2021-09-01 11:02:41.281 INFO [0x7000048d7000] ConnectionPool:86 | Deleting stale connection from pool for pulsar://localhost:6650 use_count: 1 @ 0x7fae5305ce00
2021-09-01 11:02:41.281 INFO [0x7000048d7000] ClientConnection:181 | [<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
2021-09-01 11:02:41.281 INFO [0x7000048d7000] ConnectionPool:96 | Created connection for pulsar://localhost:6650
2021-09-01 11:02:41.283 ERROR [0x7000048d7000] ClientConnection:448 | [<none> -> pulsar://localhost:6650] Failed to establish connection: Connection refused
2021-09-01 11:02:41.283 INFO [0x7000048d7000] ClientConnection:1495 | [<none> -> pulsar://localhost:6650] Connection closed
2021-09-01 11:02:41.283 INFO [0x7000048d7000] HandlerBase:142 | [persistent://public/default/foobar, standalone-6-0] Schedule reconnection in 1.568 s
Got exception: Pulsar error: TimeOut
2021-09-01 11:02:41.390 INFO [0x118051e00] ClientConnection:181 | [<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
2021-09-01 11:02:41.390 INFO [0x118051e00] ConnectionPool:96 | Created connection for pulsar://localhost:6650
2021-09-01 11:02:41.392 ERROR [0x7000049dd000] ClientConnection:448 | [<none> -> pulsar://localhost:6650] Failed to establish connection: Connection refused
2021-09-01 11:02:41.392 INFO [0x7000049dd000] ClientConnection:1495 | [<none> -> pulsar://localhost:6650] Connection closed
2021-09-01 11:02:41.392 ERROR [0x7000049dd000] ClientImpl:188 | Error Checking/Getting Partition Metadata while creating producer on persistent://public/default/foobar -- ConnectError
error creating producer: Pulsar error: ConnectError
```
The reconnection related logs should appear before the `Got exception: Pulsar error: TimeOut` line, but it appeared after this line in your logs. It might be caused by the stdout not synchronized. During `create_producer`, there should be no logs printed by `HandlerBase`.
I suspected there's something wrong with the phase to establish connection because your logs don't contain `Error Checking/Getting Partition Metadata`. `create_producer` performs topic lookup at first, so if the connection cannot be established, the topic lookup will fail.
Since you built source from master, you can add connection timeout and try again.
```python
cl = pulsar.Client(
'pulsar://localhost:6650',
operation_timeout_seconds=1,
connection_timeout_ms=1000,
)
```
> once against an already SIGSTOPped broker.
Regarding to this case, could you enable debug logs and try again?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org