You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/09/01 03:14:15 UTC

[GitHub] [pulsar] BewareMyPower commented on issue #10721: (2.7.2) Python client blocks forever in "create_producer" during a broker failure, even if "operation_timeout_seconds" is set

BewareMyPower commented on issue #10721:
URL: https://github.com/apache/pulsar/issues/10721#issuecomment-909838082


   I still cannot reproduce the bug. For the first log file, it looks like the reconnection logs only happened for `send`, instead of `create_producer`. I ran your code with a little change that adds the timeout for send
   
   ```python
           rv = cl.create_producer('foobar', send_timeout_millis=1000)
   ```
   
   Here's the output:
   
   ```
   Sent: 2000
   Sent: 3000
   2021-09-01 11:02:40.385 INFO  [0x7000048d7000] HandlerBase:142 | [persistent://public/default/foobar, standalone-6-0] Schedule reconnection in 0.1 s
   2021-09-01 11:02:40.394 ERROR [0x7000048d7000] ClientConnection:572 | [127.0.0.1:59743 -> 127.0.0.1:6650] Read failed: End of file
   2021-09-01 11:02:40.394 INFO  [0x7000048d7000] ClientConnection:1495 | [127.0.0.1:59743 -> 127.0.0.1:6650] Connection closed
   2021-09-01 11:02:40.487 INFO  [0x7000048d7000] HandlerBase:64 | [persistent://public/default/foobar, standalone-6-0] Getting connection from pool
   2021-09-01 11:02:40.487 INFO  [0x7000048d7000] ConnectionPool:86 | Deleting stale connection from pool for pulsar://localhost:6650 use_count: 1 @ 0x7fae5580c000
   2021-09-01 11:02:40.487 INFO  [0x7000048d7000] ClientConnection:181 | [<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
   2021-09-01 11:02:40.487 INFO  [0x7000048d7000] ConnectionPool:96 | Created connection for pulsar://localhost:6650
   2021-09-01 11:02:40.489 ERROR [0x7000048d7000] ClientConnection:448 | [<none> -> pulsar://localhost:6650] Failed to establish connection: Connection refused
   2021-09-01 11:02:40.489 INFO  [0x7000048d7000] ClientConnection:1495 | [<none> -> pulsar://localhost:6650] Connection closed
   2021-09-01 11:02:40.489 INFO  [0x7000048d7000] HandlerBase:142 | [persistent://public/default/foobar, standalone-6-0] Schedule reconnection in 0.182 s
   2021-09-01 11:02:40.676 INFO  [0x7000048d7000] HandlerBase:64 | [persistent://public/default/foobar, standalone-6-0] Getting connection from pool
   2021-09-01 11:02:40.676 INFO  [0x7000048d7000] ConnectionPool:86 | Deleting stale connection from pool for pulsar://localhost:6650 use_count: 1 @ 0x7fae54852800
   2021-09-01 11:02:40.676 INFO  [0x7000048d7000] ClientConnection:181 | [<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
   2021-09-01 11:02:40.676 INFO  [0x7000048d7000] ConnectionPool:96 | Created connection for pulsar://localhost:6650
   2021-09-01 11:02:40.678 ERROR [0x7000048d7000] ClientConnection:448 | [<none> -> pulsar://localhost:6650] Failed to establish connection: Connection refused
   2021-09-01 11:02:40.678 INFO  [0x7000048d7000] ClientConnection:1495 | [<none> -> pulsar://localhost:6650] Connection closed
   2021-09-01 11:02:40.678 INFO  [0x7000048d7000] HandlerBase:142 | [persistent://public/default/foobar, standalone-6-0] Schedule reconnection in 0.372 s
   2021-09-01 11:02:41.052 INFO  [0x7000048d7000] HandlerBase:64 | [persistent://public/default/foobar, standalone-6-0] Getting connection from pool
   2021-09-01 11:02:41.052 INFO  [0x7000048d7000] ConnectionPool:86 | Deleting stale connection from pool for pulsar://localhost:6650 use_count: 1 @ 0x7fae5384e800
   2021-09-01 11:02:41.052 INFO  [0x7000048d7000] ClientConnection:181 | [<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
   2021-09-01 11:02:41.052 INFO  [0x7000048d7000] ConnectionPool:96 | Created connection for pulsar://localhost:6650
   2021-09-01 11:02:41.054 ERROR [0x7000048d7000] ClientConnection:448 | [<none> -> pulsar://localhost:6650] Failed to establish connection: Connection refused
   2021-09-01 11:02:41.054 INFO  [0x7000048d7000] ClientConnection:1495 | [<none> -> pulsar://localhost:6650] Connection closed
   2021-09-01 11:02:41.054 INFO  [0x7000048d7000] HandlerBase:142 | [persistent://public/default/foobar, standalone-6-0] Schedule reconnection in 0.224 s
   2021-09-01 11:02:41.281 INFO  [0x7000048d7000] HandlerBase:64 | [persistent://public/default/foobar, standalone-6-0] Getting connection from pool
   2021-09-01 11:02:41.281 INFO  [0x7000048d7000] ConnectionPool:86 | Deleting stale connection from pool for pulsar://localhost:6650 use_count: 1 @ 0x7fae5305ce00
   2021-09-01 11:02:41.281 INFO  [0x7000048d7000] ClientConnection:181 | [<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
   2021-09-01 11:02:41.281 INFO  [0x7000048d7000] ConnectionPool:96 | Created connection for pulsar://localhost:6650
   2021-09-01 11:02:41.283 ERROR [0x7000048d7000] ClientConnection:448 | [<none> -> pulsar://localhost:6650] Failed to establish connection: Connection refused
   2021-09-01 11:02:41.283 INFO  [0x7000048d7000] ClientConnection:1495 | [<none> -> pulsar://localhost:6650] Connection closed
   2021-09-01 11:02:41.283 INFO  [0x7000048d7000] HandlerBase:142 | [persistent://public/default/foobar, standalone-6-0] Schedule reconnection in 1.568 s
   Got exception: Pulsar error: TimeOut
   2021-09-01 11:02:41.390 INFO  [0x118051e00] ClientConnection:181 | [<none> -> pulsar://localhost:6650] Create ClientConnection, timeout=10000
   2021-09-01 11:02:41.390 INFO  [0x118051e00] ConnectionPool:96 | Created connection for pulsar://localhost:6650
   2021-09-01 11:02:41.392 ERROR [0x7000049dd000] ClientConnection:448 | [<none> -> pulsar://localhost:6650] Failed to establish connection: Connection refused
   2021-09-01 11:02:41.392 INFO  [0x7000049dd000] ClientConnection:1495 | [<none> -> pulsar://localhost:6650] Connection closed
   2021-09-01 11:02:41.392 ERROR [0x7000049dd000] ClientImpl:188 | Error Checking/Getting Partition Metadata while creating producer on persistent://public/default/foobar -- ConnectError
   error creating producer: Pulsar error: ConnectError
   ```
   
   The reconnection related logs should appear before the `Got exception: Pulsar error: TimeOut` line, but it appeared after this line in your logs. It might be caused by the stdout not synchronized. During `create_producer`, there should be no logs printed by `HandlerBase`.
   
   I suspected there's something wrong with the phase to establish connection because your logs don't contain `Error Checking/Getting Partition Metadata`. `create_producer` performs topic lookup at first, so if the connection cannot be established, the topic lookup will fail.
   
   Since you built source from master, you can add connection timeout and try again.
   
   ```python
           cl = pulsar.Client(
               'pulsar://localhost:6650',
               operation_timeout_seconds=1,
               connection_timeout_ms=1000,
           )
   ```
   
   > once against an already SIGSTOPped broker.
   
   Regarding to this case, could you enable debug logs and try again? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org