You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/11/08 15:19:52 UTC

[GitHub] [pulsar-client-go] bschofield opened a new issue #662: newPartitionProducer() occasionally hangs, waiting for a ready connection

bschofield opened a new issue #662:
URL: https://github.com/apache/pulsar-client-go/issues/662


   #### Expected behavior
   
   When creating a partitioned producer, producers for all partitions should be created.
   
   #### Actual behavior
   
   I am writing to a topic with six partitions. Very occasionally, I experience a process which is stuck because the producer for one of the partitions is not creating.
   
   I recently had a process in this state, and I attached _delve_ to it in order to see exactly where it was stuck. Looking first at the main goroutine, then we can see that it is stuck in _internalCreatePartitionsProducers()_:
   
   ```
   (dlv) bt
    0  0x00000000004425b6 in runtime.gopark
       at /usr/local/go/src/runtime/proc.go:367
    1  0x0000000000411bec in runtime.chanrecv
       at /usr/local/go/src/runtime/chan.go:576
    2  0x0000000000411658 in runtime.chanrecv2
       at /usr/local/go/src/runtime/chan.go:444
    3  0x000000000095e9ec in github.com/apache/pulsar-client-go/pulsar.(*producer).internalCreatePartitionsProducers
       at /go/pkg/mod/github.com/apache/pulsar-client-go@v0.7.0-candidate-1.0.20211104025721-d80a722ac1ab/pulsar/producer_impl.go:230
    4  0x000000000095dfaf in github.com/apache/pulsar-client-go/pulsar.newProducer
       at /go/pkg/mod/github.com/apache/pulsar-client-go@v0.7.0-candidate-1.0.20211104025721-d80a722ac1ab/pulsar/producer_impl.go:127
    5  0x0000000000945ba9 in github.com/apache/pulsar-client-go/pulsar.(*client).CreateProducer
       at /go/pkg/mod/github.com/apache/pulsar-client-go@v0.7.0-candidate-1.0.20211104025721-d80a722ac1ab/pulsar/client_impl.go:154
   [...]
   ```
   
   Line 230 of _producer_impl.go_ is waiting for a channel read:
   
   ```
   pe, ok := <-c
   ```
   
   The goroutine on the other end of this channel is launched at line 219 of _producer_impl.go_, and the issue lies in the call to _newPartitionProducer()_ on line 220. The full stack trace for that other goroutine is:
   
   ```
   (dlv) bt
    0  0x00000000004425b6 in runtime.gopark
       at /usr/local/go/src/runtime/proc.go:367
    1  0x000000000046c3dd in runtime.goparkunlock
       at /usr/local/go/src/runtime/proc.go:372
    2  0x000000000046c3dd in sync.runtime_notifyListWait
       at /usr/local/go/src/runtime/sema.go:513
    3  0x0000000000477fac in sync.(*Cond).Wait
       at /usr/local/go/src/sync/cond.go:56
    4  0x00000000008e33cd in github.com/apache/pulsar-client-go/pulsar/internal.(*connection).waitUntilReady
       at /go/pkg/mod/github.com/apache/pulsar-client-go@v0.7.0-candidate-1.0.20211104025721-d80a722ac1ab/pulsar/internal/connection.go:339
    5  0x00000000008e8668 in github.com/apache/pulsar-client-go/pulsar/internal.(*connectionPool).GetConnection
       at /go/pkg/mod/github.com/apache/pulsar-client-go@v0.7.0-candidate-1.0.20211104025721-d80a722ac1ab/pulsar/internal/connection_pool.go:108
    6  0x00000000008f37ed in github.com/apache/pulsar-client-go/pulsar/internal.(*rpcClient).Request
       at /go/pkg/mod/github.com/apache/pulsar-client-go@v0.7.0-candidate-1.0.20211104025721-d80a722ac1ab/pulsar/internal/rpc_client.go:120
    7  0x0000000000960d63 in github.com/apache/pulsar-client-go/pulsar.(*partitionProducer).grabCnx
       at /go/pkg/mod/github.com/apache/pulsar-client-go@v0.7.0-candidate-1.0.20211104025721-d80a722ac1ab/pulsar/producer_partition.go:228
    8  0x00000000009602fa in github.com/apache/pulsar-client-go/pulsar.newPartitionProducer
       at /go/pkg/mod/github.com/apache/pulsar-client-go@v0.7.0-candidate-1.0.20211104025721-d80a722ac1ab/pulsar/producer_partition.go:161
    9  0x000000000095edc9 in github.com/apache/pulsar-client-go/pulsar.(*producer).internalCreatePartitionsProducers.func1
       at /go/pkg/mod/github.com/apache/pulsar-client-go@v0.7.0-candidate-1.0.20211104025721-d80a722ac1ab/pulsar/producer_impl.go:220
   10  0x000000000095ed55 in github.com/apache/pulsar-client-go/pulsar.(*producer).internalCreatePartitionsProducers·dwrap·38
       at /go/pkg/mod/github.com/apache/pulsar-client-go@v0.7.0-candidate-1.0.20211104025721-d80a722ac1ab/pulsar/producer_impl.go:226
   11  0x000000000046ff01 in runtime.goexit
       at /usr/local/go/src/runtime/asm_amd64.s:1581
   ```
   
   For some reason, the call to _waitUntilReady()_ in _connection.go_ never completes.
   
   The other five partitions seem to have created fine: I see creation log messages for all five, and in _delve_ I can see five running copies of _failTimeoutMessages_ and _runEventsLoop_ goroutines.
   
   #### Steps to reproduce
   
   This is not an easily bug to reproduce. It seems to occur around once per 10,000 invocations of the code in question, which means it happens around once per week for me.
   
   It may be relevant that there are many new connections being made to the broker, at the time the bug occurs -- could a packet loss somewhere cause this, perhaps?
   
   I do not know how to debug this further. However, if there is debug code I can insert, please let me know and I will do so.
   
   #### System configuration
   
   Pulsar version: 2.8.1
   Client version: 0.7.0-candidate-1.0.20211104025721-d80a722ac1ab
   
   The connection here is direct to the brokers, not via a proxy.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar-client-go] bschofield edited a comment on issue #662: newPartitionProducer() occasionally hangs, waiting for a ready connection

Posted by GitBox <gi...@apache.org>.
bschofield edited a comment on issue #662:
URL: https://github.com/apache/pulsar-client-go/issues/662#issuecomment-963286629


   In this code:
   
   ```
   	for c.getState() != connectionReady {
   		c.log.Debugf("Wait until connection is ready state=%s", c.getState().String())
   		if c.getState() == connectionClosed {
   			return errors.New("connection error")
   		}
   		// wait for a new connection state change
   		c.cond.Wait()
   	}
   ```
   
   If the state of the connection changes after the for statement, but before `c.cond.Wait()` is called, what prevents a race condition?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar-client-go] bschofield commented on issue #662: newPartitionProducer() occasionally hangs, waiting for a ready connection

Posted by GitBox <gi...@apache.org>.
bschofield commented on issue #662:
URL: https://github.com/apache/pulsar-client-go/issues/662#issuecomment-963286629


   In this code:
   
   ```
   	for c.getState() != connectionReady {
   		c.log.Debugf("Wait until connection is ready state=%s", c.getState().String())
   		if c.getState() == connectionClosed {
   			return errors.New("connection error")
   		}
   		// wait for a new connection state change
   		c.cond.Wait()
   	}
   ```
   
   If the state of the connection is changed to ready after the for statement, but before `c.cond.Wait()` is called, what prevents a race condition?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar-client-go] cckellogg closed issue #662: newPartitionProducer() occasionally hangs, waiting for a ready connection

Posted by GitBox <gi...@apache.org>.
cckellogg closed issue #662:
URL: https://github.com/apache/pulsar-client-go/issues/662


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar-client-go] bschofield edited a comment on issue #662: newPartitionProducer() occasionally hangs, waiting for a ready connection

Posted by GitBox <gi...@apache.org>.
bschofield edited a comment on issue #662:
URL: https://github.com/apache/pulsar-client-go/issues/662#issuecomment-963286629


   In this code:
   
   ```
   	for c.getState() != connectionReady {
   		c.log.Debugf("Wait until connection is ready state=%s", c.getState().String())
   		if c.getState() == connectionClosed {
   			return errors.New("connection error")
   		}
   		// wait for a new connection state change
   		c.cond.Wait()
   	}
   ```
   
   If the state of the connection changes after the for statement, but before `c.cond.Wait()` is called, what prevents a race condition? It looks to me like the `cond.Broadcast()` could happen before the `cond.Wait()` registers the listener. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org