You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2020/04/10 12:01:33 UTC

[GitHub] [pulsar] cdbartholomew opened a new issue #6710: Health check fails when running Docker images

cdbartholomew opened a new issue #6710: Health check fails when running Docker images
URL: https://github.com/apache/pulsar/issues/6710
 
 
   **Describe the bug**
   When requesting the health check endpoint for a broker, it always fails when running inside a Docker image:
   
   ```curl localhost:8080/admin/v2/brokers/health
   
    --- An unexpected error occurred in the server ---
   
   Message: org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 5 lookup request timedout after ms 30000
   
   Stacktrace:
   
   java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 5 lookup request timedout after ms 30000
   	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
   	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
   	at java.util.concurrent.CompletableFuture.biRelay(CompletableFuture.java:1284)
   	at java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1270)
   	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
   	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
   	at org.apache.pulsar.client.impl.ProducerImpl.lambda$connectionOpened$14(ProducerImpl.java:1205)
   	at java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
   	at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
   	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
   	at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
   	at org.apache.pulsar.client.impl.ClientCnx.checkRequestTimeout(ClientCnx.java:1026)
   	at org.apache.pulsar.client.impl.ClientCnx.lambda$channelActive$0(ClientCnx.java:187)
   	at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
   	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:176)
   	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
   	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
   	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384)
   	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
   	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
   	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 5 lookup request timedout after ms 30000
   	at org.apache.pulsar.client.impl.ClientCnx.checkRequestTimeout(ClientCnx.java:1025)
   	... 10 more
   ```
   
   This problem does not happen when running the broker in standalone mode.
   
   This issue is present in master and in the v2.5.1-candidate-2 tag. In an identical setup running 2.5.0, this does not happen.
   
   **To Reproduce**
   
   1. Build Docker images like this:
   
   ```
   git checkout master
   mvn install -DskipTests
   cd docker
   ./build.sh
   ```
   
   2. Run the image in Kubernetes on minikube.
   
   3. From inside the broker container, call the health endpoint:
   
   curl localhost:8080/admin/v2/brokers/health
   
   **Expected behavior**
   Return HTTP 200.
   
   **Additional context**
   
   Every time the endpoint is called, it leaves a handing producer on the topic:
   
   ```
   bin/pulsar-admin topics stats persistent://pulsar/pulsar/10.32.0.6:8080/healthcheck
   {
     "msgRateIn" : 0.0,
     "msgThroughputIn" : 0.0,
     "msgRateOut" : 0.0,
     "msgThroughputOut" : 0.0,
     "averageMsgSize" : 0.0,
     "storageSize" : 0,
     "backlogSize" : 0,
     "publishers" : [ {
       "msgRateIn" : 0.0,
       "msgThroughputIn" : 0.0,
       "averageMsgSize" : 0.0,
       "producerId" : 0,
       "metadata" : { },
       "producerName" : "pulsar-0-2",
       "connectedSince" : "2020-04-10T11:20:20.001Z",
       "clientVersion" : "2.5.1",
       "address" : "/10.32.0.6:56852"
     }, {
       "msgRateIn" : 0.0,
       "msgThroughputIn" : 0.0,
       "averageMsgSize" : 0.0,
       "producerId" : 2,
       "metadata" : { },
       "producerName" : "pulsar-0-4",
       "connectedSince" : "2020-04-10T11:35:42.63Z",
       "clientVersion" : "2.5.1",
       "address" : "/10.32.0.6:56852"
     }, {
       "msgRateIn" : 0.0,
       "msgThroughputIn" : 0.0,
       "averageMsgSize" : 0.0,
       "producerId" : 1,
       "metadata" : { },
       "producerName" : "pulsar-0-3",
       "connectedSince" : "2020-04-10T11:34:27.901Z",
       "clientVersion" : "2.5.1",
       "address" : "/10.32.0.6:56852"
     } ],
     "subscriptions" : { },
     "replication" : { },
     "deduplicationStatus" : "Disabled",
     "bytesInCounter" : 0,
     "msgInCounter" : 0
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [pulsar] cdbartholomew commented on issue #6710: Health check fails when running Docker images

Posted by GitBox <gi...@apache.org>.
cdbartholomew commented on issue #6710: Health check fails when running Docker images
URL: https://github.com/apache/pulsar/issues/6710#issuecomment-612284242
 
 
   Turns out the build server had it's Java version upgraded to 11. Once we reverted back to Java 8, this problem goes away. My apologies for the noise. Closing.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [pulsar] cdbartholomew commented on issue #6710: Health check fails when running Docker images

Posted by GitBox <gi...@apache.org>.
cdbartholomew commented on issue #6710: Health check fails when running Docker images
URL: https://github.com/apache/pulsar/issues/6710#issuecomment-612008850
 
 
   I can confirm this issue also occurs when running in a real Kubernetes cluster, not just minikube.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [pulsar] cdbartholomew commented on issue #6710: Health check fails when running Docker images

Posted by GitBox <gi...@apache.org>.
cdbartholomew commented on issue #6710: Health check fails when running Docker images
URL: https://github.com/apache/pulsar/issues/6710#issuecomment-612058362
 
 
   There appears to be a much larger issue here. Connecting a producer with a schema is timing out (I am testing using the 2.5.1 candidate). Using the following python code running in a bastion connecting to the broker:
   
   ```
   import pulsar
   
   from pulsar.schema import *
   
   class Test(Record):
       id = Integer(required=True)
       user_id = Integer(required=False)
   
   service_url = 'pulsar://pulsar-broker:6650'
   
   
   client = pulsar.Client(service_url)
   
   noSchemaProducer = client.create_producer(
                       topic='persistent://public/default/no_schema')
   
   schemaProducer = client.create_producer(
                       topic='persistent://public/default/with_schema',
                       schema=AvroSchema(Test) )
   
   
   client.close()
   ```
   Gives this result:
   
   ```
   python3 schema.py 
   2020-04-10 14:30:18.245 INFO  ConnectionPool:85 | Created connection for pulsar://pulsar-broker:6650
   2020-04-10 14:30:18.247 INFO  ClientConnection:330 | [10.32.0.7:40630 -> 10.96.121.170:6650] Connected to broker
   2020-04-10 14:30:18.253 INFO  HandlerBase:53 | [persistent://public/default/no_schema, ] Getting connection from pool
   2020-04-10 14:30:18.254 INFO  ConnectionPool:85 | Created connection for pulsar://10.32.0.15:6650
   2020-04-10 14:30:18.254 INFO  ClientConnection:330 | [10.32.0.7:53392 -> 10.32.0.15:6650] Connected to broker
   2020-04-10 14:30:18.258 INFO  ProducerImpl:151 | [persistent://public/default/no_schema, ] Created producer on broker [10.32.0.7:53392 -> 10.32.0.15:6650] 
   2020-04-10 14:30:18.261 INFO  HandlerBase:53 | [persistent://public/default/with_schema, ] Getting connection from pool
   2020-04-10 14:30:48.263 ERROR ProducerImpl:219 | [persistent://public/default/with_schema, ] Failed to create producer: TimeOut
   2020-04-10 14:30:48.263 INFO  ProducerImpl:474 | Producer - [persistent://public/default/with_schema, ] , [batching  = off]
   Traceback (most recent call last):
     File "schema.py", line 19, in <module>
       schema=AvroSchema(Test) )
     File "/usr/local/lib/python3.7/dist-packages/pulsar/__init__.py", line 524, in create_producer
       p._producer = self._client.create_producer(topic, conf)
   Exception: Pulsar error: TimeOut
   2020-04-10 14:30:48.266 INFO  ClientConnection:1349 | [10.32.0.7:53392 -> 10.32.0.15:6650] Connection closed
   2020-04-10 14:30:48.266 INFO  ClientConnection:235 | [10.32.0.7:53392 -> 10.32.0.15:6650] Destroyed connection
   2020-04-10 14:30:48.267 INFO  ClientConnection:1349 | [10.32.0.7:40630 -> 10.96.121.170:6650] Connection closed
   2020-04-10 14:30:48.267 INFO  ProducerImpl:474 | Producer - [persistent://public/default/no_schema, pulsar-18-5] , [batching  = off]
   2020-04-10 14:30:48.267 INFO  ClientConnection:235 | [10.32.0.7:40630 -> 10.96.121.170:6650] Destroyed connection
   ```
   The first producer without schema is able to connect without issue. The second producer with a schema times out. This same Python code runs fine against 2.5.0.
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [pulsar] cdbartholomew closed issue #6710: Health check fails when running Docker images

Posted by GitBox <gi...@apache.org>.
cdbartholomew closed issue #6710: Health check fails when running Docker images
URL: https://github.com/apache/pulsar/issues/6710
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services