You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/03/18 18:36:55 UTC

[GitHub] [pulsar] michaeljmarshall opened a new pull request #14750: Set function channel to idle to prevent DNS resolution of deleted pod

michaeljmarshall opened a new pull request #14750:
URL: https://github.com/apache/pulsar/pull/14750


   ### Motivation
   
   When running the Kubernetes runtime and deleting a function, it is possible to observe the following error log:
   
   ```
   15:36:17.025 [worker-scheduler-0] INFO  org.apache.pulsar.functions.utils.Actions - Sucessfully completed action [ Deleting statefulset for function ds/default/gluon-revolut-ord-rsp-assembler ]
   15:36:17.405 [grpc-default-executor-307] WARN  io.grpc.internal.ManagedChannelImpl - [Channel<99>: (pf-ds-default-gluon-revolut-ord-rsp-assembler-0.pf-ds-default-gluon-revolut-ord-rsp-assembler.pulsar.svc.cluster.local:9093)] Failed to resolve name. status=Status{code=UNAVAILABLE, description=Unable to resolve host pf-ds-default-gluon-revolut-ord-rsp-assembler-0.pf-ds-default-gluon-revolut-ord-rsp-assembler.pulsar.svc.cluster.local, cause=java.lang.RuntimeException: java.net.UnknownHostException: pf-ds-default-gluon-revolut-ord-rsp-assembler-0.pf-ds-default-gluon-revolut-ord-rsp-assembler.pulsar.svc.cluster.local: Name or service not known
   	at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:399)
   	at io.grpc.internal.DnsNameResolver$Resolve.resolveInternal(DnsNameResolver.java:269)
   	at io.grpc.internal.DnsNameResolver$Resolve.run(DnsNameResolver.java:225)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
   	at java.base/java.lang.Thread.run(Thread.java:834)
   Caused by: java.net.UnknownHostException: pf-ds-default-gluon-revolut-ord-rsp-assembler-0.pf-ds-default-gluon-revolut-ord-rsp-assembler.pulsar.svc.cluster.local: Name or service not known
   	at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
   	at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)
   	at java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1519)
   	at java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)
   	at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1509)
   	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1368)
   	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1302)
   	at io.grpc.internal.DnsNameResolver$JdkAddressResolver.resolveAddress(DnsNameResolver.java:624)
   	at io.grpc.internal.DnsNameResolver.resolveAll(DnsNameResolver.java:367)
   	... 5 more
   }
   ```
   
   This error happens because the `ManagedChannel` is created to target the StatefulSet pods, and the function worker deletes the pods before shutting down the `ManagedChannel`. Note that there is nothing about pod deletion that triggers the GRPC client to connect to the functions. The error happens due to the way that GRPC handles DNS and its frequent DNS resolution.
   
   There are two solutions. First, we could shutdown the managed channel first, or we could set the channel to idle and prevent any new DNS resolution (as long as there are new connections). Given that the StatefulSet or the Service could fail to get deleted, it seems simpler to just set the channel to idle and then delete it after successfully deleting the function.
   
   ### Modifications
   
   * Set all channels to idle before deleting the function pods in the K8s function runtime
   * Fix some copy/pasted log lines that were a bit confusing
   * Fix typos of `sucess`.
   
   ### Verifying this change
   
   This is a trivial change that will not affect the logic of function deletion. It just ensures graceful shutdown and avoids benign errors that might otherwise confuse users.
   
   ### Does this pull request potentially affect one of the following parts:
   
   This is a backwards compatible change.
   
   ### Documentation
   
   - [x] `no-need-doc` 
     
   This is an internal cleanup.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui merged pull request #14750: Set function channel to idle to prevent DNS resolution of deleted pod

Posted by GitBox <gi...@apache.org>.
codelipenghui merged pull request #14750:
URL: https://github.com/apache/pulsar/pull/14750


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #14750: Set function channel to idle to prevent DNS resolution of deleted pod

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #14750:
URL: https://github.com/apache/pulsar/pull/14750#issuecomment-1073449441


   @freeznet @nlu90 Please help review this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org