You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@heron.apache.org by GitBox <gi...@apache.org> on 2020/06/18 17:27:05 UTC

[GitHub] [incubator-heron] windhamwong opened a new issue #3542: Kubernetes additional work required for deploying topology

windhamwong opened a new issue #3542:
URL: https://github.com/apache/incubator-heron/issues/3542


   Kubernetes version: 1.18.2+k3s1
   I am trying out to deploy `heron-api-examples.jar` as suggested in the doc. After some troubleshooting (not just some but a lot) and found that the created pods (`acking-0`, `acking-1`, `acking-2`) require to connect together. The `acking-0` is responsible for launching servers for stream manager (6001/TCP), tmaster controller (6002/TCP) and tmaster state (6003/TCP). As k8s requires `Service` to expose ports to allow external or other containers to connect, we need additional code in Heron k8s source to generate these services as well. The following k8s yaml code has been tested and working for me.
   
   I will keep on testing out as the topology is still not working as the demo expectation. I will come back to add more after.
   
   ```
   apiVersion: v1
   kind: Service
   metadata:
     name: acking-0
   spec:
     selector:
       statefulset.kubernetes.io/pod-name: acking-0
     ports:
       - name: stream-manager
         protocol: TCP
         port: 6001
         targetPort: 6001
       - name: tmaster-controller
         protocol: TCP
         port: 6002
         targetPort: 6002
       - name: tmaster-state
         protocol: TCP
         port: 6003
         targetPort: 6003
       - name: heron-shell
         protocol: TCP
         port: 6004
         targetPort: 6004
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-heron] nicknezis commented on issue #3542: Kubernetes additional work required for deploying topology

Posted by GitBox <gi...@apache.org>.
nicknezis commented on issue #3542:
URL: https://github.com/apache/incubator-heron/issues/3542#issuecomment-650497830


   I've discussed this with @windhamwong and have come up with a proposed approach to handle Heron topologies in Kubernetes. We found that it does work, but there are some edge cases that can cause the topology StatefulSet to fail.
   
   1. TMaster looks for `/.dockerenv` to determine if Tmaster is running in a container. I have found situations in which the pod does not have this file (i.e. Kind and K3s) [Tmaster code](https://github.com/apache/incubator-heron/blob/cc815d85305dc0b665a2ccb42113cf7a49b1eb0a/heron/executor/src/python/heron_executor.py#L232)
   2. If TMaster does find `/.dockerenv` it will try to use the `HOST` environment variable. I have found some use cases in which the Pod does not have this set (i.e. Kind).
   3. If both of these work, then the TMaster and Stmgr processes will use the pod's IP address. If either fails, then the `socket.hostname()` call will return the pod name, which is not stored in the Kubernetes cluster DNS.
   4. To enable the use of the hostname, we need to have a Headless Service registered.
   
   The proposal:
   1. Update Kubernetes Scheduler code to create a matching Headless Service for each topology created.
   2. Update the Kubernetes Scheduler code to add a custom ENV variable on the StatefulSet (i.e. `HERON_HOSTNAME`)
   3. Update the TMaster logic that checks for `/.dockerenv` to instead first check for `HERON_HOSTNAME` variable.
   
   If we make these changes, this issue would be resolved.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-heron] nicknezis commented on issue #3542: Kubernetes additional work required for deploying topology

Posted by GitBox <gi...@apache.org>.
nicknezis commented on issue #3542:
URL: https://github.com/apache/incubator-heron/issues/3542#issuecomment-658298595


   This issue should now be resolved with the fix in #3550 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-heron] nicknezis commented on issue #3542: Kubernetes additional work required for deploying topology

Posted by GitBox <gi...@apache.org>.
nicknezis commented on issue #3542:
URL: https://github.com/apache/incubator-heron/issues/3542#issuecomment-647085583


   Is this an issue with newer Kubernetes? How best can I reproduce the issue you are seeing. 
   
   Kubernetes services are not needed for exposing ports, but are used for Service Discovery and for providing a stable IP as opposed to Pod IP which will change on pod restart. I believe the way the current Topologies are run as StatefulSets in Kubernetes, each pod has a specific name which can be referenced by the other pods in the StatefulSet. It might still be a good idea to create the Service, but I don't think this should be causing the Topology pods to fail to connect.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-heron] windhamwong commented on issue #3542: Kubernetes additional work required for deploying topology

Posted by GitBox <gi...@apache.org>.
windhamwong commented on issue #3542:
URL: https://github.com/apache/incubator-heron/issues/3542#issuecomment-646201924


   Current situation:
   Heron UI shows only the topology layout, but no details of config, count and instance log/info. Bolt seems to be working and showing logs of, 
   ```
   [2020-06-18 17:27:28 +0000] [STDOUT] stdout: Bolt processed 64980000 tuples in 525055 ms
   [2020-06-18 17:27:28 +0000] [STDOUT] stdout: Bolt processed 64990000 tuples in 525067 ms
   ```
   
   I guess the Heron doc of k8s cluster setup, and the source for k8s require a lot of work to support the latest k8s version.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-heron] nicknezis closed issue #3542: Kubernetes additional work required for deploying topology

Posted by GitBox <gi...@apache.org>.
nicknezis closed issue #3542:
URL: https://github.com/apache/incubator-heron/issues/3542


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-heron] windhamwong commented on issue #3542: Kubernetes additional work required for deploying topology

Posted by GitBox <gi...@apache.org>.
windhamwong commented on issue #3542:
URL: https://github.com/apache/incubator-heron/issues/3542#issuecomment-649713583


   I believe this is an issue of newer K8s as statefulset requires service to work properly now. I tried to resolve the targeted pod without service but never works, i.e. with just name or with the full domain,  `<pod>.<namespace>.svc.cluster.local`. Please suggest if I am wrong here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org