You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@submarine.apache.org by GitBox <gi...@apache.org> on 2021/10/06 13:50:10 UTC

[GitHub] [submarine] noidname01 opened a new pull request #767: SUBMARINE-1021. Experiment Watcher

noidname01 opened a new pull request #767:
URL: https://github.com/apache/submarine/pull/767


   ### What is this PR for?
   
   Use k8s java client to build watchers of TFJobs and PytorchJobs, logging status when the experiment status change.
   [Watcher examples](https://github.com/kubernetes-client/java/blob/master/examples/examples-release-12/src/main/java/io/kubernetes/client/examples/WatchExample.java)
   
   We will create a websocket connection between server and workbench, and modify the frontend logic of workbench in the following PRs.
   
   ### What type of PR is it?
   [Feature]
   
   ### Todos
   
   None
   
   ### What is the Jira issue?
   
   https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE-1021
   
   ### How should this be tested?
   
   It will run with the initializing of k8s submitter, then keep watching the experiments.
   You can see the log when status of experiment changing.
   
   ### Screenshots (if appropriate)
   
   
   https://user-images.githubusercontent.com/55401762/136215425-5dcf4c15-3810-42b5-9511-ceb00b781405.mp4
   
   
   
   ### Questions:
   * Do the license files need updating? No
   * Are there breaking changes for older versions? No
   * Does this need new documentation? No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@submarine.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [submarine] asfgit closed pull request #767: SUBMARINE-1021. Experiment Watcher

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #767:
URL: https://github.com/apache/submarine/pull/767


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@submarine.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [submarine] noidname01 commented on a change in pull request #767: SUBMARINE-1021. Experiment Watcher

Posted by GitBox <gi...@apache.org>.
noidname01 commented on a change in pull request #767:
URL: https://github.com/apache/submarine/pull/767#discussion_r725608709



##########
File path: submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/K8sSubmitter.java
##########
@@ -133,7 +146,13 @@ public void initialize(SubmarineConfiguration conf) {
       appsV1Api = new AppsV1Api();
     }
 
-    client.setDebugging(true);
+    try {
+      watchExperiment();
+    } catch (Exception e){
+      LOG.error("Experiment watch failed. " + e.getMessage(), e);
+    }
+
+    // client.setDebugging(true);

Review comment:
       I commented this line is because Watch in Java k8s client library seems not to support debugging mode, it made the watcher failed.
   
   I'll just remove it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@submarine.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [submarine] pingsutw commented on a change in pull request #767: SUBMARINE-1021. Experiment Watcher

Posted by GitBox <gi...@apache.org>.
pingsutw commented on a change in pull request #767:
URL: https://github.com/apache/submarine/pull/767#discussion_r725445727



##########
File path: submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/K8sSubmitter.java
##########
@@ -565,6 +584,79 @@ public ServeResponse deleteServe(ServeRequest spec)
     }
   }
 
+  public void watchExperiment() throws ApiException{

Review comment:
       Could we add a unit test or integration test for it?

##########
File path: submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/K8sSubmitter.java
##########
@@ -133,7 +146,13 @@ public void initialize(SubmarineConfiguration conf) {
       appsV1Api = new AppsV1Api();
     }
 
-    client.setDebugging(true);
+    try {
+      watchExperiment();
+    } catch (Exception e){
+      LOG.error("Experiment watch failed. " + e.getMessage(), e);
+    }
+
+    // client.setDebugging(true);

Review comment:
       Remove?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@submarine.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [submarine] pingsutw commented on a change in pull request #767: SUBMARINE-1021. Experiment Watcher

Posted by GitBox <gi...@apache.org>.
pingsutw commented on a change in pull request #767:
URL: https://github.com/apache/submarine/pull/767#discussion_r729932448



##########
File path: submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/K8sSubmitter.java
##########
@@ -565,6 +583,79 @@ public ServeResponse deleteServe(ServeRequest spec)
     }
   }
 
+  public void watchExperiment() throws ApiException{
+
+    Watch<MLJob> watchTF = Watch.createWatch(
+              client,
+              api.listNamespacedCustomObjectCall(
+                      TFJob.CRD_TF_GROUP_V1,
+                      TFJob.CRD_TF_VERSION_V1,
+                      getServerNamespace(),
+                      TFJob.CRD_TF_PLURAL_V1,
+                      "true",
+                      null,
+                      null,
+                      null,
+                      null,
+                      Boolean.TRUE,
+                      null,
+                      null
+              ),
+              new TypeToken<Watch.Response<MLJob>>() {}.getType()
+      );
+
+    Watch<MLJob> watchPytorch = Watch.createWatch(
+            client,
+            api.listNamespacedCustomObjectCall(
+                    PyTorchJob.CRD_PYTORCH_GROUP_V1,
+                    PyTorchJob.CRD_PYTORCH_VERSION_V1,
+                    getServerNamespace(),
+                    PyTorchJob.CRD_PYTORCH_PLURAL_V1,
+                    "true",
+                    null,
+                    null,
+                    null,
+                    null,
+                    Boolean.TRUE,
+                    null,
+                    null
+            ),
+            new TypeToken<Watch.Response<MLJob>>() {}.getType()
+    );
+
+    ExecutorService experimentThread = Executors.newFixedThreadPool(2);
+
+    experimentThread.execute(new Runnable() {
+        @Override
+        public void run() {
+            try {
+              LOG.info("Start watching on TFJobs...");
+              for (Watch.Response<MLJob> experiment : watchTF) {
+                LOG.info("{}", experiment.object.getStatus());
+              }
+            } finally {
+              LOG.info("WATCH TFJob END");
+              throw new RuntimeException();
+            }
+        }
+    });
+
+    experimentThread.execute(new Runnable() {
+        @Override
+        public void run() {
+            try {
+              LOG.info("Start watching on PytorchJobs...");
+              for (Watch.Response<MLJob> experiment : watchPytorch) {
+                LOG.info("{}", experiment.object.getStatus());
+              }
+            } finally {
+              LOG.info("WATCH PytorchJob END");
+              throw new RuntimeException();

Review comment:
       ```suggestion
                 watchPytorch.close();
                 throw new RuntimeException();
   ```

##########
File path: submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/K8sSubmitter.java
##########
@@ -565,6 +583,79 @@ public ServeResponse deleteServe(ServeRequest spec)
     }
   }
 
+  public void watchExperiment() throws ApiException{
+
+    Watch<MLJob> watchTF = Watch.createWatch(
+              client,
+              api.listNamespacedCustomObjectCall(
+                      TFJob.CRD_TF_GROUP_V1,
+                      TFJob.CRD_TF_VERSION_V1,
+                      getServerNamespace(),
+                      TFJob.CRD_TF_PLURAL_V1,
+                      "true",
+                      null,
+                      null,
+                      null,
+                      null,
+                      Boolean.TRUE,
+                      null,
+                      null
+              ),
+              new TypeToken<Watch.Response<MLJob>>() {}.getType()
+      );
+
+    Watch<MLJob> watchPytorch = Watch.createWatch(
+            client,
+            api.listNamespacedCustomObjectCall(
+                    PyTorchJob.CRD_PYTORCH_GROUP_V1,
+                    PyTorchJob.CRD_PYTORCH_VERSION_V1,
+                    getServerNamespace(),
+                    PyTorchJob.CRD_PYTORCH_PLURAL_V1,
+                    "true",
+                    null,
+                    null,
+                    null,
+                    null,
+                    Boolean.TRUE,
+                    null,
+                    null
+            ),
+            new TypeToken<Watch.Response<MLJob>>() {}.getType()
+    );
+
+    ExecutorService experimentThread = Executors.newFixedThreadPool(2);
+
+    experimentThread.execute(new Runnable() {
+        @Override
+        public void run() {
+            try {
+              LOG.info("Start watching on TFJobs...");
+              for (Watch.Response<MLJob> experiment : watchTF) {
+                LOG.info("{}", experiment.object.getStatus());
+              }
+            } finally {
+              LOG.info("WATCH TFJob END");
+              throw new RuntimeException();

Review comment:
       ```suggestion
                 watchTF.close();
                 throw new RuntimeException();
   ```

##########
File path: submarine-server/server-submitter/submitter-k8s/src/main/java/org/apache/submarine/server/submitter/k8s/K8sSubmitter.java
##########
@@ -565,6 +584,79 @@ public ServeResponse deleteServe(ServeRequest spec)
     }
   }
 
+  public void watchExperiment() throws ApiException{

Review comment:
       okay 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@submarine.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org