You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@submarine.apache.org by ka...@apache.org on 2021/07/04 13:11:37 UTC

[submarine] branch master updated: SUBMARINE-863. A time lag between the frontend and backend of MLflow

This is an automated email from the ASF dual-hosted git repository.

kaihsun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/submarine.git


The following commit(s) were added to refs/heads/master by this push:
     new c66a8c6  SUBMARINE-863. A time lag between the frontend and backend of MLflow
c66a8c6 is described below

commit c66a8c6e2ae92fcbc07571a1e7ee7f0991532c2e
Author: noidname01 <ti...@gmail.com>
AuthorDate: Sun Jul 4 20:52:34 2021 +0800

    SUBMARINE-863. A time lag between the frontend and backend of MLflow
    
    ### What is this PR for?
    There is a time lag between the frontend and backend of MLflow.
    To elaborate, the status of the button "MLflow" on the experiment page becomes clickable. However, the MLflow server has not launched, and thus when we click the button, the webpage will show "Bad gateway". Wait until the MLflow launches successfully, the button works well.
    There is same problem on the status of "TensorBoard".
    
    To solve the problem, we can use readiness probes on the pod we want to check it's ready or not.
    Detail is [here](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)
    Then let the front-end subscribes to it and keep updating.
    
    ### What type of PR is it?
    [ Improvement ]
    
    ### Todos
    
    ### What is the Jira issue?
    
    https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE-863
    
    ### How should this be tested?
    
    ```bash
    # Step1 Launch the frontend
    # At ./submarine
    cd submarine-workbench/workbench-web
    npm install
    npm run start
    
    # Step2 Run Submarine and Port-Forward
    # At ./submarine
    helm install submarine ./helm-charts/submarine
    kubectl port-forward --address 0.0.0.0 service/submarine-traefik 32080:80
    
    #Step3 Open the localhost:4200 to check the MLflow button is loading, and localhost:32080/mlflow is Service Unavailable
    
    #Step4  Wait for the MLflow button become clickable and check localhost:32080/mlflow again, it should display normally
    # If you want to test again, you can delete the pod of mlflow.
    ```
    
    ### Screenshots (if appropriate)
    
    https://user-images.githubusercontent.com/55401762/124378566-9aa39c80-dce4-11eb-828d-0bf825418af3.mp4
    
    ### Questions:
    * Do the license files need updating? No
    * Are there breaking changes for older versions? No
    * Does this need new documentation? No
    
    Author: noidname01 <ti...@gmail.com>
    
    Signed-off-by: Kai-Hsun Chen <ka...@apache.org>
    
    Closes #642 from noidname01/SUBMARINE-863 and squashes the following commits:
    
    8db87672 [noidname01] do the same thing on tensorboard
    7d2187e6 [noidname01] add readiness probe and modify the frontend update
---
 helm-charts/submarine/templates/submarine-mlflow.yaml               | 5 +++++
 helm-charts/submarine/templates/submarine-tensorboard.yaml          | 6 +++++-
 .../experiment/experiment-home/experiment-home.component.ts         | 6 ++++--
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/helm-charts/submarine/templates/submarine-mlflow.yaml b/helm-charts/submarine/templates/submarine-mlflow.yaml
index 8dd4868..0caaac4 100644
--- a/helm-charts/submarine/templates/submarine-mlflow.yaml
+++ b/helm-charts/submarine/templates/submarine-mlflow.yaml
@@ -72,6 +72,11 @@ spec:
           - mountPath: "/logs"
             name: "volume"
             subPath: "submarine-mlflow"
+        readinessProbe:
+          tcpSocket:
+            port: 5000
+          initialDelaySeconds: 60
+          periodSeconds: 10
       volumes:
         - name: "volume"
           persistentVolumeClaim:
diff --git a/helm-charts/submarine/templates/submarine-tensorboard.yaml b/helm-charts/submarine/templates/submarine-tensorboard.yaml
index ff32ee1..f03490a 100644
--- a/helm-charts/submarine/templates/submarine-tensorboard.yaml
+++ b/helm-charts/submarine/templates/submarine-tensorboard.yaml
@@ -75,6 +75,10 @@ spec:
           - mountPath: "/logs"
             name: "volume"
             subPath: "submarine-tensorboard"
+        readinessProbe:
+          tcpSocket:
+            port: 6006
+          periodSeconds: 10
       volumes:
         - name: "volume"
           persistentVolumeClaim:
@@ -105,4 +109,4 @@ spec:
     services:
     - kind: Service
       name: submarine-tensorboard-service
-      port: 8080
\ No newline at end of file
+      port: 8080
diff --git a/submarine-workbench/workbench-web/src/app/pages/workbench/experiment/experiment-home/experiment-home.component.ts b/submarine-workbench/workbench-web/src/app/pages/workbench/experiment/experiment-home/experiment-home.component.ts
index 603e6f9..60ccc59 100644
--- a/submarine-workbench/workbench-web/src/app/pages/workbench/experiment/experiment-home/experiment-home.component.ts
+++ b/submarine-workbench/workbench-web/src/app/pages/workbench/experiment/experiment-home/experiment-home.component.ts
@@ -23,7 +23,7 @@ import { ExperimentFormService } from '@submarine/services/experiment.form.servi
 import { ExperimentService } from '@submarine/services/experiment.service';
 import { NzMessageService } from 'ng-zorro-antd';
 import { interval } from 'rxjs';
-import { filter, mergeMap, take, tap, timeout } from 'rxjs/operators';
+import { filter, mergeMap, take, tap, timeout, retryWhen } from 'rxjs/operators';
 import { ExperimentFormComponent } from './experiment-form/experiment-form.component';
 
 @Component({
@@ -72,7 +72,7 @@ export class ExperimentHomeComponent implements OnInit {
 
     this.experimentService.emitInfo(null);
     this.getTensorboardInfo(1000, 50000);
-    this.getMlflowInfo(1000, 50000);
+    this.getMlflowInfo(1000, 100000);
   }
 
   fetchExperimentList() {
@@ -156,6 +156,7 @@ export class ExperimentHomeComponent implements OnInit {
     interval(period)
       .pipe(
         mergeMap(() => this.experimentService.getTensorboardInfo()), // map interval observable to tensorboardInfo observable
+        retryWhen((error) => error), //  retry to get tensorboardInfo
         tap((x) => console.log(x)), // monitoring the process
         filter((res) => res.available), // only emit the success ones
         take(1), // if succeed, stop emitting new value from source observable
@@ -174,6 +175,7 @@ export class ExperimentHomeComponent implements OnInit {
     interval(period)
       .pipe(
         mergeMap(() => this.experimentService.getMlflowInfo()),
+        retryWhen((error) => error),
         tap((x) => console.log(x)),
         filter((res) => res.available),
         take(1),

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org