You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@submarine.apache.org by ka...@apache.org on 2021/07/04 13:11:37 UTC
[submarine] branch master updated: SUBMARINE-863. A time lag
between the frontend and backend of MLflow
This is an automated email from the ASF dual-hosted git repository.
kaihsun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/submarine.git
The following commit(s) were added to refs/heads/master by this push:
new c66a8c6 SUBMARINE-863. A time lag between the frontend and backend of MLflow
c66a8c6 is described below
commit c66a8c6e2ae92fcbc07571a1e7ee7f0991532c2e
Author: noidname01 <ti...@gmail.com>
AuthorDate: Sun Jul 4 20:52:34 2021 +0800
SUBMARINE-863. A time lag between the frontend and backend of MLflow
### What is this PR for?
There is a time lag between the frontend and backend of MLflow.
To elaborate, the status of the button "MLflow" on the experiment page becomes clickable. However, the MLflow server has not launched, and thus when we click the button, the webpage will show "Bad gateway". Wait until the MLflow launches successfully, the button works well.
There is same problem on the status of "TensorBoard".
To solve the problem, we can use readiness probes on the pod we want to check it's ready or not.
Detail is [here](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)
Then let the front-end subscribes to it and keep updating.
### What type of PR is it?
[ Improvement ]
### Todos
### What is the Jira issue?
https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE-863
### How should this be tested?
```bash
# Step1 Launch the frontend
# At ./submarine
cd submarine-workbench/workbench-web
npm install
npm run start
# Step2 Run Submarine and Port-Forward
# At ./submarine
helm install submarine ./helm-charts/submarine
kubectl port-forward --address 0.0.0.0 service/submarine-traefik 32080:80
#Step3 Open the localhost:4200 to check the MLflow button is loading, and localhost:32080/mlflow is Service Unavailable
#Step4 Wait for the MLflow button become clickable and check localhost:32080/mlflow again, it should display normally
# If you want to test again, you can delete the pod of mlflow.
```
### Screenshots (if appropriate)
https://user-images.githubusercontent.com/55401762/124378566-9aa39c80-dce4-11eb-828d-0bf825418af3.mp4
### Questions:
* Do the license files need updating? No
* Are there breaking changes for older versions? No
* Does this need new documentation? No
Author: noidname01 <ti...@gmail.com>
Signed-off-by: Kai-Hsun Chen <ka...@apache.org>
Closes #642 from noidname01/SUBMARINE-863 and squashes the following commits:
8db87672 [noidname01] do the same thing on tensorboard
7d2187e6 [noidname01] add readiness probe and modify the frontend update
---
helm-charts/submarine/templates/submarine-mlflow.yaml | 5 +++++
helm-charts/submarine/templates/submarine-tensorboard.yaml | 6 +++++-
.../experiment/experiment-home/experiment-home.component.ts | 6 ++++--
3 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/helm-charts/submarine/templates/submarine-mlflow.yaml b/helm-charts/submarine/templates/submarine-mlflow.yaml
index 8dd4868..0caaac4 100644
--- a/helm-charts/submarine/templates/submarine-mlflow.yaml
+++ b/helm-charts/submarine/templates/submarine-mlflow.yaml
@@ -72,6 +72,11 @@ spec:
- mountPath: "/logs"
name: "volume"
subPath: "submarine-mlflow"
+ readinessProbe:
+ tcpSocket:
+ port: 5000
+ initialDelaySeconds: 60
+ periodSeconds: 10
volumes:
- name: "volume"
persistentVolumeClaim:
diff --git a/helm-charts/submarine/templates/submarine-tensorboard.yaml b/helm-charts/submarine/templates/submarine-tensorboard.yaml
index ff32ee1..f03490a 100644
--- a/helm-charts/submarine/templates/submarine-tensorboard.yaml
+++ b/helm-charts/submarine/templates/submarine-tensorboard.yaml
@@ -75,6 +75,10 @@ spec:
- mountPath: "/logs"
name: "volume"
subPath: "submarine-tensorboard"
+ readinessProbe:
+ tcpSocket:
+ port: 6006
+ periodSeconds: 10
volumes:
- name: "volume"
persistentVolumeClaim:
@@ -105,4 +109,4 @@ spec:
services:
- kind: Service
name: submarine-tensorboard-service
- port: 8080
\ No newline at end of file
+ port: 8080
diff --git a/submarine-workbench/workbench-web/src/app/pages/workbench/experiment/experiment-home/experiment-home.component.ts b/submarine-workbench/workbench-web/src/app/pages/workbench/experiment/experiment-home/experiment-home.component.ts
index 603e6f9..60ccc59 100644
--- a/submarine-workbench/workbench-web/src/app/pages/workbench/experiment/experiment-home/experiment-home.component.ts
+++ b/submarine-workbench/workbench-web/src/app/pages/workbench/experiment/experiment-home/experiment-home.component.ts
@@ -23,7 +23,7 @@ import { ExperimentFormService } from '@submarine/services/experiment.form.servi
import { ExperimentService } from '@submarine/services/experiment.service';
import { NzMessageService } from 'ng-zorro-antd';
import { interval } from 'rxjs';
-import { filter, mergeMap, take, tap, timeout } from 'rxjs/operators';
+import { filter, mergeMap, take, tap, timeout, retryWhen } from 'rxjs/operators';
import { ExperimentFormComponent } from './experiment-form/experiment-form.component';
@Component({
@@ -72,7 +72,7 @@ export class ExperimentHomeComponent implements OnInit {
this.experimentService.emitInfo(null);
this.getTensorboardInfo(1000, 50000);
- this.getMlflowInfo(1000, 50000);
+ this.getMlflowInfo(1000, 100000);
}
fetchExperimentList() {
@@ -156,6 +156,7 @@ export class ExperimentHomeComponent implements OnInit {
interval(period)
.pipe(
mergeMap(() => this.experimentService.getTensorboardInfo()), // map interval observable to tensorboardInfo observable
+ retryWhen((error) => error), // retry to get tensorboardInfo
tap((x) => console.log(x)), // monitoring the process
filter((res) => res.available), // only emit the success ones
take(1), // if succeed, stop emitting new value from source observable
@@ -174,6 +175,7 @@ export class ExperimentHomeComponent implements OnInit {
interval(period)
.pipe(
mergeMap(() => this.experimentService.getMlflowInfo()),
+ retryWhen((error) => error),
tap((x) => console.log(x)),
filter((res) => res.available),
take(1),
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org