You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@submarine.apache.org by "cdmikechen (Jira)" <ji...@apache.org> on 2023/04/01 10:12:00 UTC

[jira] [Updated] (SUBMARINE-1378) The current state of the experiment should be further refined

     [ https://issues.apache.org/jira/browse/SUBMARINE-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

cdmikechen updated SUBMARINE-1378:
----------------------------------
    Issue Type: Improvement  (was: Bug)

> The current state of the experiment should be further refined
> -------------------------------------------------------------
>
>                 Key: SUBMARINE-1378
>                 URL: https://issues.apache.org/jira/browse/SUBMARINE-1378
>             Project: Apache Submarine
>          Issue Type: Improvement
>          Components: experiment
>            Reporter: cdmikechen
>            Priority: Major
>
> In some exceptions (e.g. mirror cannot be downloaded), submarine cannot listen to the actual task status and is always running now.
> For example, in the case of a image that cannot be pulled, the actual job status is as follows.
> {code}
> status:
>   conditions:
>     - lastProbeTime: '2023-04-01T03:50:53Z'
>       reason: PodInitializing
>       type: Waiting
>     - lastProbeTime: '2023-04-01T03:50:39Z'
>       message: >-
>         rpc error: code = Unknown desc = error pulling image configuration: Get
>         "https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/5c/5ccab874feb97b32099f72978f97c8e7d129fbe7577464ad49b43f58f693ca90/data?verify=1680324025-7lKdJkTa1waOdofNoPtnsjwv%2FIQ%3D":
>         EOF
>       reason: ErrImagePull
>       type: Waiting
>     - lastProbeTime: '2023-04-01T03:49:58Z'
>       message: >-
>         Back-off pulling image
>         "apache/submarine:jupyter-notebook-0.8.0-SNAPSHOT"
>       reason: ImagePullBackOff
>       type: Waiting
>     - lastProbeTime: '2023-04-01T03:49:57Z'
>       message: >-
>         rpc error: code = Unknown desc = Error response from daemon: Head
>         "https://registry-1.docker.io/v2/apache/submarine/manifests/jupyter-notebook-0.8.0-SNAPSHOT":
>         Get
>         "https://auth.docker.io/token?scope=repository%3Aapache%2Fsubmarine%3Apull&service=registry.docker.io":
>         EOF
>       reason: ErrImagePull
>       type: Waiting
>     - lastProbeTime: '2023-04-01T03:49:54Z'
>       reason: PodInitializing
>       type: Waiting
>   containerState:
>     waiting:
>       reason: PodInitializing
>   readyReplicas: 0
> {code}
> Therefore, we should refine the status a bit more.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org