You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@submarine.apache.org by "cdmikechen (Jira)" <ji...@apache.org> on 2023/04/01 10:10:00 UTC

[jira] [Created] (SUBMARINE-1378) The current state of the experiment should be further refined

cdmikechen created SUBMARINE-1378:
-------------------------------------

             Summary: The current state of the experiment should be further refined
                 Key: SUBMARINE-1378
                 URL: https://issues.apache.org/jira/browse/SUBMARINE-1378
             Project: Apache Submarine
          Issue Type: Bug
          Components: experiment
            Reporter: cdmikechen


In some exceptions (e.g. mirror cannot be downloaded), submarine cannot listen to the actual task status and is always running now.

For example, in the case of a image that cannot be pulled, the actual job status is as follows.
{code}
status:
  conditions:
    - lastProbeTime: '2023-04-01T03:50:53Z'
      reason: PodInitializing
      type: Waiting
    - lastProbeTime: '2023-04-01T03:50:39Z'
      message: >-
        rpc error: code = Unknown desc = error pulling image configuration: Get
        "https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/5c/5ccab874feb97b32099f72978f97c8e7d129fbe7577464ad49b43f58f693ca90/data?verify=1680324025-7lKdJkTa1waOdofNoPtnsjwv%2FIQ%3D":
        EOF
      reason: ErrImagePull
      type: Waiting
    - lastProbeTime: '2023-04-01T03:49:58Z'
      message: >-
        Back-off pulling image
        "apache/submarine:jupyter-notebook-0.8.0-SNAPSHOT"
      reason: ImagePullBackOff
      type: Waiting
    - lastProbeTime: '2023-04-01T03:49:57Z'
      message: >-
        rpc error: code = Unknown desc = Error response from daemon: Head
        "https://registry-1.docker.io/v2/apache/submarine/manifests/jupyter-notebook-0.8.0-SNAPSHOT":
        Get
        "https://auth.docker.io/token?scope=repository%3Aapache%2Fsubmarine%3Apull&service=registry.docker.io":
        EOF
      reason: ErrImagePull
      type: Waiting
    - lastProbeTime: '2023-04-01T03:49:54Z'
      reason: PodInitializing
      type: Waiting
  containerState:
    waiting:
      reason: PodInitializing
  readyReplicas: 0
{code}

Therefore, we should refine the status a bit more.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@submarine.apache.org
For additional commands, e-mail: dev-help@submarine.apache.org