You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Daren Wong (Jira)" <ji...@apache.org> on 2022/10/20 18:01:00 UTC

[jira] [Updated] (FLINK-29708) Enrich Flink Kubernetes Operator CRD error field

     [ https://issues.apache.org/jira/browse/FLINK-29708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daren Wong updated FLINK-29708:
-------------------------------
    Description: 
h1. Problem Statement:

FlinkDeployment and FlinkSessionJob CRD has a CommonStatus error field of String type. Currently, this field stores various errors such as:

1. CR validation error
2. Missing SessionJob error/ Missing JobManager deployment error
3. Unknown Job error
4. DeploymentFailedException
5. ReconciliationError such as RestClientException from Flink Internal such as FlinkRest and FlinkRuntime

It is insufficient to store each error simply as string only. We need to include some exception metadata to help operator handle this error accordingly. For example, it is very useful to know the HttpResponseStatus code from RestClientException.

h1. Proposed Solution:

1. The error field should store a JSON with exception metadata. For example:

{
    "operatorErrorType": "JobManagerNotFoundException",
    "message": "JobManager with leadership ID: 1234 was not found",
    "stackTrace": "JobManager lost connection at ....", 
    "httpResponseCode": 400
}

2. The stackTrace field can be enabled or disabled via spec change.

  was:
*Problem Statement:
*
FlinkDeployment and FlinkSessionJob CRD has a CommonStatus error field of String type. Currently, this field stores various errors such as:

1. CR validation error
2. Missing SessionJob error/ Missing JobManager deployment error
3. Unknown Job error
4. DeploymentFailedException
5. ReconciliationError such as RestClientException from Flink Internal such as FlinkRest and FlinkRuntime

It is insufficient to store each error simply as string only. We need to include some exception metadata to help operator handle this error accordingly. For example, it is very useful to know the HttpResponseStatus code from RestClientException.

*Proposed Solution:
*

1. The error field should store a JSON with exception metadata. For example:

{
    "operatorErrorType": "JobManagerNotFoundException",
    "message": "JobManager with leadership ID: 1234 was not found",
    "stackTrace": "JobManager lost connection at ....", 
    "httpResponseCode": 400
}

2. The stackTrace field can be enabled or disabled via spec change.


> Enrich Flink Kubernetes Operator CRD error field
> ------------------------------------------------
>
>                 Key: FLINK-29708
>                 URL: https://issues.apache.org/jira/browse/FLINK-29708
>             Project: Flink
>          Issue Type: Improvement
>          Components: Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.3.0
>            Reporter: Daren Wong
>            Priority: Major
>             Fix For: kubernetes-operator-1.3.0
>
>
> h1. Problem Statement:
> FlinkDeployment and FlinkSessionJob CRD has a CommonStatus error field of String type. Currently, this field stores various errors such as:
> 1. CR validation error
> 2. Missing SessionJob error/ Missing JobManager deployment error
> 3. Unknown Job error
> 4. DeploymentFailedException
> 5. ReconciliationError such as RestClientException from Flink Internal such as FlinkRest and FlinkRuntime
> It is insufficient to store each error simply as string only. We need to include some exception metadata to help operator handle this error accordingly. For example, it is very useful to know the HttpResponseStatus code from RestClientException.
> h1. Proposed Solution:
> 1. The error field should store a JSON with exception metadata. For example:
> {
>     "operatorErrorType": "JobManagerNotFoundException",
>     "message": "JobManager with leadership ID: 1234 was not found",
>     "stackTrace": "JobManager lost connection at ....", 
>     "httpResponseCode": 400
> }
> 2. The stackTrace field can be enabled or disabled via spec change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)