You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/05/28 15:44:47 UTC

[GitHub] [flink] zentol opened a new pull request #12392: [FLINK-17765] Trim stack traces

zentol opened a new pull request #12392:
URL: https://github.com/apache/flink/pull/12392


   With this PR we now remove all stack trace elements from exceptions in the REST API or during job submission on the client.
   
   The full exception will still be logged; this only affects the user-facing components (REST API / CLI)
   
   The following is the error a user would receive when attempting to submit a job with an non-existent savepoint:
   
   ```
    The program finished with the following exception:
   
   org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
           at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:302)
           at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:198)
           at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:162)
           at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:689)
           at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:227)
           at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:906)
           at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:982)
           at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
           at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:982)
   Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
           at org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:304)
           at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1807)
           at org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:104)
           at org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:71)
           at org.apache.flink.streaming.examples.wordcount.WordCount.main(WordCount.java:101)
           at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.base/java.lang.reflect.Method.invoke(Method.java:566)
           at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:288)
           ... 8 more
   Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
   Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
   Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Internal server error., <Exception on server side:
   org.apache.flink.runtime.client.JobSubmissionException: Failed to submit job.
   Caused by: java.lang.RuntimeException: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
   Caused by: org.apache.flink.runtime.client.JobExecutionException: Could not set up JobManager
   Caused by: java.io.FileNotFoundException: Cannot find checkpoint or savepoint file/directory 'ashudasd' on file system 'file'.
   ```
   
   This is intentionally not implemented for the entire CLI (e.g., exceptions during the job), because here the stacktrace might contain references to user-code where we could be removing valuable information.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol closed pull request #12392: [FLINK-17765] Trim stack traces

Posted by GitBox <gi...@apache.org>.
zentol closed pull request #12392:
URL: https://github.com/apache/flink/pull/12392


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #12392: [FLINK-17765] Trim stack traces

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #12392:
URL: https://github.com/apache/flink/pull/12392#issuecomment-635431170


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit 465f0ea5684ea5721fe76bbfa4aaf3a57bb7a982 (Thu May 28 15:46:22 UTC 2020)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on pull request #12392: [FLINK-17765] Trim stack traces

Posted by GitBox <gi...@apache.org>.
zentol commented on pull request #12392:
URL: https://github.com/apache/flink/pull/12392#issuecomment-639454320


   This approach is too intrusive to be merged this late, and should include a switch to disable to trimming for debugging purposes. Moving this feature to 1.12 in FLINK-18159.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #12392: [FLINK-17765] Trim stack traces

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #12392:
URL: https://github.com/apache/flink/pull/12392#issuecomment-635460867


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "465f0ea5684ea5721fe76bbfa4aaf3a57bb7a982",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "465f0ea5684ea5721fe76bbfa4aaf3a57bb7a982",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 465f0ea5684ea5721fe76bbfa4aaf3a57bb7a982 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12392: [FLINK-17765] Trim stack traces

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12392:
URL: https://github.com/apache/flink/pull/12392#issuecomment-635460867


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "465f0ea5684ea5721fe76bbfa4aaf3a57bb7a982",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2393",
       "triggerID" : "465f0ea5684ea5721fe76bbfa4aaf3a57bb7a982",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 465f0ea5684ea5721fe76bbfa4aaf3a57bb7a982 Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2393) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on pull request #12392: [FLINK-17765] Trim stack traces

Posted by GitBox <gi...@apache.org>.
zentol commented on pull request #12392:
URL: https://github.com/apache/flink/pull/12392#issuecomment-639095073


   > I think the PR goes a bit in a different direction than I intended it to go initially.
   
   Yes that is true, but I feel like this approach is more universal. It pretty much _solves_ the problem.
   You can have a chain of 20 exceptions and the result would still be readable.
   
   We can (and should) go around and remove some of the re-wrapping, but my concern is that these are very limited in scope. You get rid of some small thing here and there, but the exception trace can still blow up. And going through all the code-paths in search for small optimization opportunities sounds rather time intensive; time which we don't have right now.
   
   > Trying to filter out StackTraceElement coming from CompletableFuture since they add often only noise.
   
   We can _probably_ do this; we also have to account for java 11 since the stack trace element will likely have a `java/base` prefix (but this isn't too difficult). I guess this would be less invasive than the current approach (but also less powerful).
   Or do you want this to be done everywhere (also in logs)? They are also mostly noise in the logs after all. But then we're basically talking about stripping execution exceptions wherever possible, which has similar problems as the re-wrapping stuff. Lots of places to check, tiny individual pay off, easy to re-introduce.
   
   > Hiding the client side stack traces.
   
   You do want client-side stacktraces to some degree, specifically when user-code is involved. In the example above in the PR description, this line should be present in all cases, because the user should have the information on what line of his code resulted in an error:
   
   ```
   at org.apache.flink.streaming.examples.wordcount.WordCount.main(WordCount.java:101)
   ```
   
   This makes this a bit tricky in regards to where you draw the line; hence why I was conservative on the client-side.
   
   The reconstructing part...well it would be neat but this could only work if there were an well-defined format for stacktraces, and I don't think there is. We currently format exceptions as a single string, and rebuilding the stack trace element could fail in weird ways if you have unusual exceptions (e.g., containing blank lines). In fact our current client exceptions are an excellent example for being unusual.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #12392: [FLINK-17765] Trim stack traces

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #12392:
URL: https://github.com/apache/flink/pull/12392#issuecomment-635460867


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "465f0ea5684ea5721fe76bbfa4aaf3a57bb7a982",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2393",
       "triggerID" : "465f0ea5684ea5721fe76bbfa4aaf3a57bb7a982",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 465f0ea5684ea5721fe76bbfa4aaf3a57bb7a982 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=2393) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run travis` re-run the last Travis build
    - `@flinkbot run azure` re-run the last Azure build
   </details>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org