You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2021/10/29 13:36:54 UTC

[GitHub] [flink] zentol opened a new pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

zentol opened a new pull request #17606:
URL: https://github.com/apache/flink/pull/17606


   Fixes an issue where issues during the deserialization of RPC messages were silently ignored. We now forward the error to the returned future instead.
   
   @tillrohrmann Could you take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957180521


   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957180521






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808",
       "triggerID" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ade8b47510aaf11753d52b1283588366107245a0 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786) 
   * a9ce74387e3be7c9fa8e4309886c9b86825c3aea Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804) 
   * 61532812701d75c0ce669f42ec47f5de4a588b70 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808) 
   * a9bc844925ae7e67e5574db7c765650b0c85fc4b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808",
       "triggerID" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ade8b47510aaf11753d52b1283588366107245a0 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786) 
   * a9ce74387e3be7c9fa8e4309886c9b86825c3aea Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804) 
   * 61532812701d75c0ce669f42ec47f5de4a588b70 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808) 
   * a9bc844925ae7e67e5574db7c765650b0c85fc4b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957620163






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957620163






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 308144b47092d73d777530ce5d0a11c85d7ab5c6 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741) 
   * ade8b47510aaf11753d52b1283588366107245a0 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786) 
   * a9ce74387e3be7c9fa8e4309886c9b86825c3aea UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808",
       "triggerID" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25814",
       "triggerID" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a9ce74387e3be7c9fa8e4309886c9b86825c3aea Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804) 
   * 61532812701d75c0ce669f42ec47f5de4a588b70 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808) 
   * a9bc844925ae7e67e5574db7c765650b0c85fc4b Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25814) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 308144b47092d73d777530ce5d0a11c85d7ab5c6 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741) 
   * ade8b47510aaf11753d52b1283588366107245a0 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786) 
   * a9ce74387e3be7c9fa8e4309886c9b86825c3aea Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957309354






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 308144b47092d73d777530ce5d0a11c85d7ab5c6 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r739241181



##########
File path: flink-rpc/flink-rpc-akka/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaInvocationHandler.java
##########
@@ -240,17 +241,20 @@ private Object invokeRpc(Method method, Object[] args) throws Exception {
             final CompletableFuture<?> resultFuture = ask(rpcInvocation, futureTimeout);
 
             final CompletableFuture<Object> completableFuture = new CompletableFuture<>();
-            resultFuture.whenComplete(
-                    (resultValue, failure) -> {
-                        if (failure != null) {
-                            completableFuture.completeExceptionally(
-                                    resolveTimeoutException(
-                                            failure, callStackCapture, address, rpcInvocation));
-                        } else {
-                            completableFuture.complete(
-                                    deserializeValueIfNeeded(resultValue, method));
-                        }
-                    });
+            FutureUtils.forward(

Review comment:
       I was wondering if we even need the forwarding, or if we couldn't just do
   
   ```
   final CompletableFuture<Object> completableFuture = 
     ask(rpcInvocation, futureTimeout)
       .thenApply(...)
       .exceptionally(...);
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954753157


   Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress of the review.
   
   
   ## Automated Checks
   Last check on commit d65fff260e684b231835a04583b3e9d600ae96d6 (Fri Oct 29 13:40:00 UTC 2021)
   
   **Warnings:**
    * No documentation files were touched! Remember to keep the Flink docs up to date!
   
   
   <sub>Mention the bot in a comment to re-run the automated checks.</sub>
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full explanation of the review process.<details>
    The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot approve description` to approve one or more aspects (aspects: `description`, `consensus`, `architecture` and `quality`)
    - `@flinkbot approve all` to approve all aspects
    - `@flinkbot approve-until architecture` to approve everything until `architecture`
    - `@flinkbot attention @username1 [@username2 ..]` to require somebody's attention
    - `@flinkbot disapprove architecture` to remove an approval you gave earlier
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d65fff260e684b231835a04583b3e9d600ae96d6 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-956207150


   That test failure is driving me crazy.
   
   With the original code the test passes.
   ```
               resultFuture.whenComplete(
                       (resultValue, failure) -> {
                           if (failure != null) {
                               completableFuture.completeExceptionally(
                                       resolveTimeoutException(
                                               failure, callStackCapture, address, rpcInvocation));
                           } else {
                               completableFuture.complete(
                                       deserializeValueIfNeeded(resultValue, method));
                           }
                       });
   ```
   If I move the deserialization into a separate thenApply then it fails.
   ```
               resultFuture
                       .thenApply(resultValue -> deserializeValueIfNeeded(resultValue, method))
                       .whenComplete(
                               (resultValue, failure) -> {
                                   if (failure != null) {
                                       completableFuture.completeExceptionally(
                                               resolveTimeoutException(
                                                       failure,
                                                       callStackCapture,
                                                       address,
                                                       rpcInvocation));
                                   } else {
                                       completableFuture.complete(resultValue);
                                   }
                               });
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-956207150


   That test failure is driving me crazy.
   
   With the original code the test passes.
   ```
               resultFuture
                       .whenComplete(
                               (resultValue, failure) -> {
                                   if (failure != null) {
                                       completableFuture.completeExceptionally(
                                               resolveTimeoutException(
                                                       failure, callStackCapture, address, rpcInvocation));
                                   } else {
                                       completableFuture.complete(
                                               deserializeValueIfNeeded(resultValue, method));
                                   }
                               });
   ```
   If I move the deserialization into a separate thenApply then it fails.
   ```
               resultFuture
                       .thenApply(resultValue -> deserializeValueIfNeeded(resultValue, method))
                       .whenComplete(
                               (resultValue, failure) -> {
                                   if (failure != null) {
                                       completableFuture.completeExceptionally(
                                               resolveTimeoutException(
                                                       failure, callStackCapture, address, rpcInvocation));
                                   } else {
                                       completableFuture.complete(resultValue);
                                   }
                               });
   ```
   Sure, I could explain it if `deserializeValueIfNeeded` throws an exception, but that's not the case. 💢 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 308144b47092d73d777530ce5d0a11c85d7ab5c6 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 308144b47092d73d777530ce5d0a11c85d7ab5c6 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r740943042



##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       I can also reproduce it locally where there are no processing gaps.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957663355






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957738541


   The advantage of increasing the restart attempts instead of the delay is that the test will on average run faster. With the increased delay, the execution of the test will take at least as much longer as you've increased the delay.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957309354






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-956207150


   That test failure is driving me crazy.
   
   With the original code the test passes.
   ```
               resultFuture
                       .whenComplete(
                               (resultValue, failure) -> {
                                   if (failure != null) {
                                       completableFuture.completeExceptionally(
                                               resolveTimeoutException(
                                                       failure, callStackCapture, address, rpcInvocation));
                                   } else {
                                       completableFuture.complete(
                                               deserializeValueIfNeeded(resultValue, method));
                                   }
                               });
   ```
   If I move the deserialization into a separate thenApply then it fails.
   ```
               resultFuture
                       .thenApply(resultValue -> deserializeValueIfNeeded(resultValue, method))
                       .whenComplete(
                               (resultValue, failure) -> {
                                   if (failure != null) {
                                       completableFuture.completeExceptionally(
                                               resolveTimeoutException(
                                                       failure,
                                                       callStackCapture,
                                                       address,
                                                       rpcInvocation));
                                   } else {
                                       completableFuture.complete(resultValue);
                                   }
                               });
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-956300547


   It may be that the test is simply not stable.
   What could be happening is that the jobs fails once because the TM loss is noticed by another TM, and then fails again when the JM later attempts to deploy a task to the lost TM.
   
   It's suspicious how _consistently_ the test failed when I switched between the PR and current master, but I also ran into the issue once without any changes...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 308144b47092d73d777530ce5d0a11c85d7ab5c6 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-955057857


   hmm...the `TaskManagerProcessFailureBatchRecoveryITCase` is failing pretty reliably now. I'm wondering if this issue was masking another failure as well...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957309354


   From the logs of the failed test run it looks as if we aren't able to detect the dead TM via the lost heartbeats that should be reported to the dead letter queue. From the logs it also looks as if we aren't experiencing a long processing gap which can usually explain a situation like this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957678218


   > This has never been a problem though.
   
   Isn't that because previously we still waited for the heartbeat to time out (10 seconds in this test)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957663355


   I don't know exactly how fast Akka can detect it. I would assume that this can happen quite quickly once the TCP connection is terminated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808",
       "triggerID" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25814",
       "triggerID" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ade8b47510aaf11753d52b1283588366107245a0 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786) 
   * a9ce74387e3be7c9fa8e4309886c9b86825c3aea Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804) 
   * 61532812701d75c0ce669f42ec47f5de4a588b70 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808) 
   * a9bc844925ae7e67e5574db7c765650b0c85fc4b Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25814) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-959076412


   > I don't have a strong opinion on this. A single restart results in cleaner logs.
   If we're aiming to do at most 1-2 restarts then the difference will be negligible either way (+-2s)🤷 The test already needs 30-50 seconds.
   
   Fair enough, then lets go with a single restart and an extended delay.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol merged pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol merged pull request #17606:
URL: https://github.com/apache/flink/pull/17606


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r740963885



##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       I'm wondering if the exception the HeartbeatMonitorImpl sees could be wrapped in a CompletionException...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957620163


   How long does it actually take for Akka to consider something to be unreachable?
   
   It takes quite a while for this to show up after the TM has been killed:
   ```
   441053 [flink-akka.actor.default-dispatcher-23] WARN  akka.remote.transport.netty.NettyTransport [] - Remote connection to [null] failed with java.net.ConnectException: Connection refused: no further information: /127.0.0.1:58498
   441056 [flink-akka.actor.default-dispatcher-23] WARN  akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink@127.0.0.1:58498] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@127.0.0.1:58498]] Caused by: [java.net.ConnectException: Connection refused: no further information: /127.0.0.1:58498]
   ```
   
   After that we immediately throw `RecipientUnreachableExceptions`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957620163


   How long does it actually take for Akka to consider something to be unreachable?
   
   It takes quite a while for this to show up:
   ```
   441053 [flink-akka.actor.default-dispatcher-23] WARN  akka.remote.transport.netty.NettyTransport [] - Remote connection to [null] failed with java.net.ConnectException: Connection refused: no further information: /127.0.0.1:58498
   441056 [flink-akka.actor.default-dispatcher-23] WARN  akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink@127.0.0.1:58498] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@127.0.0.1:58498]] Caused by: [java.net.ConnectException: Connection refused: no further information: /127.0.0.1:58498]
   ```
   
   After that we immediately throw `RecipientUnreachableExceptions.`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957635293


   So yeah, increasing the restart delay to ~5 seconds would probably do the trick.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r740932666



##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       The issue depends on when the first heartbeat is sent after the TM went down.
   
   If it happens before the restart, then we properly remove the TM and don't use it later on.
   If it does not happen before the restart, then the job fails later on a second time while attempting to deploy task to the same TM.
   
   I think this change just ever so slightly adjusts the timings to make it more common; it can already happen in the current master.

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       The issue depends on when the first heartbeat is sent after the TM went down (and would then be considered unreachable).
   
   If it happens before the restart, then we properly remove the TM and don't use it later on.
   If it does not happen before the restart, then the job fails later on a second time while attempting to deploy task to the same TM.
   
   I think this change just ever so slightly adjusts the timings to make it more common; it can already happen in the current master.

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       - JM sends heartbeat to TM1 (could be transmitted, but may not get a response)
   - TM1 is killed
   - job restarts (not delayed by missing TM1 because cancelTask RPCs fail immediately)
   - JM sends heartbeat to TM1 (could not be transmitted)
   - job restarts second time

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       - JM sends heartbeat to TM1 (transmitted, but may not get a response)
   - TM1 is killed
   - job restarts (not delayed by missing TM1 because cancelTask RPCs fail immediately)
   - JM sends heartbeat to TM1 (not transmitted)
   - job restarts second time

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       - JM sends heartbeat request to TM1 (transmitted, but may not get a response)
   - TM1 is killed
   - job restarts (not delayed by missing TM1 because cancelTask RPCs fail immediately)
   - JM sends heartbeat request to TM1 (not transmitted)
   - job restarts second time

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       I can also reproduce it locally where there are no processing gaps.

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       The restart delay is a fair point, I'll check the logs again.

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       It looks like the failed heartbeats are being ignored while we are waiting for the restart delay:
   
   ```
   // 64589-0ef458 is the killed TM
   
   // this is the last heartbeat
   17532 o.a.f.r.jobmaster.JobMaster [] - Received heartbeat from 127.0.0.1:64589-0ef458.
   ...
   <start job restart>
   <cancel slot requests>
   <cleanup partitions>
   <various failed cancelTask RPCs>
   17740 <reduce resource requirements to 0>
   ...
   17440... JM idling, sending heartbeat requests
   19212 o.a.f.r.jobmaster.JobMaster [] - Archive local failure causing attempt 05bcf9159a5a301d2f7b6566111235da to fail
   ...
   19213  o.a.f.r.executiongraph.ExecutionGraph [] - Job Flink Java Job at Mon Nov 01 15:42:30 CET 2021 (4daf5dcbf65f7cd384ac228ad72ab5c6) switched from state RESTARTING to RUNNING.
   19777  o.a.f.r.jobmaster.JobMaster [] - TaskManager with id 127.0.0.1:64589-0ef458 is no longer reachable.
   19777 o.a.f.r.jobmaster.JobMaster [] - Disconnect TaskExecutor 127.0.0.1:64589-0ef458 because: TaskManager with id 127.0.0.1:64589-0ef458 is no longer reachable.
   ```

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       I'm wondering if the exception the HeartbeatMonitorImpl sees could be wrapped in a CompletionException...

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       💢 
   ```
   21440 o.a.f.r.j.JobMaster [] - #handleHeartbeatRpcFailure exception
   java.util.concurrent.CompletionException: org.apache.flink.runtime.rpc.exceptions.RecipientUnreachableException
   ```

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       I suppose stripping the `CompletionException` in the `HeartbeatManagerImpl` should be done in any case because it is so easy to introduce bugs like this.
   
   I'm curious though whether we should revert the `AkkaInvocationHandler` to again to the manual forwarding of the result...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ade8b47510aaf11753d52b1283588366107245a0 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786) 
   * a9ce74387e3be7c9fa8e4309886c9b86825c3aea Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804) 
   * 61532812701d75c0ce669f42ec47f5de4a588b70 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957696510


   Maybe. You could enable debug logs and check whether we send heartbeat requests after the TM died. If these requests are not failed, then it is most likely because Akka hasn't detected that the connection is dead yet. Maybe this is also something we can configure in Akka.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808",
       "triggerID" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ade8b47510aaf11753d52b1283588366107245a0 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786) 
   * a9ce74387e3be7c9fa8e4309886c9b86825c3aea Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804) 
   * 61532812701d75c0ce669f42ec47f5de4a588b70 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808",
       "triggerID" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ade8b47510aaf11753d52b1283588366107245a0 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786) 
   * a9ce74387e3be7c9fa8e4309886c9b86825c3aea Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804) 
   * 61532812701d75c0ce669f42ec47f5de4a588b70 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808) 
   * a9bc844925ae7e67e5574db7c765650b0c85fc4b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808",
       "triggerID" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ade8b47510aaf11753d52b1283588366107245a0 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786) 
   * a9ce74387e3be7c9fa8e4309886c9b86825c3aea Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804) 
   * 61532812701d75c0ce669f42ec47f5de4a588b70 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d65fff260e684b231835a04583b3e9d600ae96d6 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d65fff260e684b231835a04583b3e9d600ae96d6 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645) 
   * 3652a3b41d02f84d54924930705ae710d7f7cccc Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r740944848



##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       But maybe the error margin is still too small and we should either increase the number of restart attempts or the restart delay.
   
   For reference I tried to harden the test case via #17107.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 3652a3b41d02f84d54924930705ae710d7f7cccc Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957180521






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808",
       "triggerID" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25814",
       "triggerID" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a9bc844925ae7e67e5574db7c765650b0c85fc4b Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25814) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-959076412


   > I don't have a strong opinion on this. A single restart results in cleaner logs.
   If we're aiming to do at most 1-2 restarts then the difference will be negligible either way (+-2s)🤷 The test already needs 30-50 seconds.
   
   Fair enough, then lets go with a single restart and an extended delay.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r740942116



##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       For this test, we have the heartbeat interval set to 200ms and the restart delay to 1.5s. Hence there should be multiple heartbeats being sent during the restart delay. Moreover, we mark TMs as unreachable after a single lost message. Hence, I can only think of a processing gap on the test machine to explain this situation. But the logs say differently :-(




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r740944174



##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       The restart delay is a fair point, I'll check the logs again.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r741094093



##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       Hmm, with removing the forwarding we do indeed change the exception type for consumers of the returned future. If we don't want to accept this risk, then I think it is probably safer to revert this change.
   
   In general, consumers of futures should handle the `CompletionException` case though (e.g. via stripping or using `ExceptionUtils.containsCause`).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r740935146



##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       - JM sends heartbeat request to TM1 (transmitted, but may not get a response)
   - TM1 is killed
   - job restarts (not delayed by missing TM1 because cancelTask RPCs fail immediately)
   - JM sends heartbeat request to TM1 (not transmitted)
   - job restarts second time




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808",
       "triggerID" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ade8b47510aaf11753d52b1283588366107245a0 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786) 
   * a9ce74387e3be7c9fa8e4309886c9b86825c3aea Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804) 
   * 61532812701d75c0ce669f42ec47f5de4a588b70 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3652a3b41d02f84d54924930705ae710d7f7cccc Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657) 
   * 308144b47092d73d777530ce5d0a11c85d7ab5c6 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r740889079



##########
File path: flink-rpc/flink-rpc-akka/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaInvocationHandler.java
##########
@@ -240,17 +241,20 @@ private Object invokeRpc(Method method, Object[] args) throws Exception {
             final CompletableFuture<?> resultFuture = ask(rpcInvocation, futureTimeout);
 
             final CompletableFuture<Object> completableFuture = new CompletableFuture<>();
-            resultFuture.whenComplete(
-                    (resultValue, failure) -> {
-                        if (failure != null) {
-                            completableFuture.completeExceptionally(
-                                    resolveTimeoutException(
-                                            failure, callStackCapture, address, rpcInvocation));
-                        } else {
-                            completableFuture.complete(
-                                    deserializeValueIfNeeded(resultValue, method));
-                        }
-                    });
+            FutureUtils.forward(

Review comment:
       I think you are right. We don't need the `forward` here.

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       I don't fully understand why this change is now required. Can we explain why this is the case?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r740969750



##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       💢 
   ```
   21440 o.a.f.r.j.JobMaster [] - #handleHeartbeatRpcFailure exception
   java.util.concurrent.CompletionException: org.apache.flink.runtime.rpc.exceptions.RecipientUnreachableException
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957696510


   Maybe. You could enable debug logs and check whether we send heartbeat requests after the TM died. If these requests are not failed, then it is most likely because Akka hasn't detected that the connection is dead yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r740935146



##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       - JM sends heartbeat to TM1 (transmitted, but may not get a response)
   - TM1 is killed
   - job restarts (not delayed by missing TM1 because cancelTask RPCs fail immediately)
   - JM sends heartbeat to TM1 (not transmitted)
   - job restarts second time




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r740957930



##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       It looks like the failed heartbeats are being ignored while we are waiting for the restart delay:
   
   ```
   // 64589-0ef458 is the killed TM
   
   // this is the last heartbeat
   17532 o.a.f.r.jobmaster.JobMaster [] - Received heartbeat from 127.0.0.1:64589-0ef458.
   ...
   <start job restart>
   <cancel slot requests>
   <cleanup partitions>
   <various failed cancelTask RPCs>
   17740 <reduce resource requirements to 0>
   ...
   17440... JM idling, sending heartbeat requests
   19212 o.a.f.r.jobmaster.JobMaster [] - Archive local failure causing attempt 05bcf9159a5a301d2f7b6566111235da to fail
   ...
   19213  o.a.f.r.executiongraph.ExecutionGraph [] - Job Flink Java Job at Mon Nov 01 15:42:30 CET 2021 (4daf5dcbf65f7cd384ac228ad72ab5c6) switched from state RESTARTING to RUNNING.
   19777  o.a.f.r.jobmaster.JobMaster [] - TaskManager with id 127.0.0.1:64589-0ef458 is no longer reachable.
   19777 o.a.f.r.jobmaster.JobMaster [] - Disconnect TaskExecutor 127.0.0.1:64589-0ef458 because: TaskManager with id 127.0.0.1:64589-0ef458 is no longer reachable.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 308144b47092d73d777530ce5d0a11c85d7ab5c6 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741) 
   * ade8b47510aaf11753d52b1283588366107245a0 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d65fff260e684b231835a04583b3e9d600ae96d6 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645) 
   * 3652a3b41d02f84d54924930705ae710d7f7cccc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * d65fff260e684b231835a04583b3e9d600ae96d6 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645) 
   * 3652a3b41d02f84d54924930705ae710d7f7cccc Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808",
       "triggerID" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ade8b47510aaf11753d52b1283588366107245a0 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786) 
   * a9ce74387e3be7c9fa8e4309886c9b86825c3aea Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804) 
   * 61532812701d75c0ce669f42ec47f5de4a588b70 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-956207150


   That test failure is driving me crazy.
   
   With the original code the test passes.
   ```
               resultFuture
                       .whenComplete(
                               (resultValue, failure) -> {
                                   if (failure != null) {
                                       completableFuture.completeExceptionally(
                                               resolveTimeoutException(
                                                       failure, callStackCapture, address, rpcInvocation));
                                   } else {
                                       completableFuture.complete(
                                               deserializeValueIfNeeded(resultValue, method));
                                   }
                               });
   ```
   If I move the deserialization into a separate thenApply then it fails.
   ```
               resultFuture
                       .thenApply(resultValue -> deserializeValueIfNeeded(resultValue, method))
                       .whenComplete(
                               (resultValue, failure) -> {
                                   if (failure != null) {
                                       completableFuture.completeExceptionally(
                                               resolveTimeoutException(
                                                       failure,
                                                       callStackCapture,
                                                       address,
                                                       rpcInvocation));
                                   } else {
                                       completableFuture.complete(resultValue);
                                   }
                               });
   ```
   Sure, I could explain it if `deserializeValueIfNeeded` throws an exception, but that's not the case. 💢 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 308144b47092d73d777530ce5d0a11c85d7ab5c6 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741) 
   * ade8b47510aaf11753d52b1283588366107245a0 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d65fff260e684b231835a04583b3e9d600ae96d6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-955045722


   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r741036859



##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       I suppose stripping the `CompletionException` in the `HeartbeatManagerImpl` should be done in any case because it is so easy to introduce bugs like this.
   
   I'm curious though whether we should revert the `AkkaInvocationHandler` to again to the manual forwarding of the result...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808",
       "triggerID" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ade8b47510aaf11753d52b1283588366107245a0 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786) 
   * a9ce74387e3be7c9fa8e4309886c9b86825c3aea Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804) 
   * 61532812701d75c0ce669f42ec47f5de4a588b70 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808) 
   * a9bc844925ae7e67e5574db7c765650b0c85fc4b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ade8b47510aaf11753d52b1283588366107245a0 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786) 
   * a9ce74387e3be7c9fa8e4309886c9b86825c3aea Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957620163


   How long does it actually take for Akka to consider something to be unreachable?
   
   It takes quite a while for this to show up:
   ```
   441053 [flink-akka.actor.default-dispatcher-23] WARN  akka.remote.transport.netty.NettyTransport [] - Remote connection to [null] failed with java.net.ConnectException: Connection refused: no further information: /127.0.0.1:58498
   441056 [flink-akka.actor.default-dispatcher-23] WARN  akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink@127.0.0.1:58498] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@127.0.0.1:58498]] Caused by: [java.net.ConnectException: Connection refused: no further information: /127.0.0.1:58498]
   ```
   
   to show up, and after that we immediately throw `RecipientUnreachableExceptions.`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957620163


   How long does it actually take for Akka to consider something to be unreachable?
   
   It takes quite a while for this to show up:
   ```
   441053 [flink-akka.actor.default-dispatcher-23] WARN  akka.remote.transport.netty.NettyTransport [] - Remote connection to [null] failed with java.net.ConnectException: Connection refused: no further information: /127.0.0.1:58498
   441056 [flink-akka.actor.default-dispatcher-23] WARN  akka.remote.ReliableDeliverySupervisor [] - Association with remote system [akka.tcp://flink@127.0.0.1:58498] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@127.0.0.1:58498]] Caused by: [java.net.ConnectException: Connection refused: no further information: /127.0.0.1:58498]
   ```
   
   After that we immediately throw `RecipientUnreachableExceptions`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r740932666



##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       The issue depends on when the first heartbeat is sent after the TM went down.
   
   If it happens before the restart, then we properly remove the TM and don't use it later on.
   If it does not happen before the restart, then the job fails later on a second time while attempting to deploy task to the same TM.
   
   I think this change just ever so slightly adjusts the timings to make it more common; it can already happen in the current master.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-959076412






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808",
       "triggerID" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ade8b47510aaf11753d52b1283588366107245a0 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786) 
   * a9ce74387e3be7c9fa8e4309886c9b86825c3aea Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804) 
   * 61532812701d75c0ce669f42ec47f5de4a588b70 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3652a3b41d02f84d54924930705ae710d7f7cccc Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657) 
   * 308144b47092d73d777530ce5d0a11c85d7ab5c6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r740932666



##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       The issue depends on when the first heartbeat is sent after the TM went down (and would then be considered unreachable).
   
   If it happens before the restart, then we properly remove the TM and don't use it later on.
   If it does not happen before the restart, then the job fails later on a second time while attempting to deploy task to the same TM.
   
   I think this change just ever so slightly adjusts the timings to make it more common; it can already happen in the current master.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r740935146



##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       - JM sends heartbeat to TM1 (could be transmitted, but may not get a response)
   - TM1 is killed
   - job restarts (not delayed by missing TM1 because cancelTask RPCs fail immediately)
   - JM sends heartbeat to TM1 (could not be transmitted)
   - job restarts second time




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957663355


   I don't know exactly how fast Akka can detect it. I would assume that this can happen quite quickly once the TCP connection is terminated. This has never been a problem though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808",
       "triggerID" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ade8b47510aaf11753d52b1283588366107245a0 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786) 
   * a9ce74387e3be7c9fa8e4309886c9b86825c3aea Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804) 
   * 61532812701d75c0ce669f42ec47f5de4a588b70 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808) 
   * a9bc844925ae7e67e5574db7c765650b0c85fc4b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808",
       "triggerID" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ade8b47510aaf11753d52b1283588366107245a0 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786) 
   * a9ce74387e3be7c9fa8e4309886c9b86825c3aea Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804) 
   * 61532812701d75c0ce669f42ec47f5de4a588b70 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808) 
   * a9bc844925ae7e67e5574db7c765650b0c85fc4b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r740889079



##########
File path: flink-rpc/flink-rpc-akka/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaInvocationHandler.java
##########
@@ -240,17 +241,20 @@ private Object invokeRpc(Method method, Object[] args) throws Exception {
             final CompletableFuture<?> resultFuture = ask(rpcInvocation, futureTimeout);
 
             final CompletableFuture<Object> completableFuture = new CompletableFuture<>();
-            resultFuture.whenComplete(
-                    (resultValue, failure) -> {
-                        if (failure != null) {
-                            completableFuture.completeExceptionally(
-                                    resolveTimeoutException(
-                                            failure, callStackCapture, address, rpcInvocation));
-                        } else {
-                            completableFuture.complete(
-                                    deserializeValueIfNeeded(resultValue, method));
-                        }
-                    });
+            FutureUtils.forward(

Review comment:
       I think you are right. We don't need the `forward` here.

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       I don't fully understand why this change is now required. Can we explain why this is the case?

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       For this test, we have the heartbeat interval set to 200ms and the restart delay to 1.5s. Hence there should be multiple heartbeats being sent during the restart delay. Moreover, we mark TMs as unreachable after a single lost message. Hence, I can only think of a processing gap on the test machine to explain this situation.

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       For this test, we have the heartbeat interval set to 200ms and the restart delay to 1.5s. Hence there should be multiple heartbeats being sent during the restart delay. Moreover, we mark TMs as unreachable after a single lost message. Hence, I can only think of a processing gap on the test machine to explain this situation. But the logs say differently :-(

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       But maybe the error margin is still too small and we should either increase the number of restart attempts or the restart delay.
   
   For reference I tried to harden the test case via #17107.

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       Hmm, with removing the forwarding we do indeed change the exception type for consumers of the returned future. If we don't want to accept this risk, then I think it is probably safer to revert this change.
   
   In general, consumers of futures should handle the `CompletionException` case though (e.g. via stripping or using `ExceptionUtils.containsCause`).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r740942116



##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       For this test, we have the heartbeat interval set to 200ms and the restart delay to 1.5s. Hence there should be multiple heartbeats being sent during the restart delay. Moreover, we mark TMs as unreachable after a single lost message. Hence, I can only think of a processing gap on the test machine to explain this situation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808",
       "triggerID" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a9bc844925ae7e67e5574db7c765650b0c85fc4b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ade8b47510aaf11753d52b1283588366107245a0 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786) 
   * a9ce74387e3be7c9fa8e4309886c9b86825c3aea Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804) 
   * 61532812701d75c0ce669f42ec47f5de4a588b70 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808) 
   * a9bc844925ae7e67e5574db7c765650b0c85fc4b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-954752065


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25645",
       "triggerID" : "d65fff260e684b231835a04583b3e9d600ae96d6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "triggerType" : "PUSH"
     }, {
       "hash" : "3652a3b41d02f84d54924930705ae710d7f7cccc",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25657",
       "triggerID" : "955045722",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "triggerType" : "PUSH"
     }, {
       "hash" : "308144b47092d73d777530ce5d0a11c85d7ab5c6",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25741",
       "triggerID" : "957180521",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "ade8b47510aaf11753d52b1283588366107245a0",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786",
       "triggerID" : "ade8b47510aaf11753d52b1283588366107245a0",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804",
       "triggerID" : "a9ce74387e3be7c9fa8e4309886c9b86825c3aea",
       "triggerType" : "PUSH"
     }, {
       "hash" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808",
       "triggerID" : "61532812701d75c0ce669f42ec47f5de4a588b70",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ade8b47510aaf11753d52b1283588366107245a0 Azure: [CANCELED](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25786) 
   * a9ce74387e3be7c9fa8e4309886c9b86825c3aea Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25804) 
   * 61532812701d75c0ce669f42ec47f5de4a588b70 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=25808) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957870305


   > You could enable debug logs and check whether we send heartbeat requests after the TM died.
   
   We do still send requests, but Akka does not report the TM as being unreachable.
   
   > Maybe this is also something we can configure in Akka.
   
   I skimmed the Akka config reference and nothing stood out to me that would match the observed  ~2s detection duration. But this could even be an OS-level thing with it being TCP and all.
   
   > The advantage of increasing the restart attempts instead of the delay is that the test will on average run faster.
   
   I don't have a strong opinion on this. A single restart results in cleaner logs.
   If we're aiming to do at most 1-2 restarts then the difference will be negligible either way (+-2s)🤷 The test already needs 30-50 seconds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] zentol commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
zentol commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r740932666



##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       The issue depends on when the first heartbeat is sent after the TM went down.
   
   If it happens before the restart, then we properly remove the TM and don't use it later on.
   If it does not happen before the restart, then the job fails later on a second time while attempting to deploy task to the same TM.
   
   I think this change just ever so slightly adjusts the timings to make it more common; it can already happen in the current master.

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       The issue depends on when the first heartbeat is sent after the TM went down (and would then be considered unreachable).
   
   If it happens before the restart, then we properly remove the TM and don't use it later on.
   If it does not happen before the restart, then the job fails later on a second time while attempting to deploy task to the same TM.
   
   I think this change just ever so slightly adjusts the timings to make it more common; it can already happen in the current master.

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       - JM sends heartbeat to TM1 (could be transmitted, but may not get a response)
   - TM1 is killed
   - job restarts (not delayed by missing TM1 because cancelTask RPCs fail immediately)
   - JM sends heartbeat to TM1 (could not be transmitted)
   - job restarts second time

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       - JM sends heartbeat to TM1 (transmitted, but may not get a response)
   - TM1 is killed
   - job restarts (not delayed by missing TM1 because cancelTask RPCs fail immediately)
   - JM sends heartbeat to TM1 (not transmitted)
   - job restarts second time

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       - JM sends heartbeat request to TM1 (transmitted, but may not get a response)
   - TM1 is killed
   - job restarts (not delayed by missing TM1 because cancelTask RPCs fail immediately)
   - JM sends heartbeat request to TM1 (not transmitted)
   - job restarts second time

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       I can also reproduce it locally where there are no processing gaps.

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       The restart delay is a fair point, I'll check the logs again.

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       It looks like the failed heartbeats are being ignored while we are waiting for the restart delay:
   
   ```
   // 64589-0ef458 is the killed TM
   
   // this is the last heartbeat
   17532 o.a.f.r.jobmaster.JobMaster [] - Received heartbeat from 127.0.0.1:64589-0ef458.
   ...
   <start job restart>
   <cancel slot requests>
   <cleanup partitions>
   <various failed cancelTask RPCs>
   17740 <reduce resource requirements to 0>
   ...
   17440... JM idling, sending heartbeat requests
   19212 o.a.f.r.jobmaster.JobMaster [] - Archive local failure causing attempt 05bcf9159a5a301d2f7b6566111235da to fail
   ...
   19213  o.a.f.r.executiongraph.ExecutionGraph [] - Job Flink Java Job at Mon Nov 01 15:42:30 CET 2021 (4daf5dcbf65f7cd384ac228ad72ab5c6) switched from state RESTARTING to RUNNING.
   19777  o.a.f.r.jobmaster.JobMaster [] - TaskManager with id 127.0.0.1:64589-0ef458 is no longer reachable.
   19777 o.a.f.r.jobmaster.JobMaster [] - Disconnect TaskExecutor 127.0.0.1:64589-0ef458 because: TaskManager with id 127.0.0.1:64589-0ef458 is no longer reachable.
   ```

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       I'm wondering if the exception the HeartbeatMonitorImpl sees could be wrapped in a CompletionException...

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       💢 
   ```
   21440 o.a.f.r.j.JobMaster [] - #handleHeartbeatRpcFailure exception
   java.util.concurrent.CompletionException: org.apache.flink.runtime.rpc.exceptions.RecipientUnreachableException
   ```

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       I suppose stripping the `CompletionException` in the `HeartbeatManagerImpl` should be done in any case because it is so easy to introduce bugs like this.
   
   I'm curious though whether we should revert the `AkkaInvocationHandler` to again to the manual forwarding of the result...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann commented on a change in pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann commented on a change in pull request #17606:
URL: https://github.com/apache/flink/pull/17606#discussion_r740889079



##########
File path: flink-rpc/flink-rpc-akka/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaInvocationHandler.java
##########
@@ -240,17 +241,20 @@ private Object invokeRpc(Method method, Object[] args) throws Exception {
             final CompletableFuture<?> resultFuture = ask(rpcInvocation, futureTimeout);
 
             final CompletableFuture<Object> completableFuture = new CompletableFuture<>();
-            resultFuture.whenComplete(
-                    (resultValue, failure) -> {
-                        if (failure != null) {
-                            completableFuture.completeExceptionally(
-                                    resolveTimeoutException(
-                                            failure, callStackCapture, address, rpcInvocation));
-                        } else {
-                            completableFuture.complete(
-                                    deserializeValueIfNeeded(resultValue, method));
-                        }
-                    });
+            FutureUtils.forward(

Review comment:
       I think you are right. We don't need the `forward` here.

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       I don't fully understand why this change is now required. Can we explain why this is the case?

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       For this test, we have the heartbeat interval set to 200ms and the restart delay to 1.5s. Hence there should be multiple heartbeats being sent during the restart delay. Moreover, we mark TMs as unreachable after a single lost message. Hence, I can only think of a processing gap on the test machine to explain this situation.

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       For this test, we have the heartbeat interval set to 200ms and the restart delay to 1.5s. Hence there should be multiple heartbeats being sent during the restart delay. Moreover, we mark TMs as unreachable after a single lost message. Hence, I can only think of a processing gap on the test machine to explain this situation. But the logs say differently :-(

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       But maybe the error margin is still too small and we should either increase the number of restart attempts or the restart delay.
   
   For reference I tried to harden the test case via #17107.

##########
File path: flink-tests/src/test/java/org/apache/flink/test/recovery/TaskManagerProcessFailureBatchRecoveryITCase.java
##########
@@ -67,7 +67,7 @@ public void testTaskManagerFailure(Configuration configuration, final File coord
         ExecutionEnvironment env =
                 ExecutionEnvironment.createRemoteEnvironment("localhost", 1337, configuration);
         env.setParallelism(PARALLELISM);
-        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1, 1500L));
+        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(10, 1500L));

Review comment:
       Hmm, with removing the forwarding we do indeed change the exception type for consumers of the returned future. If we don't want to accept this risk, then I think it is probably safer to revert this change.
   
   In general, consumers of futures should handle the `CompletionException` case though (e.g. via stripping or using `ExceptionUtils.containsCause`).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] tillrohrmann edited a comment on pull request #17606: [FLINK-24706][rpc] Forward deserialization errors to returned future

Posted by GitBox <gi...@apache.org>.
tillrohrmann edited a comment on pull request #17606:
URL: https://github.com/apache/flink/pull/17606#issuecomment-957663355






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org