You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2022/03/22 12:09:55 UTC

[GitHub] [flink] XComp opened a new pull request #19199: [FLINK-26797][runtime] Hardens ZKCheckpointIDCounterMultiServersTest

XComp opened a new pull request #19199:
URL: https://github.com/apache/flink/pull/19199


   ## What is the purpose of the change
   
   There was a connection issue again with ZK that supposedly caused the IDCounter to be incremented once too often.
   
   ## Brief change log
   
   * I fixed the issue in the same way it was done in FLINK-26120.
   * Additionally, I extended the ci log4j properties to log also request handling on the ZK server side
   
   ## Verifying this change
   
   A CI run with a forced ZK test failure was added temporarily to verify the log4j configuration changes.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #19199: [FLINK-26797][runtime] Hardens ZKCheckpointIDCounterMultiServersTest

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #19199:
URL: https://github.com/apache/flink/pull/19199#issuecomment-1075102084


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "3d3db0de4f092b60f3eda813933655230f2ea999",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=33599",
       "triggerID" : "3d3db0de4f092b60f3eda813933655230f2ea999",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3d3db0de4f092b60f3eda813933655230f2ea999 Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=33599) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #19199: [FLINK-26797][runtime] Hardens ZKCheckpointIDCounterMultiServersTest

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #19199:
URL: https://github.com/apache/flink/pull/19199#issuecomment-1075102084


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "3d3db0de4f092b60f3eda813933655230f2ea999",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=33599",
       "triggerID" : "3d3db0de4f092b60f3eda813933655230f2ea999",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8f24707016208c4695c8e35ffa9150111464296a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "8f24707016208c4695c8e35ffa9150111464296a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3d3db0de4f092b60f3eda813933655230f2ea999 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=33599) 
   * 8f24707016208c4695c8e35ffa9150111464296a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp commented on pull request #19199: [FLINK-26797][runtime] Hardens ZKCheckpointIDCounterMultiServersTest

Posted by GitBox <gi...@apache.org>.
XComp commented on pull request #19199:
URL: https://github.com/apache/flink/pull/19199#issuecomment-1085553403


   Thanks, @autophagy . I removed the debug commit and rebased the branch. I'm gonna go ahead and create backport PRs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #19199: [FLINK-26797][runtime] Hardens ZKCheckpointIDCounterMultiServersTest

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #19199:
URL: https://github.com/apache/flink/pull/19199#issuecomment-1075102084


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "3d3db0de4f092b60f3eda813933655230f2ea999",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=33599",
       "triggerID" : "3d3db0de4f092b60f3eda813933655230f2ea999",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8f24707016208c4695c8e35ffa9150111464296a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34086",
       "triggerID" : "8f24707016208c4695c8e35ffa9150111464296a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 8f24707016208c4695c8e35ffa9150111464296a Azure: [SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34086) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #19199: [FLINK-26797][runtime] Hardens ZKCheckpointIDCounterMultiServersTest

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #19199:
URL: https://github.com/apache/flink/pull/19199#issuecomment-1075102084


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "3d3db0de4f092b60f3eda813933655230f2ea999",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=33599",
       "triggerID" : "3d3db0de4f092b60f3eda813933655230f2ea999",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3d3db0de4f092b60f3eda813933655230f2ea999 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=33599) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp merged pull request #19199: [FLINK-26797][runtime] Hardens ZKCheckpointIDCounterMultiServersTest

Posted by GitBox <gi...@apache.org>.
XComp merged pull request #19199:
URL: https://github.com/apache/flink/pull/19199


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot edited a comment on pull request #19199: [FLINK-26797][runtime] Hardens ZKCheckpointIDCounterMultiServersTest

Posted by GitBox <gi...@apache.org>.
flinkbot edited a comment on pull request #19199:
URL: https://github.com/apache/flink/pull/19199#issuecomment-1075102084


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "3d3db0de4f092b60f3eda813933655230f2ea999",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=33599",
       "triggerID" : "3d3db0de4f092b60f3eda813933655230f2ea999",
       "triggerType" : "PUSH"
     }, {
       "hash" : "8f24707016208c4695c8e35ffa9150111464296a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34086",
       "triggerID" : "8f24707016208c4695c8e35ffa9150111464296a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3d3db0de4f092b60f3eda813933655230f2ea999 Azure: [FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=33599) 
   * 8f24707016208c4695c8e35ffa9150111464296a Azure: [PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=34086) 
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] flinkbot commented on pull request #19199: [FLINK-26797][runtime] Hardens ZKCheckpointIDCounterMultiServersTest

Posted by GitBox <gi...@apache.org>.
flinkbot commented on pull request #19199:
URL: https://github.com/apache/flink/pull/19199#issuecomment-1075102084


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "3d3db0de4f092b60f3eda813933655230f2ea999",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3d3db0de4f092b60f3eda813933655230f2ea999",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3d3db0de4f092b60f3eda813933655230f2ea999 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [flink] XComp commented on pull request #19199: [FLINK-26797][runtime] Hardens ZKCheckpointIDCounterMultiServersTest

Posted by GitBox <gi...@apache.org>.
XComp commented on pull request #19199:
URL: https://github.com/apache/flink/pull/19199#issuecomment-1075350666


   The ci run failed due to the debug commit I added: The zookeeper-server.log contain additional debug logs about final requests, e.g. the first getAndIncrement call:
   ```
   14:48:43,156 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing request:: sessionid:0x101b66cf7c70000 type:createSession cxid:0x0 zxid:0x1 txntype:-10 reqpath:n/a
   14:48:43,165 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - sessionid:0x101b66cf7c70000 type:createSession cxid:0x0 zxid:0x1 txntype:-10 reqpath:n/a
   14:48:43,188 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing request:: sessionid:0x101b66cf7c70000 type:getData cxid:0x1 zxid:0xfffffffffffffffe txntype:unknown reqpath:/zookeeper/config
   14:48:43,188 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - sessionid:0x101b66cf7c70000 type:getData cxid:0x1 zxid:0xfffffffffffffffe txntype:unknown reqpath:/zookeeper/config
   14:48:43,191 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing request:: sessionid:0x101b66cf7c70000 type:getData cxid:0x2 zxid:0xfffffffffffffffe txntype:unknown reqpath:/zookeeper/config
   14:48:43,191 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - sessionid:0x101b66cf7c70000 type:getData cxid:0x2 zxid:0xfffffffffffffffe txntype:unknown reqpath:/zookeeper/config
   14:48:43,203 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing request:: sessionid:0x101b66cf7c70000 type:exists cxid:0x3 zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink
   14:48:43,203 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - sessionid:0x101b66cf7c70000 type:exists cxid:0x3 zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink
   14:48:43,207 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing request:: sessionid:0x101b66cf7c70000 type:exists cxid:0x4 zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink
   14:48:43,207 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - sessionid:0x101b66cf7c70000 type:exists cxid:0x4 zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink
   14:48:43,235 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing request:: sessionid:0x101b66cf7c70000 type:createContainer cxid:0x5 zxid:0x2 txntype:19 reqpath:n/a
   14:48:43,271 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - sessionid:0x101b66cf7c70000 type:createContainer cxid:0x5 zxid:0x2 txntype:19 reqpath:n/a
   14:48:43,324 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing request:: sessionid:0x101b66cf7c70000 type:createContainer cxid:0x6 zxid:0x3 txntype:-1 reqpath:n/a
   14:48:43,337 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing request:: sessionid:0x101b66cf7c70000 type:exists cxid:0x7 zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
   14:48:43,337 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - sessionid:0x101b66cf7c70000 type:exists cxid:0x7 zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
   14:48:43,394 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing request:: sessionid:0x101b66cf7c70000 type:createContainer cxid:0x8 zxid:0x4 txntype:19 reqpath:n/a
   14:48:43,395 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - sessionid:0x101b66cf7c70000 type:createContainer cxid:0x8 zxid:0x4 txntype:19 reqpath:n/a
   14:48:43,395 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing request:: sessionid:0x101b66cf7c70000 type:exists cxid:0x9 zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
   14:48:43,395 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - sessionid:0x101b66cf7c70000 type:exists cxid:0x9 zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default
   14:48:43,399 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing request:: sessionid:0x101b66cf7c70000 type:getData cxid:0xa zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/checkpoint_id_counter
   14:48:43,399 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - sessionid:0x101b66cf7c70000 type:getData cxid:0xa zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/checkpoint_id_counter
   14:48:43,446 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing request:: sessionid:0x101b66cf7c70000 type:create2 cxid:0xb zxid:0x5 txntype:15 reqpath:n/a
   14:48:43,446 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - sessionid:0x101b66cf7c70000 type:create2 cxid:0xb zxid:0x5 txntype:15 reqpath:n/a
   14:48:43,451 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing request:: sessionid:0x101b66cf7c70000 type:getData cxid:0xc zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/checkpoint_id_counter
   14:48:43,452 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - sessionid:0x101b66cf7c70000 type:getData cxid:0xc zxid:0xfffffffffffffffe txntype:unknown reqpath:/flink/default/checkpoint_id_counter
   14:48:43,573 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - Processing request:: sessionid:0x101b66cf7c70000 type:setData cxid:0xd zxid:0x6 txntype:5 reqpath:n/a
   14:48:43,580 [        SyncThread:0] DEBUG org.apache.zookeeper.server.FinalRequestProcessor            [] - sessionid:0x101b66cf7c70000 type:setData cxid:0xd zxid:0x6 txntype:5 reqpath:n/a
   ```
   I agree that it's a bit verbose. But it should give us a way to see, what actually happened on the ZooKeeper server.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org