You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/13 23:39:00 UTC

[GitHub] [iceberg] kbendick opened a new pull request #3110: Flink - Ignore test that leads to infinite checkpoint loop and CI timeouts

kbendick opened a new pull request #3110:
URL: https://github.com/apache/iceberg/pull/3110


   This test has a race condition, where one of the two disjointed DAGs can finish and close its tasks before the other has finished.
   
   When the task(s) belonging to the disjoint DAG which terminated aren't present to participate in checkpointing, it leads to an infinite loop of attempting to re-checkpoint.
   
   Here are some of the logs (visible when passing `-i` for info level logs to gradle.
   
   ```
   2021-09-13T08:19:47.7896411Z > Task :iceberg-flink:test
   2021-09-13T08:19:47.7899950Z     [Checkpoint Timer] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: rightCustomSource -> rightIcebergSink-rightIcebergSink -> rightIcebergSink-IcebergStreamWriter (1/1) of job 437e46445e777ca2231677f60f87496a is not in state RUNNING but FINISHED instead. Aborting checkpoint.
   2021-09-13T08:19:47.7905489Z     [Checkpoint Timer] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: rightCustomSource -> rightIcebergSink-rightIcebergSink -> rightIcebergSink-IcebergStreamWriter (1/1) of job 437e46445e777ca2231677f60f87496a is not in state RUNNING but FINISHED instead. Aborting checkpoint.
   2021-09-13T08:19:47.7914766Z     [Checkpoint Timer] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: rightCustomSource -> rightIcebergSink-rightIcebergSink -> rightIcebergSink-IcebergStreamWriter (1/1) of job 437e46445e777ca2231677f60f87496a is not in state RUNNING but FINISHED instead. Aborting checkpoint.
   2021-09-13T08:19:47.7920502Z     [Checkpoint Timer] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Checkpoint triggering task Source: rightCustomSource -> rightIcebergSink-rightIcebergSink -> rightIcebergSink-IcebergStreamWriter (1/1) of job 437e46445e777ca2231677f60f87496a is not in state RUNNING but FINISHED instead. Aborting checkpoint.
   ```
   
   Link to another PR where I attempted to debug this with some relevant discussion - https://github.com/apache/iceberg/pull/3106
   
   This (temporarily) closes this issue: https://github.com/apache/iceberg/issues/3091, though we should fix the `BoundedTestSource` (though this edge case might be fixed come Flink 1.14).
   
   More details and discussion in the issue (particularly the linked FLIP).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on pull request #3110: Flink - Ignore test that leads to infinite checkpoint loop and CI timeouts

Posted by GitBox <gi...@apache.org>.
stevenzwu commented on pull request #3110:
URL: https://github.com/apache/iceberg/pull/3110#issuecomment-918802895


   @openinx @rdblue can you help merge this PR to avoid test hanging?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer merged pull request #3110: Flink - Ignore test that leads to infinite checkpoint loop and CI timeouts

Posted by GitBox <gi...@apache.org>.
RussellSpitzer merged pull request #3110:
URL: https://github.com/apache/iceberg/pull/3110


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #3110: Flink - Ignore test that leads to infinite checkpoint loop and CI timeouts

Posted by GitBox <gi...@apache.org>.
kbendick commented on pull request #3110:
URL: https://github.com/apache/iceberg/pull/3110#issuecomment-918664589


   cc @stevenzwu @RussellSpitzer @rdblue @nastra since we've located the cause of the CI timeouts, we've decided to just ignore the test for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #3110: Flink - Ignore test that leads to infinite checkpoint loop and CI timeouts

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #3110:
URL: https://github.com/apache/iceberg/pull/3110#issuecomment-919168138


   Merged! Thanks everyone for looking into this! Should make CI easier until we get the real fix 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org