You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/08/26 21:59:24 UTC

[GitHub] [iceberg] jzhuge opened a new pull request #1389: Add clock to SnapshotProducer

jzhuge opened a new pull request #1389:
URL: https://github.com/apache/iceberg/pull/1389


   Enable unit tests to use the new TestClock, a mutable manual clock.
   Switch TestRemoveSnapshots to use TestClock.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] HeartSaVioR commented on pull request #1389: Add clock to SnapshotProducer

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #1389:
URL: https://github.com/apache/iceberg/pull/1389#issuecomment-690836103






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1389: Add clock to SnapshotProducer

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1389:
URL: https://github.com/apache/iceberg/pull/1389#issuecomment-690758285


   We discussed this last night since there was a thread on the mailing list about using timestamps for incremental table consumption. What we concluded was that even with an easy way to inject timestamps like this, it would still be unreliable. That makes me less inclined to add a `Clock` to `TableOperations` because people may start to use it to inject timestamps, even though there are still commit issues.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1389: Add clock to SnapshotProducer

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1389:
URL: https://github.com/apache/iceberg/pull/1389#issuecomment-681181270


   What's the motivation for this change?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jzhuge closed pull request #1389: Add clock to SnapshotProducer

Posted by GitBox <gi...@apache.org>.
jzhuge closed pull request #1389:
URL: https://github.com/apache/iceberg/pull/1389


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jzhuge commented on pull request #1389: Add clock to SnapshotProducer

Posted by GitBox <gi...@apache.org>.
jzhuge commented on pull request #1389:
URL: https://github.com/apache/iceberg/pull/1389#issuecomment-681321493


   The PR is to enhance the testability of Snapshot, a feature sensitive to timings, especially when expiring snapshots. It injects a clock that can be either system clock in production or a test clock in testing. The PR adds a test clock implementation that is fixed and can be reset or advanced. Please note default Java 8 concrete clock classes like `Clock.fixed` are immutable.
   
   Unit tests no longer have to perform busy wait or sleep like the following:
   ```
     // t1: append and commit
     // busy wait or sleep
     // t2: append and commit
     // busy wait or sleep
     // expire snapshot before current time
   ```
   Instead the tests will look like:
   ```
     // t1: append and commit
     // advance test clock by 1ms
     // t1+1: append and commit
     // advance test clock by 1ms
     // expire snapshot before current time (t1+2)
   ```
   or
   ```
     // set the test clock to t1
     // t1: append and commit
     // set the test clock to t2
     // t2: append and commit
     // expire snapshot before t2+1
   ```
   The unit tests can simulate running in the future, in the past, or in any arbitrary point in time. The tests will run faster. And they will be slightly more reliable as they don't rely on system clock.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jzhuge commented on pull request #1389: Add clock to SnapshotProducer

Posted by GitBox <gi...@apache.org>.
jzhuge commented on pull request #1389:
URL: https://github.com/apache/iceberg/pull/1389#issuecomment-690845933


   Yes @HeartSaVioR, this change has a very limited scope. Unfortunately there is no clean way to ingest the clock other than going through table operations or a bunch of constructors.
   
   Close the PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] HeartSaVioR edited a comment on pull request #1389: Add clock to SnapshotProducer

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on pull request #1389:
URL: https://github.com/apache/iceberg/pull/1389#issuecomment-690836103


   In general, manual clock like this is widely used for testing and I see it very helpful, but according to the changeset it looks to affect just a couple of tests which looks to be very minor, compared to the impact of the change performed to the source side. Places I see using manual clocks are testing timer/timeout functionality, which would add multi-seconds or minutes which aren't what we really want to actually sleep to wait. Here the test just busy-waits for 1 ms per each, which is trivial to just wait.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] HeartSaVioR edited a comment on pull request #1389: Add clock to SnapshotProducer

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on pull request #1389:
URL: https://github.com/apache/iceberg/pull/1389#issuecomment-690836103


   In general, manual clock like this is widely used for testing and I see it very helpful, but according to the changeset it looks to affect just a couple of tests which looks to be very minor, compared to the impact of the change performed to the source side. Places I see using manual clocks are testing timer/timeout functionality, which would add multi-seconds or minutes which aren't what we really want to actually sleep to wait. Here the test just busy-waits for 1 ms per each, which is trivial to just wait.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jzhuge commented on pull request #1389: Add clock to SnapshotProducer

Posted by GitBox <gi...@apache.org>.
jzhuge commented on pull request #1389:
URL: https://github.com/apache/iceberg/pull/1389#issuecomment-690845933


   Yes @HeartSaVioR, this change has a very limited scope. Unfortunately there is no clean way to ingest the clock other than going through table operations or a bunch of constructors.
   
   Close the PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jzhuge closed pull request #1389: Add clock to SnapshotProducer

Posted by GitBox <gi...@apache.org>.
jzhuge closed pull request #1389:
URL: https://github.com/apache/iceberg/pull/1389






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jzhuge commented on pull request #1389: Add clock to SnapshotProducer

Posted by GitBox <gi...@apache.org>.
jzhuge commented on pull request #1389:
URL: https://github.com/apache/iceberg/pull/1389#issuecomment-683358653


   Agree that passing the clock through `TableOperations` to `SnapshotProducer.apply` does not make sense.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jzhuge commented on pull request #1389: Add clock to SnapshotProducer

Posted by GitBox <gi...@apache.org>.
jzhuge commented on pull request #1389:
URL: https://github.com/apache/iceberg/pull/1389#issuecomment-690845933


   Yes @HeartSaVioR, this change has a very limited scope. Unfortunately there is no clean way to ingest the clock other than going through table operations or a bunch of constructors.
   
   Close the PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1389: Add clock to SnapshotProducer

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1389:
URL: https://github.com/apache/iceberg/pull/1389#issuecomment-683351935


   Thanks, @jzhuge! I think the current implementation is fairly clean and I like how it injects the clock through `TableOperations`. That is a good place to inject table customization and this alters a lot fewer classes now.
   
   I'm not sure that this solution is needed, though. I don't know of any use cases that would use a clock to inject custom timestamp logic, so adding this through `TableOperations` adds unnecessary complexity to an interface that is extended by external libraries. I am also not sure that this is the best option for tests because it isn't obvious what relationship the clock has to the other operations because it is passed in at table creation time through the superclass. The logic isn't very obvious, compared to the approach of calling `waitUntilAfter(snapshot.timestampMillis())`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] HeartSaVioR edited a comment on pull request #1389: Add clock to SnapshotProducer

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on pull request #1389:
URL: https://github.com/apache/iceberg/pull/1389#issuecomment-690836103


   In general, manual clock like this is widely used for testing and I see it very helpful, but according to the changeset it looks to affect just a couple of tests which looks to be very minor, compared to the impact of the change performed to the source side. Places I see using manual clocks are testing timer/timeout functionality, which would add multi-seconds or minutes which aren't what we really want to actually sleep to wait. Here the test just busy-waits for 1 ms per each, which is trivial to just wait.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] HeartSaVioR commented on pull request #1389: Add clock to SnapshotProducer

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #1389:
URL: https://github.com/apache/iceberg/pull/1389#issuecomment-690836103


   In general, manual clock like this is widely used for testing and I see it very helpful, but according to the changeset it looks to affect just a couple of tests which looks to be very minor, compared to the impact of the change performed to the source side. Places I see using manual clocks are testing timeout functionality, which would add multi-seconds or minutes which aren't what we really want to actually sleep to wait. Here the test just busy-waits for 1 ms per each, which is trivial to just wait.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jzhuge commented on pull request #1389: Add clock to SnapshotProducer

Posted by GitBox <gi...@apache.org>.
jzhuge commented on pull request #1389:
URL: https://github.com/apache/iceberg/pull/1389#issuecomment-689894270


   If there is production use case for a manual clock ingested into table operations, I can rename `TestClock` to `ManualClock` and move it from test to prod.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org