You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/09 09:54:37 UTC

[GitHub] [iceberg] ajantha-bhat opened a new pull request, #5478: Core: Remove confusing log from RemoveSnapshots

ajantha-bhat opened a new pull request, #5478:
URL: https://github.com/apache/iceberg/pull/5478

   By default, `ExpireSnapshotsSparkAction` cleans the expired files but the log says not cleaning the files. So, this log is confusing the users. 
   
   [ExpireSnapshotsSparkAction] (https://github.com/apache/iceberg/blob/master/spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/ExpireSnapshotsSparkAction.java#L155) calls some code that is common for non-distributed and distributed to expire snapshot job.
   So, update the common code to print the clean up logs only if the files are cleaned up. 
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #5478: Core: Remove confusing log from RemoveSnapshots

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on PR #5478:
URL: https://github.com/apache/iceberg/pull/5478#issuecomment-1209607359

   We could just move the log line to the "clean expired" branch of the if statement, and log "Local removal of files enabled. Cleaning up manifest and data files"?
   
   That way you can differentiate between "cleanup disabled" and "empty"?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5478: Core: Fix confusing log from RemoveSnapshots

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5478:
URL: https://github.com/apache/iceberg/pull/5478#discussion_r943956425


##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -320,9 +320,8 @@ public void commit() {
     LOG.info("Committed snapshot changes");
 
     if (cleanExpiredFiles) {
+      LOG.info("Cleaning up of expired files is enabled");

Review Comment:
   I think logs are better when using active voice. Here, it would be "Cleaning up expired files (local, incremental)"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #5478: Core: Fix confusing log from RemoveSnapshots

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on PR #5478:
URL: https://github.com/apache/iceberg/pull/5478#issuecomment-1241620857

   @rdblue, @RussellSpitzer, @Fokko : Can this PR be merged? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5478: Core: Remove confusing log from RemoveSnapshots

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5478:
URL: https://github.com/apache/iceberg/pull/5478#issuecomment-1209635918

   I would like a lot to distinguish between "didn't try" and "nothing to clean up" -- even when "nothing to clean up" is because there were no expired files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #5478: Core: Fix confusing log from RemoveSnapshots

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #5478:
URL: https://github.com/apache/iceberg/pull/5478#discussion_r995343484


##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -320,9 +320,8 @@ public void commit() {
     LOG.info("Committed snapshot changes");
 
     if (cleanExpiredFiles) {
+      LOG.info("Cleaning up of expired files is enabled");

Review Comment:
   True. I have updated it now. Thanks for the review. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #5478: Core: Fix confusing log from RemoveSnapshots

Posted by GitBox <gi...@apache.org>.
rdblue merged PR #5478:
URL: https://github.com/apache/iceberg/pull/5478


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #5478: Core: Remove confusing log from RemoveSnapshots

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #5478:
URL: https://github.com/apache/iceberg/pull/5478#discussion_r941141134


##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -321,8 +321,6 @@ public void commit() {
 
     if (cleanExpiredFiles) {
       cleanExpiredSnapshots();

Review Comment:
   Inside `cleanExpiredSnapshots()` logs are present, which are enough to know whether files are cleaned or not.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #5478: Core: Remove confusing log from RemoveSnapshots

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on PR #5478:
URL: https://github.com/apache/iceberg/pull/5478#issuecomment-1209620514

   @RussellSpitzer : Thanks for the suggestion. I added a log at the proper place to identify the empty case.
   
   @rdblue : If the logs are not present, we can assume cleanup is disabled? if it is not disabled, some logs related to clean-up will be printed. For the users who run the ExpireSnapshotAction this log was confusing as mentioned in the description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5478: Core: Remove confusing log from RemoveSnapshots

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5478:
URL: https://github.com/apache/iceberg/pull/5478#issuecomment-1209594696

   I don't agree with this change. I think it's good to know that this is skipped, rather than not being able to distinguish between missing logs and not trying to remove the files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #5478: Core: Fix confusing log from RemoveSnapshots

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on PR #5478:
URL: https://github.com/apache/iceberg/pull/5478#issuecomment-1256435936

   ping @rdblue 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5478: Core: Fix confusing log from RemoveSnapshots

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5478:
URL: https://github.com/apache/iceberg/pull/5478#issuecomment-1279450722

   Thanks, @ajantha-bhat!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #5478: Core: Fix confusing log from RemoveSnapshots

Posted by GitBox <gi...@apache.org>.
kbendick commented on PR #5478:
URL: https://github.com/apache/iceberg/pull/5478#issuecomment-1209758658

   For the log that is causing confusion for users because the distributed engines (in this case Spark) then go on to do some cleanup, maybe we can update the log to state something similar to we're "skipping clean up during current iteration, cleanup might occur later if this action is run from a distributed processing engine"? Then possibly add logs to Spark as well?
   
   > I would like a lot to distinguish between "didn't try" and "nothing to clean up" -- even when "nothing to clean up" is because there were no expired files.
   
   Maybe we need a flag for "delayedFileCleanup" or just generally "runningViaDistributedAction" and then let Spark / engines control the logs? I don't love it, but throwing it out there as the current logging is admittedly confusing when the Spark action runs (which is the primary way users interact with this action).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #5478: Core: Remove confusing log from RemoveSnapshots

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #5478:
URL: https://github.com/apache/iceberg/pull/5478#discussion_r941601488


##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -321,8 +321,6 @@ public void commit() {
 
     if (cleanExpiredFiles) {
       cleanExpiredSnapshots();

Review Comment:
   ok. Added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #5478: Core: Fix confusing log from RemoveSnapshots

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #5478:
URL: https://github.com/apache/iceberg/pull/5478#discussion_r944147122


##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -320,9 +320,8 @@ public void commit() {
     LOG.info("Committed snapshot changes");
 
     if (cleanExpiredFiles) {
+      LOG.info("Cleaning up of expired files is enabled");

Review Comment:
   updated as per your suggestion. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #5478: Core: Remove confusing log from RemoveSnapshots

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on code in PR #5478:
URL: https://github.com/apache/iceberg/pull/5478#discussion_r941587170


##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -321,8 +321,6 @@ public void commit() {
 
     if (cleanExpiredFiles) {
       cleanExpiredSnapshots();

Review Comment:
   I agree with @rdblue that we should just add the message right above here, "Local delete of expired files enabled. Cleaning up expired manifests and data files" or something like that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #5478: Core: Fix confusing log from RemoveSnapshots

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on code in PR #5478:
URL: https://github.com/apache/iceberg/pull/5478#discussion_r995167863


##########
core/src/main/java/org/apache/iceberg/RemoveSnapshots.java:
##########
@@ -320,9 +320,8 @@ public void commit() {
     LOG.info("Committed snapshot changes");
 
     if (cleanExpiredFiles) {
+      LOG.info("Cleaning up of expired files is enabled");

Review Comment:
   I think incremental is not always true, if it is disabled?  What do you think, to add the mode as a variable:
   ```
   String cleanupMode = incrementalCleanup ? "incremental" : "reachable";
   LOG.info("Cleaning up of expired files is enabled (local, %s)", cleanupMode);
   
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #5478: Core: Fix confusing log from RemoveSnapshots

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on PR #5478:
URL: https://github.com/apache/iceberg/pull/5478#issuecomment-1278955508

   The below test case could be flaky!
   
   `org.apache.iceberg.spark.TestFileRewriteCoordinator > testBinPackRewrite[catalogName = testhive
   `
   https://github.com/apache/iceberg/actions/runs/3249433325/jobs/5331831494#step:6:293


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #5478: Core: Fix confusing log from RemoveSnapshots

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on PR #5478:
URL: https://github.com/apache/iceberg/pull/5478#issuecomment-1212770220

   > Maybe we need a flag for delayedFileCleanup or just generally runningViaDistributedAction -- and then let Spark / engines control the logs?
   
   @kbendick : I originally had the same thought. But it is making things more complex as we are linking core logic to callers.  
   I think the current changes in the PR is enough to clear the confusion (It will also differentiates "disabled" vs "nothing to clean up") 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org