You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/05/02 18:03:14 UTC

[GitHub] [iceberg] flyrain opened a new pull request, #4681: Core: Log the new metadata location in commit.

flyrain opened a new pull request, #4681:
URL: https://github.com/apache/iceberg/pull/4681

   This pretty useful to figure out which version actually committed to the catalog from the log, especially when we debug a catalog consistency or locking issue.
   cc @aokolnychyi @RussellSpitzer @szehon-ho @karuppayya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on pull request #4681: Core: Log the new metadata location in commit.

Posted by GitBox <gi...@apache.org>.
flyrain commented on PR #4681:
URL: https://github.com/apache/iceberg/pull/4681#issuecomment-1116809372

   Added the support for `HadoopTableOperations`. But it is less likely be confusing for Hadoop table since it always write with the latest version number as the file name. One of the pain point for Hive table is that, there are multiple metadata files in the metadata directory, and you don't know which one is generated by a succeeded job, which is not. Hadoop table doesn't have this issue, it can overwrite the file generated by the failed commit. 
   
   It is still valuable though. You can connect the file with the job easily by looking at the log.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on pull request #4681: Core: Log the new metadata location in commit.

Posted by GitBox <gi...@apache.org>.
flyrain commented on PR #4681:
URL: https://github.com/apache/iceberg/pull/4681#issuecomment-1116412642

   Hi @RussellSpitzer, @rdblue @kbendick, made the change only to `HiveTableOperation`. I can make change to other table operations you think it is necessary. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on pull request #4681: Core: Log the new metadata location in commit.

Posted by GitBox <gi...@apache.org>.
flyrain commented on PR #4681:
URL: https://github.com/apache/iceberg/pull/4681#issuecomment-1117583352

   Thanks all for the review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #4681: Core: Log the new metadata location in commit.

Posted by GitBox <gi...@apache.org>.
rdblue merged PR #4681:
URL: https://github.com/apache/iceberg/pull/4681


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #4681: Core: Log the new metadata location in commit.

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #4681:
URL: https://github.com/apache/iceberg/pull/4681#issuecomment-1117511160

   Thanks, @flyrain!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on pull request #4681: Core: Log the new metadata location in commit.

Posted by GitBox <gi...@apache.org>.
flyrain commented on PR #4681:
URL: https://github.com/apache/iceberg/pull/4681#issuecomment-1115236877

   > I'm not sure we want to force all future table operations to include a literal location (or have to return one). So we may want to have the logging in the operations (HiveTableOperations, HadoopTableOperations, ... ) themselves?
   
   Good point. But I think a table version string will be there even for a future catalog without a metatdata.json file. It can easily support that, with minor word change. like `Successfully committed to table {} with the new metadata location({})` -> `Successfully committed to table {} with the new version ({})`
   
   Hi @kbendick, what do you think from the perspective of rest API catalog?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #4681: Core: Log the new metadata location in commit.

Posted by GitBox <gi...@apache.org>.
kbendick commented on PR #4681:
URL: https://github.com/apache/iceberg/pull/4681#issuecomment-1115529855

   > > I'm not sure we want to force all future table operations to include a literal location (or have to return one). So we may want to have the logging in the operations (HiveTableOperations, HadoopTableOperations, ... ) themselves?
   > 
   > Good point. But I think a table version string will be there even for a future catalog without a metatdata.json file. It can easily support that, with minor word change. like `Successfully committed to table {} with the new metadata location({})` -> `Successfully committed to table {} with the new version ({})`
   > 
   > Hi @kbendick, what do you think from the perspective of rest API catalog?
   
   For the current REST catalog, a file should be loggable. That said, I do agree that it's probably best not to force catalogs to do so.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on a diff in pull request #4681: Core: Log the new metadata location in commit.

Posted by GitBox <gi...@apache.org>.
flyrain commented on code in PR #4681:
URL: https://github.com/apache/iceberg/pull/4681#discussion_r863128766


##########
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java:
##########
@@ -311,6 +311,8 @@ protected void doCommit(TableMetadata base, TableMetadata metadata) {
     } finally {
       cleanupMetadataAndUnlock(commitStatus, newMetadataLocation, lockId, tableLevelMutex);
     }
+
+    LOG.info("Committed to table {} with the new metadata location {}", fullName, newMetadataLocation);

Review Comment:
   Instead of putting this log in both line 278 and line 297, I put the log here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #4681: Core: Log the new metadata location in commit.

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #4681:
URL: https://github.com/apache/iceberg/pull/4681#issuecomment-1115254429

   I agree with @RussellSpitzer that this should be done in the catalog, not generally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #4681: Core: Log the new metadata location in commit.

Posted by GitBox <gi...@apache.org>.
kbendick commented on PR #4681:
URL: https://github.com/apache/iceberg/pull/4681#issuecomment-1116854196

   > Added the support for `HadoopTableOperations`. But it is less likely be confusing for Hadoop table since it always write with the latest version number as the file name. One of the pain point for Hive table is that, there are multiple metadata files in the metadata directory, and you don't know which one is generated by a succeeded job, which is not. Hadoop table doesn't have this issue, it can overwrite the file generated by the failed commit.
   > 
   > It is still valuable though. You can connect the file with the job easily by looking at the log.
   
   Ah that makes sense. I'm good with this use case then. If other TableOperations decide it's needed, then we can add it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #4681: Core: Log the new metadata location in commit.

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on PR #4681:
URL: https://github.com/apache/iceberg/pull/4681#issuecomment-1115209512

   I'm not sure we want to force all future table operations to include a literal location (or have to return one). So we may want to have the logging in the operations (HiveTableOperations, HadoopTableOperations, ... ) themselves?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #4681: Core: Log the new metadata location in commit.

Posted by GitBox <gi...@apache.org>.
kbendick commented on PR #4681:
URL: https://github.com/apache/iceberg/pull/4681#issuecomment-1116761069

   > Hi @RussellSpitzer, @rdblue @kbendick, made the change only to `HiveTableOperation`. I can make change to other table operations you think it is necessary.
   
   I'm good with just adding the log to `HiveTableOperations` for now. I'd _consider_ adding the log to `HadoopTableOperations` as well, as a lot of testing takes place using that and so investigations might wind up using that.
   
   But realistically, if people who use other table operations find need or value for this log, it can be added as a follow up (particularly by people who make more common use of those table operations).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org