You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/01/10 17:44:29 UTC

[GitHub] [incubator-iceberg] aokolnychyi opened a new issue #733: Collect additional metadata while writing manifets

aokolnychyi opened a new issue #733: Collect additional metadata while writing manifets
URL: https://github.com/apache/incubator-iceberg/issues/733
 
 
   In order to avoid reading manifests to get stats in #675, we need to collect additional metadata when writing manifests. The minimum required information (num added/deleted records) can be gathered easily. The main question is how to store it.
   
   **Option 1**
   
   We can store the new metadata as we store the number of added/deleted/existing files.
   
   Benefits:
   - We can always retrieve the additional information by reading manifests.
   
   Drawbacks:
   - Affects what we actually store on disk.
   - Potentially increases the metadata size.
   
   **Option 2**
   
   We can introduce a serializable field of type `Map<String, String>` but don't store that on disk. Only instances of `ManifestFile` created via `ManifestWriter` will always contain all needed properties.
   
   Benefits:
   - The metadata on disk doesn't change.
   - The metadata size stays the same.
   
   Drawbacks
   - We cannot retrieve the additional information by reading manifests.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] aokolnychyi commented on issue #733: Collect additional metadata while writing manifets

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #733: Collect additional metadata while writing manifets
URL: https://github.com/apache/incubator-iceberg/issues/733#issuecomment-573136588
 
 
   @rdblue, how did you envision future extensions when you wrote `GenericManifestFile`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] aokolnychyi commented on issue #733: Collect additional metadata while writing manifets

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #733: Collect additional metadata while writing manifets
URL: https://github.com/apache/incubator-iceberg/issues/733#issuecomment-578989012
 
 
   This was done. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] aokolnychyi closed issue #733: Collect additional metadata while writing manifets

Posted by GitBox <gi...@apache.org>.
aokolnychyi closed issue #733: Collect additional metadata while writing manifets
URL: https://github.com/apache/incubator-iceberg/issues/733
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on issue #733: Collect additional metadata while writing manifets

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #733: Collect additional metadata while writing manifets
URL: https://github.com/apache/incubator-iceberg/issues/733#issuecomment-574808623
 
 
   Does it make sense for `ManifestFile` to have stats like row-count, added-row-count, and deleted-row-count? I think those are the main ones that we care about that are not available because manifests already have file counts. I think it may make sense to add those to `ManifestFile`. 3 more longs aren't going to be a ton of space in the manifest list.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] aokolnychyi edited a comment on issue #733: Collect additional metadata while writing manifets

Posted by GitBox <gi...@apache.org>.
aokolnychyi edited a comment on issue #733: Collect additional metadata while writing manifets
URL: https://github.com/apache/incubator-iceberg/issues/733#issuecomment-573136588
 
 
   @rdblue, how did you envision future extensions when you wrote `GenericManifestFile` and what do you think makes sense in this case?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org