You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/05 20:07:57 UTC

[GitHub] [iceberg] aokolnychyi opened a new issue #2300: Build a utility to infer partitions at a given path

aokolnychyi opened a new issue #2300:
URL: https://github.com/apache/iceberg/issues/2300


   We need to infer partitions to migrate path-based tables to Iceberg. While we can leverage some internal code in query engines like `InMemoryFileIndex` in Spark, it is better to build our own utility and use it in Spark, Flink, etc modules.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2300: Build a utility to infer partitions at a given path

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2300:
URL: https://github.com/apache/iceberg/issues/2300#issuecomment-794719689


   @pvary, I think this logic would only apply to path-based tables that are not tracked in a metastore. It is pretty popular at least in Spark. Whenever we have a metastore, we should talk to it for figure out the list of partitions and their locations.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #2300: Build a utility to infer partitions at a given path

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #2300:
URL: https://github.com/apache/iceberg/issues/2300#issuecomment-792572948


   We are also thinking about migrating Hive tables to Iceberg tables by creating the appropriate metadata files. 
   
   For us it  would be better to have an API what the engines can implement. In Hive it is possible to set the location on partition level, so we might be better off using HMS to get the partition information than parsing the path. So I think some API that will return Pair<Map<PARTITION_KEY, PARTITION_VALUE>, List<PATH>> might be better.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2300: Build a utility to infer partitions at a given path

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2300:
URL: https://github.com/apache/iceberg/issues/2300#issuecomment-791653194


   I'll give it a try.
   
   FYI, @RussellSpitzer @pvary @openinx @rdblue 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org