You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/04 19:16:05 UTC

[GitHub] [iceberg] Fokko opened a new pull request, #5439: Python: Add mkdir to the FileIO

Fokko opened a new pull request, #5439:
URL: https://github.com/apache/iceberg/pull/5439

   We need this for Hive to create the `/metadata` directory


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5439: Python: Add mkdir to the FileIO

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5439:
URL: https://github.com/apache/iceberg/pull/5439#issuecomment-1207500016

   > Alright, in that case, it doesn't make sense to implement this in the actual code. I'll create a wrapper for any FileIO that will first create the directory.
   
   I don't think I follow. The expectation is that `OutputStream.create` will create the file and any necessary directory structure above it. That's the default behavior for HDFS, S3, and other object stores, so I don't think that we need a generic wrapper. We may need to ensure that directories exist for some URI schemes in specific implementations (maybe arrow doesn't do this?) but I don't think we could make a generic wrapper that does it. At least, I don't see what a generic wrapper would call.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #5439: Python: Add mkdir to the FileIO

Posted by GitBox <gi...@apache.org>.
Fokko commented on PR #5439:
URL: https://github.com/apache/iceberg/pull/5439#issuecomment-1206080680

   This is mostly for local testing (against my local file system), but I also think we'll need this in the case of HDFS. We could just stub them out for object stores. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #5439: Python: Add mkdir to the FileIO

Posted by GitBox <gi...@apache.org>.
Fokko commented on PR #5439:
URL: https://github.com/apache/iceberg/pull/5439#issuecomment-1207475027

   > Once we put it into the base class, it becomes part of the api though. Seems like if this is specific to the implementation (e.g. hive metastore) we could just use something directly from there without going through FileIO.
   
   It would be relevant for every catalog that creates the table metadata (the REST catalog does this for us).
   
   > HDFS will automatically create directories. If we want to test locally, we should make a LocalFileIO that has the expected behavior.
   
   Alright, in that case, it doesn't make sense to implement this in the actual code. I'll create a wrapper for any FileIO that will first create the directory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko closed pull request #5439: Python: Add mkdir to the FileIO

Posted by GitBox <gi...@apache.org>.
Fokko closed pull request #5439: Python: Add mkdir to the FileIO
URL: https://github.com/apache/iceberg/pull/5439


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on pull request #5439: Python: Add mkdir to the FileIO

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on PR #5439:
URL: https://github.com/apache/iceberg/pull/5439#issuecomment-1206769070

   > This is mostly for local testing (against my local file system), but I also think we'll need this in the case of HDFS. We could just stub them out for object stores. I agree that we should leave potential harmful methods such as `ls` out 👍🏻
   
   Once we put it into the base class, it becomes part of the api though.  Seems like if this is specific to the implementation (e.g. hive metastore) we could just use something directly from there without going through FileIO.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5439: Python: Add mkdir to the FileIO

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5439:
URL: https://github.com/apache/iceberg/pull/5439#issuecomment-1207441506

   HDFS will automatically create directories. If we want to test locally, we should make a LocalFileIO that has the expected behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org