You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/04 16:39:43 UTC

[GitHub] [iceberg] rdblue commented on pull request #5436: Python: Split PyArrowFile into PyArrowInputFile and PyArrowOutputFile

rdblue commented on PR #5436:
URL: https://github.com/apache/iceberg/pull/5436#issuecomment-1205506576

   I don't think that we want to do this. We discussed this when introducing the FIleIO API, but `__enter__` and `__exit__` don't fit well. Looks like you hit some of the problems:
   * This introduces `FileAlreadyOpenError` because `InputFile` can be used for only one stream at a time
   * The overwrite option needs to be added to `OutputFile` even though that's an option for the writer to decide, not `FileIO`
   
   I'm all for adding `__enter__` and `__exit__` support though. What about adding it for `InputStream` and `OutputStream`? We could have `__exit__` call `close`. Then usage would look like this:
   
   ```python
   file = io.newInputFile("s3://bucket/path.parquet")
   with file.create(overwrite=True) as f:
       f.write(...)
   
   with file.toInputFile().open() as f:
       f.read(...)
   ```
   
   I also think that the `close` method should still exist. We just want to make `__exit__` call that method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org