You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by GitBox <gi...@apache.org> on 2018/11/27 01:25:36 UTC

[GitHub] mccheah opened a new issue #12: File I/O Submodule for TableOperations

mccheah opened a new issue #12: File I/O Submodule for TableOperations
URL: https://github.com/apache/incubator-iceberg/issues/12
 
 
   In https://github.com/Netflix/iceberg/issues/107 it was discussed that `InputFile` and `OutputFile` instances should be pluggable. We discussed the fact that provision of `InputFile` and `OutputFile` instances should be handled by the `TableOperations` API. However, the Spark data source in particular only uses `HadoopInputFile#fromPath` for reading and `HadoopOutputFile#fromPath` for writing. Using `TableOperations#newInputFile` and `TableOperations#newOutputFile`, would also be difficult because calling these methods on the executors would require `TableOperations` instances to be `Serializable`.
   
   We propose having the `TableOperations` API provide a `FileIO` module that handles the narrow role of reading, creating / writing, and deleting files. We propose the following:
   
   ```
   interface FileIO {
     InputFile newInputFile(String path);
     OutputFile newOutputFile(String path);
     void deleteFile(String path);
   }
   ```
   
   Then the following method would be added to `TableOperations`, and we would remove `TableOperations#newInputFile` and `TableOperations#newMetadataFile`.
   
   ```
   interface TableOperations {
     FileIO fileIo();
     String resolveNewMetadataPath(String metadataFilename);
   }
   ```
   
   The need for `resolveNewMetadataPath` is because the new `FileIO` abstraction considers all locations as full paths, but the old method `TableOperations#newMetadataFile` assumes the argument is a file name, not a full path. Therefore now callers that used to call `TableOperations#newMetadataFile` should first retrieve the full path and then pass that along to `FileIO#newOutputFile`. For convenience we could add a helper default method like so:
   
   ```
   interface TableOperations {
     FileIO fileIo();
     String resolveNewMetadataPath(String metadataFilename);
     default OutputFile newMetadataFile(String fileName) {
       return fileIo().newOutputFile(resolveMetadataPath(fileName));
     }
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services