You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/28 12:34:36 UTC

[GitHub] [arrow] JayjeetAtGithub edited a comment on pull request #10431: ARROW-12921: [C++][Dataset] Add RadosParquetFileFormat to Dataset API

JayjeetAtGithub edited a comment on pull request #10431:
URL: https://github.com/apache/arrow/pull/10431#issuecomment-888255960


   @westonpace Thanks a lot for taking the time to review the PR. I wanted to let you know that I have just updated the Pull request branch with the latest changes we had in our development branch (in our fork). The main change that you would see now as compared to the last time you looked at the PR, is that we changed the `RadosParquetFileFormat` name to `SkyhookFileFormat`, which would act as an abstraction to offload into Ceph any real file format supported by Arrow. Currently, `SkyhookFileFormat` supports LZ4 compressed IPC (feather) and Parquet. 
   
   > The approach of extending ParquetFileFormat seems like it might lead to duplicate code since you'd need a RadosXyzFileFormat for every fragment. Instead would it be possible to have RadosFragment which takes in (via the constructor) a shared_ptr to a FileFormat? The FileFormat could then be serialized (which format to use and read options) and sent as part of the rados scan request.
   
   I think the `SkyhookFileFormat` addresses this problem probably.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org