You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/11/12 00:56:00 UTC

[GitHub] [arrow] ivotron opened a new pull request #8647: ARROW-10549: [C++][Dataset] RADOS dataset

ivotron opened a new pull request #8647:
URL: https://github.com/apache/arrow/pull/8647


   This PR contains a basic implementation of the Dataset API on Ceph that uses the librados C++ API to defer evaluation of expressions to a RADOS storage backend. The storage-side code is included, as well as unit and integration tests.
   
   The Dataset implementation on RADOS is done by adding new RadosDataset and RadosFragment classes. A scanning operation triggers the evaluation of expressions on the storage-side. The PR includes a wrapper for the librados library, as well as a mock, that allows to run unit tests without having a Ceph instance. Integration tests have been modified in order to install Ceph and run without the mocks (running tests against a single-node Ceph "cluster").
   
   The storage-side code is implemented as a RADOS CLS (object storage class) using Ceph's [RADOS SDK](https://docs.ceph.com/en/octopus/architecture/#extending-ceph). The code lives in `cpp/src/arrow/adapters/arrow-rados-cls/`, and is expected to be deployed on the storage nodes (Ceph's OSDs) prior to operating on tables through the RadosDataset implementation. This PR includes a cmake configuration for building this library if desired (`ARROW_CLS` cmake option).
   
   Follow up work includes: dataset discovery, python bindings, large fragment stripping, IPC improvements on the backend, and a python library for writing tables.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8647: ARROW-10549: [C++][Dataset] RADOS dataset

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8647:
URL: https://github.com/apache/arrow/pull/8647#issuecomment-725759470


   https://issues.apache.org/jira/browse/ARROW-10549


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ivotron commented on pull request #8647: ARROW-10549: [C++][Dataset] RADOS dataset

Posted by GitBox <gi...@apache.org>.
ivotron commented on pull request #8647:
URL: https://github.com/apache/arrow/pull/8647#issuecomment-753504342


   we are working on a different version of this same functionality, and thus will close this for now. thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ivotron closed pull request #8647: ARROW-10549: [C++][Dataset] RADOS dataset

Posted by GitBox <gi...@apache.org>.
ivotron closed pull request #8647:
URL: https://github.com/apache/arrow/pull/8647


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org