You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/26 12:26:25 UTC

[GitHub] [arrow] lidavidm commented on pull request #10991: ARROW-13572: [C++][Datasets] Add ORC support to Datasets API

lidavidm commented on pull request #10991:
URL: https://github.com/apache/arrow/pull/10991#issuecomment-906362174


   > This also brings up the question for the Python/Cython side: how to do this when ORC is an optional feature and might not be built by the C++ library? Currently for the datasets cython code, we assume all formats (so also Parquet) are simply always available.
   
   It would be a lot of refactoring, but you could imagine having a _dataset.pyx/pxd (with the pxd containing base classes/definitions), then a _dataset_orc.pyx, _dataset_parquet.pyx, etc. with setup.py configuring everything as appropriate (e.g. --with-dataset + --with-orc implying _dataset_orc.pyx should be built). FileFormat._wrap and Fragment._wrap would get trickier, though: 
   https://github.com/apache/arrow/blob/257d0aa936786c095b5560adb27bffaaccaed589/python/pyarrow/_dataset.pyx#L832-L848


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org