You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/18 09:04:01 UTC

[GitHub] [arrow] LouisClt opened a new issue, #14673: [C++] [Orc] Get the number of rows per stripe without reading it

LouisClt opened a new issue, #14673:
URL: https://github.com/apache/arrow/issues/14673

   Hello, I have the need to read only a subpart of the ORC file (say records 5000 to 6500 for instance).
   The goal is to do it in an efficient way reading only the stripes that contain the data (not the whole file).
   To do this I need to get the number of rows by stripe.
   Looking at the code, it seems this is known in the implementation but not reported back at the API level.
   See: https://github.com/apache/arrow/blob/b4a8320890c6658f948e025f522db5f125a1f8dc/cpp/src/arrow/adapters/orc/adapter.cc#L207-L212
   Is that correct ?
   There is also a potential way of doing that by using the "seek()" method, and then using the "NextStripeReader" method. but this allows only reading one stripe if I am correct, and I also prefer going with the "ReadStripe(...)" method.
   If this is correct, I am willing to add a method to retrieve this. What can be its name ? NumerOfRows(int64_t stripe) ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] LouisClt commented on issue #14673: [C++] [Orc] Get the number of rows per stripe without reading it

Posted by GitBox <gi...@apache.org>.
LouisClt commented on issue #14673:
URL: https://github.com/apache/arrow/issues/14673#issuecomment-1347903738

   Closed by [ARROW-18421](https://github.com/apache/arrow/pull/14806)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] LouisClt closed issue #14673: [C++] [Orc] Get the number of rows per stripe without reading it

Posted by GitBox <gi...@apache.org>.
LouisClt closed issue #14673: [C++] [Orc] Get the number of rows per stripe without reading it
URL: https://github.com/apache/arrow/issues/14673


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org