You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "suremarc (via GitHub)" <gi...@apache.org> on 2023/03/24 00:13:51 UTC

[GitHub] [arrow-rs] suremarc commented on issue #3922: Support reverse order for Parquet streams

suremarc commented on issue #3922:
URL: https://github.com/apache/arrow-rs/issues/3922#issuecomment-1482076248

   Thank you for the speedy reply. It sounds like this feature doesn't really agree with Parquet very much, unfortunately. 
   
   > That being said, it is possible to just decode the last n rows using [`RowSelection`](https://docs.rs/parquet/latest/parquet/arrow/arrow_reader/struct.RowSelection.html). This means if the DataFusion query optimiser could be taught to push this down, it should work without requiring any additional functionality in the parquet crate.
   
   This makes sense, but I think I should have been more specific — the queries I was testing also involved filtering, e.g. `SELECT * FROM table WHERE attribute = 'value' ORDER BY field DESC LIMIT n`. Unless I am misunderstanding, I do not think it is possible to select the last N rows subject to a predicate with a `RowSelection`. 
   
   I am starting to think maybe Parquet and Datafusion are not ideal for my company's use case — we are interested in its analytical capabilities, but our existing products support queries of the form described above (essentially, filter + limit + sort ascending/descending on time only). Do you think it would be worth opening an issue on Datafusion about this though?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org