You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/09/29 23:47:00 UTC
[jira] [Comment Edited] (ARROW-10524) [C++][Dataset] Add FlightFragment

    [ https://issues.apache.org/jira/browse/ARROW-10524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17422448#comment-17422448 ] 

Weston Pace edited comment on ARROW-10524 at 9/29/21, 11:46 PM:
----------------------------------------------------------------

I don't really like the flag on the fragment (see https://github.com/apache/arrow/pull/10913#discussion_r699822754).  Pushing down some computation is ok and we have mechanisms for it.  For example, pushing down a filter is fine and the mechanism is the guarantee.

Pushing down projection is not generally a good idea.  For example, consider a query with an order by where the order key column is removed by the projection.  On the other hand, fragments do need to be able to project/cast from the file schema to the dataset schema but this is a different problem statement.

For more general computation we are venturing into the realm of a distributed query engine and not a fragment or file format.  As another example, consider an order by.  You can push down the filtering but you have to do a corresponding merge.  That might make sense if all your leaves can handle sort but if only some of your leaves can handle sort then I don't know if there is much merit in getting back some batches sorted and others unsorted.


was (Author: westonpace):
I don't really like the flag on the fragment (see https://github.com/apache/arrow/pull/10913#discussion_r699822754).  Pushing down some computation is ok and we have mechanisms for it.  For example, pushing down a filter is fine and the mechanism is the guarantee.

Pushing down projection is not generally a good idea.  For example, consider a query with an order by where the order key column is removed by the projection.  On the other hand, fragments do need to be able to project/cast from the file schema to the dataset schema but this is a different story.

For more general computation we are venturing into the realm of a distributed query engine and not a fragment or file format.  As another example, consider an order by.  You can push down the filtering but you have to do a corresponding merge.  That might make sense if all your leaves can handle sort but if only some of your leaves can handle sort then I don't know if there is much merit in getting back some batches sorted and others unsorted.

> [C++][Dataset] Add FlightFragment
> ---------------------------------
>
>                 Key: ARROW-10524
>                 URL: https://issues.apache.org/jira/browse/ARROW-10524
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 2.0.0
>            Reporter: Ben Kietzman
>            Assignee: Ben Kietzman
>            Priority: Major
>              Labels: dataset
>             Fix For: 6.0.0
>
>
> Allow wrapping a flight service as a dataset/fragment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)