You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2022/01/10 02:31:21 UTC

[GitHub] [drill] paul-rogers edited a comment on issue #2421: ValueVectors replacement

paul-rogers edited a comment on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1007611673

@jnturton, one could do something like what you described. However, to have all of Drill work with Arrow would be a huge amount of work. Optimizations made for one format would be sub-optimal for the other. (Example: exchanges.) Furthermore, your use case would benefit from vectors only in the project and grouping operators.

So, I wonder if we might think about the problem operator-by-operator. If you have a compute-heavy phase, might that first transform data to vectors, apply the compute, then send data along in row format? Every fragment does a network exchange: data is read/written anyway. So, perhaps there is something that can be done to transform formats at fragment boundaries (he says, waving hands wildly...)

You'll also get speed only for queries without joins. If you have joins, then the joins are likely to take the vast amount of the runtime, leaving your projection and grouping in the noise. I'm not sure how vectorization can help joins; certainly in Drill today, vectors make the join code atrociously complex.

This is why DBs (and compiler optimizers) are hard: the answers change based on use case...

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org