You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Jorge Cardoso Leitão <jo...@gmail.com> on 2021/07/01 09:14:16 UTC

Re: [Discuss] Consider renaming "Arrow" in HO2 benchmarks?

Hi,

I did not know what to change there for renaming the bench, as the bench
name seems to be used in different places. I thus started with an issue,
https://github.com/h2oai/db-benchmark/issues/229.

Best,
Jorge


On Fri, Jun 25, 2021 at 1:04 PM Wes McKinney <we...@gmail.com> wrote:

> I recommend sending a PR to the benchmark repo that clarifies that
> it's executing the query using the arrow R/C++ library, when in fact
> the query is actually primarily handled by dplyr and not Arrow at all.
> The benchmark is very misleading in its current form.
>
> On Fri, Jun 25, 2021 at 11:55 AM Jorge Cardoso Leitão
> <jo...@gmail.com> wrote:
> >
> > Hi,
> >
> > HO2 has a set of benchmarks comparing different query engines [1].
> >
> > There is currently an implementation named "Arrow", backed by the Arrow R
> > implementation [2].
> >
> > This is one of the least performant implementations evaluated. I sense
> that
> > this may negatively affect the Arrow format, as people will (even if
> > unfairly) associate "Arrow" to "poor performance". In fact, polars and
> > cuDF, the top performers, also use Arrow as their backing in-memory
> format.
> >
> > Would it make sense to avoid naming specific query engines as "Arrow"
> (e.g.
> > like we do with DataFusion, Grandiva, etc), so that these
> misunderstandings
> > are avoided?
> >
> > Best,
> > Jorge
> >
> > [1] https://h2oai.github.io/db-benchmark/
> > [2] https://github.com/h2oai/db-benchmark/tree/master/arrow
>