You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Lekshmi <le...@gmail.com> on 2019/01/01 18:29:46 UTC

Re: Question Regarding The Benchmark of Calcite Compared To Conventional Database System(Related to CALCITE-2169)

Hi Julian, Stamatis, Seung-Hwan,

@Julian, Thank you for suggesting JMH for performance analysis. I prepared
a TPC-H benchmark test using JMH and  as you already mentioned, the time
for each phases reduced drastically. I was surprised and the same time
happy to see the benchmark results.
@Stamatis, Yes you are correct. The support for adding the custom rules in
Calcite will beat the overall performance of built-in rules-set of any
RDBMS. I can convince my guide with regards to these points .
@Seung-Hwan, Thank you so much for your interest in collaborating with us.
I'll be also very happy to contribute towards the benchmark analysis for
Calcite.

The JMH benchmark test for TPC-H and the results of its comparison with
postgres is also attached to this email. Please find them and I appreciate
your feedback in this regard.

Thanking you

Lekshmi B.G
Email: lekshmibg09@gmail.com




On Mon, Dec 31, 2018 at 3:53 PM Lim, Seung-Hwan <li...@ornl.gov.invalid>
wrote:

> Hi Lekshmi,
>
> I am one of the members in Edmon Begoli’s team who did preliminary work on
> comparison Calcite with conventional RDBMS, especially postgresql.
>
> The major challenge that we’ve identified is that many benchmarks (e.g.,
> TPC-H, TPC-DS) evaluates the performance of ‘Join’ operations. For both
> benchmarks, Calcite often generated less optimized plan than RDBMS.
>
> I’ll be very happy to help you or collaborate with you in this regard.
>
> Thank you,
> Seung-Hwan
>
>
> > On Dec 31, 2018, at 8:33 AM, Stamatis Zampetakis <za...@gmail.com>
> wrote:
> >
> > Hi Lekshmi,
> >
> > Thanks for the interesting information. It is good to see more people
> > involved in the benchmark and optimizations on Calcite.
> >
> > However, I am not sure I understand what you are trying to achieve by
> > performing an all-in-all comparison between Calcite and other databases
> (in
> > the particular case with Postgres).
> > Calcite provides you everything you need to build a database but itself
> is
> > not a database. Could you possibly share a bit more information on what
> you
> > are expecting to gain from these kind of experiments.
> >
> > On the other hand, it would be very interesting to compare individual
> parts
> > of Calcite (e.g., optimizer) with the respective ones of Postgres (or
> other
> > database) although this will not be easy.
> > If for instance, you want to compare the optimizers in terms of
> > performance, time may not be a good metric since C code will almost
> always
> > be faster than Java code.
> > Another comparison axe for the optimizer, could be the quality of the
> > produced plans but finding a good metric can be also challenging.
> Measuring
> > the quality of the plan could be based on the execution time of the plan
> on
> > the same engine (all in Calcite or all in Postgres for instance). In
> terms
> > of research, I guess it would be very nice to demonstrate that a Volcano
> > optimizer (Calcite) with a custom rule-set can beat the built-in
> optimizer
> > of Postgres in terms of plan quality; plus it would be very useful for
> many
> > end-users of Calcite to have a rule-set that simulates the optimizer of
> > Postgres (or another database).
> >
> > As a general comment, I think it would be easier to find good use cases
> in
> > favor of Calcite if you emphasize in data integration scenarios,
> > cross-database queries,  querying raw data (not in a database) and/or
> > systems without an optimizer.
> >
> > Best,
> > Stamatis
> >
> > Στις Δευ, 31 Δεκ 2018 στις 12:35 μ.μ., ο/η Lekshmi <
> lekshmibg09@gmail.com>
> > έγραψε:
> >
> >> Hi Julian,
> >>
> >> Thanks for a lot for the prompt response and support. I will try running
> >> the test with JMH and will let you know the feedback.
> >>
> >> I wish you all have a prosperous new year.
> >>
> >> Thanks and Regards
> >>
> >> Lekshmi B.G
> >> Email: lekshmibg09@gmail.com
> >>
> >>
> >>
> >>
> >> On Mon, Dec 31, 2018 at 10:38 AM Julian Feinauer <
> >> j.feinauer@pragmaticminds.de> wrote:
> >>
> >>> Hi Lekshmi,
> >>>
> >>> your activity sounds very interesting.
> >>> One important thing to note is that Performance testing in Java is
> always
> >>> tricky due to JIT and "warmup" phase of the JVM. Thus it is generally
> >>> recommended to do these tests with JMH (
> >>> https://openjdk.java.net/projects/code-tools/jmh/).
> >>>
> >>> I would assume that the time for sql2rel reduces drastically (perhaps
> one
> >>> or two orders) when run with JMH.
> >>>
> >>> Best
> >>> Julian
> >>>
> >>> Am 30.12.18, 23:12 schrieb "Lekshmi" <le...@gmail.com>:
> >>>
> >>>    Hello Folks,
> >>>
> >>>    For my research activities, I was trying to perform a benchmark
> >>> comparison
> >>>    between calcite with other database systems.  As an initial step, I
> >> was
> >>>    trying to do it for *Calcite* and *PostgresSql*. So, I thought TPCH
> >>> queries
> >>>    were the right thing to start with. I tried running the TpchTest (
> >>>
> >>>
> >>
> https://github.com/apache/calcite/blob/master/plus/src/test/java/org/apache/calcite/adapter/tpch/TpchTest.java
> >>> )
> >>>    by adding the *CalciteTimingTracer* in the junit tests to determine
> >> the
> >>>    execution time. While doing so, I could see that the execution time
> >> in
> >>>    calcite is significantly higher compared to postgresSql. On further
> >>>    investigation, I could see that we generate the required datas
> >>> required for
> >>>    these queries(which comes around 150,000 for some tables) and I was
> >>> under
> >>>    an impression that most of the time was spend on the data generation
> >>> and
> >>>    that the query execution could be faster. So, I modified the
> relevant
> >>>    schema class (
> >>>
> >>>
> >>
> https://github.com/apache/calcite/blob/master/plus/src/main/java/org/apache/calcite/adapter/tpch/TpchSchema.java
> >>> )
> >>>    to perform the data generation and query execution separately. Then,
> >> I
> >>>    traced the time took for just query execution. Even, then there was
> a
> >>>    significant difference from that of PostgresSql.
> >>>
> >>>    I, also enabled the *log4j.rootLogger* to *TRACE * to find the time
> >>> spend
> >>>    for sql2rel and optimization phases of the class Prepare
> >>>    <
> >>>
> >>>
> >>
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/Prepare.java
> >>>> .
> >>>    And, to my surprise, I could see that calcite takes a time of 355ms
> >> for
> >>>    sql2rel and 352ms for optimization for the junit test *testQuery01*.
> >>> On the
> >>>    other side, the same query gave a planning time of 0.163ms in
> >> Postgres.
> >>>
> >>>    I would like to know, if this is the right way to test the
> >> performance
> >>> of
> >>>    TPCH queries using apache calcite. Can anyone let me know if there
> >>> exist
> >>>    any better ways to do it.
> >>>
> >>>    And, while searching through JIRA, I could find a ticket
> >>>    https://issues.apache.org/jira/browse/CALCITE-2169 which was
> created
> >>> by
> >>>    Edmon Begoli for performing a comparative performance study of the
> >>> calcite
> >>>    framework. I think, its related to my current problem. I have no
> idea
> >>>    regarding the status of the ticket. It would be really great if
> >> someone
> >>>    could help me with some information on it.
> >>>
> >>>    Also, now coming to the personal preference, I would like to
> continue
> >>> my
> >>>    research in calcite due to its simplicity and extensibility.  But,
> >> if I
> >>>    fail to give a good case study in favour of Calcite, I am afraid
> >> that I
> >>>    could loose an opportunity to work with calcite.
> >>>
> >>>    Thanks and Regards
> >>>
> >>>    Lekshmi B.G
> >>>    Email: lekshmibg09@gmail.com
> >>>
> >>>
> >>>
> >>
>
>