You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Lekshmi <le...@gmail.com> on 2018/12/30 22:11:55 UTC

Question Regarding The Benchmark of Calcite Compared To Conventional Database System(Related to CALCITE-2169)

Hello Folks,

For my research activities, I was trying to perform a benchmark comparison
between calcite with other database systems.  As an initial step, I was
trying to do it for *Calcite* and *PostgresSql*. So, I thought TPCH queries
were the right thing to start with. I tried running the TpchTest (
https://github.com/apache/calcite/blob/master/plus/src/test/java/org/apache/calcite/adapter/tpch/TpchTest.java)
by adding the *CalciteTimingTracer* in the junit tests to determine the
execution time. While doing so, I could see that the execution time in
calcite is significantly higher compared to postgresSql. On further
investigation, I could see that we generate the required datas required for
these queries(which comes around 150,000 for some tables) and I was under
an impression that most of the time was spend on the data generation and
that the query execution could be faster. So, I modified the relevant
schema class (
https://github.com/apache/calcite/blob/master/plus/src/main/java/org/apache/calcite/adapter/tpch/TpchSchema.java)
to perform the data generation and query execution separately. Then, I
traced the time took for just query execution. Even, then there was a
significant difference from that of PostgresSql.

I, also enabled the *log4j.rootLogger* to *TRACE * to find the time spend
for sql2rel and optimization phases of the class Prepare
<
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/Prepare.java>.
And, to my surprise, I could see that calcite takes a time of 355ms for
sql2rel and 352ms for optimization for the junit test *testQuery01*. On the
other side, the same query gave a planning time of 0.163ms in Postgres.

I would like to know, if this is the right way to test the performance of
TPCH queries using apache calcite. Can anyone let me know if there exist
any better ways to do it.

And, while searching through JIRA, I could find a ticket
https://issues.apache.org/jira/browse/CALCITE-2169 which was created by
Edmon Begoli for performing a comparative performance study of the calcite
framework. I think, its related to my current problem. I have no idea
regarding the status of the ticket. It would be really great if someone
could help me with some information on it.

Also, now coming to the personal preference, I would like to continue my
research in calcite due to its simplicity and extensibility.  But, if I
fail to give a good case study in favour of Calcite, I am afraid that I
could loose an opportunity to work with calcite.

Thanks and Regards

Lekshmi B.G
Email: lekshmibg09@gmail.com

Re: Question Regarding The Benchmark of Calcite Compared To Conventional Database System(Related to CALCITE-2169)

Posted by Lekshmi <le...@gmail.com>.
Hi Julian, Stamatis, Seung-Hwan,

@Julian, Thank you for suggesting JMH for performance analysis. I prepared
a TPC-H benchmark test using JMH and  as you already mentioned, the time
for each phases reduced drastically. I was surprised and the same time
happy to see the benchmark results.
@Stamatis, Yes you are correct. The support for adding the custom rules in
Calcite will beat the overall performance of built-in rules-set of any
RDBMS. I can convince my guide with regards to these points .
@Seung-Hwan, Thank you so much for your interest in collaborating with us.
I'll be also very happy to contribute towards the benchmark analysis for
Calcite.

The JMH benchmark test for TPC-H and the results of its comparison with
postgres is also attached to this email. Please find them and I appreciate
your feedback in this regard.

Thanking you

Lekshmi B.G
Email: lekshmibg09@gmail.com




On Mon, Dec 31, 2018 at 3:53 PM Lim, Seung-Hwan <li...@ornl.gov.invalid>
wrote:

> Hi Lekshmi,
>
> I am one of the members in Edmon Begoli’s team who did preliminary work on
> comparison Calcite with conventional RDBMS, especially postgresql.
>
> The major challenge that we’ve identified is that many benchmarks (e.g.,
> TPC-H, TPC-DS) evaluates the performance of ‘Join’ operations. For both
> benchmarks, Calcite often generated less optimized plan than RDBMS.
>
> I’ll be very happy to help you or collaborate with you in this regard.
>
> Thank you,
> Seung-Hwan
>
>
> > On Dec 31, 2018, at 8:33 AM, Stamatis Zampetakis <za...@gmail.com>
> wrote:
> >
> > Hi Lekshmi,
> >
> > Thanks for the interesting information. It is good to see more people
> > involved in the benchmark and optimizations on Calcite.
> >
> > However, I am not sure I understand what you are trying to achieve by
> > performing an all-in-all comparison between Calcite and other databases
> (in
> > the particular case with Postgres).
> > Calcite provides you everything you need to build a database but itself
> is
> > not a database. Could you possibly share a bit more information on what
> you
> > are expecting to gain from these kind of experiments.
> >
> > On the other hand, it would be very interesting to compare individual
> parts
> > of Calcite (e.g., optimizer) with the respective ones of Postgres (or
> other
> > database) although this will not be easy.
> > If for instance, you want to compare the optimizers in terms of
> > performance, time may not be a good metric since C code will almost
> always
> > be faster than Java code.
> > Another comparison axe for the optimizer, could be the quality of the
> > produced plans but finding a good metric can be also challenging.
> Measuring
> > the quality of the plan could be based on the execution time of the plan
> on
> > the same engine (all in Calcite or all in Postgres for instance). In
> terms
> > of research, I guess it would be very nice to demonstrate that a Volcano
> > optimizer (Calcite) with a custom rule-set can beat the built-in
> optimizer
> > of Postgres in terms of plan quality; plus it would be very useful for
> many
> > end-users of Calcite to have a rule-set that simulates the optimizer of
> > Postgres (or another database).
> >
> > As a general comment, I think it would be easier to find good use cases
> in
> > favor of Calcite if you emphasize in data integration scenarios,
> > cross-database queries,  querying raw data (not in a database) and/or
> > systems without an optimizer.
> >
> > Best,
> > Stamatis
> >
> > Στις Δευ, 31 Δεκ 2018 στις 12:35 μ.μ., ο/η Lekshmi <
> lekshmibg09@gmail.com>
> > έγραψε:
> >
> >> Hi Julian,
> >>
> >> Thanks for a lot for the prompt response and support. I will try running
> >> the test with JMH and will let you know the feedback.
> >>
> >> I wish you all have a prosperous new year.
> >>
> >> Thanks and Regards
> >>
> >> Lekshmi B.G
> >> Email: lekshmibg09@gmail.com
> >>
> >>
> >>
> >>
> >> On Mon, Dec 31, 2018 at 10:38 AM Julian Feinauer <
> >> j.feinauer@pragmaticminds.de> wrote:
> >>
> >>> Hi Lekshmi,
> >>>
> >>> your activity sounds very interesting.
> >>> One important thing to note is that Performance testing in Java is
> always
> >>> tricky due to JIT and "warmup" phase of the JVM. Thus it is generally
> >>> recommended to do these tests with JMH (
> >>> https://openjdk.java.net/projects/code-tools/jmh/).
> >>>
> >>> I would assume that the time for sql2rel reduces drastically (perhaps
> one
> >>> or two orders) when run with JMH.
> >>>
> >>> Best
> >>> Julian
> >>>
> >>> Am 30.12.18, 23:12 schrieb "Lekshmi" <le...@gmail.com>:
> >>>
> >>>    Hello Folks,
> >>>
> >>>    For my research activities, I was trying to perform a benchmark
> >>> comparison
> >>>    between calcite with other database systems.  As an initial step, I
> >> was
> >>>    trying to do it for *Calcite* and *PostgresSql*. So, I thought TPCH
> >>> queries
> >>>    were the right thing to start with. I tried running the TpchTest (
> >>>
> >>>
> >>
> https://github.com/apache/calcite/blob/master/plus/src/test/java/org/apache/calcite/adapter/tpch/TpchTest.java
> >>> )
> >>>    by adding the *CalciteTimingTracer* in the junit tests to determine
> >> the
> >>>    execution time. While doing so, I could see that the execution time
> >> in
> >>>    calcite is significantly higher compared to postgresSql. On further
> >>>    investigation, I could see that we generate the required datas
> >>> required for
> >>>    these queries(which comes around 150,000 for some tables) and I was
> >>> under
> >>>    an impression that most of the time was spend on the data generation
> >>> and
> >>>    that the query execution could be faster. So, I modified the
> relevant
> >>>    schema class (
> >>>
> >>>
> >>
> https://github.com/apache/calcite/blob/master/plus/src/main/java/org/apache/calcite/adapter/tpch/TpchSchema.java
> >>> )
> >>>    to perform the data generation and query execution separately. Then,
> >> I
> >>>    traced the time took for just query execution. Even, then there was
> a
> >>>    significant difference from that of PostgresSql.
> >>>
> >>>    I, also enabled the *log4j.rootLogger* to *TRACE * to find the time
> >>> spend
> >>>    for sql2rel and optimization phases of the class Prepare
> >>>    <
> >>>
> >>>
> >>
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/Prepare.java
> >>>> .
> >>>    And, to my surprise, I could see that calcite takes a time of 355ms
> >> for
> >>>    sql2rel and 352ms for optimization for the junit test *testQuery01*.
> >>> On the
> >>>    other side, the same query gave a planning time of 0.163ms in
> >> Postgres.
> >>>
> >>>    I would like to know, if this is the right way to test the
> >> performance
> >>> of
> >>>    TPCH queries using apache calcite. Can anyone let me know if there
> >>> exist
> >>>    any better ways to do it.
> >>>
> >>>    And, while searching through JIRA, I could find a ticket
> >>>    https://issues.apache.org/jira/browse/CALCITE-2169 which was
> created
> >>> by
> >>>    Edmon Begoli for performing a comparative performance study of the
> >>> calcite
> >>>    framework. I think, its related to my current problem. I have no
> idea
> >>>    regarding the status of the ticket. It would be really great if
> >> someone
> >>>    could help me with some information on it.
> >>>
> >>>    Also, now coming to the personal preference, I would like to
> continue
> >>> my
> >>>    research in calcite due to its simplicity and extensibility.  But,
> >> if I
> >>>    fail to give a good case study in favour of Calcite, I am afraid
> >> that I
> >>>    could loose an opportunity to work with calcite.
> >>>
> >>>    Thanks and Regards
> >>>
> >>>    Lekshmi B.G
> >>>    Email: lekshmibg09@gmail.com
> >>>
> >>>
> >>>
> >>
>
>

Re: Question Regarding The Benchmark of Calcite Compared To Conventional Database System(Related to CALCITE-2169)

Posted by "Lim, Seung-Hwan" <li...@ornl.gov.INVALID>.
Hi Lekshmi,

I am one of the members in Edmon Begoli’s team who did preliminary work on comparison Calcite with conventional RDBMS, especially postgresql. 

The major challenge that we’ve identified is that many benchmarks (e.g., TPC-H, TPC-DS) evaluates the performance of ‘Join’ operations. For both benchmarks, Calcite often generated less optimized plan than RDBMS. 

I’ll be very happy to help you or collaborate with you in this regard.

Thank you,
Seung-Hwan


> On Dec 31, 2018, at 8:33 AM, Stamatis Zampetakis <za...@gmail.com> wrote:
> 
> Hi Lekshmi,
> 
> Thanks for the interesting information. It is good to see more people
> involved in the benchmark and optimizations on Calcite.
> 
> However, I am not sure I understand what you are trying to achieve by
> performing an all-in-all comparison between Calcite and other databases (in
> the particular case with Postgres).
> Calcite provides you everything you need to build a database but itself is
> not a database. Could you possibly share a bit more information on what you
> are expecting to gain from these kind of experiments.
> 
> On the other hand, it would be very interesting to compare individual parts
> of Calcite (e.g., optimizer) with the respective ones of Postgres (or other
> database) although this will not be easy.
> If for instance, you want to compare the optimizers in terms of
> performance, time may not be a good metric since C code will almost always
> be faster than Java code.
> Another comparison axe for the optimizer, could be the quality of the
> produced plans but finding a good metric can be also challenging. Measuring
> the quality of the plan could be based on the execution time of the plan on
> the same engine (all in Calcite or all in Postgres for instance). In terms
> of research, I guess it would be very nice to demonstrate that a Volcano
> optimizer (Calcite) with a custom rule-set can beat the built-in optimizer
> of Postgres in terms of plan quality; plus it would be very useful for many
> end-users of Calcite to have a rule-set that simulates the optimizer of
> Postgres (or another database).
> 
> As a general comment, I think it would be easier to find good use cases in
> favor of Calcite if you emphasize in data integration scenarios,
> cross-database queries,  querying raw data (not in a database) and/or
> systems without an optimizer.
> 
> Best,
> Stamatis
> 
> Στις Δευ, 31 Δεκ 2018 στις 12:35 μ.μ., ο/η Lekshmi <le...@gmail.com>
> έγραψε:
> 
>> Hi Julian,
>> 
>> Thanks for a lot for the prompt response and support. I will try running
>> the test with JMH and will let you know the feedback.
>> 
>> I wish you all have a prosperous new year.
>> 
>> Thanks and Regards
>> 
>> Lekshmi B.G
>> Email: lekshmibg09@gmail.com
>> 
>> 
>> 
>> 
>> On Mon, Dec 31, 2018 at 10:38 AM Julian Feinauer <
>> j.feinauer@pragmaticminds.de> wrote:
>> 
>>> Hi Lekshmi,
>>> 
>>> your activity sounds very interesting.
>>> One important thing to note is that Performance testing in Java is always
>>> tricky due to JIT and "warmup" phase of the JVM. Thus it is generally
>>> recommended to do these tests with JMH (
>>> https://openjdk.java.net/projects/code-tools/jmh/).
>>> 
>>> I would assume that the time for sql2rel reduces drastically (perhaps one
>>> or two orders) when run with JMH.
>>> 
>>> Best
>>> Julian
>>> 
>>> Am 30.12.18, 23:12 schrieb "Lekshmi" <le...@gmail.com>:
>>> 
>>>    Hello Folks,
>>> 
>>>    For my research activities, I was trying to perform a benchmark
>>> comparison
>>>    between calcite with other database systems.  As an initial step, I
>> was
>>>    trying to do it for *Calcite* and *PostgresSql*. So, I thought TPCH
>>> queries
>>>    were the right thing to start with. I tried running the TpchTest (
>>> 
>>> 
>> https://github.com/apache/calcite/blob/master/plus/src/test/java/org/apache/calcite/adapter/tpch/TpchTest.java
>>> )
>>>    by adding the *CalciteTimingTracer* in the junit tests to determine
>> the
>>>    execution time. While doing so, I could see that the execution time
>> in
>>>    calcite is significantly higher compared to postgresSql. On further
>>>    investigation, I could see that we generate the required datas
>>> required for
>>>    these queries(which comes around 150,000 for some tables) and I was
>>> under
>>>    an impression that most of the time was spend on the data generation
>>> and
>>>    that the query execution could be faster. So, I modified the relevant
>>>    schema class (
>>> 
>>> 
>> https://github.com/apache/calcite/blob/master/plus/src/main/java/org/apache/calcite/adapter/tpch/TpchSchema.java
>>> )
>>>    to perform the data generation and query execution separately. Then,
>> I
>>>    traced the time took for just query execution. Even, then there was a
>>>    significant difference from that of PostgresSql.
>>> 
>>>    I, also enabled the *log4j.rootLogger* to *TRACE * to find the time
>>> spend
>>>    for sql2rel and optimization phases of the class Prepare
>>>    <
>>> 
>>> 
>> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/Prepare.java
>>>> .
>>>    And, to my surprise, I could see that calcite takes a time of 355ms
>> for
>>>    sql2rel and 352ms for optimization for the junit test *testQuery01*.
>>> On the
>>>    other side, the same query gave a planning time of 0.163ms in
>> Postgres.
>>> 
>>>    I would like to know, if this is the right way to test the
>> performance
>>> of
>>>    TPCH queries using apache calcite. Can anyone let me know if there
>>> exist
>>>    any better ways to do it.
>>> 
>>>    And, while searching through JIRA, I could find a ticket
>>>    https://issues.apache.org/jira/browse/CALCITE-2169 which was created
>>> by
>>>    Edmon Begoli for performing a comparative performance study of the
>>> calcite
>>>    framework. I think, its related to my current problem. I have no idea
>>>    regarding the status of the ticket. It would be really great if
>> someone
>>>    could help me with some information on it.
>>> 
>>>    Also, now coming to the personal preference, I would like to continue
>>> my
>>>    research in calcite due to its simplicity and extensibility.  But,
>> if I
>>>    fail to give a good case study in favour of Calcite, I am afraid
>> that I
>>>    could loose an opportunity to work with calcite.
>>> 
>>>    Thanks and Regards
>>> 
>>>    Lekshmi B.G
>>>    Email: lekshmibg09@gmail.com
>>> 
>>> 
>>> 
>> 


Re: Question Regarding The Benchmark of Calcite Compared To Conventional Database System(Related to CALCITE-2169)

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hi Lekshmi,

Thanks for the interesting information. It is good to see more people
involved in the benchmark and optimizations on Calcite.

However, I am not sure I understand what you are trying to achieve by
performing an all-in-all comparison between Calcite and other databases (in
the particular case with Postgres).
Calcite provides you everything you need to build a database but itself is
not a database. Could you possibly share a bit more information on what you
are expecting to gain from these kind of experiments.

On the other hand, it would be very interesting to compare individual parts
of Calcite (e.g., optimizer) with the respective ones of Postgres (or other
database) although this will not be easy.
If for instance, you want to compare the optimizers in terms of
performance, time may not be a good metric since C code will almost always
be faster than Java code.
Another comparison axe for the optimizer, could be the quality of the
produced plans but finding a good metric can be also challenging. Measuring
the quality of the plan could be based on the execution time of the plan on
the same engine (all in Calcite or all in Postgres for instance). In terms
of research, I guess it would be very nice to demonstrate that a Volcano
optimizer (Calcite) with a custom rule-set can beat the built-in optimizer
of Postgres in terms of plan quality; plus it would be very useful for many
end-users of Calcite to have a rule-set that simulates the optimizer of
Postgres (or another database).

As a general comment, I think it would be easier to find good use cases in
favor of Calcite if you emphasize in data integration scenarios,
cross-database queries,  querying raw data (not in a database) and/or
systems without an optimizer.

Best,
Stamatis

Στις Δευ, 31 Δεκ 2018 στις 12:35 μ.μ., ο/η Lekshmi <le...@gmail.com>
έγραψε:

> Hi Julian,
>
> Thanks for a lot for the prompt response and support. I will try running
> the test with JMH and will let you know the feedback.
>
> I wish you all have a prosperous new year.
>
> Thanks and Regards
>
> Lekshmi B.G
> Email: lekshmibg09@gmail.com
>
>
>
>
> On Mon, Dec 31, 2018 at 10:38 AM Julian Feinauer <
> j.feinauer@pragmaticminds.de> wrote:
>
> > Hi Lekshmi,
> >
> > your activity sounds very interesting.
> > One important thing to note is that Performance testing in Java is always
> > tricky due to JIT and "warmup" phase of the JVM. Thus it is generally
> > recommended to do these tests with JMH (
> > https://openjdk.java.net/projects/code-tools/jmh/).
> >
> > I would assume that the time for sql2rel reduces drastically (perhaps one
> > or two orders) when run with JMH.
> >
> > Best
> > Julian
> >
> > Am 30.12.18, 23:12 schrieb "Lekshmi" <le...@gmail.com>:
> >
> >     Hello Folks,
> >
> >     For my research activities, I was trying to perform a benchmark
> > comparison
> >     between calcite with other database systems.  As an initial step, I
> was
> >     trying to do it for *Calcite* and *PostgresSql*. So, I thought TPCH
> > queries
> >     were the right thing to start with. I tried running the TpchTest (
> >
> >
> https://github.com/apache/calcite/blob/master/plus/src/test/java/org/apache/calcite/adapter/tpch/TpchTest.java
> > )
> >     by adding the *CalciteTimingTracer* in the junit tests to determine
> the
> >     execution time. While doing so, I could see that the execution time
> in
> >     calcite is significantly higher compared to postgresSql. On further
> >     investigation, I could see that we generate the required datas
> > required for
> >     these queries(which comes around 150,000 for some tables) and I was
> > under
> >     an impression that most of the time was spend on the data generation
> > and
> >     that the query execution could be faster. So, I modified the relevant
> >     schema class (
> >
> >
> https://github.com/apache/calcite/blob/master/plus/src/main/java/org/apache/calcite/adapter/tpch/TpchSchema.java
> > )
> >     to perform the data generation and query execution separately. Then,
> I
> >     traced the time took for just query execution. Even, then there was a
> >     significant difference from that of PostgresSql.
> >
> >     I, also enabled the *log4j.rootLogger* to *TRACE * to find the time
> > spend
> >     for sql2rel and optimization phases of the class Prepare
> >     <
> >
> >
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/Prepare.java
> > >.
> >     And, to my surprise, I could see that calcite takes a time of 355ms
> for
> >     sql2rel and 352ms for optimization for the junit test *testQuery01*.
> > On the
> >     other side, the same query gave a planning time of 0.163ms in
> Postgres.
> >
> >     I would like to know, if this is the right way to test the
> performance
> > of
> >     TPCH queries using apache calcite. Can anyone let me know if there
> > exist
> >     any better ways to do it.
> >
> >     And, while searching through JIRA, I could find a ticket
> >     https://issues.apache.org/jira/browse/CALCITE-2169 which was created
> > by
> >     Edmon Begoli for performing a comparative performance study of the
> > calcite
> >     framework. I think, its related to my current problem. I have no idea
> >     regarding the status of the ticket. It would be really great if
> someone
> >     could help me with some information on it.
> >
> >     Also, now coming to the personal preference, I would like to continue
> > my
> >     research in calcite due to its simplicity and extensibility.  But,
> if I
> >     fail to give a good case study in favour of Calcite, I am afraid
> that I
> >     could loose an opportunity to work with calcite.
> >
> >     Thanks and Regards
> >
> >     Lekshmi B.G
> >     Email: lekshmibg09@gmail.com
> >
> >
> >
>

Re: Question Regarding The Benchmark of Calcite Compared To Conventional Database System(Related to CALCITE-2169)

Posted by Lekshmi <le...@gmail.com>.
Hi Julian,

Thanks for a lot for the prompt response and support. I will try running
the test with JMH and will let you know the feedback.

I wish you all have a prosperous new year.

Thanks and Regards

Lekshmi B.G
Email: lekshmibg09@gmail.com




On Mon, Dec 31, 2018 at 10:38 AM Julian Feinauer <
j.feinauer@pragmaticminds.de> wrote:

> Hi Lekshmi,
>
> your activity sounds very interesting.
> One important thing to note is that Performance testing in Java is always
> tricky due to JIT and "warmup" phase of the JVM. Thus it is generally
> recommended to do these tests with JMH (
> https://openjdk.java.net/projects/code-tools/jmh/).
>
> I would assume that the time for sql2rel reduces drastically (perhaps one
> or two orders) when run with JMH.
>
> Best
> Julian
>
> Am 30.12.18, 23:12 schrieb "Lekshmi" <le...@gmail.com>:
>
>     Hello Folks,
>
>     For my research activities, I was trying to perform a benchmark
> comparison
>     between calcite with other database systems.  As an initial step, I was
>     trying to do it for *Calcite* and *PostgresSql*. So, I thought TPCH
> queries
>     were the right thing to start with. I tried running the TpchTest (
>
> https://github.com/apache/calcite/blob/master/plus/src/test/java/org/apache/calcite/adapter/tpch/TpchTest.java
> )
>     by adding the *CalciteTimingTracer* in the junit tests to determine the
>     execution time. While doing so, I could see that the execution time in
>     calcite is significantly higher compared to postgresSql. On further
>     investigation, I could see that we generate the required datas
> required for
>     these queries(which comes around 150,000 for some tables) and I was
> under
>     an impression that most of the time was spend on the data generation
> and
>     that the query execution could be faster. So, I modified the relevant
>     schema class (
>
> https://github.com/apache/calcite/blob/master/plus/src/main/java/org/apache/calcite/adapter/tpch/TpchSchema.java
> )
>     to perform the data generation and query execution separately. Then, I
>     traced the time took for just query execution. Even, then there was a
>     significant difference from that of PostgresSql.
>
>     I, also enabled the *log4j.rootLogger* to *TRACE * to find the time
> spend
>     for sql2rel and optimization phases of the class Prepare
>     <
>
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/Prepare.java
> >.
>     And, to my surprise, I could see that calcite takes a time of 355ms for
>     sql2rel and 352ms for optimization for the junit test *testQuery01*.
> On the
>     other side, the same query gave a planning time of 0.163ms in Postgres.
>
>     I would like to know, if this is the right way to test the performance
> of
>     TPCH queries using apache calcite. Can anyone let me know if there
> exist
>     any better ways to do it.
>
>     And, while searching through JIRA, I could find a ticket
>     https://issues.apache.org/jira/browse/CALCITE-2169 which was created
> by
>     Edmon Begoli for performing a comparative performance study of the
> calcite
>     framework. I think, its related to my current problem. I have no idea
>     regarding the status of the ticket. It would be really great if someone
>     could help me with some information on it.
>
>     Also, now coming to the personal preference, I would like to continue
> my
>     research in calcite due to its simplicity and extensibility.  But, if I
>     fail to give a good case study in favour of Calcite, I am afraid that I
>     could loose an opportunity to work with calcite.
>
>     Thanks and Regards
>
>     Lekshmi B.G
>     Email: lekshmibg09@gmail.com
>
>
>

Re: Question Regarding The Benchmark of Calcite Compared To Conventional Database System(Related to CALCITE-2169)

Posted by Julian Feinauer <j....@pragmaticminds.de>.
Hi Lekshmi,

your activity sounds very interesting.
One important thing to note is that Performance testing in Java is always tricky due to JIT and "warmup" phase of the JVM. Thus it is generally recommended to do these tests with JMH (https://openjdk.java.net/projects/code-tools/jmh/).

I would assume that the time for sql2rel reduces drastically (perhaps one or two orders) when run with JMH.

Best
Julian

Am 30.12.18, 23:12 schrieb "Lekshmi" <le...@gmail.com>:

    Hello Folks,
    
    For my research activities, I was trying to perform a benchmark comparison
    between calcite with other database systems.  As an initial step, I was
    trying to do it for *Calcite* and *PostgresSql*. So, I thought TPCH queries
    were the right thing to start with. I tried running the TpchTest (
    https://github.com/apache/calcite/blob/master/plus/src/test/java/org/apache/calcite/adapter/tpch/TpchTest.java)
    by adding the *CalciteTimingTracer* in the junit tests to determine the
    execution time. While doing so, I could see that the execution time in
    calcite is significantly higher compared to postgresSql. On further
    investigation, I could see that we generate the required datas required for
    these queries(which comes around 150,000 for some tables) and I was under
    an impression that most of the time was spend on the data generation and
    that the query execution could be faster. So, I modified the relevant
    schema class (
    https://github.com/apache/calcite/blob/master/plus/src/main/java/org/apache/calcite/adapter/tpch/TpchSchema.java)
    to perform the data generation and query execution separately. Then, I
    traced the time took for just query execution. Even, then there was a
    significant difference from that of PostgresSql.
    
    I, also enabled the *log4j.rootLogger* to *TRACE * to find the time spend
    for sql2rel and optimization phases of the class Prepare
    <
    https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/Prepare.java>.
    And, to my surprise, I could see that calcite takes a time of 355ms for
    sql2rel and 352ms for optimization for the junit test *testQuery01*. On the
    other side, the same query gave a planning time of 0.163ms in Postgres.
    
    I would like to know, if this is the right way to test the performance of
    TPCH queries using apache calcite. Can anyone let me know if there exist
    any better ways to do it.
    
    And, while searching through JIRA, I could find a ticket
    https://issues.apache.org/jira/browse/CALCITE-2169 which was created by
    Edmon Begoli for performing a comparative performance study of the calcite
    framework. I think, its related to my current problem. I have no idea
    regarding the status of the ticket. It would be really great if someone
    could help me with some information on it.
    
    Also, now coming to the personal preference, I would like to continue my
    research in calcite due to its simplicity and extensibility.  But, if I
    fail to give a good case study in favour of Calcite, I am afraid that I
    could loose an opportunity to work with calcite.
    
    Thanks and Regards
    
    Lekshmi B.G
    Email: lekshmibg09@gmail.com