You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Edmon Begoli (JIRA)" <ji...@apache.org> on 2018/02/06 01:59:00 UTC

[jira] [Created] (CALCITE-2169) Conduct a comparative performance study of the framework

Edmon Begoli created CALCITE-2169:
-------------------------------------

             Summary: Conduct a comparative performance study of the framework 
                 Key: CALCITE-2169
                 URL: https://issues.apache.org/jira/browse/CALCITE-2169
             Project: Calcite
          Issue Type: Task
          Components: core
         Environment: Use Calcite Benchmark, and run it on the Benchmark environment. 
            Reporter: Edmon Begoli
            Assignee: Edmon Begoli


Design and implement a study of the Calcite framework using benchmark that is to be developed for CALCITE-2168 (Implement a General Purpose Benchmark for Calcite), and run a comparative analysis of the performance of the Calcite optimizer, and the performance of the queries under Calcite optimized and un-optimized, and in comparison to standalone databases, or other frameworks.

Some ideas and targets for the study:

* Planning and execution time with queries that span across multiple systems (e.g. Postgres and Cassandra, Postgres and Pig, Pig and Cassandra).
* for TCP-DS, study the plan produced by Calcite vs. existing RDBMS optimizers (e.g. Postgres, MySQL). This would be interesting even as a
feature to use in conjunction with the lattice framework to decide what queries to eventually build lattices as an estimation of time savings.
* Optimizer runtime for complex queries (we could also compare with the runtime of executing the optimized query directly)
* Calcite optimized query
* Unoptimized query with the optimizer of the backend disabled
* Unoptimized query with the optimizer of the backend enabled
* Comparison with other federated query processing engines such as Spark SQL, PrestoDB, and maybe KSQL[1] and InfluxDB
* Uses Calcite to optimize Spark queries [2]

[1] https://github.com/confluentinc/ksql
[2] https://www.datascience.com/blog/grunion-data-science-tools-query-optimizer-apache-spark



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)