You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Stamatis Zampetakis (Jira)" <ji...@apache.org> on 2022/06/13 13:26:00 UTC

[jira] [Commented] (CALCITE-5190) TPC-H testing of the Calcite framework

    [ https://issues.apache.org/jira/browse/CALCITE-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17553578#comment-17553578 ] 

Stamatis Zampetakis commented on CALCITE-5190:
----------------------------------------------

Hey [~itxiangkui] , there are various benchmarks around TPC-H but not everything is open and usually there are various customizations in the middle. One presentation that comes to mind around this was done by Chris Baynes at Apache Con ([https://www.youtube.com/watch?v=4JAOkLKrcYE&t=1032s)] a few years ago.

Also keep in mind that for questions, it is better to use the dev@calcite mailing list than logging JIRA tickets. 

> TPC-H testing of the Calcite framework
> --------------------------------------
>
>                 Key: CALCITE-5190
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5190
>             Project: Calcite
>          Issue Type: Wish
>          Components: core, jdbc-adapter
>    Affects Versions: 1.30.0
>            Reporter: itxiangkui
>            Priority: Major
>
> What our project is doing:
> 1. Designed a time series database, made full use of the enumerable adapter, and made a horizontal comparison of benchmarks with other popular time databases
> 2. Design catalog->database->table, and use the adapter function of jdbc to connect multiple databases such as Mysql/Es in the backend to solve the join problem of heterogeneous data (for example: select * from catalog1.database .table1 ,catalog2,database.table2 .. where t1.c1='xx' )
> But when we tested it, we found that, as far as Jdbc-Adapter is concerned, Calcite will disassemble the logical SQL into very fragmented query plan SQL, and hand it over to the underlying database instance to execute the specific physical SQL. Most of the SQL has no filter conditions to push down and the physical SQL looks very clumsy.
> So we did a version of the TPC-H test (an industry standard in the database field), taking SQL-19 as an example:
> {code:sql}
> SELECT sum(l_extendedprice* (1 - l_discount)) AS revenue
> FROM lineitem,
>      part
> WHERE (p_partkey = l_partkey
>        AND p_brand = 'Brand#52'
>        AND p_container in ('SM CASE',
>                            'SM BOX',
>                            'SM PACK',
>                            'SM PKG')
>        AND l_quantity >= 4
>        AND l_quantity <= 4 + 10
>        AND p_size BETWEEN 1 AND 5
>        AND l_shipmode in ('AIR',
>                           'AIR REG')
>        AND l_shipinstruct = 'DELIVER IN PERSON')
>   OR (p_partkey = l_partkey
>       AND p_brand = 'Brand#11'
>       AND p_container in ('MED BAG',
>                           'MED BOX',
>                           'MED PKG',
>                           'MED PACK')
>       AND l_quantity >= 18
>       AND l_quantity <= 18 + 10
>       AND p_size BETWEEN 1 AND 10
>       AND l_shipmode in ('AIR',
>                          'AIR REG')
>       AND l_shipinstruct = 'DELIVER IN PERSON')
>   OR (p_partkey = l_partkey
>       AND p_brand = 'Brand#51'
>       AND p_container in ('LG CASE',
>                           'LG BOX',
>                           'LG PACK',
>                           'LG PKG')
>       AND l_quantity >= 29
>       AND l_quantity <= 29 + 10
>       AND p_size BETWEEN 1 AND 15
>       AND l_shipmode in ('AIR',
>                          'AIR REG')
>       AND l_shipinstruct = 'DELIVER IN PERSON');
> {code}
> The physical execution plan is:
> {code:sql}
> [SELECT *
> FROM `tpch`.`lineitem`
> LIMIT 3]
> [SELECT `L_PARTKEY`, `L_QUANTITY`, `L_EXTENDEDPRICE`, `L_DISCOUNT`, `L_SHIPINSTRUCT`, `L_SHIPMODE`
> FROM `tpch`.`lineitem`]
> [SELECT `P_PARTKEY`, `P_BRAND`, `P_SIZE`, `P_CONTAINER`
> FROM `tpch`.`part`]
> {code}
> this looks incredible...
>  
> My question is: Has Calcite ever tested something like TPC-H.
> In theory, for a specific database instance, the query speed after using calcite will not drop much performance compared to the original database, so many users are happy to use Calcite to solve the problem of data islands. …



--
This message was sent by Atlassian Jira
(v8.20.7#820007)