You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Seung-Hwan Lim (JIRA)" <ji...@apache.org> on 2018/04/17 15:36:00 UTC

[jira] [Comment Edited] (CALCITE-2168) Implement a General Purpose Benchmark for Calcite

    [ https://issues.apache.org/jira/browse/CALCITE-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16441031#comment-16441031 ] 

Seung-Hwan Lim edited comment on CALCITE-2168 at 4/17/18 3:35 PM:
------------------------------------------------------------------

While writing TPC-DS queries for Calcite with Postgres backends, I have found couple of issues.

1. date time interval compatibility: postgres' dialect is 

``` (cast('2000-08-20' as date) +  interval '30 days') ```. 

 

For Calcite with postgres backend , when I tried following:

(cast('2000-08-20' as date) +  interval '30' day )

I have UnsupportedOperation Exception: 

Caused by: java.lang.UnsupportedOperationException: class org.apache.calcite.sql.SqlSyntax$6: SPECIAL

 

2. nested aggregation with windows function.

in TPC-DS query 98, we have following troublesome phrase:

```sum(ss."ss_ext_sales_price")*100/sum(sum(ss."ss_ext_sales_price")) over

          (partition by i."i_class") as REVENUERATIO```

Which generates:

SUM("t"."ss_ext_sales_price") * 100 / CASE WHEN (COUNT(SUM("t"."ss_ext_sales_price")) OVER (PARTITION BY "t1"."i_class" RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)) > 0 THEN CAST($SUM0(SUM("t"."ss_ext_sales_price")) OVER (PARTITION BY "t1"."i_class" RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS DECIMAL(7, 2)) ELSE NULL END AS "REVENUERATIO"

 

It causes syntax error in CAST($SUM0(SUM())) part in postgresql.

 

 

I'm testing TPC-DS with the version of 1.16.

 

Thank you,


was (Author: lims1):
While writing TPC-DS queries for Calcite with Postgres backends, I have found couple of issues.

1. date time interval compatibility: postgres' dialect is 

``` (cast('2000-08-20' as date) +  interval '30 days') ```. 

 

For Calcite with postgres backend , I tried following:

(cast('2000-08-20' as date) +  interval '30' day )

I have UnsupportedOperation Exception: 

Caused by: java.lang.UnsupportedOperationException: class org.apache.calcite.sql.SqlSyntax$6: SPECIAL

 

2. nested aggregation with windows function.

in TPC-DS query 98, we have following troublesome phrase:

```sum(ss."ss_ext_sales_price")*100/sum(sum(ss."ss_ext_sales_price")) over

          (partition by i."i_class") as REVENUERATIO```

Which generates:

SUM("t"."ss_ext_sales_price") * 100 / CASE WHEN (COUNT(SUM("t"."ss_ext_sales_price")) OVER (PARTITION BY "t1"."i_class" RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)) > 0 THEN CAST($SUM0(SUM("t"."ss_ext_sales_price")) OVER (PARTITION BY "t1"."i_class" RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS DECIMAL(7, 2)) ELSE NULL END AS "REVENUERATIO"

 

It causes syntax error in CAST($SUM0(SUM())) part in postgresql.

 

 

I'm testing TPC-DS with the version of 1.16.

 

Thank you,

> Implement a General Purpose Benchmark for Calcite 
> --------------------------------------------------
>
>                 Key: CALCITE-2168
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2168
>             Project: Calcite
>          Issue Type: Wish
>          Components: core
>            Reporter: Edmon Begoli
>            Assignee: Edmon Begoli
>            Priority: Minor
>              Labels: performance
>   Original Estimate: 2,688h
>  Remaining Estimate: 2,688h
>
> Develop a benchmark that can be used for general purpose benchamrking of Calcite against other frameworks, and databases, and for study,research, and profiling of the framwork.
> Use popular benchmarks such as TCP-DS (or -H) or Star Schema Benchmark (SSB) and measure the performance of optimized vs. unoptimized Calcite queries, and the overhead of going through Calcite adapters vs. natively accessing the target DB
> Look into the existing approaches and do perhaps something similar:
> * https://www.slideshare.net/julianhyde/w-435phyde-3
> * https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.2/bk_hive-performance-tuning/content/ch_cost-based-optimizer.html
> * (How much of this is still relevant (Hive 0.14)? Can we use queries/benchmarks?)
> https://hortonworks.com/blog/hive-0-14-cost-based-optimizer-cbo-technical-overview/
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)