You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Robert Shahla1 <Ro...@ibm.com> on 2022/08/24 08:21:30 UTC

Result of TPCH-Query 12 differs when using Calcite

Hello,

I have been working using Calcite with PostgreSQL as a database, and I tried to run the TPCH benchmark using Calcite.
I have the TPCH data populated in PostgreSQL, and I have two ways for querying the data:

  1.  Using the JDBC connector of PostgreSQL and evaluate the query directly on PostgreSQL without using Calcite.
  2.  Using Calcite to communicate with PostgreSQL and evaluating the query using Calcite’s API (I use the JDBC adapter in Calcite).

When running Query 12 in the TPCH benchmark, and I get two different results. I tried with other queries such as Query-3 and Query-4, and I get the same result with both approaches.
I also tried a third approach which is using Spark SQL to query the PostgreSQL database, and the result for Query 12 was the same result as in approach (1), but different from approach (2).

I would be happy to know if you are aware of any work in progress / bugs that might lead to this behavior.

This is the main JAVA code that I’m using with Calcite:
Connection connection = DriverManager.getConnection("jdbc:calcite:");
CalciteConnection calciteConnection = connection.unwrap(CalciteConnection.class);
SchemaPlus rootSchema = calciteConnection.getRootSchema();

BasicDataSource dataSource = new BasicDataSource();
dataSource.setUrl(Utils.POSTGRES_DB_JDBC_URL);
dataSource.setUsername(Utils.POSTGRES_USER);
dataSource.setPassword(Utils.POSTGRES_PASSWORD);

JdbcSchema jdbcSchema = JdbcSchema.create(rootSchema, "tpch", dataSource,
        null, "tpch");

SchemaPlus schema = rootSchema.add("tpch",jdbcSchema);

Statement statement = calciteConnection.createStatement();
String tpchQuery12 = "select\n"
        + "  \"l\".\"l_shipmode\",\n"
        + "  sum(case\n"
        + "    when \"o\".\"o_orderpriority\" = '1-URGENT'\n"
        + "      or \"o\".\"o_orderpriority\" = '2-HIGH'\n"
        + "      then 1\n"
        + "    else 0\n"
        + "  end) as \"high_line_count\",\n"
        + "  sum(case\n"
        + "    when \"o\".\"o_orderpriority\" <> '1-URGENT'\n"
        + "      and \"o\".\"o_orderpriority\" <> '2-HIGH'\n"
        + "      then 1\n"
        + "    else 0\n"
        + "  end) as \"low_line_count\"\n"
        + "from\n"
        + "  \"tpch\".\"orders\" \"o\",\n"
        + "  \"tpch\".\"lineitem\" \"l\"\n"
        + "where\n"
        + "  \"o\".\"o_orderkey\" = \"l\".\"l_orderkey\"\n"
        + "  and \"l\".\"l_shipmode\" in ('TRUCK', 'REG AIR')\n"
        + "  and \"l\".\"l_commitdate\" < \"l\".\"l_receiptdate\"\n"
        + "  and \"l\".\"l_shipdate\" < \"l\".\"l_commitdate\"\n"
        + "--  and \"l\".\"l_receiptdate\" >= date '1994-01-01'\n"
        + "--  and \"l\".\"l_receiptdate\" < date '1994-01-01' + interval '1' year\n"
        + "group by\n"
        + "  \"l\".\"l_shipmode\"\n"
        + "order by\n"
        + "  \"l\".\"l_shipmode\"";
ResultSet rs = statement.executeQuery(tpchQuery12);


Thanks in advance,
Robert