You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Luis Fernando Kauer <lf...@yahoo.com.br.INVALID> on 2017/07/06 20:23:43 UTC

Explain Plan for aggregating a single column in CSV Adapter

Hi,
I'm trying to understand the CSV Adapter and how the rules are fired.The CsvProjectTableScanRule gets fired when I use CsvTranslatableTable.But I'm not understanding why I'm getting a plan that scans all fields when I use an aggregate function.For example:explain plan for select name from emps;CsvTableScan(table=[[SALES, EMPS]], fields=[[1]])

explain plan for select max(name) from emps;EnumerableAggregate(group=[{}], EXPR$0=[MAX($1)])CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
I noticed that the rule gets fired and at that point it shows just 1 field being used.But the last time CsvTableScan.deriveRowType() gets called it has all the fields set, and it's not the instance create by the rule, but the first instance created with all the fields.
Can anybody explain me if this is a bug or if this is supposed to happen with aggregate functions ?
Best regards,
Luis Fernando Kauer

Re: Explain Plan for aggregating a single column in CSV Adapter

Posted by Julian Hyde <jh...@apache.org>.

I already wrote some ideas in
https://issues.apache.org/jira/browse/CALCITE-1876. Let's discuss
there.

On Tue, Jul 11, 2017 at 1:24 PM, Luis Fernando Kauer
<lf...@yahoo.com.br.invalid> wrote:
> Hi,
>
> If I change CsvTranslatableTable so that it implements ProjectableFilterableTable instead of TranslatableTable and implement the scan method, Calcite's own rules apply and the plan gets right, scanning only the used field in the aggregate function.
>
> However, now I realized that
>
> "select count(*) from EMPS" generates the plan:
> EnumerableAggregate(group=[{}], EXPR$0=[COUNT()])
> CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
>
> "select * from EMPS" generates the plan:
> CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
>
> Notice that the count(*) generates a plan that scans all fields, requiring to convert them all without the need.
> Even when using ProjectableFilterableTable plan scans all fields, but the plan for "select count(name) from EMPS" scans just one field.
> What could be the best approach to handle the count(*) without having to scan all fields?
>
> Best regards,
>
> Luis Fernando
>
>
>
>
>
> Em Quinta-feira, 6 de Julho de 2017 18:05, Julian Hyde <jh...@apache.org> escreveu:
>
>
>
> Calcite should realize that Aggregate has an implied Project (because it only uses a few columns) and push that projection into the CsvTableScan, but it doesn’t.
>
> I think we need a new rule for Aggregate on a TableScan of a ProjectableFilterableTable. Can you create a JIRA case please?
>
> I created a test case. It currently fails:
>
> diff --git a/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java b/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java
> index 00c59ee..2402872 100644
> --- a/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java
> +++ b/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java
> @@ -241,6 +241,13 @@ public Void apply(ResultSet resultSet) {
>          .ok();
>    }
>
> +  @Test public void testAggregateImpliesProject() throws SQLException {
> +    final String sql = "select max(name) from EMPS";
> +    final String plan = "PLAN=EnumerableAggregate(group=[{}], EXPR$0=[MAX($0)])\n"
> +        + "  CsvTableScan(table=[[SALES, EMPS]], fields=[[1]])\n";
> +    sql("smart", "explain plan for " + sql).returns(plan).ok();
> +  }
> +
>    @Test public void testFilterableSelect() throws SQLException {
>      sql("filterable-model", "select name from EMPS").ok();
>    }
>
>
> Julian
>
>
>> On Jul 6, 2017, at 1:23 PM, Luis Fernando Kauer <lf...@yahoo.com.br.INVALID> wrote:
>>
>> Hi,
>> I'm trying to understand the CSV Adapter and how the rules are fired.The CsvProjectTableScanRule gets fired when I use CsvTranslatableTable.But I'm not understanding why I'm getting a plan that scans all fields when I use an aggregate function.For example:explain plan for select name from emps;CsvTableScan(table=[[SALES, EMPS]], fields=[[1]])
>>
>> explain plan for select max(name) from emps;EnumerableAggregate(group=[{}], EXPR$0=[MAX($1)])CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
>> I noticed that the rule gets fired and at that point it shows just 1 field being used.But the last time CsvTableScan.deriveRowType() gets called it has all the fields set, and it's not the instance create by the rule, but the first instance created with all the fields.
>> Can anybody explain me if this is a bug or if this is supposed to happen with aggregate functions ?
>> Best regards,
>> Luis Fernando Kauer

Re: Explain Plan for aggregating a single column in CSV Adapter

Posted by Luis Fernando Kauer <lf...@yahoo.com.br.INVALID>.

Hi,

If I change CsvTranslatableTable so that it implements ProjectableFilterableTable instead of TranslatableTable and implement the scan method, Calcite's own rules apply and the plan gets right, scanning only the used field in the aggregate function.

However, now I realized that 

"select count(*) from EMPS" generates the plan:
EnumerableAggregate(group=[{}], EXPR$0=[COUNT()]) 
CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

"select * from EMPS" generates the plan:
CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

Notice that the count(*) generates a plan that scans all fields, requiring to convert them all without the need.
Even when using ProjectableFilterableTable plan scans all fields, but the plan for "select count(name) from EMPS" scans just one field.
What could be the best approach to handle the count(*) without having to scan all fields?

Best regards,

Luis Fernando

Em Quinta-feira, 6 de Julho de 2017 18:05, Julian Hyde <jh...@apache.org> escreveu:

Calcite should realize that Aggregate has an implied Project (because it only uses a few columns) and push that projection into the CsvTableScan, but it doesn’t.

I think we need a new rule for Aggregate on a TableScan of a ProjectableFilterableTable. Can you create a JIRA case please?

I created a test case. It currently fails:

diff --git a/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java b/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java
index 00c59ee..2402872 100644
--- a/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java
+++ b/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java
@@ -241,6 +241,13 @@ public Void apply(ResultSet resultSet) {
         .ok();
   }

+  @Test public void testAggregateImpliesProject() throws SQLException {
+    final String sql = "select max(name) from EMPS";
+    final String plan = "PLAN=EnumerableAggregate(group=[{}], EXPR$0=[MAX($0)])\n"
+        + "  CsvTableScan(table=[[SALES, EMPS]], fields=[[1]])\n";
+    sql("smart", "explain plan for " + sql).returns(plan).ok();
+  }
+
   @Test public void testFilterableSelect() throws SQLException {
     sql("filterable-model", "select name from EMPS").ok();
   }

Julian

> On Jul 6, 2017, at 1:23 PM, Luis Fernando Kauer <lf...@yahoo.com.br.INVALID> wrote:
> 
> Hi,
> I'm trying to understand the CSV Adapter and how the rules are fired.The CsvProjectTableScanRule gets fired when I use CsvTranslatableTable.But I'm not understanding why I'm getting a plan that scans all fields when I use an aggregate function.For example:explain plan for select name from emps;CsvTableScan(table=[[SALES, EMPS]], fields=[[1]])
> 
> explain plan for select max(name) from emps;EnumerableAggregate(group=[{}], EXPR$0=[MAX($1)])CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
> I noticed that the rule gets fired and at that point it shows just 1 field being used.But the last time CsvTableScan.deriveRowType() gets called it has all the fields set, and it's not the instance create by the rule, but the first instance created with all the fields.
> Can anybody explain me if this is a bug or if this is supposed to happen with aggregate functions ?
> Best regards,
> Luis Fernando Kauer

Re: Explain Plan for aggregating a single column in CSV Adapter

Posted by Julian Hyde <jh...@apache.org>.

Calcite should realize that Aggregate has an implied Project (because it only uses a few columns) and push that projection into the CsvTableScan, but it doesn’t.

I think we need a new rule for Aggregate on a TableScan of a ProjectableFilterableTable. Can you create a JIRA case please?

I created a test case. It currently fails:

diff --git a/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java b/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java
index 00c59ee..2402872 100644
--- a/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java
+++ b/example/csv/src/test/java/org/apache/calcite/test/CsvTest.java
@@ -241,6 +241,13 @@ public Void apply(ResultSet resultSet) {
         .ok();
   }
 
+  @Test public void testAggregateImpliesProject() throws SQLException {
+    final String sql = "select max(name) from EMPS";
+    final String plan = "PLAN=EnumerableAggregate(group=[{}], EXPR$0=[MAX($0)])\n"
+        + "  CsvTableScan(table=[[SALES, EMPS]], fields=[[1]])\n";
+    sql("smart", "explain plan for " + sql).returns(plan).ok();
+  }
+
   @Test public void testFilterableSelect() throws SQLException {
     sql("filterable-model", "select name from EMPS").ok();
   }


Julian

> On Jul 6, 2017, at 1:23 PM, Luis Fernando Kauer <lf...@yahoo.com.br.INVALID> wrote:
> 
> Hi,
> I'm trying to understand the CSV Adapter and how the rules are fired.The CsvProjectTableScanRule gets fired when I use CsvTranslatableTable.But I'm not understanding why I'm getting a plan that scans all fields when I use an aggregate function.For example:explain plan for select name from emps;CsvTableScan(table=[[SALES, EMPS]], fields=[[1]])
> 
> explain plan for select max(name) from emps;EnumerableAggregate(group=[{}], EXPR$0=[MAX($1)])CsvTableScan(table=[[SALES, EMPS]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
> I noticed that the rule gets fired and at that point it shows just 1 field being used.But the last time CsvTableScan.deriveRowType() gets called it has all the fields set, and it's not the instance create by the rule, but the first instance created with all the fields.
> Can anybody explain me if this is a bug or if this is supposed to happen with aggregate functions ?
> Best regards,
> Luis Fernando Kauer