You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by Julian Hyde <ju...@gmail.com> on 2013/05/18 23:06:22 UTC

Example aggregate queries

I'm starting to work on GROUP BY support, and I'm trying to figure out the syntax of the "collapsingaggregate" operator based on the spec. I tried to find some example logical plans but I couldn't. Could someone give a couple of examples?

Here's a plan that almost works:

{
  "head" : {
    "type" : "apache_drill_logical_plan",
    "version" : 1,
    "generator" : {
      "type" : "manual",
      "info" : "na"
    }
  },
  "storage" : [ {
    "type" : "queue",
    "name" : "queue"
  }, {
    "type" : "classpath",
    "name" : "donuts-json"
  } ],
  "query" : [ {
    "op" : "scan",
    "@id" : 1,
    "memo" : "initial_scan",
    "storageengine" : "donuts-json",
    "selection" : {
      "path" : "/employees.json",
      "type" : "JSON"
    },
    "ref" : "_MAP"
  },  {
    "op" : "collapsingaggregate",
    "@id" : 2,
    "input" : 1,
    "carryovers": [],
    "aggregations" : [ {
      "ref" : "output.c",
      "expr" : "count"
    } ]
  }, {
    "op" : "store",
    "@id" : 3,
    "memo" : "output sink",
    "input" : 2,
    "target" : {
      "number" : 0
    },
    "partition" : null,
    "storageEngine" : "queue"
  } ]
}

Some specific questions:

1. is it valid for "carryovers" to be omitted or empty (I get NPE if I omit it)?
2. what's the syntax for aggregation expression (I tried "COUNT()", "COUNT(*)", "COUNT")
3. the spec seems to imply that I could write something as rich as "5 + SUM(salary) / COUNT()" for aggregation expression. Is that true?
4. "within" is a segment. Does that mean a holder field, like the "toppings" field in donuts.json?
5. is it valid for aggregations to be omitted or empty?
6. in the plan spec [ https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit ] does "*" mean that a field is optional and dagger mean system-generated? It does't say explicitly.
7. if I want to implement "SELECT deptno, COUNT(*) FROM emp GROUP BY deptno", I presume I will generate a plan that consists of scan-segment-collapsingaggregate?

If the example queries answer those questions, no need to answer them explicitly.

Julian

Re: Example aggregate queries

Posted by Jacques Nadeau <ja...@apache.org>.

I'll start with the answers.  It seems like you've made a bunch of
progress past the rest already.

J


> 1. is it valid for "carryovers" to be omitted or empty (I get NPE if I omit it)?

It should be.  If it doesn't work, we need to fix it.

> 2. what's the syntax for aggregation expression (I tried "COUNT()", "COUNT(*)", "COUNT")

The spec doesn't do a great job of covering expressions.  In this
case, COUNT(expr) where expr is another expression (anything but
another tree containing an aggregation function).

> 3. the spec seems to imply that I could write something as rich as "5 + SUM(salary) / COUNT()" for aggregation expression. Is that true?

it should be.  The rule is basically any expression except ones where
an aggregation is inside an aggregation.


> 4. "within" is a segment. Does that mean a holder field, like the "toppings" field in donuts.json?

It resets aggregations at each segment boundary.

> 5. is it valid for aggregations to be omitted or empty?

It should be.  If it doesn't work, it should be a jira.

> 6. in the plan spec [ https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit ] does "*" mean that a
field is optional and dagger mean system-generated? It does't say explicitly.

Star means optional.  Dagger means required outside a sequence, not
allowed in a sequence.  I added notes to the doc.

> 7. if I want to implement "SELECT deptno, COUNT(*) FROM emp GROUP BY deptno", I presume I will generate a plan that consists of scan-segment-collapsingaggregate?

Yes.  Although the count(*) would need to be remapped to
count(constant) or count(_MAP) since I don't believe we currently
support asterisk at the logical plan level.


>
> If the example queries answer those questions, no need to answer them explicitly.
>
> Julian
>