You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/01/24 11:12:53 UTC

[GitHub] [pinot] atris opened a new issue #8060: Multiple Grouping Sets and Familia Support

atris opened a new issue #8060:
URL: https://github.com/apache/pinot/issues/8060


   This issue tracks the development of the feature that brings multiple grouping sets to Pinot.
   
   **What is A Grouping Set?**
   
   Consider the following query:
   
   ```
   SELECT
       brand,
       segment,
       SUM (quantity)
   FROM
       sales
   GROUP BY
       brand,
       segment;
   ```
   
   (brand, segment) represents a single grouping set.
   
   A query using multiple grouping sets would be represented as:
   
   ```
   SELECT
       c1,
       c2,
       aggregate_function(c3)
   FROM
       table_name
   GROUP BY
       GROUPING SETS (
           (c1, c2),
           (c1),
           (c2),
           ()
   );
   ```
   
   An equivalent query using UNION ALL would be:
   
   ```
   SELECT
       brand,
       segment,
       SUM (quantity)
   FROM
       sales
   GROUP BY
       brand,
       segment
   
   UNION ALL
   
   SELECT
       brand,
       NULL,
       SUM (quantity)
   FROM
       sales
   GROUP BY
       brand
   
   UNION ALL
   
   SELECT
       NULL,
       segment,
       SUM (quantity)
   FROM
       sales
   GROUP BY
       segment
   
   UNION ALL
   
   SELECT
       NULL,
       NULL,
       SUM (quantity)
   FROM
       sales;
   ```
   
   GROUPING SETS also allows empty sets () which is equivalent of SELECT * FROM foo;
   
   **CUBE and ROLLUP**
   
   `CUBE(c1, c2, c3) ` generates:
   
   ```
   (c1, c2, c3)
   (c1, c2)
   (c2, c3)
   (c1,c3)
   (c1)
   (c2)
   (c3)
   ()
   ```
   
   `ROLLUP(c1, c2,c3)` generates:
   
   ```
   (c1, c2, c3)
   (c1, c2)
   (c1)
   ()
   ```
   
   ROLLUP generates groups in hierarchy vs. CUBE generating all groups.
   
   **Design** 
   
   A design document shall soon be published but the design theme will be to use the swim lane concept introduced in the FILTER PR. An important design goal is to avoid rescans.
   
   `Implementation Plan`
   
   The implementation plan will be to first support ROLLUP, then CUBE and then generic GROUPING sets. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on issue #8060: Multiple Grouping Sets and Familia Support

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on issue #8060:
URL: https://github.com/apache/pinot/issues/8060#issuecomment-1023791941


   Not sure if this is similar / overlapping but linking the issue here for reference - https://github.com/apache/pinot/issues/8040
   
   Looking forward to the design doc


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on issue #8060: Multiple Grouping Sets and Familia Support

Posted by GitBox <gi...@apache.org>.
walterddr commented on issue #8060:
URL: https://github.com/apache/pinot/issues/8060#issuecomment-1020293534


   Hi @atris . This looks like a super powerful feature. thanks for working on this.
   
   on the higher level, would you please describe briefly the scope of this design? specifically
   1. is it more on the syntactically support (e.g. I think calcite parser natively supports the 3 concepts, see: https://calcite.apache.org/docs/reference.html#groupItems) or more on how to design the aggregation operators to carry out the compute?
   2. for the equivalent UNION ALL syntax, I don't think you can achieve that with one simple scatter-gather execution. were you planning to support this by issuing multiple brokerRequests with different group by keys?
   
   thanks in advance.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on issue #8060: Multiple Grouping Sets and Familia Support

Posted by GitBox <gi...@apache.org>.
atris commented on issue #8060:
URL: https://github.com/apache/pinot/issues/8060#issuecomment-1019987354


   I have started working on this and aim to publish a design document by end of this week


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org