You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "xiangfu0 (via GitHub)" <gi...@apache.org> on 2023/04/20 20:51:12 UTC

[GitHub] [pinot] xiangfu0 opened a new issue, #10658: [multistage] distinct/group by support on array(multi-value) column

xiangfu0 opened a new issue, #10658:
URL: https://github.com/apache/pinot/issues/10658

   Support the implementation of array based aggregation/(group by/distinct) support for multi-stage query engine.
   
   Current V1 distinct or group by on multi-value column will break each array into individual values, then perform the operation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] kishoreg commented on issue #10658: [multistage] MV column support in Multi Stage

Posted by "kishoreg (via GitHub)" <gi...@apache.org>.
kishoreg commented on issue #10658:
URL: https://github.com/apache/pinot/issues/10658#issuecomment-1539721298

   this is probably not needed if we implement the idea described here - https://github.com/apache/pinot/issues/10745 right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] vvivekiyer commented on issue #10658: [multistage] distinct/group by support on array(multi-value) column

Posted by "vvivekiyer (via GitHub)" <gi...@apache.org>.
vvivekiyer commented on issue #10658:
URL: https://github.com/apache/pinot/issues/10658#issuecomment-1517084387

   Thanks @siddharthteotia. Sure, I can pick it up. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on issue #10658: [multistage] distinct/group by support on array(multi-value) column

Posted by "siddharthteotia (via GitHub)" <gi...@apache.org>.
siddharthteotia commented on issue #10658:
URL: https://github.com/apache/pinot/issues/10658#issuecomment-1517083285

   Discussed with @xiangfu0 .
   
   I'd like @vvivekiyer to pick this up since he has been contributing some bug fixes and enhancements to multi stage. He can get going with this right away dedicatedly and work with us here on design / discussion etc as needed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] xiangfu0 commented on issue #10658: [multistage] MV column support in Multi Stage

Posted by "xiangfu0 (via GitHub)" <gi...@apache.org>.
xiangfu0 commented on issue #10658:
URL: https://github.com/apache/pinot/issues/10658#issuecomment-1654502529

   MV column is modeled as an array in multi-stage engine, in order to follow v1 filter/groupby behavior, users need to use the function `arrayToMV` to bridge the data type conversion in leaf stage.
   
   https://github.com/apache/pinot/pull/11117


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] vvivekiyer commented on issue #10658: [multistage] MV column support in Multi Stage

Posted by "vvivekiyer (via GitHub)" <gi...@apache.org>.
vvivekiyer commented on issue #10658:
URL: https://github.com/apache/pinot/issues/10658#issuecomment-1532316120

   @xiangfu0  and @walterddr 
   I have[ created a doc](https://docs.google.com/document/d/12Pp7VznanUV6StLkW9syoalXdRLJ1b_h7OFBX0h0tJ0/edit?usp=sharing) outlining the future plan for MV columns on Multistage. Please take a look and provide feedback. 
   I will update the document with the implementation details once we get alignment on the overall approach. 
   
   cc: @somandal @siddharthteotia @jasperjiaguo 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on issue #10658: [multistage] MV column support in Multi Stage

Posted by "siddharthteotia (via GitHub)" <gi...@apache.org>.
siddharthteotia commented on issue #10658:
URL: https://github.com/apache/pinot/issues/10658#issuecomment-1535148368

   Hey @xiangfu0  - gentle ping to take a look. We can also meet to iterate faster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] vvivekiyer commented on issue #10658: [multistage] distinct/group by support on array(multi-value) column

Posted by "vvivekiyer (via GitHub)" <gi...@apache.org>.
vvivekiyer commented on issue #10658:
URL: https://github.com/apache/pinot/issues/10658#issuecomment-1520518081

   I have a [draft implementation for SUM_MV here](https://github.com/apache/pinot/compare/master...vvivekiyer:pinot:multistage_mv). I can improve on it and add support for all other aggregations in a similar fashion. To support GROUP BY, I am thinking of writing an RelOptRule to modify the return type of the project in the leaf stage to be a primitive type instead of array.
   
   
   But before I go ahead with making the entire changes, let’s discuss how we’d like to see the evolution of MV columns. I see that there are two possible options (please let me know if there are more):
   
   1. Provide backward compatibility support for MV columns in the V2 Engine (for all existing operations). Create a new datatype for ARRAY.  This will allow us to provide standard SQL operations (like postgres) on ARRAY columns.  We will optionally decide to phase out MV column functionality in favor of standard ARRAYs (because MV functionality can be achieved with a combination of ARRAYs/UNNEST and subqueries).  As the end state, Pinot will have support for two datatypes:
      - MV (which behaves the same way as today. Additionally working for Joins/Subqueries and other complex SQL). 
      - ARRAY (which behaves similar to Postgres).
   
   2. Do not introduce an ARRAY datatype. But equate/evolve our support of MV columns to achieve all the support that ARRAYs have.
   
   I’m inclined with doing (1).
   
   
   Also, currently, we unnest MV columns when applying aggregation functions and group by on MV columns. Do we want to retain the behavior as such when executing all sorts of complex SQL (Joins, Subqueries, etc)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on issue #10658: [multistage] MV column support in Multi Stage

Posted by "siddharthteotia (via GitHub)" <gi...@apache.org>.
siddharthteotia commented on issue #10658:
URL: https://github.com/apache/pinot/issues/10658#issuecomment-1540935231

   Yes let's prioritize fixing the aggregation function work. @vvivekiyer  and @jasperjiaguo will pick it up 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on issue #10658: [multistage] MV column support in Multi Stage

Posted by "siddharthteotia (via GitHub)" <gi...@apache.org>.
siddharthteotia commented on issue #10658:
URL: https://github.com/apache/pinot/issues/10658#issuecomment-1539274720

   hey @xiangfu0  / @walterddr - can we please meet soon to discuss the path ahead ? Want to unblock and make progress on this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] somandal commented on issue #10658: [multistage] distinct/group by support on array(multi-value) column

Posted by "somandal (via GitHub)" <gi...@apache.org>.
somandal commented on issue #10658:
URL: https://github.com/apache/pinot/issues/10658#issuecomment-1520941010

   Thanks @vvivekiyer I'll take a look at this in more detail soon. Just wanted to bring up what we discussed offline:
   
   > Do not introduce an ARRAY datatype. But equate/evolve our support of MV columns to achieve all the support that ARRAYs have.
   
   I do have some concerns with this. MV today doesn't have a very clear guideline on whether it's really like an ARRAY or a SET or something else. For example (as we discussed offline), today when the `ForwardIndexHandler` disables the forward index for an MV column and wants to reconstruct it later on, the following guarantees cannot be met:
   
   - The order of the elements in each MV row can be reordered (i.e. ordering is not preserved on re-enabling the forward index). This breaks array semantics
   - Today when an MV row has duplicate entries, on forward index reconstruction the duplicates are lost as we don't store frequency information. This breaks array semantics.
   
   Though we don't reorder MV rows (as far as I can tell) anywhere else in the code, these semantics aren't ingrained into Pinot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org