You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/11/25 21:10:31 UTC

[GitHub] [incubator-pinot] kkrugler opened a new issue #6292: Support limiting sub-groups to top N

kkrugler opened a new issue #6292:
URL: https://github.com/apache/incubator-pinot/issues/6292


   In Elasticsearch we can do a group by gender, and then sub-group in each gender by state, and sum a metric, and get the top 10 results (for that sum of that metric) per state.
   
   There doesn't seem to be a way to currently do that in Pinot, though @kishoreg suggested it would be possible via a UDF (let's call it "subBuckets"), something like `select <top bucket column>, subBuckets(<aggregation operation>, <sub-bucket column>, <max_values>) from <table> group by <top bucket column>`.  For example:
   
   ``` sql
   select gender, subBuckets('sum(billed_expenses)', state, 10) from medicare_expenses group by gender
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 commented on issue #6292: Support limiting sub-groups to top N

Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #6292:
URL: https://github.com/apache/incubator-pinot/issues/6292#issuecomment-734026302


   > @fx19880617 - good question re output format. With Elasticsearch it's a nested JSON response. I don't know what the standard is for Pinot, the above seems reasonable. You get additional details in the Elastic response for things like maximum error (due to how the top results from each shard get combined, and that could result in not actually getting the top N), etc.
   
   I see.
   
   The output of `subBuckets(sum(billed_expenses), state, 10)` can be a map(JSON blob) if we need more info or stats per group.
   
   Also not sure if this is the case, but maybe we can just define the order by clause in the function, and have syntactic sugar for TOP_N queries.
   ```
   subBuckets(<aggregation operation>, <sub-bucket column>, <order-by operation>, <max_values>)
   ``` 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on issue #6292: Support limiting sub-groups to top N

Posted by GitBox <gi...@apache.org>.
kishoreg commented on issue #6292:
URL: https://github.com/apache/incubator-pinot/issues/6292#issuecomment-733948498


   will be better to not have `'sum(billed_expenses)'` in quotes but at a high level this looks good


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kkrugler commented on issue #6292: Support limiting sub-groups to top N

Posted by GitBox <gi...@apache.org>.
kkrugler commented on issue #6292:
URL: https://github.com/apache/incubator-pinot/issues/6292#issuecomment-733998554


   @fx19880617 - good question re output format. With Elasticsearch it's a nested JSON response. I don't know what the standard is for Pinot, the above seems reasonable. You get additional details in the Elastic response for things like maximum error (due to how the top results from each shard get combined, and that could result in not actually getting the top N), etc.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kishoreg commented on issue #6292: Support limiting sub-groups to top N

Posted by GitBox <gi...@apache.org>.
kishoreg commented on issue #6292:
URL: https://github.com/apache/incubator-pinot/issues/6292#issuecomment-734987649


   its almost feels like its an nested query. in other words - it is a window udf 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 commented on issue #6292: Support limiting sub-groups to top N

Posted by GitBox <gi...@apache.org>.
fx19880617 commented on issue #6292:
URL: https://github.com/apache/incubator-pinot/issues/6292#issuecomment-733992332


   Can you also put the expected output?
   I suspect it could be:
   ```
   ["male", [[800, "NY"], ..., [200, "IL"]]],
   ["female", [[1000, "NY"], ..., [300, "IL"]]]
   ```
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] fx19880617 edited a comment on issue #6292: Support limiting sub-groups to top N

Posted by GitBox <gi...@apache.org>.
fx19880617 edited a comment on issue #6292:
URL: https://github.com/apache/incubator-pinot/issues/6292#issuecomment-734026302


   > @fx19880617 - good question re output format. With Elasticsearch it's a nested JSON response. I don't know what the standard is for Pinot, the above seems reasonable. You get additional details in the Elastic response for things like maximum error (due to how the top results from each shard get combined, and that could result in not actually getting the top N), etc.
   
   I see.
   
   The output of `subBuckets(sum(billed_expenses), state, 10)` can be a map(JSON blob) if we need more info or stats per group.
   
   Also not sure if this is the case, but maybe we can just define the order by clause in the function and have syntactic sugar for TOP_N queries.
   ```
   subBuckets(<aggregation operation>, <sub-bucket column>, <order-by operation>, <max_values>)
   ``` 
   
   One example could be that I want to show metrics for 10 same states ordered alphabetically in each gender.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] kkrugler commented on issue #6292: Support limiting sub-groups to top N

Posted by GitBox <gi...@apache.org>.
kkrugler commented on issue #6292:
URL: https://github.com/apache/incubator-pinot/issues/6292#issuecomment-733976765


   Thanks, updated.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org