You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/09/22 18:01:31 UTC

[GitHub] [pinot] hristo-stripe opened a new issue #7463: Optimize filters at segment processing time

hristo-stripe opened a new issue #7463:
URL: https://github.com/apache/pinot/issues/7463


   We've tried running `select distinctcount(field) from table` which times out with the default timeout.
   
   However, running
   `select distinctcount(field) from table_REALTIME`
   and
   `select distinctcount(field) from table_OFFLINE`
   both complete in less than 15ms.
   
   After a discussion with @Jackie-Jiang, it appears this can be tracked down to the fact that
   querying the realtime/offline tables separately allows the broker to respond to this query by
   using metadata only and not having to scan any segments.
   
   However, when the hybrid table gets queried, the high-level query gets split into
   `select distinctcount(field) from table_REALTIME where time_field > $IMPLICIT_TIME`
   and
   `select distinctcount(field) from table_OFFLINE where time_field <= $IMPLICIT_TIME`
   
   And this filter causes every server to perform a full scan of every segment.
   This can be optimized by performing an optimization on the filters relative to every
   individual segment and checking if the query can be answered simply with metadata
   without scanning the segment.
   
   In most cases, the filter will be true for all rows of most segments and therefore is not needed in that case.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang closed issue #7463: Optimize filters at segment processing time

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang closed issue #7463:
URL: https://github.com/apache/pinot/issues/7463


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on issue #7463: Optimize filters at segment processing time

Posted by GitBox <gi...@apache.org>.
atris commented on issue #7463:
URL: https://github.com/apache/pinot/issues/7463#issuecomment-926026609


   @Jackie-Jiang as discussed, can you please assign this to me? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org