You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/06/20 14:52:42 UTC

[GitHub] [pinot] jadami10 opened a new issue, #8932: Automatic, configured, and transparent index selection in Pinot queries

jadami10 opened a new issue, #8932:
URL: https://github.com/apache/pinot/issues/8932

   This issue is getting us into the realm of other database functionality. Because we setup so many Pinot tables and queries, we try to standardize how we select indices on all of our tables. The current logic is all dimension columns get an inverted index, the time column gets a range index, and some "key column" also gets a bloom filter.
   
   While we expected this to be a clear trade off of cost (cost of generating and storing indices) to better latency, there are several cases where we see much 10x worse performance due to this approach. Specifically, when we have a table with ~10s millions documents, filtering by the "key column" will get us down to thousands of documents. It turns out using the inverted index on the remaining fields + filters actually causes queries to be return much more slowly than just looking up and aggregating those documents directly.
   
   Some ideas here:
   - have pinot automatically recognize these cases and stop using the index
   - give users some control over what indices are used (we've had similar examples where pinot uses a startree index even though we know there's not that many records, or a range index even when we know it won't be useful for that query)
   - provide information on what indices were used in the query response. without this, you have to constantly infer this information from other query stats like rows scanned in vs post filter)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org