You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/03/31 22:06:11 UTC

[GitHub] [pinot] richardstartin opened a new issue #8453: Evaluate cheap filters before expensive filters

richardstartin opened a new issue #8453:
URL: https://github.com/apache/pinot/issues/8453


   The plan for the query below is extremely inefficient:
   
   ```sql
   select * from githubEvents where repo_name like 'FeedScale%' and type = 'PullRequestEvent' limit 100
   ```
   
   | Operator |	Operator_Id |	Parent_Id |
   |----------|---------------|------------|
   | BROKER_REDUCE(limit:100) |	0 |	-1 |
   | COMBINE_SELECT |	1 |	0 |
   | SELECT(selectList:actor_id, actor_login, actor_url, created_at, event_time, event_time_ts, id, public, pull_request_additions, pull_request_changed_files, pull_request_comments, pull_request_commits, pull_request_deletions, pull_request_id, pull_request_review_comments, pull_request_state, pull_request_title, pull_request_url, repo_id, repo_name, repo_url, type)	| 2	| 1 |
   | TRANSFORM_PASSTHROUGH(actor_id, actor_login, actor_url, created_at, event_time, event_time_ts, id, public, pull_request_additions, pull_request_changed_files, pull_request_comments, pull_request_commits, pull_request_deletions, pull_request_id, pull_request_review_comments, pull_request_state, pull_request_title, pull_request_url, repo_id, repo_name, repo_url, type) |	3 |	2 |
   | PROJECT(repo_url, pull_request_comments, pull_request_changed_files, actor_login, pull_request_deletions, created_at, pull_request_url, type, pull_request_id, event_time_ts, public, pull_request_review_comments, pull_request_title, pull_request_state, repo_id, actor_url, repo_name, actor_id, id, pull_request_additions, pull_request_commits, event_time)	| 4	| 3 |
   | FILTER_AND |	5 |	4 |
   | FILTER_INVERTED_INDEX(indexLookUp:inverted_index,operator:EQ,predicate:type = 'PullRequestEvent') |	6 |	5 |
   | FILTER_FULL_SCAN(operator:REGEXP_LIKE,predicate:regexp_like(repo_name,'^FeedScale.*$')) |	7 |	5 |
   
   Instead of iterating over the result of the `FILTER_INVERTED_INDEX` operator, the filter is done in parallel with a full scan and then is intersected with other filters. 
   
   This as a few implications:
   
   1. no amount of filtering will ever make this query produce a result without timing out unless the user has a text index
   2. not even the limit can prevent the query from scanning the entire table because there may be fewer matches than the limit
   3. Pinot users shouldn't put any expression in a where clause which isn't guaranteed to match an index
   
   Cheap filters should be pushed down before expensive filters. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on issue #8453: Evaluate cheap filters before expensive filters

Posted by GitBox <gi...@apache.org>.
richardstartin commented on issue #8453:
URL: https://github.com/apache/pinot/issues/8453#issuecomment-1085256163


   Yes, this is quite misleading, either the plan should reflect the linking of filters done in AndDocIdSet or the code should be refactored to make it easier to describe the execution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on issue #8453: Evaluate cheap filters before expensive filters

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on issue #8453:
URL: https://github.com/apache/pinot/issues/8453#issuecomment-1085194696


   The operator name is a little bit misleading. For this query, we will first evaluate the `type='PullRequestEvent'` predicate using inverted index, and get a bitmap; then pass the bitmap to the scan-based operator to evaluate the `repo_name like 'FeedScale%'` predicate by scanning the docs in the bitmap.
   The issue for this query is that we should not fully evaluate the scan-based predicate, but scan on demand, and stop at 100 documents. If the query is aggregation (requires all documents to be evaluated), then it is okay. We should add a flag to the filter operator denoting whether all documents need to be evaluated (lazy evaluation).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin closed issue #8453: Evaluate cheap filters before expensive filters

Posted by GitBox <gi...@apache.org>.
richardstartin closed issue #8453:
URL: https://github.com/apache/pinot/issues/8453


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org