You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/06/15 04:19:46 UTC

[GitHub] [incubator-druid] justinborromeo commented on issue #7895: Confusing Scan Query example

justinborromeo commented on issue #7895: Confusing Scan Query example
URL: https://github.com/apache/incubator-druid/issues/7895#issuecomment-502333141
 
 
   Hi Dinesh,
   
   I think you misunderstand how the scan query is implemented.  Afaik, filtering using the index(es) occurs prior to batching.  When the historical reads rows into a batch (see `ScanQueryLimitRowIterator.java` for reference), this batch only contains filtered records.  Then, batches of `batchSize` rows are streamed back to the broker until `limit` rows have been returned (the only edge case being the last batch which might contain fewer than `batchSize` rows to avoid exceeding `limit`).
   
   This differs from your understanding in that you're assuming that every row is being scanned and filtered through.  Rather than combining rows into batches prior to filtering, batching is used to combine results **after filtering** to be streamed back.
   
   Referring to the doc example, the `batchSize` property actually corresponds to a **maximum** batch size. Perhaps the property could be named better but it's too much pain to change the API for minimal benefit.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org