You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/01/11 06:19:30 UTC

[GitHub] [druid] clintropolis commented on pull request #12139: Limit the subquery results by memory usage (estimated)

clintropolis commented on pull request #12139:
URL: https://github.com/apache/druid/pull/12139#issuecomment-1009633460


   this seems pretty useful, but also looks rather expensive since this is going to happen for every row. Could you measure the performance before and after this change? [this benchmark might be a good place to start](https://github.com/apache/druid/blob/master/benchmarks/src/test/java/org/apache/druid/benchmark/query/CachingClusteredClientBenchmark.java)
   
   Also, there appears to be no way to disable it, maybe it should be possible to set the limit to 0 to disable this computation instead of setting the limit to max long value?
   
   Should it prove expensive, maybe the approach should be to just sample the first 'n' rows and use whatever the average estimated size is for any remaining rows instead of trying to estimate every row encountered. I imagine the loss of accuracy would be worth how much cheaper it would be to not have to loop every column of every row for all rows.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org