You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/08 06:52:11 UTC

[GitHub] [spark] baibaichen commented on pull request #29695: [SPARK-22390][SPARK-32833][SQL] [WIP]JDBC V2 Datasource aggregate push down

baibaichen commented on pull request #29695:
URL: https://github.com/apache/spark/pull/29695#issuecomment-792516852


   Thanks @huaxingao 
   
   we did some tests on aggregate push down in real product environment last month. here are results
   
   1. datasets: 550M records
   2. 4 click-house nodes
   
     | 1 User | 10 Users | 20 Users | 60 Users
   -- | -- | -- | -- | --
   QPS | 2.76 | 6.1 | 4.43 | 4.45
   90% (sec) | **0.4** | 2.1 | 7 | 17
   slowest (sec) | 0.45 | 3.3 | 12 | 27
   
   we didn't test without aggregate push down, because it is 10 X slower than push down
   
   However the current PR has some limitations:
   1. Don't support count
   2. Don't support AVG in case of multiple shards
   3. Don't know how to extend the implementation for supporting more aggregation case, for example, sum(if()).
   
   Thanks
   Chang 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org