You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by GitBox <gi...@apache.org> on 2020/12/04 08:17:56 UTC

[GitHub] [kylin] zzcclp edited a comment on pull request #1495: KYLIN-4829 Support to use thread-level SparkSession to execute query

zzcclp edited a comment on pull request #1495:
URL: https://github.com/apache/kylin/pull/1495#issuecomment-738628576


   ## The Results of Testing Manually
   
   ### Test Env
   
   - Hadoop 2.7.0 on docker.
   - Commit : [3b3786c5c](https://github.com/apache/kylin/commit/3b3786c5c9602838cd4abd0a6d40574550ec8622)
   - Sparder Env : 
      spark.executor.cores=1
      spark.executor.instances=4
      spark.executor.memory=2G
      spark.executor.memoryOverhead=1G
      spark.sql.shuffle.partitions=4
   
   
   ### Before this patch
   The shuffle partition number of all querys is 4, which equals to the total cores number.
   
   ![image](https://user-images.githubusercontent.com/9430290/101136306-19016880-3648-11eb-8ae0-2e02d42a41ac.png)
   
   ![image](https://user-images.githubusercontent.com/9430290/101136373-32a2b000-3648-11eb-83dd-83b52e2d9980.png)
   
   ![image](https://user-images.githubusercontent.com/9430290/101136443-4e0dbb00-3648-11eb-8d31-ac721797ee94.png)
   
   ![image](https://user-images.githubusercontent.com/9430290/101136476-5a921380-3648-11eb-90f7-ec20faeca57b.png)
   
   
   ### After this patch
   The shuffle partition number of each query is calculated according to the scanned bytes of each query:
   
   ![image](https://user-images.githubusercontent.com/9430290/101136174-e3f51600-3647-11eb-99c9-290831bb30af.png)
   The log messages shown as below:
   `2020-12-04 05:28:11,991 INFO  [Query a5e841ba-c430-383b-dfa8-5694cd6d282b-122] datasource.FilePruner:51 : Set partition to 2, total bytes 92610534`
   
   ![image](https://user-images.githubusercontent.com/9430290/101136210-f40cf580-3647-11eb-8eec-0b7bdef93c30.png)
   The log messages shown as below:
   `2020-12-04 05:28:12,112 INFO  [Query 534a7afb-4857-6e0c-67b8-bd6a8da155a8-130] datasource.FilePruner:51 : Set partition to 1, total bytes 42133710`
   
   ![image](https://user-images.githubusercontent.com/9430290/101136227-fa02d680-3647-11eb-9001-128c0e3bd490.png)
   The log messages shown as below:
   `2020-12-04 05:28:12,141 INFO  [Query 744d34d1-0d06-11c8-fdee-1f260388117f-131] datasource.FilePruner:51 : Set partition to 3, total bytes 158775868`
   
   ![image](https://user-images.githubusercontent.com/9430290/101136249-05ee9880-3648-11eb-9267-c1cb44319697.png)
   The log messages shown as below:
   `2020-12-04 08:16:43,746 INFO  [Query e117ceb8-53c1-959e-9cb0-75ee3901e271-126] pushdown.SparkSqlClient:68 : Auto set spark.sql.shuffle.partitions to 8, the total sources size is 415631445 b`
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org