You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sdap.apache.org by GitBox <gi...@apache.org> on 2022/01/18 21:43:56 UTC

[GitHub] [incubator-sdap-nexus] skorper edited a comment on pull request #151: SDAP-362: Update tile format for gridded and L2 data

skorper edited a comment on pull request #151:
URL: https://github.com/apache/incubator-sdap-nexus/pull/151#issuecomment-1015860092


   @ngachung Re-ran the benchmarks on a larger L2 request. 
   
   ```
   {{big_data_url}}/match_spark?primary=ASCATB-L2-Coastal&startTime=2017-02-27T12:03:16Z&endTime=2017-02-28T12:03:16Z&tt=86400&rt=100000&b=100,0,150,50&platforms=1,2,3,4,5,6,7,8,9&depthMin=0&depthMax=5&matchOnce=true&secondary=icoads&resultSizeLimit=10000
   ```
   
   This request matched with 4284 tiles. 
   
   | Request | Seconds Before | Seconds After |
   |--|--|--|
   | L2 --> insitu | 25.33 | 22.43 |
   
   ## Breakdown
   
   ### Before: 
   
   | Step Name                                               | Average(s) | Total(s) | Number of executions |
   | ------------------------------------------------------- | ---------- | -------- | -------------------- |
   | Time to determine spatial-temporal extents | 0.368042 | 2.944334 | 8 |
   | Time to call edge for partition | 0.923261 | 7.386087 | 8 |
   | Time to convert match points | 0.053592 | 0.428739 | 8 |
   | Time to build matchup tree | 0.002608 | 0.020868 | 8 |
   | Time to load tile | 0.236766 | 11.364751 | 48 |
   | Time to build primary tree | 0.000156 | 0.007479 | 48 |
   | Time to convert primary points for tile | 0.057327 | 2.751713 | 48 |
   | Time to query primary tree for tile | 0.000717 | 0.034416 | 48 |
   
   
   ![before](https://user-images.githubusercontent.com/11022336/150022391-c20223e8-d76f-4f75-a111-ae95b0a89587.png)
   
   ### After:
   
   | Step Name                                               | Average(s) | Total(s) | Number of executions |
   | ------------------------------------------------------- | ---------- | -------- | -------------------- |
   | Time to determine spatial-temporal extents | 0.371486 | 2.971884 | 8 |
   | Time to call edge for partition | 0.965982 | 7.727854 | 8 |
   | Time to convert match points | 0.056917 | 0.455339 | 8 |
   | Time to build matchup tree | 0.002847 | 0.022776 | 8 |
   | Time to load tile | 0.017097 | 0.820645 | 48 |
   | Time to build primary tree | 0.000131 | 0.00629 | 48 |
   | Time to convert primary points for tile | 0.003607 | 0.17312 | 48 |
   | Time to query primary tree for tile | 0.000794 | 0.038127 | 48 |
   
   ![after](https://user-images.githubusercontent.com/11022336/150022404-2c1a6e7e-bdc4-4fa6-978f-6b11b8ddb356.png)
   
   What's curious is when we look at the benchmark breakdown we can see a huge improvement in `Time to load tile` which makes sense and was the desired behavior. However that doesn't seem to have impacted the total time very much. As you can see in the above table, 
   `Time to load tile` is executed 48 times. That means the 11 seconds is actually dispersed across those 48 different executions of that code, many of which are running in parallel with pyspark. The parallelism is working so well it's masking the performance gains and they are nearly negligible. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sdap.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org