You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/07/02 16:36:49 UTC

[GitHub] [spark] baohe-zhang commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

baohe-zhang commented on pull request #28412:
URL: https://github.com/apache/spark/pull/28412#issuecomment-653110422


   Hi @HeartSaVioR @tgravescs , I measured the memory usage and disk usage for a 1.21g log file and logs for the same application with different compression codec. The log is generated by spark3 and parsed by spark3 SHS. The application contains 400 jobs, each job contains one stage, each stage contains 1000 tasks.
   | codec                                                                                        | uncompressed | lz4      | lzf      | snappy   | zstd     |
   | -------------------------------------------------------------------------------------------- | ------------ | -------- | -------- | -------- | -------- |
   | log filesize                                                                                 | 1.21 gb      | 108 mb   | 128 mb   | 136 mb   | 40 mb    |
   | actual memory usage (measure through Utils.SizeEstimator)                                          | 254.8 mb     | 252.1 mb | 260.5 mb | 256.4 mb | 279.2 mb |
   | estimated memory usage (log size / 2 for uncompressed log, log size \* 2 for compressed log) | 605 mb       | 216 mb   | 256 mb   | 272 mb   | 80 mb    |
   | disk usage (leveldb filesize)                                                                | 393 mb       | 398 mb   | 403 mb   | 395 mb   | 424 mb   |
   
   From the result seems we are overestimating the memory usage of uncompressed files and underestimate the memory usage of zstd compressed files. I think filesize / 4 for uncompressed log, filesize * 4 for zstd compressed log might be a better estimation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org