You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/08/02 05:31:03 UTC

[GitHub] [spark] itsvikramagr commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation

itsvikramagr commented on issue #24922: [SPARK-28120][SS]  Rocksdb state storage implementation
URL: https://github.com/apache/spark/pull/24922#issuecomment-517556064
 
 
   > I agree keeping state in memory is not scalable, and the result looks promising. It might be better to have another kind of benchmark here, like stress test, to see the performance on stateful operations and let end users guide whether they're mostly encouraged to use this implementation, or use this selectively.
   > 
   > What I did for my patch was following:
   > https://issues.apache.org/jira/browse/SPARK-21271
   > [#21733 (comment)](https://github.com/apache/spark/pull/21733#issuecomment-411207042)
   > 
   
   I have created the following [repo](https://github.com/itsvikramagr/spark-benchmark) in similar lines to what @HeartSaVioR has done for this patch. 
   
   **Setup**
   - Used Qubole's distribution of Apache Spark 2.4.0 for my tests. 
   - Master Instance Type =  i3.xlarge
   - Driver Memory = 2g
   - num-executors  = 1 
   - max-executors  = 1 
   - spark.sql.shuffle.partitions = 8
   - Run time = 30 mins 
   - Source = Rate Source
   - executor Memory = 7g
   - spark.executor.memoryOverhead=3g
   - Processing Time = 30 sec
   
   Executor Instance type =  i3.xlarge 
   cores per executor = 4
   ratePerSec = 20k
   
   | State Storage Type | Mode | Total Trigger Execution Time  | Records Processed | Total State Rows | Comments|
   | --- | --- | --- | --- | --- | --- |
   | HDFS | Append | ~7 mins | 8.6 million | 2 million | Application failed before 30 mins |
   | RockSB | Append | ~30 minutes | 34.6 million | 7 million |  |
   
   
   Executor Instance type = C5d.2xlarge 
   cores per executor = 8
   ratePerSec = 30k
   
   | State Storage Type | Mode | Total Trigger Execution Time  | Records Processed | Total State Rows | Comments|
   | --- | --- | --- | --- | --- | --- |
   | HDFS | Append | 8 mins | 12.6 million | 3.1 million | Application was stuck because of GC |
   | RockSB | Complete | ~30 minutes | 47.34 million | 12.5 million |  |
   
   Executor info when HDFS state storage is used 
   <img width="1244" alt="Screenshot 2019-08-02 at 10 58 21 AM" src="https://user-images.githubusercontent.com/5220941/62346639-79443f80-b514-11e9-82ff-c41bdd2d5a91.png">
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org