You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/08/19 03:53:12 UTC

[GitHub] [spark] zhengruifeng commented on a change in pull request #25445: [SPARK-28541][WEBUI] Document Storage page

zhengruifeng commented on a change in pull request #25445: [SPARK-28541][WEBUI] Document Storage page
URL: https://github.com/apache/spark/pull/25445#discussion_r315030988
 
 

 ##########
 File path: docs/web-ui.md
 ##########
 @@ -45,6 +45,53 @@ The Storage tab displays the persisted RDDs and DataFrames, if any, in the appli
 page shows the storage levels, sizes and partitions of all RDDs, and the details page shows the
 sizes and using executors for all partitions in an RDD or DataFrame.
 
+{% highlight scala %}
+scala> import org.apache.spark.storage.StorageLevel._
+import org.apache.spark.storage.StorageLevel._
+
+scala> val rdd = sc.range(0, 100, 1, 5).setName("rdd")
+rdd: org.apache.spark.rdd.RDD[Long] = rdd MapPartitionsRDD[1] at range at <console>:27
+
+scala> rdd.persist(MEMORY_ONLY_SER)
+res0: rdd.type = rdd MapPartitionsRDD[1] at range at <console>:27
+
+scala> rdd.count
+res1: Long = 100                                                                
+
+scala> val df = Seq((1, "andy"), (2, "bob"), (2, "andy")).toDF("count", "name")
+df: org.apache.spark.sql.DataFrame = [count: int, name: string]
+
+scala> df.persist(DISK_ONLY)
+res2: df.type = [count: int, name: string]
+
+scala> df.count
+res3: Long = 3
+{% endhighlight %}
+
+<p style="text-align: center;">
+  <img src="img/webui-storage-tab.png"
+       title="Storage tab"
+       alt="Storage tab"
+       width="100%" />
+  <!-- Images are downsized intentionally to improve quality on retina displays -->
+</p>
+
+After running above example, we can found two RDDs listed in the Storage tab. Basic information like
+storage level, number of partitions and memory overhead are provided. Note that the newly persisted RDDs
+or DataFrames are not shown in the tab before they are materialized, to monitor a specific RDD or DataFrame,
+make sure an action operation has been triggered.
+
+<p style="text-align: center;">
+  <img src="img/webui-storage-detail.png"
+       title="Storage detail"
+       alt="Storage detail"
+       width="100%" />
+  <!-- Images are downsized intentionally to improve quality on retina displays -->
+</p>
+
+Cliking the RDD name 'rdd' displays the details of data persistance, such as the data distribution on the cluster.
 
 Review comment:
   Sorry, I do not get the point. Is there a rendering issue?
   In the example, I set the name of the first RDD to `rdd`.
   Thanks for reviewing!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org