You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/09/23 03:34:56 UTC

[GitHub] [incubator-seatunnel] ashulin commented on a diff in pull request #2854: [Feature] [Plugin] add spark cache transform

ashulin commented on code in PR #2854:
URL: https://github.com/apache/incubator-seatunnel/pull/2854#discussion_r978242219


##########
docs/en/transform/cache.md:
##########
@@ -0,0 +1,57 @@
+# cache
+
+> cache transform plugin
+
+## Description
+
+Supports using Cache in data integration by the transform.
+
+:::tip
+
+This transform **ONLY** supported by Spark.
+
+:::
+
+## Options
+
+| name           | type        | required | default value |
+| -------------- | ----------- | -------- | ------------- |
+| storage_level       | string      | false      | -             |
+
+
+### storage_level [string]
+
+One of the most important capabilities in Spark is persisting (or caching) a dataset in memory across operations. When you persist an RDD, each node stores any partitions of it that it computes in memory and reuses them in other actions on that dataset (or datasets derived from it). This allows future actions to be much faster (often by more than 10x). Caching is a key tool for iterative algorithms and fast interactive use.
+
+
+NONE
+DISK_ONLY
+DISK_ONLY_2
+MEMORY_ONLY
+MEMORY_ONLY_2
+MEMORY_ONLY_SER
+MEMORY_ONLY_SER_2
+MEMORY_AND_DISK
+MEMORY_AND_DISK_2
+MEMORY_AND_DISK_SER
+MEMORY_AND_DISK_SER_2
+OFF_HEAP

Review Comment:
   You can describe their differences, or add a reference link to Spark



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org