You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by le...@apache.org on 2021/10/20 14:12:04 UTC

[hudi] branch asf-site updated: [DOCS] Update JuiceFS doc (#3780)

This is an automated email from the ASF dual-hosted git repository.

leesf pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new df97b34  [DOCS] Update JuiceFS doc (#3780)
df97b34 is described below

commit df97b348143c597a77055ad9d5327288125bc91c
Author: Changjian Gao <ga...@gmail.com>
AuthorDate: Wed Oct 20 22:11:41 2021 +0800

    [DOCS] Update JuiceFS doc (#3780)
---
 website/docs/cloud.md      | 18 ++++++------
 website/docs/jfs_hoodie.md | 69 +++++++++++++++++++++++++---------------------
 2 files changed, 46 insertions(+), 41 deletions(-)

diff --git a/website/docs/cloud.md b/website/docs/cloud.md
index 09ef8f6..818baa1 100644
--- a/website/docs/cloud.md
+++ b/website/docs/cloud.md
@@ -1,29 +1,29 @@
 ---
 title: Cloud Storage
-keywords: [hudi, aws, gcp, oss, azure, cloud]
+keywords: [hudi, aws, gcp, oss, azure, cloud, juicefs]
 summary: "In this page, we introduce how Hudi work with different Cloud providers."
 toc: true
-last_modified_at: 2019-06-16T21:59:57-04:00
+last_modified_at: 2021-10-12T10:50:00+08:00
 ---
- 
+
 ## Talking to Cloud Storage
 
 Immaterial of whether RDD/WriteClient APIs or Datasource is used, the following information helps configure access
 to cloud stores.
 
- * [AWS S3](/docs/s3_hoodie) <br/>
+* [AWS S3](/docs/s3_hoodie) <br/>
    Configurations required for S3 and Hudi co-operability.
- * [Google Cloud Storage](/docs/gcs_hoodie) <br/>
+* [Google Cloud Storage](/docs/gcs_hoodie) <br/>
    Configurations required for GCS and Hudi co-operability.
- * [Alibaba Cloud OSS](/docs/oss_hoodie) <br/>
+* [Alibaba Cloud OSS](/docs/oss_hoodie) <br/>
    Configurations required for OSS and Hudi co-operability.
- * [Microsoft Azure](/docs/azure_hoodie) <br/>
+* [Microsoft Azure](/docs/azure_hoodie) <br/>
    Configurations required for Azure and Hudi co-operability.
 * [Tencent Cloud Object Storage](/docs/cos_hoodie) <br/>
    Configurations required for COS and Hudi co-operability.
 * [IBM Cloud Object Storage](/docs/ibm_cos_hoodie) <br/>
-   Configurations required for IBM Cloud Object Storage and Hudi co-operability.   
+   Configurations required for IBM Cloud Object Storage and Hudi co-operability.
 * [Baidu Cloud Object Storage](bos_hoodie) <br/>
    Configurations required for BOS and Hudi co-operability.
 * [JuiceFS](jfs_hoodie) <br/>
-   Configurations required for JuiceFS and Hudi co-operability.
\ No newline at end of file
+   Configurations required for JuiceFS and Hudi co-operability.
diff --git a/website/docs/jfs_hoodie.md b/website/docs/jfs_hoodie.md
index 5ea3af3..a77cb29 100644
--- a/website/docs/jfs_hoodie.md
+++ b/website/docs/jfs_hoodie.md
@@ -1,59 +1,65 @@
 ---
-title: JuiceFS 
-keywords: [ hudi, hive, jfs, spark, flink]
-summary: On this page, we go over how to configure Hudi with JuiceFS.
-last_modified_at: 2021-09-30T17:24:24-10:00
+title: JuiceFS
+keywords: [ hudi, hive, juicefs, jfs, spark, flink ]
+summary: In this page, we go over how to configure Hudi with JuiceFS file system.
+last_modified_at: 2021-10-12T10:50:00+08:00
 ---
-On this page, we explain how to use Hudi with JuiceFS.
 
-## JuiceFS Preparing
+In this page, we explain how to use Hudi with JuiceFS.
 
-JuiceFS is a high-performance distributed file system. Any data stored into JuiceFS, the data itself will be persisted in object storage (e.g. Amazon S3), and the metadata corresponding to the data can be persisted in various database engines such as Redis, MySQL, and TiKV according to the needs of the scene.
+## JuiceFS configs
+
+[JuiceFS](https://github.com/juicedata/juicefs) is a high-performance distributed file system. Any data stored into JuiceFS, the data itself will be persisted in object storage (e.g. Amazon S3), and the metadata corresponding to the data can be persisted in various database engines such as Redis, MySQL, and TiKV according to the needs of the scene.
 
 There are three configurations required for Hudi-JuiceFS compatibility:
 
-- Creating JuiceFS
-- Adding JuiceFS configuration for Hudi
-- Adding required jar to `classpath`
+1. Creating JuiceFS file system
+2. Adding JuiceFS configuration for Hudi
+3. Adding required JAR to `classpath`
+
+### Creating JuiceFS file system
 
-### Creating JuiceFS
+JuiceFS supports multiple [metadata engines](https://github.com/juicedata/juicefs/blob/main/docs/en/databases_for_metadata.md) such as Redis, MySQL, SQLite, and TiKV. And supports almost all [object storage](https://github.com/juicedata/juicefs/blob/main/docs/en/how_to_setup_object_storage.md#supported-object-storage) as data storage, e.g. Amazon S3, Google Cloud Storage, Azure Blob Storage.
 
-JuiceFS supports multiple engines such as Redis, MySQL, SQLite, and TiKV.
+The following example uses Redis as "Metadata Engine" and Amazon S3 as "Data Storage" in Linux environment.
 
-This example uses Redis as Meta Engine and AWS S3 as Data Storage in Linux env.
+#### Download JuiceFS client
 
-- Download
 ```shell
-JFS_LATEST_TAG=$(curl -s https://api.github.com/repos/juicedata/juicefs/releases/latest | grep 'tag_name' | cut -d '"' -f 4 | tr -d 'v')
-wget "https://github.com/juicedata/juicefs/releases/download/v${JFS_LATEST_TAG}/juicefs-${JFS_LATEST_TAG}-linux-amd64.tar.gz"
+$ JFS_LATEST_TAG=$(curl -s https://api.github.com/repos/juicedata/juicefs/releases/latest | grep 'tag_name' | cut -d '"' -f 4 | tr -d 'v')
+$ wget "https://github.com/juicedata/juicefs/releases/download/v${JFS_LATEST_TAG}/juicefs-${JFS_LATEST_TAG}-linux-amd64.tar.gz"
 ```
 
-- Install
+#### Install JuiceFS client
+
 ```shell
-mkdir juice && tar -zxvf "juicefs-${JFS_LATEST_TAG}-linux-amd64.tar.gz" -C juice
-sudo install juice/juicefs /usr/local/bin
+$ mkdir juice && tar -zxvf "juicefs-${JFS_LATEST_TAG}-linux-amd64.tar.gz" -C juice
+$ sudo install juice/juicefs /usr/local/bin
 ```
 
-- Format a filesystem
+#### Format a JuiceFS file system
+
 ```shell
-juicefs format \
+$ juicefs format \
     --storage s3 \
-    --bucket https://<your-bucket-name> \
+    --bucket https://<bucket>.s3.<region>.amazonaws.com \
     --access-key <your-access-key-id> \
     --secret-key <your-access-key-secret> \
     redis://:<password>@<redis-host>:6379/1 \
-    myjfs 
+    myjfs
 ```
 
-### JuiceFS configuration
+For more information, please refer to ["JuiceFS Quick Start Guide"](https://github.com/juicedata/juicefs/blob/main/docs/en/quick_start_guide.md).
+
+### Adding JuiceFS configuration for Hudi
 
-Add the required configurations in your core-site.xml from where Hudi can fetch them.
+Add the required configurations in your `core-site.xml` from where Hudi can fetch them.
 
 ```xml
 <property>
     <name>fs.defaultFS</name>
-    <value>jfs://myfs</value>
-    <description>Optional, you can also specify full path "jfs://myfs/path-to-dir" with location to use JuiceFS</description>
+    <value>jfs://myjfs</value>
+    <description>Optional, you can also specify full path "jfs://myjfs/path-to-dir" with location to use JuiceFS</description>
 </property>
 <property>
     <name>fs.jfs.impl</name>
@@ -81,11 +87,10 @@ Add the required configurations in your core-site.xml from where Hudi can fetch
 </property>
 ```
 
-### JuiceFS Hadoop SDK
-You can download the JuiceFS java Hadoop SDK jar from [here](https://github.com/juicedata/juicefs/releases/download/v0.17.0/juicefs-hadoop-0.17.0-linux-amd64.jar), and place it to the `classpath`. 
-You can also visit [JuiceFS Releases](https://github.com/juicedata/juicefs/releases)) to get the latest version or compile by your self.
+You can visit [here](https://github.com/juicedata/juicefs/blob/main/docs/en/hadoop_java_sdk.md#client-configurations) for more configuration information.
 
-For example:
-- $SPARK_HOME/jars
+### Adding JuiceFS Hadoop Java SDK
 
+You can download latest JuiceFS Hadoop Java SDK from [here](http://github.com/juicedata/juicefs/releases/latest) (download the file called like `juicefs-hadoop-X.Y.Z-linux-amd64.jar`), and place it to the `classpath`. You can also [compile](https://github.com/juicedata/juicefs/blob/main/docs/en/hadoop_java_sdk.md#client-compilation) it by yourself.
 
+For example, if you use Hudi in Spark, please put the JAR in `$SPARK_HOME/jars`.