You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by xu...@apache.org on 2022/05/28 15:39:17 UTC

[hudi] branch asf-site updated: [HUDI-3551] Documentation on using Hudi with OCI Object Storage (#4953)

This is an automated email from the ASF dual-hosted git repository.

xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new de1f571443 [HUDI-3551] Documentation on using Hudi with OCI Object Storage (#4953)
de1f571443 is described below

commit de1f571443f6624bf0ae5f0ce32a52a0f7829935
Author: Carter Shanklin <ca...@users.noreply.github.com>
AuthorDate: Sat May 28 08:39:12 2022 -0700

    [HUDI-3551] Documentation on using Hudi with OCI Object Storage (#4953)
---
 website/docs/cloud.md      |  2 ++
 website/docs/oci_hoodie.md | 80 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 82 insertions(+)

diff --git a/website/docs/cloud.md b/website/docs/cloud.md
index 818baa1e2e..eed2d97846 100644
--- a/website/docs/cloud.md
+++ b/website/docs/cloud.md
@@ -27,3 +27,5 @@ to cloud stores.
    Configurations required for BOS and Hudi co-operability.
 * [JuiceFS](jfs_hoodie) <br/>
    Configurations required for JuiceFS and Hudi co-operability.
+* [Oracle Cloud Infrastructure](oci_hoodie) <br/>
+   Configurations required for OCI and Hudi co-operability.
diff --git a/website/docs/oci_hoodie.md b/website/docs/oci_hoodie.md
new file mode 100644
index 0000000000..872b265640
--- /dev/null
+++ b/website/docs/oci_hoodie.md
@@ -0,0 +1,80 @@
+---
+title: Oracle Cloud Infrastructure
+keywords: [ hudi, hive, oracle cloud, storage, spark ]
+summary: In this page, we go over how to configure hudi with Oracle Cloud Infrastructure Object Storage.
+last_modified_at: 2022-03-03T16:57:05-08:00
+---
+The [Oracle Object Storage](https://docs.oracle.com/en-us/iaas/Content/Object/Concepts/objectstorageoverview.htm) system provides strongly-consistent operations on all buckets in all regions. OCI Object Storage provides an [HDFS Connector](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/hdfsconnector.htm) your Application will need to access data.
+
+## OCI Configs
+
+To use HUDI on OCI Object Storage you must:
+
+- Configure the HDFS Connector using an API key
+- Include the HDFS Connector and dependencies in your application
+- Construct an OCI HDFS URI
+
+### Configuring the HDFS Connector
+
+The OCI HDFS Connector requires configurations from an API key to authenticate and select the correct region. Start by [generating an API key](https://docs.oracle.com/en-us/iaas/Content/API/Concepts/apisigningkey.htm).
+
+If you are using Hadoop, include these in your core-site.xml:
+
+```xml
+  <property>
+    <name>fs.oci.client.auth.tenantId</name>
+    <value>ocid1.tenancy.oc1..[tenant]</value>
+    <description>The OCID of your OCI tenancy</description>
+  </property>
+
+  <property>
+    <name>fs.oci.client.auth.userId</name>
+    <value>ocid1.user.oc1..[user]</value>
+    <description>The OCID of your OCI user</description>
+  </property>
+
+  <property>
+    <name>fs.oci.client.auth.fingerprint</name>
+    <value>XX::XX</value>
+    <description>Your 32-digit hexidecimal public key fingerprint</description>
+  </property>
+
+  <property>
+    <name>fs.oci.client.auth.pemfilepath</name>
+    <value>/path/to/file</value>
+    <description>Local path to your private key file</description>
+  </property>
+
+  <property>
+    <name>fs.oci.client.auth.hostname</name>
+    <value>https://objectstorage.[region].oraclecloud.com</value>
+    <description>HTTPS endpoint of your regional object store</description>
+  </property>
+```
+
+If you are using Spark outside of Hadoop, set these configurations in your Spark Session:
+
+| Key                                         | Description                                      |
+| ------------------------------------------- | ------------------------------------------------ |
+| spark.hadoop.fs.oci.client.auth.tenantId    | The OCID of your OCI tenancy                     |
+| spark.hadoop.fs.oci.client.auth.userId      | The OCID of your OCI user                        |
+| spark.hadoop.fs.oci.client.auth.fingerprint | Your 32-digit hexidecimal public key fingerprint |
+| spark.hadoop.fs.oci.client.auth.pemfilepath | Local path to your private key file              |
+| spark.hadoop.fs.oci.client.hostname         | HTTPS endpoint of your regional object store     |
+
+If you are running Spark in OCI Data Flow you do not need to configure these settings, object storage access is configured for you.
+
+### Libraries
+
+These libraries need to be added to your application. The versions below are a reference, the libraries are continuously updated and you should check for later releases in Maven Central.
+
+- com.oracle.oci.sdk:oci-java-sdk-core:2.18.0
+- com.oracle.oci.sdk:oci-hdfs-connector:3.3.0.5
+
+### Construct an OCI HDFS URI
+
+OCI HDFS URIs have the form of:
+
+`oci://<bucket>@<namespace>/<path>`
+
+The HDFS connector allows you to treat these locations similar to an `HDFS` location on Hadoop. Your tenancy has a unique Object Storage namespace. If you're not sure what your namespace is you can find it by installing the [OCI CLI](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm) and running `oci os ns get`.
\ No newline at end of file