You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by vi...@apache.org on 2019/03/13 22:41:17 UTC

[incubator-hudi-site] 14/19: More documentation cleanup

This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-hudi-site.git

commit d6b880af14907626f63b527e6461475204f04fda
Author: Vinoth Chandar <vi...@uber.com>
AuthorDate: Thu Mar 7 11:01:35 2019 -0800

    More documentation cleanup
    
     - Fixing search keywords
     - Added link to team page
     - Fixed configuration page, specifically s3/gcs instructions
     - Suggested using gists for sharing code on ML
---
 docs/_data/topnav.yml          |   3 +-
 docs/_includes/head.html       |   2 +-
 docs/admin_guide.md            |   2 +-
 docs/community.md              |   5 +-
 docs/comparison.md             |   5 +-
 docs/concepts.md               |   2 +-
 docs/configurations.md         | 110 +++++++++++++++++++++--------------------
 docs/contributing.md           |   2 +-
 docs/gcs_filesystem.md         |   4 +-
 docs/implementation.md         |   2 +-
 docs/incremental_processing.md |  13 ++---
 docs/index.md                  |   2 +-
 docs/migration_guide.md        |   2 +-
 docs/pages/news/news.html      |   2 +-
 docs/powered_by.md             |   5 +-
 docs/privacy.md                |   2 +-
 docs/quickstart.md             |   2 +-
 docs/s3_filesystem.md          |   8 ++-
 docs/sql_queries.md            |   2 +-
 docs/use_cases.md              |   3 +-
 20 files changed, 87 insertions(+), 91 deletions(-)

diff --git a/docs/_data/topnav.yml b/docs/_data/topnav.yml
index 0042feb..3b9eca8 100644
--- a/docs/_data/topnav.yml
+++ b/docs/_data/topnav.yml
@@ -25,4 +25,5 @@ topnav_dropdowns:
             external_url: https://issues.apache.org/jira/projects/HUDI/summary
           - title: Blog
             external_url: https://cwiki.apache.org/confluence/pages/viewrecentblogposts.action?key=HUDI
-      
+          - title: Team
+            external_url: https://projects.apache.org/project.html?incubator-hudi
diff --git a/docs/_includes/head.html b/docs/_includes/head.html
index 9a6f74e..705c1c9 100644
--- a/docs/_includes/head.html
+++ b/docs/_includes/head.html
@@ -2,7 +2,7 @@
 <meta http-equiv="X-UA-Compatible" content="IE=edge">
 <meta name="viewport" content="width=device-width, initial-scale=1">
 <meta name="description" content="{% if page.summary %}{{ page.summary | strip_html | strip_newlines | truncate: 160 }}{% endif %}">
-<meta name="keywords" content="{{page.tags}}{% if page.tags %}, {% endif %} {{page.keywords}}">
+<meta name="keywords" content="{{page.keywords}}">
 <title>{{ page.title }} | {{ site.site_title }}</title>
 <link rel="stylesheet" href="{{ "css/syntax.css" }}">
 
diff --git a/docs/admin_guide.md b/docs/admin_guide.md
index 3d37d22..7757d04 100644
--- a/docs/admin_guide.md
+++ b/docs/admin_guide.md
@@ -1,6 +1,6 @@
 ---
 title: Admin Guide
-keywords: admin
+keywords: hudi, administration, operation, devops
 sidebar: mydoc_sidebar
 permalink: admin_guide.html
 toc: false
diff --git a/docs/community.md b/docs/community.md
index c16dc92..5708f3d 100644
--- a/docs/community.md
+++ b/docs/community.md
@@ -1,6 +1,6 @@
 ---
 title: Community
-keywords: usecases
+keywords: hudi, use cases, big data, apache
 sidebar: mydoc_sidebar
 toc: false
 permalink: community.html
@@ -12,7 +12,7 @@ There are several ways to get in touch with the Hudi community.
 
 | When? | Channel to use |
 |-------|--------|
-| For any general questions, user support, development discussions | Dev Mailing list ([Subscribe](mailto:dev-subscribe@hudi.apache.org), [Unsubscribe](mailto:dev-unsubscribe@hudi.apache.org), [Archives](https://lists.apache.org/list.html?dev@hudi.apache.org)). Empty email works for subscribe/unsubscribe |
+| For any general questions, user support, development discussions | Dev Mailing list ([Subscribe](mailto:dev-subscribe@hudi.apache.org), [Unsubscribe](mailto:dev-unsubscribe@hudi.apache.org), [Archives](https://lists.apache.org/list.html?dev@hudi.apache.org)). Empty email works for subscribe/unsubscribe. Please use [gists](https://gist.github.com) to share code/stacktraces on the email. |
 | For reporting bugs or issues or discover known issues | Please use [ASF Hudi JIRA](https://issues.apache.org/jira/projects/HUDI/summary) |
 | For quick pings & 1-1 chats | Join our [slack group](https://join.slack.com/t/apache-hudi/signup) |
 | For proposing large features, changes | Start a Hudi Improvement Process (HIP). Instructions coming soon.|
@@ -30,6 +30,7 @@ Here are few ways, you can get involved.
  - Ask (and/or) answer questions on our support channels listed above.
  - Review code or HIPs
  - Help improve documentation
+ - Author blogs on our wiki
  - Testing; Improving out-of-box experience by reporting bugs
  - Share new ideas/directions to pursue or propose a new HIP
  - Contributing code to the project
diff --git a/docs/comparison.md b/docs/comparison.md
index 36a2ec5..0862c26 100644
--- a/docs/comparison.md
+++ b/docs/comparison.md
@@ -1,6 +1,6 @@
 ---
 title: Comparison
-keywords: usecases
+keywords: apache, hudi, kafka, kudu, hive, hbase, stream processing
 sidebar: mydoc_sidebar
 permalink: comparison.html
 toc: false
@@ -56,6 +56,3 @@ More advanced use cases revolve around the concepts of [incremental processing](
 uses Hudi even inside the `processing` engine to speed up typical batch pipelines. For e.g: Hudi can be used as a state store inside a processing DAG (similar
 to how [rocksDB](https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/state_backends.html#the-rocksdbstatebackend) is used by Flink). This is an item on the roadmap
 and will eventually happen as a [Beam Runner](https://github.com/uber/hoodie/issues/8)
-
-
-
diff --git a/docs/concepts.md b/docs/concepts.md
index 845228a..7532631 100644
--- a/docs/concepts.md
+++ b/docs/concepts.md
@@ -1,6 +1,6 @@
 ---
 title: Concepts
-keywords: concepts
+keywords: hudi, design, storage, views, timeline
 sidebar: mydoc_sidebar
 permalink: concepts.html
 toc: false
diff --git a/docs/configurations.md b/docs/configurations.md
index e6602e6..ce45a3d 100644
--- a/docs/configurations.md
+++ b/docs/configurations.md
@@ -1,13 +1,67 @@
 ---
 title: Configurations
-keywords: configurations
+keywords: garbage collection, hudi, jvm, configs, tuning
 sidebar: mydoc_sidebar
 permalink: configurations.html
 toc: false
 summary: "Here we list all possible configurations and what they mean"
 ---
 
-### Configuration
+
+#### Talking to Cloud Storage
+
+ * [AWS S3](s3_hoodie.html) <br/>
+    <span style="color:grey">Configurations required for S3 and Hoodie co-operability.</span>
+ * [Google Cloud Storage](gcs_hoodie.html) <br/>
+    <span style="color:grey">Configurations required for GCS and Hoodie co-operability.</span>
+
+#### Spark Datasource Configs
+
+* [Hoodie Datasource](#datasource) <br/>
+<span style="color:grey">Configs for datasource</span>
+    - [write options](#writeoptions) (write.format.option(...)) <br/>
+    <span style="color:grey"> Options useful for writing datasets </span>
+        - [OPERATION_OPT_KEY](#OPERATION_OPT_KEY) (Default: upsert) <br/>
+        <span style="color:grey">whether to do upsert, insert or bulkinsert for the write operation</span>
+        - [STORAGE_TYPE_OPT_KEY](#STORAGE_TYPE_OPT_KEY) (Default: COPY_ON_WRITE) <br/>
+        <span style="color:grey">The storage type for the underlying data, for this write. This can't change between writes.</span>
+        - [TABLE_NAME_OPT_KEY](#TABLE_NAME_OPT_KEY) (Default: None (mandatory)) <br/>
+        <span style="color:grey">Hive table name, to register the dataset into.</span>
+        - [PRECOMBINE_FIELD_OPT_KEY](#PRECOMBINE_FIELD_OPT_KEY) (Default: ts) <br/>
+        <span style="color:grey">Field used in preCombining before actual write. When two records have the same key value,
+        we will pick the one with the largest value for the precombine field, determined by Object.compareTo(..)</span>
+        - [PAYLOAD_CLASS_OPT_KEY](#PAYLOAD_CLASS_OPT_KEY) (Default: com.uber.hoodie.OverwriteWithLatestAvroPayload) <br/>
+        <span style="color:grey">Payload class used. Override this, if you like to roll your own merge logic, when upserting/inserting.
+        This will render any value set for `PRECOMBINE_FIELD_OPT_VAL` in-effective</span>
+        - [RECORDKEY_FIELD_OPT_KEY](#RECORDKEY_FIELD_OPT_KEY) (Default: uuid) <br/>
+        <span style="color:grey">Record key field. Value to be used as the `recordKey` component of `HoodieKey`. Actual value
+        will be obtained by invoking .toString() on the field value. Nested fields can be specified using
+        the dot notation eg: `a.b.c`</span>
+        - [PARTITIONPATH_FIELD_OPT_KEY](#PARTITIONPATH_FIELD_OPT_KEY) (Default: partitionpath) <br/>
+        <span style="color:grey">Partition path field. Value to be used at the `partitionPath` component of `HoodieKey`.
+        Actual value ontained by invoking .toString()</span>
+        - [KEYGENERATOR_CLASS_OPT_KEY](#KEYGENERATOR_CLASS_OPT_KEY) (Default: com.uber.hoodie.SimpleKeyGenerator) <br/>
+        <span style="color:grey">Key generator class, that implements will extract the key out of incoming `Row` object</span>
+        - [COMMIT_METADATA_KEYPREFIX_OPT_KEY](#COMMIT_METADATA_KEYPREFIX_OPT_KEY) (Default: `_`) <br/>
+        <span style="color:grey">Option keys beginning with this prefix, are automatically added to the commit/deltacommit metadata.
+        This is useful to store checkpointing information, in a consistent way with the hoodie timeline</span>
+
+    - [read options](#readoptions) (read.format.option(...)) <br/>
+    <span style="color:grey">Options useful for reading datasets</span>
+        - [VIEW_TYPE_OPT_KEY](#VIEW_TYPE_OPT_KEY) (Default:  = read_optimized) <br/>
+        <span style="color:grey">Whether data needs to be read, in incremental mode (new data since an instantTime)
+        (or) Read Optimized mode (obtain latest view, based on columnar data)
+        (or) Real time mode (obtain latest view, based on row & columnar data)</span>
+        - [BEGIN_INSTANTTIME_OPT_KEY](#BEGIN_INSTANTTIME_OPT_KEY) (Default: None (Mandatory in incremental mode)) <br/>
+        <span style="color:grey">Instant time to start incrementally pulling data from. The instanttime here need not
+        necessarily correspond to an instant on the timeline. New data written with an
+         `instant_time > BEGIN_INSTANTTIME` are fetched out. For e.g: '20170901080000' will get
+         all new data written after Sep 1, 2017 08:00AM.</span>
+        - [END_INSTANTTIME_OPT_KEY](#END_INSTANTTIME_OPT_KEY) (Default: latest instant (i.e fetches all new data since begin instant time)) <br/>
+        <span style="color:grey"> Instant time to limit incrementally fetched data to. New data written with an
+        `instant_time <= END_INSTANTTIME` are fetched out.</span>
+
+#### Write Client Configuration
 
 * [HoodieWriteConfig](#HoodieWriteConfig) <br/>
 <span style="color:grey">Top Level Config which is passed in when HoodieWriteClent is created.</span>
@@ -105,58 +159,8 @@ summary: "Here we list all possible configurations and what they mean"
         - [withMaxMemorySizePerCompactionInBytes](#withMaxMemorySizePerCompactionInBytes) (maxMemorySizePerCompactionInBytes = 1GB) <br/>
         <span style="color:grey">HoodieCompactedLogScanner reads logblocks, converts records to HoodieRecords and then merges these log blocks and records. At any point, the number of entries in a log block can be less than or equal to the number of entries in the corresponding parquet file. This can lead to OOM in the Scanner. Hence, a spillable map helps alleviate the memory pressure. Use this config to set the max allowable inMemory footprint of the spillable map.</span>
 
-    - [S3Configs](s3_hoodie.html) (Hoodie S3 Configs) <br/>
-    <span style="color:grey">Configurations required for S3 and Hoodie co-operability.</span>
-
-    - [GCSConfigs](gcs_hoodie.html) (Hoodie GCS Configs) <br/>
-    <span style="color:grey">Configurations required for GCS and Hoodie co-operability.</span>
-
-* [Hoodie Datasource](#datasource) <br/>
-<span style="color:grey">Configs for datasource</span>
-    - [write options](#writeoptions) (write.format.option(...)) <br/>
-    <span style="color:grey"> Options useful for writing datasets </span>
-        - [OPERATION_OPT_KEY](#OPERATION_OPT_KEY) (Default: upsert) <br/>
-        <span style="color:grey">whether to do upsert, insert or bulkinsert for the write operation</span>
-        - [STORAGE_TYPE_OPT_KEY](#STORAGE_TYPE_OPT_KEY) (Default: COPY_ON_WRITE) <br/>
-        <span style="color:grey">The storage type for the underlying data, for this write. This can't change between writes.</span>
-        - [TABLE_NAME_OPT_KEY](#TABLE_NAME_OPT_KEY) (Default: None (mandatory)) <br/>
-        <span style="color:grey">Hive table name, to register the dataset into.</span>
-        - [PRECOMBINE_FIELD_OPT_KEY](#PRECOMBINE_FIELD_OPT_KEY) (Default: ts) <br/>
-        <span style="color:grey">Field used in preCombining before actual write. When two records have the same key value,
-        we will pick the one with the largest value for the precombine field, determined by Object.compareTo(..)</span>
-        - [PAYLOAD_CLASS_OPT_KEY](#PAYLOAD_CLASS_OPT_KEY) (Default: com.uber.hoodie.OverwriteWithLatestAvroPayload) <br/>
-        <span style="color:grey">Payload class used. Override this, if you like to roll your own merge logic, when upserting/inserting.
-        This will render any value set for `PRECOMBINE_FIELD_OPT_VAL` in-effective</span>
-        - [RECORDKEY_FIELD_OPT_KEY](#RECORDKEY_FIELD_OPT_KEY) (Default: uuid) <br/>
-        <span style="color:grey">Record key field. Value to be used as the `recordKey` component of `HoodieKey`. Actual value
-        will be obtained by invoking .toString() on the field value. Nested fields can be specified using
-        the dot notation eg: `a.b.c`</span>
-        - [PARTITIONPATH_FIELD_OPT_KEY](#PARTITIONPATH_FIELD_OPT_KEY) (Default: partitionpath) <br/>
-        <span style="color:grey">Partition path field. Value to be used at the `partitionPath` component of `HoodieKey`.
-        Actual value ontained by invoking .toString()</span>
-        - [KEYGENERATOR_CLASS_OPT_KEY](#KEYGENERATOR_CLASS_OPT_KEY) (Default: com.uber.hoodie.SimpleKeyGenerator) <br/>
-        <span style="color:grey">Key generator class, that implements will extract the key out of incoming `Row` object</span>
-        - [COMMIT_METADATA_KEYPREFIX_OPT_KEY](#COMMIT_METADATA_KEYPREFIX_OPT_KEY) (Default: `_`) <br/>
-        <span style="color:grey">Option keys beginning with this prefix, are automatically added to the commit/deltacommit metadata.
-        This is useful to store checkpointing information, in a consistent way with the hoodie timeline</span>
-
-    - [read options](#readoptions) (read.format.option(...)) <br/>
-    <span style="color:grey">Options useful for reading datasets</span>
-        - [VIEW_TYPE_OPT_KEY](#VIEW_TYPE_OPT_KEY) (Default:  = read_optimized) <br/>
-        <span style="color:grey">Whether data needs to be read, in incremental mode (new data since an instantTime)
-        (or) Read Optimized mode (obtain latest view, based on columnar data)
-        (or) Real time mode (obtain latest view, based on row & columnar data)</span>
-        - [BEGIN_INSTANTTIME_OPT_KEY](#BEGIN_INSTANTTIME_OPT_KEY) (Default: None (Mandatory in incremental mode)) <br/>
-        <span style="color:grey">Instant time to start incrementally pulling data from. The instanttime here need not
-        necessarily correspond to an instant on the timeline. New data written with an
-         `instant_time > BEGIN_INSTANTTIME` are fetched out. For e.g: '20170901080000' will get
-         all new data written after Sep 1, 2017 08:00AM.</span>
-        - [END_INSTANTTIME_OPT_KEY](#END_INSTANTTIME_OPT_KEY) (Default: latest instant (i.e fetches all new data since begin instant time)) <br/>
-        <span style="color:grey"> Instant time to limit incrementally fetched data to. New data written with an
-        `instant_time <= END_INSTANTTIME` are fetched out.</span>
-
 
-### Tuning
+#### Tuning
 
 Writing data via Hudi happens as a Spark job and thus general rules of spark debugging applies here too. Below is a list of things to keep in mind, if you are looking to improving performance or reliability.
 
diff --git a/docs/contributing.md b/docs/contributing.md
index a93ba54..028ab00 100644
--- a/docs/contributing.md
+++ b/docs/contributing.md
@@ -1,6 +1,6 @@
 ---
 title: Developer Setup
-keywords: developer setup
+keywords: hudi, ide, developer, setup
 sidebar: mydoc_sidebar
 toc: false
 permalink: contributing.html
diff --git a/docs/gcs_filesystem.md b/docs/gcs_filesystem.md
index c94cb66..8cc82da 100644
--- a/docs/gcs_filesystem.md
+++ b/docs/gcs_filesystem.md
@@ -1,12 +1,12 @@
 ---
 title: GCS Filesystem (experimental)
-keywords: sql hive gcs spark presto
+keywords: hudi, hive, google cloud, storage, spark, presto
 sidebar: mydoc_sidebar
 permalink: gcs_hoodie.html
 toc: false
 summary: In this page, we go over how to configure hudi with Google Cloud Storage.
 ---
-Hudi works with HDFS by default and GCS **regional** buckets provide an HDFS API with strong consistency.
+For Hudi storage on GCS, **regional** buckets provide an HDFS API with strong consistency.
 
 ## GCS Configs
 
diff --git a/docs/implementation.md b/docs/implementation.md
index e87a541..cbf394e 100644
--- a/docs/implementation.md
+++ b/docs/implementation.md
@@ -1,6 +1,6 @@
 ---
 title: Implementation
-keywords: implementation
+keywords: hudi, index, storage, compaction, cleaning, implementation
 sidebar: mydoc_sidebar
 toc: false
 permalink: implementation.html
diff --git a/docs/incremental_processing.md b/docs/incremental_processing.md
index c2afad6..7c97cc9 100644
--- a/docs/incremental_processing.md
+++ b/docs/incremental_processing.md
@@ -1,6 +1,6 @@
 ---
 title: Incremental Processing
-keywords: incremental processing
+keywords: hudi, incremental, batch, stream, processing, Hive, ETL, Spark SQL
 sidebar: mydoc_sidebar
 permalink: incremental_processing.html
 toc: false
@@ -86,10 +86,10 @@ Usage: <main class> [options]
   * --target-table
       name of the target table in Hive
     --transformer-class
-      subclass of com.uber.hoodie.utilities.transform.Transformer. UDF to 
-      transform raw source dataset to a target dataset (conforming to target 
-      schema) before writing. Default : Not set. E:g - 
-      com.uber.hoodie.utilities.transform.SqlQueryBasedTransformer (which 
+      subclass of com.uber.hoodie.utilities.transform.Transformer. UDF to
+      transform raw source dataset to a target dataset (conforming to target
+      schema) before writing. Default : Not set. E:g -
+      com.uber.hoodie.utilities.transform.SqlQueryBasedTransformer (which
       allows a SQL query template to be passed as a transformation function)
 
 ```
@@ -233,6 +233,3 @@ Setting the fromCommitTime=0 and maxCommits=-1 will pull in the entire source da
 then the utility can determine if the target dataset has no commits or is behind more than 24 hour (this is configurable),
 it will automatically use the backfill configuration, since applying the last 24 hours incrementally could take more time than doing a backfill. The current limitation of the tool
 is the lack of support for self-joining the same table in mixed mode (normal and incremental modes).
-
-
-
diff --git a/docs/index.md b/docs/index.md
index ad87933..22e1174 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,6 +1,6 @@
 ---
 title: What is Hudi?
-keywords: homepage
+keywords: big data, stream processing, cloud, hdfs, storage, upserts, change capture
 tags: [getting_started]
 sidebar: mydoc_sidebar
 permalink: index.html
diff --git a/docs/migration_guide.md b/docs/migration_guide.md
index 13c27ac..f785251 100644
--- a/docs/migration_guide.md
+++ b/docs/migration_guide.md
@@ -1,6 +1,6 @@
 ---
 title: Migration Guide
-keywords: migration guide
+keywords: hudi, migration, use case
 sidebar: mydoc_sidebar
 permalink: migration_guide.html
 toc: false
diff --git a/docs/pages/news/news.html b/docs/pages/news/news.html
index cb3d5d0..e1428da 100644
--- a/docs/pages/news/news.html
+++ b/docs/pages/news/news.html
@@ -1,7 +1,7 @@
 ---
 title: News
 sidebar: home_sidebar
-keywords: news, blog, updates, release notes, announcements
+keywords: apache, hudi, news, blog, updates, release notes, announcements
 permalink: news.html
 toc: false
 folder: news
diff --git a/docs/powered_by.md b/docs/powered_by.md
index e79d05e..3d63509 100644
--- a/docs/powered_by.md
+++ b/docs/powered_by.md
@@ -1,6 +1,6 @@
 ---
 title: Talks & Powered By
-keywords: talks
+keywords: hudi, talks, presentation
 sidebar: mydoc_sidebar
 permalink: powered_by.html
 toc: false
@@ -32,9 +32,8 @@ It also powers several incremental Hive ETL pipelines and being currently integr
 5. ["Hudi: Large-Scale, Near Real-Time Pipelines at Uber"](https://databricks
 .com/session/hudi-near-real-time-spark-pipelines-at-petabyte-scale) - By Vinoth Chander & Nishith Agarwal
    October 2018, Spark+AI Summit Europe, London, UK
-   
+
 ## Articles
 
 1. ["The Case for incremental processing on Hadoop"](https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop) - O'reilly Ideas article by Vinoth Chandar
 2. ["Hoodie: Uber Engineering's Incremental Processing Framework on Hadoop"](https://eng.uber.com/hoodie/) - Engineering Blog By Prasanna Rajaperumal
-
diff --git a/docs/privacy.md b/docs/privacy.md
index 32fcb91..c7e8de2 100644
--- a/docs/privacy.md
+++ b/docs/privacy.md
@@ -1,6 +1,6 @@
 ---
 title: Privacy Policy
-keywords: privacy
+keywords: hudi, privacy
 sidebar: mydoc_sidebar
 permalink: privacy.html
 ---
diff --git a/docs/quickstart.md b/docs/quickstart.md
index 1e6fa49..5a9193a 100644
--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@@ -1,6 +1,6 @@
 ---
 title: Quickstart
-keywords: quickstart
+keywords: hudi, quickstart
 tags: [quickstart]
 sidebar: mydoc_sidebar
 toc: false
diff --git a/docs/s3_filesystem.md b/docs/s3_filesystem.md
index 6138821..9bbe068 100644
--- a/docs/s3_filesystem.md
+++ b/docs/s3_filesystem.md
@@ -1,18 +1,18 @@
 ---
 title: S3 Filesystem (experimental)
-keywords: sql hive s3 spark presto
+keywords: hudi, hive, aws, s3, spark, presto
 sidebar: mydoc_sidebar
 permalink: s3_hoodie.html
 toc: false
 summary: In this page, we go over how to configure Hudi with S3 filesystem.
 ---
-Hudi works with HDFS by default. There is an experimental work going on Hoodie-S3 compatibility.
+In this page, we explain how to get your Hudi spark job to store into AWS S3.
 
 ## AWS configs
 
 There are two configurations required for Hoodie-S3 compatibility:
 
-- Adding AWS Credentials for Hudi 
+- Adding AWS Credentials for Hudi
 - Adding required Jars to classpath
 
 ### AWS Credentials
@@ -75,5 +75,3 @@ AWS hadoop libraries to add to our classpath
 
  - com.amazonaws:aws-java-sdk:1.10.34
  - org.apache.hadoop:hadoop-aws:2.7.3
-
-
diff --git a/docs/sql_queries.md b/docs/sql_queries.md
index 44848eb..4fa795f 100644
--- a/docs/sql_queries.md
+++ b/docs/sql_queries.md
@@ -1,6 +1,6 @@
 ---
 title: SQL Queries
-keywords: sql hive spark presto
+keywords: hudi, hive, spark, sql, presto
 sidebar: mydoc_sidebar
 permalink: sql_queries.html
 toc: false
diff --git a/docs/use_cases.md b/docs/use_cases.md
index ed352f1..0040bc1 100644
--- a/docs/use_cases.md
+++ b/docs/use_cases.md
@@ -1,6 +1,6 @@
 ---
 title: Use Cases
-keywords: usecases
+keywords: hudi, data ingestion, etl, real time, use cases
 sidebar: mydoc_sidebar
 permalink: use_cases.html
 toc: false
@@ -74,4 +74,3 @@ A popular choice for this queue is Kafka and this model often results in __redun
 
 Once again Hudi can efficiently solve this problem, by having the Spark Pipeline upsert output from
 each run into a Hudi dataset, which can then be incrementally tailed (just like a Kafka topic) for new data & written into the serving store.
-