You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by si...@apache.org on 2022/10/07 21:03:58 UTC

[hudi] branch asf-site updated: [HUDI-4976] added m1 changes to the site (#6860)

This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new b662955f68 [HUDI-4976] added m1 changes to the site (#6860)
b662955f68 is described below

commit b662955f6885608961b0e8a83e1b841506087b88
Author: Jon Vexler <jo...@onehouse.ai>
AuthorDate: Fri Oct 7 17:03:53 2022 -0400

    [HUDI-4976] added m1 changes to the site (#6860)
---
 website/docs/docker_demo.md                        | 63 +++++++++++++++++-
 .../versioned_docs/version-0.12.0/docker_demo.md   | 76 +++++++++++++++++++---
 2 files changed, 128 insertions(+), 11 deletions(-)

diff --git a/website/docs/docker_demo.md b/website/docs/docker_demo.md
index 698aec5439..681b1be51a 100644
--- a/website/docs/docker_demo.md
+++ b/website/docs/docker_demo.md
@@ -4,6 +4,8 @@ keywords: [ hudi, docker, demo]
 toc: true
 last_modified_at: 2019-12-30T15:59:57-04:00
 ---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
 
 ## A Demo using Docker containers
 
@@ -58,6 +60,15 @@ The next step is to run the Docker compose script and setup configs for bringing
 
 This should pull the Docker images from Docker hub and setup the Docker cluster.
 
+<Tabs
+defaultValue="default"
+values={[
+{ label: 'Default', value: 'default', },
+{ label: 'Mac AArch64', value: 'm1', },
+]}
+>
+<TabItem value="default">
+
 ```java
 cd docker
 ./setup_demo.sh
@@ -118,6 +129,50 @@ Copying spark default config and setting up configs
 $ docker ps
 ```
 
+</TabItem>
+<TabItem value="m1">
+Please note that Presto and Trino do not currently work for the docker demo on Mac AArch64
+
+```java
+cd docker
+./setup_demo.sh --mac-aarch64
+.......
+......
+[+] Running 12/12
+⠿ adhoc-1 Pulled                                          2.9s
+⠿ spark-worker-1 Pulled                                   3.0s
+⠿ kafka Pulled                                            2.9s
+⠿ datanode1 Pulled                                        2.9s
+⠿ hivemetastore Pulled                                    2.9s
+⠿ hiveserver Pulled                                       3.0s
+⠿ hive-metastore-postgresql Pulled                        2.8s
+⠿ namenode Pulled                                         2.9s
+⠿ sparkmaster Pulled                                      2.9s
+⠿ zookeeper Pulled                                        2.8s
+⠿ adhoc-2 Pulled                                          2.9s
+⠿ historyserver Pulled                                    2.9s
+[+] Running 12/12
+⠿ Container zookeeper                  Started           41.0s
+⠿ Container kafkabroker                Started           41.7s
+⠿ Container hive-metastore-postgresql  Running            0.0s
+⠿ Container namenode                   Running            0.0s
+⠿ Container hivemetastore              Running            0.0s
+⠿ Container historyserver              Started           41.0s
+⠿ Container datanode1                  Started           49.9s
+⠿ Container hiveserver                 Running            0.0s
+⠿ Container sparkmaster                Started           41.9s
+⠿ Container spark-worker-1             Started           50.2s
+⠿ Container adhoc-2                    Started           38.5s
+⠿ Container adhoc-1                    Started           38.5s
+Copying spark default config and setting up configs
+Copying spark default config and setting up configs
+$ docker ps
+```
+</TabItem>
+
+</Tabs
+> 
+
 At this point, the Docker cluster will be up and running. The demo cluster brings up the following services
 
    * HDFS Services (NameNode, DataNode)
@@ -140,7 +195,9 @@ The batches are windowed intentionally so that the second batch contains updates
 
 ### Step 1 : Publish the first batch to Kafka
 
-Upload the first batch to Kafka topic 'stock ticks' `cat docker/demo/data/batch_1.json | kcat -b kafkabroker -t stock_ticks -P`
+Upload the first batch to Kafka topic 'stock ticks' 
+
+`cat docker/demo/data/batch_1.json | kcat -b kafkabroker -t stock_ticks -P`
 
 To check if the new topic shows up, use
 ```java
@@ -1137,7 +1194,7 @@ Compaction successfully completed for 20180924070031
 
 # Now refresh and check again. You will see that there is a new compaction requested
 
-hoodie:stock_ticks->refresh
+hoodie:stock_ticks_mor->refresh
 18/09/24 07:01:16 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/stock_ticks_mor
 18/09/24 07:01:16 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties
 18/09/24 07:01:16 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1) from /user/hive/warehouse/stock_ticks_mor
@@ -1163,7 +1220,7 @@ hoodie:stock_ticks_mor->refresh
 18/09/24 07:03:00 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1) from /user/hive/warehouse/stock_ticks_mor
 Metadata for table stock_ticks_mor loaded
 
-hoodie:stock_ticks->compactions show all
+hoodie:stock_ticks_mor->compactions show all
 18/09/24 07:03:15 INFO timeline.HoodieActiveTimeline: Loaded instants [[20180924064636__clean__COMPLETED], [20180924064636__deltacommit__COMPLETED], [20180924065057__clean__COMPLETED], [20180924065057__deltacommit__COMPLETED], [20180924070031__commit__COMPLETED]]
 ___________________________________________________________________
 | Compaction Instant Time| State    | Total FileIds to be Compacted|
diff --git a/website/versioned_docs/version-0.12.0/docker_demo.md b/website/versioned_docs/version-0.12.0/docker_demo.md
index 7f56129a1c..13e0a50834 100644
--- a/website/versioned_docs/version-0.12.0/docker_demo.md
+++ b/website/versioned_docs/version-0.12.0/docker_demo.md
@@ -4,6 +4,8 @@ keywords: [ hudi, docker, demo]
 toc: true
 last_modified_at: 2019-12-30T15:59:57-04:00
 ---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
 
 ## A Demo using docker containers
 
@@ -49,8 +51,18 @@ mvn clean package -Pintegration-tests -DskipTests
 
 ### Bringing up Demo Cluster
 
-The next step is to run the docker compose script and setup configs for bringing up the cluster.
-This should pull the docker images from docker hub and setup docker cluster.
+The next step is to run the Docker compose script and setup configs for bringing up the cluster. These files are in the [Hudi repository](https://github.com/apache/hudi) which you should already have locally on your machine from the previous steps. 
+
+This should pull the Docker images from Docker hub and setup the Docker cluster.
+
+<Tabs
+defaultValue="default"
+values={[
+{ label: 'Default', value: 'default', },
+{ label: 'Mac AArch64', value: 'm1', },
+]}
+>
+<TabItem value="default">
 
 ```java
 cd docker
@@ -112,7 +124,51 @@ Copying spark default config and setting up configs
 $ docker ps
 ```
 
-At this point, the docker cluster will be up and running. The demo cluster brings up the following services
+</TabItem>
+<TabItem value="m1">
+Please note that Presto and Trino do not currently work for the docker demo on Mac AArch64
+
+```java
+cd docker
+./setup_demo.sh --mac-aarch64
+.......
+......
+[+] Running 12/12
+⠿ adhoc-1 Pulled                                          2.9s
+⠿ spark-worker-1 Pulled                                   3.0s
+⠿ kafka Pulled                                            2.9s
+⠿ datanode1 Pulled                                        2.9s
+⠿ hivemetastore Pulled                                    2.9s
+⠿ hiveserver Pulled                                       3.0s
+⠿ hive-metastore-postgresql Pulled                        2.8s
+⠿ namenode Pulled                                         2.9s
+⠿ sparkmaster Pulled                                      2.9s
+⠿ zookeeper Pulled                                        2.8s
+⠿ adhoc-2 Pulled                                          2.9s
+⠿ historyserver Pulled                                    2.9s
+[+] Running 12/12
+⠿ Container zookeeper                  Started           41.0s
+⠿ Container kafkabroker                Started           41.7s
+⠿ Container hive-metastore-postgresql  Running            0.0s
+⠿ Container namenode                   Running            0.0s
+⠿ Container hivemetastore              Running            0.0s
+⠿ Container historyserver              Started           41.0s
+⠿ Container datanode1                  Started           49.9s
+⠿ Container hiveserver                 Running            0.0s
+⠿ Container sparkmaster                Started           41.9s
+⠿ Container spark-worker-1             Started           50.2s
+⠿ Container adhoc-2                    Started           38.5s
+⠿ Container adhoc-1                    Started           38.5s
+Copying spark default config and setting up configs
+Copying spark default config and setting up configs
+$ docker ps
+```
+</TabItem>
+
+</Tabs
+> 
+
+At this point, the Docker cluster will be up and running. The demo cluster brings up the following services
 
    * HDFS Services (NameNode, DataNode)
    * Spark Master and Worker
@@ -134,7 +190,9 @@ The batches are windowed intentionally so that the second batch contains updates
 
 ### Step 1 : Publish the first batch to Kafka
 
-Upload the first batch to Kafka topic 'stock ticks' `cat docker/demo/data/batch_1.json | kcat -b kafkabroker -t stock_ticks -P`
+Upload the first batch to Kafka topic 'stock ticks' 
+
+`cat docker/demo/data/batch_1.json | kcat -b kafkabroker -t stock_ticks -P`
 
 To check if the new topic shows up, use
 ```java
@@ -241,7 +299,8 @@ docker exec -it adhoc-2 /bin/bash
   --partitioned-by dt \
   --base-path /user/hive/warehouse/stock_ticks_cow \
   --database default \
-  --table stock_ticks_cow
+  --table stock_ticks_cow \
+  --partition-value-extractor org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor
 .....
 2020-01-25 19:51:28,953 INFO  [main] hive.HiveSyncTool (HiveSyncTool.java:syncHoodieTable(129)) - Sync complete for stock_ticks_cow
 .....
@@ -254,7 +313,8 @@ docker exec -it adhoc-2 /bin/bash
   --partitioned-by dt \
   --base-path /user/hive/warehouse/stock_ticks_mor \
   --database default \
-  --table stock_ticks_mor
+  --table stock_ticks_mor \
+  --partition-value-extractor org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor
 ...
 2020-01-25 19:51:51,066 INFO  [main] hive.HiveSyncTool (HiveSyncTool.java:syncHoodieTable(129)) - Sync complete for stock_ticks_mor_ro
 ...
@@ -1129,7 +1189,7 @@ Compaction successfully completed for 20180924070031
 
 # Now refresh and check again. You will see that there is a new compaction requested
 
-hoodie:stock_ticks->refresh
+hoodie:stock_ticks_mor->refresh
 18/09/24 07:01:16 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/stock_ticks_mor
 18/09/24 07:01:16 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties
 18/09/24 07:01:16 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1) from /user/hive/warehouse/stock_ticks_mor
@@ -1155,7 +1215,7 @@ hoodie:stock_ticks_mor->refresh
 18/09/24 07:03:00 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1) from /user/hive/warehouse/stock_ticks_mor
 Metadata for table stock_ticks_mor loaded
 
-hoodie:stock_ticks->compactions show all
+hoodie:stock_ticks_mor->compactions show all
 18/09/24 07:03:15 INFO timeline.HoodieActiveTimeline: Loaded instants [[20180924064636__clean__COMPLETED], [20180924064636__deltacommit__COMPLETED], [20180924065057__clean__COMPLETED], [20180924065057__deltacommit__COMPLETED], [20180924070031__commit__COMPLETED]]
 ___________________________________________________________________
 | Compaction Instant Time| State    | Total FileIds to be Compacted|