You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by si...@apache.org on 2022/10/07 21:03:58 UTC
[hudi] branch asf-site updated: [HUDI-4976] added m1 changes to the site (#6860)
This is an automated email from the ASF dual-hosted git repository.
sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new b662955f68 [HUDI-4976] added m1 changes to the site (#6860)
b662955f68 is described below
commit b662955f6885608961b0e8a83e1b841506087b88
Author: Jon Vexler <jo...@onehouse.ai>
AuthorDate: Fri Oct 7 17:03:53 2022 -0400
[HUDI-4976] added m1 changes to the site (#6860)
---
website/docs/docker_demo.md | 63 +++++++++++++++++-
.../versioned_docs/version-0.12.0/docker_demo.md | 76 +++++++++++++++++++---
2 files changed, 128 insertions(+), 11 deletions(-)
diff --git a/website/docs/docker_demo.md b/website/docs/docker_demo.md
index 698aec5439..681b1be51a 100644
--- a/website/docs/docker_demo.md
+++ b/website/docs/docker_demo.md
@@ -4,6 +4,8 @@ keywords: [ hudi, docker, demo]
toc: true
last_modified_at: 2019-12-30T15:59:57-04:00
---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
## A Demo using Docker containers
@@ -58,6 +60,15 @@ The next step is to run the Docker compose script and setup configs for bringing
This should pull the Docker images from Docker hub and setup the Docker cluster.
+<Tabs
+defaultValue="default"
+values={[
+{ label: 'Default', value: 'default', },
+{ label: 'Mac AArch64', value: 'm1', },
+]}
+>
+<TabItem value="default">
+
```java
cd docker
./setup_demo.sh
@@ -118,6 +129,50 @@ Copying spark default config and setting up configs
$ docker ps
```
+</TabItem>
+<TabItem value="m1">
+Please note that Presto and Trino do not currently work for the docker demo on Mac AArch64
+
+```java
+cd docker
+./setup_demo.sh --mac-aarch64
+.......
+......
+[+] Running 12/12
+⠿ adhoc-1 Pulled 2.9s
+⠿ spark-worker-1 Pulled 3.0s
+⠿ kafka Pulled 2.9s
+⠿ datanode1 Pulled 2.9s
+⠿ hivemetastore Pulled 2.9s
+⠿ hiveserver Pulled 3.0s
+⠿ hive-metastore-postgresql Pulled 2.8s
+⠿ namenode Pulled 2.9s
+⠿ sparkmaster Pulled 2.9s
+⠿ zookeeper Pulled 2.8s
+⠿ adhoc-2 Pulled 2.9s
+⠿ historyserver Pulled 2.9s
+[+] Running 12/12
+⠿ Container zookeeper Started 41.0s
+⠿ Container kafkabroker Started 41.7s
+⠿ Container hive-metastore-postgresql Running 0.0s
+⠿ Container namenode Running 0.0s
+⠿ Container hivemetastore Running 0.0s
+⠿ Container historyserver Started 41.0s
+⠿ Container datanode1 Started 49.9s
+⠿ Container hiveserver Running 0.0s
+⠿ Container sparkmaster Started 41.9s
+⠿ Container spark-worker-1 Started 50.2s
+⠿ Container adhoc-2 Started 38.5s
+⠿ Container adhoc-1 Started 38.5s
+Copying spark default config and setting up configs
+Copying spark default config and setting up configs
+$ docker ps
+```
+</TabItem>
+
+</Tabs
+>
+
At this point, the Docker cluster will be up and running. The demo cluster brings up the following services
* HDFS Services (NameNode, DataNode)
@@ -140,7 +195,9 @@ The batches are windowed intentionally so that the second batch contains updates
### Step 1 : Publish the first batch to Kafka
-Upload the first batch to Kafka topic 'stock ticks' `cat docker/demo/data/batch_1.json | kcat -b kafkabroker -t stock_ticks -P`
+Upload the first batch to Kafka topic 'stock ticks'
+
+`cat docker/demo/data/batch_1.json | kcat -b kafkabroker -t stock_ticks -P`
To check if the new topic shows up, use
```java
@@ -1137,7 +1194,7 @@ Compaction successfully completed for 20180924070031
# Now refresh and check again. You will see that there is a new compaction requested
-hoodie:stock_ticks->refresh
+hoodie:stock_ticks_mor->refresh
18/09/24 07:01:16 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/stock_ticks_mor
18/09/24 07:01:16 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties
18/09/24 07:01:16 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1) from /user/hive/warehouse/stock_ticks_mor
@@ -1163,7 +1220,7 @@ hoodie:stock_ticks_mor->refresh
18/09/24 07:03:00 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1) from /user/hive/warehouse/stock_ticks_mor
Metadata for table stock_ticks_mor loaded
-hoodie:stock_ticks->compactions show all
+hoodie:stock_ticks_mor->compactions show all
18/09/24 07:03:15 INFO timeline.HoodieActiveTimeline: Loaded instants [[20180924064636__clean__COMPLETED], [20180924064636__deltacommit__COMPLETED], [20180924065057__clean__COMPLETED], [20180924065057__deltacommit__COMPLETED], [20180924070031__commit__COMPLETED]]
___________________________________________________________________
| Compaction Instant Time| State | Total FileIds to be Compacted|
diff --git a/website/versioned_docs/version-0.12.0/docker_demo.md b/website/versioned_docs/version-0.12.0/docker_demo.md
index 7f56129a1c..13e0a50834 100644
--- a/website/versioned_docs/version-0.12.0/docker_demo.md
+++ b/website/versioned_docs/version-0.12.0/docker_demo.md
@@ -4,6 +4,8 @@ keywords: [ hudi, docker, demo]
toc: true
last_modified_at: 2019-12-30T15:59:57-04:00
---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
## A Demo using docker containers
@@ -49,8 +51,18 @@ mvn clean package -Pintegration-tests -DskipTests
### Bringing up Demo Cluster
-The next step is to run the docker compose script and setup configs for bringing up the cluster.
-This should pull the docker images from docker hub and setup docker cluster.
+The next step is to run the Docker compose script and setup configs for bringing up the cluster. These files are in the [Hudi repository](https://github.com/apache/hudi) which you should already have locally on your machine from the previous steps.
+
+This should pull the Docker images from Docker hub and setup the Docker cluster.
+
+<Tabs
+defaultValue="default"
+values={[
+{ label: 'Default', value: 'default', },
+{ label: 'Mac AArch64', value: 'm1', },
+]}
+>
+<TabItem value="default">
```java
cd docker
@@ -112,7 +124,51 @@ Copying spark default config and setting up configs
$ docker ps
```
-At this point, the docker cluster will be up and running. The demo cluster brings up the following services
+</TabItem>
+<TabItem value="m1">
+Please note that Presto and Trino do not currently work for the docker demo on Mac AArch64
+
+```java
+cd docker
+./setup_demo.sh --mac-aarch64
+.......
+......
+[+] Running 12/12
+⠿ adhoc-1 Pulled 2.9s
+⠿ spark-worker-1 Pulled 3.0s
+⠿ kafka Pulled 2.9s
+⠿ datanode1 Pulled 2.9s
+⠿ hivemetastore Pulled 2.9s
+⠿ hiveserver Pulled 3.0s
+⠿ hive-metastore-postgresql Pulled 2.8s
+⠿ namenode Pulled 2.9s
+⠿ sparkmaster Pulled 2.9s
+⠿ zookeeper Pulled 2.8s
+⠿ adhoc-2 Pulled 2.9s
+⠿ historyserver Pulled 2.9s
+[+] Running 12/12
+⠿ Container zookeeper Started 41.0s
+⠿ Container kafkabroker Started 41.7s
+⠿ Container hive-metastore-postgresql Running 0.0s
+⠿ Container namenode Running 0.0s
+⠿ Container hivemetastore Running 0.0s
+⠿ Container historyserver Started 41.0s
+⠿ Container datanode1 Started 49.9s
+⠿ Container hiveserver Running 0.0s
+⠿ Container sparkmaster Started 41.9s
+⠿ Container spark-worker-1 Started 50.2s
+⠿ Container adhoc-2 Started 38.5s
+⠿ Container adhoc-1 Started 38.5s
+Copying spark default config and setting up configs
+Copying spark default config and setting up configs
+$ docker ps
+```
+</TabItem>
+
+</Tabs
+>
+
+At this point, the Docker cluster will be up and running. The demo cluster brings up the following services
* HDFS Services (NameNode, DataNode)
* Spark Master and Worker
@@ -134,7 +190,9 @@ The batches are windowed intentionally so that the second batch contains updates
### Step 1 : Publish the first batch to Kafka
-Upload the first batch to Kafka topic 'stock ticks' `cat docker/demo/data/batch_1.json | kcat -b kafkabroker -t stock_ticks -P`
+Upload the first batch to Kafka topic 'stock ticks'
+
+`cat docker/demo/data/batch_1.json | kcat -b kafkabroker -t stock_ticks -P`
To check if the new topic shows up, use
```java
@@ -241,7 +299,8 @@ docker exec -it adhoc-2 /bin/bash
--partitioned-by dt \
--base-path /user/hive/warehouse/stock_ticks_cow \
--database default \
- --table stock_ticks_cow
+ --table stock_ticks_cow \
+ --partition-value-extractor org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor
.....
2020-01-25 19:51:28,953 INFO [main] hive.HiveSyncTool (HiveSyncTool.java:syncHoodieTable(129)) - Sync complete for stock_ticks_cow
.....
@@ -254,7 +313,8 @@ docker exec -it adhoc-2 /bin/bash
--partitioned-by dt \
--base-path /user/hive/warehouse/stock_ticks_mor \
--database default \
- --table stock_ticks_mor
+ --table stock_ticks_mor \
+ --partition-value-extractor org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor
...
2020-01-25 19:51:51,066 INFO [main] hive.HiveSyncTool (HiveSyncTool.java:syncHoodieTable(129)) - Sync complete for stock_ticks_mor_ro
...
@@ -1129,7 +1189,7 @@ Compaction successfully completed for 20180924070031
# Now refresh and check again. You will see that there is a new compaction requested
-hoodie:stock_ticks->refresh
+hoodie:stock_ticks_mor->refresh
18/09/24 07:01:16 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hive/warehouse/stock_ticks_mor
18/09/24 07:01:16 INFO table.HoodieTableConfig: Loading table properties from /user/hive/warehouse/stock_ticks_mor/.hoodie/hoodie.properties
18/09/24 07:01:16 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1) from /user/hive/warehouse/stock_ticks_mor
@@ -1155,7 +1215,7 @@ hoodie:stock_ticks_mor->refresh
18/09/24 07:03:00 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1) from /user/hive/warehouse/stock_ticks_mor
Metadata for table stock_ticks_mor loaded
-hoodie:stock_ticks->compactions show all
+hoodie:stock_ticks_mor->compactions show all
18/09/24 07:03:15 INFO timeline.HoodieActiveTimeline: Loaded instants [[20180924064636__clean__COMPLETED], [20180924064636__deltacommit__COMPLETED], [20180924065057__clean__COMPLETED], [20180924065057__deltacommit__COMPLETED], [20180924070031__commit__COMPLETED]]
___________________________________________________________________
| Compaction Instant Time| State | Total FileIds to be Compacted|