You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "chenyunliang (Jira)" <ji...@apache.org> on 2022/06/09 01:55:00 UTC
[jira] [Created] (HUDI-4211) The hudi docker demo failed to execute the delta-streamer and ingest to stock_ticks_cow table in HDFS
chenyunliang created HUDI-4211:
----------------------------------
Summary: The hudi docker demo failed to execute the delta-streamer and ingest to stock_ticks_cow table in HDFS
Key: HUDI-4211
URL: https://issues.apache.org/jira/browse/HUDI-4211
Project: Apache Hudi
Issue Type: Bug
Components: cli, meta-sync, spark
Environment: [root@hudi hive_base]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/graphiteapp/graphite-statsd latest 5742c9c6f1db 2 weeks ago 850 MB
docker.io/apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_2.4.4 latest 07880b8f5978 3 months ago 2.01 GB
docker.io/apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkworker_2.4.4 latest d5344418db27 3 months ago 1.59 GB
docker.io/apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkmaster_2.4.4 latest 6903d097f47b 3 months ago 1.59 GB
docker.io/apachehudi/hudi-hadoop_2.8.4-hive_2.3.3 latest 678d033ee64c 3 months ago 1.29 GB
docker.io/apachehudi/hudi-hadoop_2.8.4-history latest 995dc55f7fbc 3 months ago 964 MB
docker.io/apachehudi/hudi-hadoop_2.8.4-datanode latest 156ea075fb0e 3 months ago 964 MB
docker.io/apachehudi/hudi-hadoop_2.8.4-namenode latest 550cfdc43cc8 3 months ago 964 MB
docker.io/apachehudi/hudi-hadoop_2.8.4-prestobase_0.271 latest 7d1a076fa27b 3 months ago 2.69 GB
docker.io/graphiteapp/graphite-statsd <none> d49e5c8fe07a 3 months ago 847 MB
docker.io/apachehudi/hudi-hadoop_2.8.4-trinoworker_368 latest d4020d02727a 4 months ago 2.93 GB
docker.io/apachehudi/hudi-hadoop_2.8.4-trinocoordinator_368 latest 9ed7e8f84f5b 4 months ago 2.93 GB
docker.io/bde2020/hive-metastore-postgresql 2.3.0 7ab9e8f93813 2 years ago 275 MB
docker.io/apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkmaster_2.3.1 latest 70dc18c432a0 3 years ago 1.64 GB
docker.io/bitnami/kafka 2.0.0 6ff9736c1996 3 years ago 423 MB
docker.io/bitnami/zookeeper 3.4.12-r68 50b53cf5fcad 3 years ago 414 MB
Reporter: chenyunliang
Fix For: 0.11.0
When I execute the following code in container adhoc-2:
{code:java}
spark-submit \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE \
--table-type COPY_ON_WRITE \
--source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
--source-ordering-field ts \
--target-base-path /user/hive/warehouse/stock_ticks_cow \
--target-table stock_ticks_cow --props /var/demo/config/kafka-source.properties \
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider {code}
An error is as follows:
{code:java}
root@adhoc-2:/opt# spark-submit \
> --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE \
> --table-type COPY_ON_WRITE \
> --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
> --source-ordering-field ts \
> --target-base-path /user/hive/warehouse/stock_ticks_cow \
> --target-table stock_ticks_cow --props /var/demo/config/kafka-source.properties \
> --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
Exception in thread "main" org.apache.spark.SparkException: Cannot load main class from JAR file:/opt/%C2%A0
at org.apache.spark.deploy.SparkSubmitArguments.error(SparkSubmitArguments.scala:657)
at org.apache.spark.deploy.SparkSubmitArguments.loadEnvironmentArguments(SparkSubmitArguments.scala:221)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:116)
at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$1.<init>(SparkSubmit.scala:907)
at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:907)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:81)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {code}
When I check the environment variable $HUDI_UTILITIES_BUNDLE, I got this:
{code:java}
root@adhoc-2:/opt# echo $HUDI_UTILITIES_BUNDLE
/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar {code}
But, I can't find the jar file:
{code:java}
root@adhoc-2:/opt# ls -ltr /var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar
ls: cannot access '/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar': No suchfile or directory {code}
When I try find this:
{code:java}
root@adhoc-2:/opt# find /var/hoodie/ws -name "hudi-utilities-bundle*.0.jar" | xargs ls -ltr
-rw-r--r-- 1 root root 60631874 Jun 8 07:41 /var/hoodie/ws/hudi-examples/hudi-examples-spark/target/lib/hudi-utilities-bundle_2.11-0.11.0.jar
-rw-r--r-- 1 root root 60631874 Jun 8 07:41 /var/hoodie/ws/hudi-cli/target/lib/hudi-utilities-bundle_2.11-0.11.0.jar
-rw-r--r-- 1 root root 60631874 Jun 8 07:41 /var/hoodie/ws/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0.jar {code}
So I tried to modify the environment variable $HUDI_UTILITIES_BUNDLE, and resubmit the command, it worked:
{code:java}
root@adhoc-2:/opt# spark-submit \
> --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE \
> --table-type COPY_ON_WRITE \
> --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
> --source-ordering-field ts \
> --target-base-path /user/hive/warehouse/stock_ticks_cow \
> --target-table stock_ticks_cow --props /var/demo/config/kafka-source.properties \
> --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
22/06/09 01:43:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/06/09 01:43:35 WARN SchedulerConfGenerator: Job Scheduling Configs will not be in effect as spark.scheduler.mode is not set to FAIR at instantiation time. Continuing without scheduling configs
22/06/09 01:43:36 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, please set it as the dirof hudi-defaults.conf
22/06/09 01:43:36 WARN DFSPropertiesConfiguration: Properties file file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
22/06/09 01:43:36 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.
22/06/09 01:43:37 WARN KafkaUtils: overriding enable.auto.commit to false for executor
22/06/09 01:43:37 WARN KafkaUtils: overriding auto.offset.reset to none for executor
22/06/09 01:43:37 ERROR KafkaUtils: group.id is null, you should probably set it
22/06/09 01:43:37 WARN KafkaUtils: overriding executor group.id to spark-executor-null
22/06/09 01:43:37 WARN KafkaUtils: overriding receive.buffer.bytes to 65536 see KAFKA-3135
22/06/09 01:43:38 WARN HoodieBackedTableMetadata: Metadata table was not found at path /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata
00:05 WARN: Timeline-server-based markers are not supported for HDFS: base path /user/hive/warehouse/stock_ticks_cow. Falling back to direct markers.
00:06 WARN: Timeline-server-based markers are not supported for HDFS: base path /user/hive/warehouse/stock_ticks_cow. Falling back to direct markers.
00:08 WARN: Timeline-server-based markers are not supported for HDFS: base path /user/hive/warehouse/stock_ticks_cow. Falling back to direct markers. {code}
I could view the data had been written in the HDFS:
{code:java}
root@adhoc-2:/opt# hdfs dfs -ls /user/hive/warehouse/stock_ticks_cow/*/*/*/*
Found 1 items
drwxr-xr-x - root supergroup 0 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/.aux/.bootstrap
-rw-r--r-- 1 root supergroup 8056 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/00000000000000.deltacommit
-rw-r--r-- 1 root supergroup 3035 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/00000000000000.deltacommit.inflight
-rw-r--r-- 1 root supergroup 0 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/00000000000000.deltacommit.requested
-rw-r--r-- 1 root supergroup 8139 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/20220609014338711.deltacommit
-rw-r--r-- 1 root supergroup 3035 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/20220609014338711.deltacommit.inflight
-rw-r--r-- 1 root supergroup 0 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/20220609014338711.deltacommit.requested
-rw-r--r-- 1 root supergroup 599 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/hoodie.properties
-rw-r--r-- 1 root supergroup 124 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/files/.files-0000_00000000000000.log.1_0-0-0
-rw-r--r-- 1 root supergroup 21951 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/files/.files-0000_00000000000000.log.1_0-10-10
-rw-r--r-- 1 root supergroup 93 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/files/.hoodie_partition_metadata
-rw-r--r-- 1 root supergroup 96 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/2018/08/31/.hoodie_partition_metadata
-rw-r--r-- 1 root supergroup 436884 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/2018/08/31/7610b058-8df2-484a-ba70-881feef7195e-0_0-36-35_20220609014338711.parquet {code}
So my question is whether I need to modify $HUDI_UTILITIES_BUNDLE ?
--
This message was sent by Atlassian Jira
(v8.20.7#820007)