You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "chenyunliang (Jira)" <ji...@apache.org> on 2022/06/09 01:55:00 UTC

[jira] [Created] (HUDI-4211) The hudi docker demo failed to execute the delta-streamer and ingest to stock_ticks_cow table in HDFS

chenyunliang created HUDI-4211:
----------------------------------

             Summary: The hudi docker demo failed to execute the delta-streamer and ingest to stock_ticks_cow table in HDFS
                 Key: HUDI-4211
                 URL: https://issues.apache.org/jira/browse/HUDI-4211
             Project: Apache Hudi
          Issue Type: Bug
          Components: cli, meta-sync, spark
         Environment: [root@hudi hive_base]# docker images
REPOSITORY                                                            TAG                 IMAGE ID       CREATED             SIZE
docker.io/graphiteapp/graphite-statsd                                 latest              5742c9c6f1db       2 weeks ago         850 MB
docker.io/apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_2.4.4    latest              07880b8f5978       3 months ago        2.01 GB
docker.io/apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkworker_2.4.4   latest              d5344418db27       3 months ago        1.59 GB
docker.io/apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkmaster_2.4.4   latest              6903d097f47b       3 months ago        1.59 GB
docker.io/apachehudi/hudi-hadoop_2.8.4-hive_2.3.3                     latest              678d033ee64c       3 months ago        1.29 GB
docker.io/apachehudi/hudi-hadoop_2.8.4-history                        latest              995dc55f7fbc       3 months ago        964 MB
docker.io/apachehudi/hudi-hadoop_2.8.4-datanode                       latest              156ea075fb0e       3 months ago        964 MB
docker.io/apachehudi/hudi-hadoop_2.8.4-namenode                       latest              550cfdc43cc8       3 months ago        964 MB
docker.io/apachehudi/hudi-hadoop_2.8.4-prestobase_0.271               latest              7d1a076fa27b       3 months ago        2.69 GB
docker.io/graphiteapp/graphite-statsd                                 <none>              d49e5c8fe07a       3 months ago        847 MB
docker.io/apachehudi/hudi-hadoop_2.8.4-trinoworker_368                latest              d4020d02727a       4 months ago        2.93 GB
docker.io/apachehudi/hudi-hadoop_2.8.4-trinocoordinator_368           latest              9ed7e8f84f5b       4 months ago        2.93 GB
docker.io/bde2020/hive-metastore-postgresql                           2.3.0               7ab9e8f93813       2 years ago         275 MB
docker.io/apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkmaster_2.3.1   latest              70dc18c432a0       3 years ago         1.64 GB
docker.io/bitnami/kafka                                               2.0.0               6ff9736c1996       3 years ago         423 MB
docker.io/bitnami/zookeeper                                           3.4.12-r68          50b53cf5fcad       3 years ago         414 MB
            Reporter: chenyunliang
             Fix For: 0.11.0


When I execute the following code in container adhoc-2:
{code:java}
spark-submit \
  --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE \
  --table-type COPY_ON_WRITE \
  --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
  --source-ordering-field ts  \
  --target-base-path /user/hive/warehouse/stock_ticks_cow \
  --target-table stock_ticks_cow --props /var/demo/config/kafka-source.properties \
  --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider {code}
An error is as follows:
{code:java}
root@adhoc-2:/opt# spark-submit \
>   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE \
>   --table-type COPY_ON_WRITE \
>   --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
>   --source-ordering-field ts  \
>   --target-base-path /user/hive/warehouse/stock_ticks_cow \
>   --target-table stock_ticks_cow --props /var/demo/config/kafka-source.properties \
>   --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
Exception in thread "main" org.apache.spark.SparkException: Cannot load main class from JAR file:/opt/%C2%A0
    at org.apache.spark.deploy.SparkSubmitArguments.error(SparkSubmitArguments.scala:657)
    at org.apache.spark.deploy.SparkSubmitArguments.loadEnvironmentArguments(SparkSubmitArguments.scala:221)
    at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:116)
    at org.apache.spark.deploy.SparkSubmit$$anon$2$$anon$1.<init>(SparkSubmit.scala:907)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.parseArguments(SparkSubmit.scala:907)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:81)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {code}
When I check the environment variable $HUDI_UTILITIES_BUNDLE, I got this:
{code:java}
root@adhoc-2:/opt# echo $HUDI_UTILITIES_BUNDLE
/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar {code}
But, I can't find the jar file:
{code:java}
root@adhoc-2:/opt# ls -ltr /var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar
ls: cannot access '/var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar': No suchfile or directory {code}
When I try find this:
{code:java}
root@adhoc-2:/opt# find /var/hoodie/ws -name "hudi-utilities-bundle*.0.jar" | xargs ls -ltr
-rw-r--r-- 1 root root 60631874 Jun  8 07:41 /var/hoodie/ws/hudi-examples/hudi-examples-spark/target/lib/hudi-utilities-bundle_2.11-0.11.0.jar
-rw-r--r-- 1 root root 60631874 Jun  8 07:41 /var/hoodie/ws/hudi-cli/target/lib/hudi-utilities-bundle_2.11-0.11.0.jar
-rw-r--r-- 1 root root 60631874 Jun  8 07:41 /var/hoodie/ws/packaging/hudi-utilities-bundle/target/hudi-utilities-bundle_2.11-0.11.0.jar {code}
So I tried to modify the environment variable  $HUDI_UTILITIES_BUNDLE, and resubmit the command, it worked:
{code:java}
root@adhoc-2:/opt# spark-submit \
>   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE \
>   --table-type COPY_ON_WRITE \
>   --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
>   --source-ordering-field ts  \
>   --target-base-path /user/hive/warehouse/stock_ticks_cow \
>   --target-table stock_ticks_cow --props /var/demo/config/kafka-source.properties \
>   --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
22/06/09 01:43:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/06/09 01:43:35 WARN SchedulerConfGenerator: Job Scheduling Configs will not be in effect as spark.scheduler.mode is not set to FAIR at instantiation time. Continuing without scheduling configs
22/06/09 01:43:36 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, please set it as the dirof hudi-defaults.conf
22/06/09 01:43:36 WARN DFSPropertiesConfiguration: Properties file file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
22/06/09 01:43:36 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.
22/06/09 01:43:37 WARN KafkaUtils: overriding enable.auto.commit to false for executor
22/06/09 01:43:37 WARN KafkaUtils: overriding auto.offset.reset to none for executor
22/06/09 01:43:37 ERROR KafkaUtils: group.id is null, you should probably set it
22/06/09 01:43:37 WARN KafkaUtils: overriding executor group.id to spark-executor-null
22/06/09 01:43:37 WARN KafkaUtils: overriding receive.buffer.bytes to 65536 see KAFKA-3135
22/06/09 01:43:38 WARN HoodieBackedTableMetadata: Metadata table was not found at path /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata
00:05  WARN: Timeline-server-based markers are not supported for HDFS: base path /user/hive/warehouse/stock_ticks_cow.  Falling back to direct markers.
00:06  WARN: Timeline-server-based markers are not supported for HDFS: base path /user/hive/warehouse/stock_ticks_cow.  Falling back to direct markers.
00:08  WARN: Timeline-server-based markers are not supported for HDFS: base path /user/hive/warehouse/stock_ticks_cow.  Falling back to direct markers. {code}
I could view the data had been written in the HDFS:
{code:java}
root@adhoc-2:/opt# hdfs dfs -ls /user/hive/warehouse/stock_ticks_cow/*/*/*/*
Found 1 items
drwxr-xr-x   - root supergroup          0 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/.aux/.bootstrap
-rw-r--r--   1 root supergroup       8056 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/00000000000000.deltacommit
-rw-r--r--   1 root supergroup       3035 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/00000000000000.deltacommit.inflight
-rw-r--r--   1 root supergroup          0 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/00000000000000.deltacommit.requested
-rw-r--r--   1 root supergroup       8139 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/20220609014338711.deltacommit
-rw-r--r--   1 root supergroup       3035 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/20220609014338711.deltacommit.inflight
-rw-r--r--   1 root supergroup          0 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/20220609014338711.deltacommit.requested
-rw-r--r--   1 root supergroup        599 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/.hoodie/hoodie.properties
-rw-r--r--   1 root supergroup        124 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/files/.files-0000_00000000000000.log.1_0-0-0
-rw-r--r--   1 root supergroup      21951 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/files/.files-0000_00000000000000.log.1_0-10-10
-rw-r--r--   1 root supergroup         93 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/.hoodie/metadata/files/.hoodie_partition_metadata
-rw-r--r--   1 root supergroup         96 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/2018/08/31/.hoodie_partition_metadata
-rw-r--r--   1 root supergroup     436884 2022-06-09 01:43 /user/hive/warehouse/stock_ticks_cow/2018/08/31/7610b058-8df2-484a-ba70-881feef7195e-0_0-36-35_20220609014338711.parquet {code}
So my question is whether I need to modify $HUDI_UTILITIES_BUNDLE ?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)