You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/01 10:01:40 UTC

[GitHub] [hudi] GnsCy opened a new issue, #5729: [SUPPORT] Environment issues when running Demo for v0.11

GnsCy opened a new issue, #5729:
URL: https://github.com/apache/hudi/issues/5729

   Running the demo setup as described [here](https://hudi.apache.org/docs/docker_demo) for v0.11 results in jar files missing error when running `spark-submit` and `hive-sync` commands.
   
   Steps to reproduce the behavior:
   
   1. Clone repo and switch to 0.11 release tag
   2. Setup the docker environments
   3. Publish events to kafka
   4. Try to run the spark-submit job to ingest data
   
   **Expected behavior**
   
   The demo environment is setup correctly and be able to go through all the scenarios of the demo.
   
   **Environment Description**
   
   * Hudi version :0.11
   
   * Spark version : 2.4.4
   
   * Hive version : 2.3.3
   
   * Hadoop version : 2.8.4
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) : yes
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```
   spark-submit \
   >   --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer $HUDI_UTILITIES_BUNDLE \
   >   --table-type COPY_ON_WRITE \
   >   --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \
   >   --source-ordering-field ts  \
   >   --target-base-path /user/hive/warehouse/stock_ticks_cow \
   >   --target-table stock_ticks_cow --props /var/demo/config/kafka-source.properties \
   >   --schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider
   22/05/31 06:54:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   22/05/31 06:54:24 WARN DependencyUtils: Local jar /var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar does not exist, skipping.
   22/05/31 06:54:24 WARN SparkSubmit$$anon$2: Failed to load org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.
   java.lang.ClassNotFoundException: org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer
   	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
   	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   	at java.lang.Class.forName0(Native Method)
   	at java.lang.Class.forName(Class.java:348)
   	at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
   	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:806)
   	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
   	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
   	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
   	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
   	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
   	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)`
   ```
   
   ```
   hive-sync ->
   Exception in thread "main" org.apache.hudi.exception.HoodieException: Got runtime exception when hive syncing stock_ticks_cow
   	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:141)
   	at org.apache.hudi.hive.HiveSyncTool.main(HiveSyncTool.java:433)
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL CREATE EXTERNAL TABLE IF NOT EXISTS `default`.`stock_ticks_cow`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `volume` bigint, `ts` string, `symbol` string, `year` int, `month` string, `high` double, `low` double, `key` string, `date` string, `close` double, `open` double, `day` string) PARTITIONED BY (`dt` String)   ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ('hoodie.query.as.ro.table'='false','path'='/user/hive/warehouse/stock_ticks_cow') STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION '/user/hive/warehouse/stock_ticks_cow' TBLPROPERTIES('spark.sql.sources.schema.partCol.0'='dt','spark.sql.sources.schema.numParts'='1','spark.sql.s
 ources.schema.numPartCols'='1','spark.sql.sources.provider'='hudi','spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"volume","type":"long","nullable":false,"metadata":{}},{"name":"ts","type":"string","nullable":false,"metadata":{}},{"name":"symbol","type":"string","nullable":false,"metadata":{}},{"name":"year","type":"integer","nullable":false,"metadata":{}},{"name":"month","type":"string","nullable":false,"metadata":{}},{"name":"high","type":"double","nullable":false,"metadata":{}},{"name":"low","type":"double","nullable":false,"metadata":{}},{"name":"key","type":"string","nullable"
 :false,"metadata":{}},{"name":"date","type":"string","nullable":false,"metadata":{}},{"name":"close","type":"double","nullable":false,"metadata":{}},{"name":"open","type":"double","nullable":false,"metadata":{}},{"name":"day","type":"string","nullable":false,"metadata":{}},{"name":"dt","type":"string","nullable":false,"metadata":{}}]}')
   	at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:67)
   	at org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.createTable(QueryBasedDDLExecutor.java:84)
   	at org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:168)
   	at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:276)
   	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:217)
   	at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:150)
   	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:138)
   	... 1 more
   Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Cannot find class 'org.apache.hudi.hadoop.HoodieParquetInputFormat'
   	at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:267)
   	at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:253)
   	at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:313)
   	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:253)
   	at org.apache.hudi.hive.ddl.JDBCExecutor.runSQL(JDBCExecutor.java:65)
   	... 7 more
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on issue #5729: [SUPPORT] Environment issues when running Demo for v0.11

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #5729:
URL: https://github.com/apache/hudi/issues/5729#issuecomment-1146407055

   @GnsCy right; this more likely caused by out-of-date instructions or configs that may changed in the newer release. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xiedeyantu commented on issue #5729: [SUPPORT] Environment issues when running Demo for v0.11

Posted by "xiedeyantu (via GitHub)" <gi...@apache.org>.
xiedeyantu commented on issue #5729:
URL: https://github.com/apache/hudi/issues/5729#issuecomment-1594983080

   I also meet this problem using current master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan closed issue #5729: [SUPPORT] Environment issues when running Demo for v0.11

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #5729: [SUPPORT] Environment issues when running Demo for v0.11
URL: https://github.com/apache/hudi/issues/5729


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on issue #5729: [SUPPORT] Environment issues when running Demo for v0.11

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #5729:
URL: https://github.com/apache/hudi/issues/5729#issuecomment-1145745363

   I suspect this is discrepancy with your environment setup. We have integration test end to end running with docker demo for every commits. And we certainly tested deltastreamer org.apache.hudi.integ.ITTestHoodieDemo#ingestFirstBatchAndHiveSync


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] GnsCy commented on issue #5729: [SUPPORT] Environment issues when running Demo for v0.11

Posted by GitBox <gi...@apache.org>.
GnsCy commented on issue #5729:
URL: https://github.com/apache/hudi/issues/5729#issuecomment-1145872951

   @xushiyan wouldn't running the demo on docker eliminate any environment setup discrepancies?
   I am running the setup on a clean ubuntu OS.
   
   ps. Btw I manage to run the same setup successfully for v0.10.1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org