You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/05/03 11:37:52 UTC

[GitHub] [hudi] Akshay2Agarwal opened a new issue #2913: [SUPPORT] Hudi + Hive Metastore Sync

Akshay2Agarwal opened a new issue #2913:
URL: https://github.com/apache/hudi/issues/2913


   Do we need hive server compulsorily for running hive sync as I tried to use metastore jdbc url in `hoodie.datasource.hive_sync.jdbcurl` as the mysql jdbc url. It gave syntax error in SQL statement when tried to sync hudi table.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.```spark-shell \
     --packages org.apache.hudi:hudi-spark-bundle_2.11:0.8.0,org.apache.spark:spark-avro_2.11:2.4.4 \
     --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer'```
   2. `val df = Seq(
     (1, 213213213, "2014/01/01"),
     (2, 343432434, "2014/11/30"),
     (3, 343242323, "2016/12/29"),
     (4, 344234242, "2016/05/09")
   ).toDF("typeId","eventTime","partition")`
   3. ```df.write.format("hudi").
     options(getQuickstartWriteConfigs).
     option(PRECOMBINE_FIELD_OPT_KEY, "eventTime").
     option(RECORDKEY_FIELD_OPT_KEY, "typeId").
     option(PARTITIONPATH_FIELD_OPT_KEY, "partition").
     option(HIVE_PARTITION_FIELDS_OPT_KEY, "partition").
     option(HIVE_STYLE_PARTITIONING_OPT_KEY, false).
     option(HIVE_SYNC_ENABLED_OPT_KEY, true).
     option(HIVE_TABLE_OPT_KEY, "hive_test_data").
     option(HIVE_USER_OPT_KEY, "hive").
     option(HIVE_PASS_OPT_KEY, "XXXXXXX").
     option(HIVE_URL_OPT_KEY, "jdbc:mysql://XXXXXXXX.compute.internal:3306").
     option(TABLE_NAME, "hudi_events_test").
     mode(Overwrite).
     save("s3a://XXXXXXXX/test-lake-data/hudi_events_test/")```
   
   **Expected behavior**
   
   I expected, it should sync hudi table to metastore
   
   **Environment Description**
   
   * Hudi version : 0.8.0
   
   * Spark version : 2.4.4
   
   * Hive version : 2
   
   * Hadoop version :2
   
   * Storage (HDFS/S3/GCS..) :S3
   
   * Running on Docker? (yes/no) :no
   
   
   **Stacktrace**
   
   ```Caused by: java.sql.SQLSyntaxErrorException: (conn=47) You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'EXTERNAL TABLE  IF NOT EXISTS `default`.`hive_test_data`( `_hoodie_commit_time` ' at line 1
     at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.get(ExceptionMapper.java:243)
     at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.getException(ExceptionMapper.java:164)
     at org.mariadb.jdbc.MariaDbStatement.executeExceptionEpilogue(MariaDbStatement.java:258)
     at org.mariadb.jdbc.MariaDbStatement.executeInternal(MariaDbStatement.java:349)
     at org.mariadb.jdbc.MariaDbStatement.execute(MariaDbStatement.java:484)
     at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:367)
     ... 110 more
   Caused by: java.sql.SQLException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'EXTERNAL TABLE  IF NOT EXISTS `default`.`hive_test_data`( `_hoodie_commit_time` ' at line 1
   Query is: CREATE EXTERNAL TABLE  IF NOT EXISTS `default`.`hive_test_data`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `typeId` int, `eventTime` int) PARTITIONED BY (`partition` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://XXXXXXXXX/test-lake-data/hudi_events_test'
   java thread: main
     at org.mariadb.jdbc.internal.util.LogQueryTool.exceptionWithQuery(LogQueryTool.java:134)
     at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executeQuery(AbstractQueryProtocol.java:184)
     at org.mariadb.jdbc.MariaDbStatement.executeInternal(MariaDbStatement.java:343)
     ... 112 more```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Akshay2Agarwal commented on issue #2913: [SUPPORT] Hudi + Hive Metastore Sync

Posted by GitBox <gi...@apache.org>.
Akshay2Agarwal commented on issue #2913:
URL: https://github.com/apache/hudi/issues/2913#issuecomment-831740942


   I read further about hudi's integration with query engines, from what I can understand, is it due to the reason that we don't have a separate hudi connector for presto or other engines and hudi has only hive plugin.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Akshay2Agarwal commented on issue #2913: [SUPPORT] Hudi + Hive Metastore Sync

Posted by GitBox <gi...@apache.org>.
Akshay2Agarwal commented on issue #2913:
URL: https://github.com/apache/hudi/issues/2913#issuecomment-831215963


   Reason behind asking this that, we are planning to use presto for querying, so we don't need hiveserver to query if metastore is only getting used. Let us know if we missed something while reading dependencies and structure of hudi.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Akshay2Agarwal commented on issue #2913: [SUPPORT] Hudi + Hive Metastore Sync

Posted by GitBox <gi...@apache.org>.
Akshay2Agarwal commented on issue #2913:
URL: https://github.com/apache/hudi/issues/2913#issuecomment-836704935


   Closing the ticket, as I read through the code, I realized hudi is integrated with hive2 and queryable in presto as an external table and not as managed table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Akshay2Agarwal edited a comment on issue #2913: [SUPPORT] Hudi + Hive Metastore Sync

Posted by GitBox <gi...@apache.org>.
Akshay2Agarwal edited a comment on issue #2913:
URL: https://github.com/apache/hudi/issues/2913#issuecomment-839386637


   Got this:https://github.com/apache/hudi/issues/1679


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Akshay2Agarwal commented on issue #2913: [SUPPORT] Hudi + Hive Metastore Sync

Posted by GitBox <gi...@apache.org>.
Akshay2Agarwal commented on issue #2913:
URL: https://github.com/apache/hudi/issues/2913#issuecomment-839386637


   Read this:https://github.com/apache/hudi/issues/1679


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] borasy commented on issue #2913: [SUPPORT] Hudi + Hive Metastore Sync

Posted by GitBox <gi...@apache.org>.
borasy commented on issue #2913:
URL: https://github.com/apache/hudi/issues/2913#issuecomment-924731093


   Hi @Akshay2Agarwal , so in the end, it's not possible to use hivemetastore and we have to use hive2 jdbc?
   
   -----
   hudi 0.9
   spark 3.0.3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] Akshay2Agarwal closed issue #2913: [SUPPORT] Hudi + Hive Metastore Sync

Posted by GitBox <gi...@apache.org>.
Akshay2Agarwal closed issue #2913:
URL: https://github.com/apache/hudi/issues/2913


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org