You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/04 07:19:44 UTC

[GitHub] [hudi] mingujotemp opened a new issue #1909: [SUPPORT] "Failed to get update last commit time synced to 20200804071144"

mingujotemp opened a new issue #1909:
URL: https://github.com/apache/hudi/issues/1909


   **Describe the problem you faced**
   
   HUDI 0.5.0 (using on EMR) 
   
   I encounter `org.apache.hudi.hive.HoodieHiveSyncException: Failed to get update last commit time synced to 20200804071144` when I try to write a non-partitioned table on Glue(S3).
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. create a pyspark dataframe
   2. Write a new df by runnning with the following options
   ```
   hudi_options = {
     'hoodie.table.name': tableName,
     'hoodie.datasource.write.recordkey.field': 'id',
     'hoodie.index.type': 'BLOOM',
     'hoodie.datasource.write.partitionpath.field': '',
     'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.NonpartitionedKeyGenerator',
     'hoodie.datasource.write.table.name': tableName,
     'hoodie.datasource.write.operation': 'upsert',
     'hoodie.datasource.write.precombine.field': 'updated_at',
     'hoodie.upsert.shuffle.parallelism': 2, 
     'hoodie.insert.shuffle.parallelism': 2,
     'hoodie.bulkinsert.shuffle.parallelism': 10,
     'hoodie.datasource.hive_sync.database': databaseName,
     'hoodie.datasource.hive_sync.table': tableName,
     'hoodie.datasource.hive_sync.enable': 'true',
     'hoodie.datasource.hive_sync.assume_date_partitioning': 'false',
     'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.NonPartitionedExtractor',
     'hoodie.datasource.hive_sync.partition_fields': '',
   }
   df.write.format("org.apache.hudi"). \
     options(**hudi_options). \
     mode("overwrite"). \
     save(basePath)
   ```
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.5.0
   
   * Spark version : 2.4.4
   
   * Hive version : 3.1.2 (Using Glue)
   
   * Hadoop version : 3.2.1-amzn-0
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   using the following jars
   `/usr/lib/hudi/hudi-spark-bundle.jar`
   `/usr/lib/spark/external/lib/spark-avro.jar`
   installed on EMR 6.0.0
   
   **Stacktrace**
   
   ```
   20/08/04 07:11:50 WARN HiveConf: HiveConf of name hive.server2.thrift.url does not exist
   Traceback (most recent call last):
     File "<stdin>", line 5, in <module>
     File "/usr/lib/spark/python/pyspark/sql/readwriter.py", line 738, in save
       self._jwrite.save(path)
     File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
     File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
       return f(*a, **kw)
     File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
   py4j.protocol.Py4JJavaError: An error occurred while calling o273.save.
   : org.apache.hudi.hive.HoodieHiveSyncException: Failed to get update last commit time synced to 20200804071144
   	at org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:667)
   	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:109)
   	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:67)
   	at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:236)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:131)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:156)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:83)
   	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:84)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:165)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:74)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   	at py4j.Gateway.invoke(Gateway.java:282)
   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.IllegalArgumentException: Can not create a Path from an empty string
   	at org.apache.hadoop.fs.Path.checkPathArg(Path.java:172)
   	at org.apache.hadoop.fs.Path.<init>(Path.java:184)
   	at org.apache.hadoop.hive.metastore.Warehouse.getDatabasePath(Warehouse.java:172)
   	at org.apache.hadoop.hive.metastore.Warehouse.getTablePath(Warehouse.java:184)
   	at org.apache.hadoop.hive.metastore.Warehouse.getFileStatusesForUnpartitionedTable(Warehouse.java:520)
   	at org.apache.hadoop.hive.metastore.MetaStoreUtils.updateUnpartitionedTableStatsFast(MetaStoreUtils.java:180)
   	at com.amazonaws.glue.shims.AwsGlueSparkHiveShims.updateTableStatsFast(AwsGlueSparkHiveShims.java:75)
   	at com.amazonaws.glue.catalog.metastore.GlueMetastoreClientDelegate.alterTable(GlueMetastoreClientDelegate.java:538)
   	at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.alter_table(AWSCatalogMetastoreClient.java:374)
   	at com.amazonaws.glue.catalog.metastore.AWSCatalogMetastoreClient.alter_table(AWSCatalogMetastoreClient.java:359)
   	at org.apache.hudi.hive.HoodieHiveClient.updateLastCommitTimeSynced(HoodieHiveClient.java:665)
   	... 35 more
   ```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1909: [SUPPORT] "Failed to get update last commit time synced to 20200804071144"

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1909:
URL: https://github.com/apache/hudi/issues/1909#issuecomment-668649154


   ```20/08/04 07:11:50 WARN HiveConf: HiveConf of name hive.server2.thrift.url does not exist
   Traceback (most recent call last):
     File "<stdin>", line 5, in <module>
     File "/usr/lib/spark/python/pyspark/sql/readwriter.py", line 738, in save
       self._jwrite.save(path)
     File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
     File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
       return f(*a, **kw)
     File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
   py4j.protocol.Py4JJavaError: An error occurred while calling o273.save.
   
   It looks like hive-conf is not set correctly.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1909: [SUPPORT] "Failed to get update last commit time synced to 20200804071144"

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1909:
URL: https://github.com/apache/hudi/issues/1909#issuecomment-675451013


   Closing this ticket. Please reopen if this issue  is specific to Hudi


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] mingujotemp commented on issue #1909: [SUPPORT] "Failed to get update last commit time synced to 20200804071144"

Posted by GitBox <gi...@apache.org>.
mingujotemp commented on issue #1909:
URL: https://github.com/apache/hudi/issues/1909#issuecomment-668934461


   @bvaradar could you elaborate more? which part on hive-conf are you describing? is it hive-conf.xml on emr or hive configuration for hudi? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar closed issue #1909: [SUPPORT] "Failed to get update last commit time synced to 20200804071144"

Posted by GitBox <gi...@apache.org>.
bvaradar closed issue #1909:
URL: https://github.com/apache/hudi/issues/1909


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] ismailsimsek commented on issue #1909: [SUPPORT] "Failed to get update last commit time synced to 20200804071144"

Posted by GitBox <gi...@apache.org>.
ismailsimsek commented on issue #1909:
URL: https://github.com/apache/hudi/issues/1909#issuecomment-819535304


   probably related to https://github.com/apache/hudi/issues/2797#issuecomment-819532968


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1909: [SUPPORT] "Failed to get update last commit time synced to 20200804071144"

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1909:
URL: https://github.com/apache/hudi/issues/1909#issuecomment-669675787


   It appears like hive-site.xml may not be set correctly. Hive metastore client is not able to find hive.server2.thrift.url from config.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org