You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/05/19 13:44:59 UTC

[GitHub] [hudi] gtwuser opened a new issue, #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

gtwuser opened a new issue, #5636:
URL: https://github.com/apache/hudi/issues/5636

**Describe the problem you faced**
Need to use higher version of Spark libraries, so as to support casting of array<string> to array<null> type, because we dont know which combination of sprak-hudi-bundle jars and spark-avro jars wold work, im stuck with Glue 2.0 and Spark 2.4.
The jars used for creating Hudi tables on glue catalog as of now are as follows :
Setup/Env config:

AWS Glue 2.0,
Python 3,
Spark 2
external dependent jars for connecting AWS glue and Hudi:
1. httpclient-4.5.9.jar
2. hudi-spark-bundle_2.11-0.8.0.jar
3. spark-avro_2.11-2.4.4.jar

A clear and concise description of the problem.

Have a use case where in we need to update the schema of received records to with empty array as value in few columns to array<null> type.

A clear and concise description of what you expected to happen.
Link for reference of the issue
https://stackoverflow.com/questions/72294587/how-to-automate-casting-of-empty-arraystring-elements-to-arraystruct-eleme

Ultimately we want to know the which versions of `hudi-spark-bundle.jar`, `spark-avro.jars` to be used so that we can switch to Glue 3.0 which internally works on Spark 3.1.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] gtwuser commented on issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

Posted by GitBox <gi...@apache.org>.

gtwuser commented on issue #5636:
URL: https://github.com/apache/hudi/issues/5636#issuecomment-1207195680

   Thanks @xushiyan @Gatsby-Lee below combination of jars works with latest released Hudi builds(0.11.1). I chose to build it from source and it works too:
   From source
   ```bash
   1. original-hudi-utilities-bundle_2.12-0.13.0-SNAPSHOT.jar
   2. hudi-spark3.1-bundle_2.12-0.13.0-SNAPSHOT.jar
   3. calcite-core-1.30.0.jar
   ```
   Direct jars:
   ```bash
   1. hudi-spark3.1-bundle_2.12-0.11.1.jar
   2. calcite-core-1.30.0.jar
   3. hudi-utilities_2.12-0.11.1.jar
   ```
   
   Closing this issue based on this resolution given. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] kapjoshi-cisco commented on issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

Posted by GitBox <gi...@apache.org>.

kapjoshi-cisco commented on issue #5636:
URL: https://github.com/apache/hudi/issues/5636#issuecomment-1135518894

   ~> Thank you this works~
   
   Sorry there as an error handling section, which made it look like the job passed, its still failing with the below error
   
   Jars used:
   1. s3://hudi-jars/jars/hudi-spark3-bundle_2.12-0.11.0.jar, # using 0.11.0 version 
   2. s3://hudi-jars/jars/spark-avro_2.12-3.1.2.jar,
   3. s3://hudi-jars/jars/calcite-core-1.16.0.jar
   
   ```bash
   java.lang.NoClassDefFoundError: org/apache/parquet/schema/LogicalTypeAnnotation	
   at org.apache.hudi.common.table.TableSchemaResolver.convertParquetSchemaToAvro(TableSchemaResolver.java:340)	
   at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:148)	at org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:565)	
   at org.apache.hudi.common.table.TableSchemaResolver.<init>(TableSchemaResolver.java:82)	
   at org.apache.hudi.HoodieBaseRelation.x$1$lzycompute(HoodieBaseRelation.scala:126)	
   at org.apache.hudi.HoodieBaseRelation.x$1(HoodieBaseRelation.scala:125)	
   at org.apache.hudi.HoodieBaseRelation.tableAvroSchema$lzycompute(HoodieBaseRelation.scala:125)	
   at org.apache.hudi.HoodieBaseRelation.tableAvroSchema(HoodieBaseRelation.scala:125)	
   at org.apache.hudi.HoodieBaseRelation.<init>(HoodieBaseRelation.scala:147)	at org.apache.hudi.BaseFileOnlyRelation.<init>(BaseFileOnlyRelation.scala:53)	
   at org.apache.hudi.DefaultSource.resolveBaseFileOnlyRelation(DefaultSource.scala:217)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:113)	
   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:66)	
   at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)	
   at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326)	
   at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308)	
   at scala.Option.getOrElse(Option.scala:189)	
   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308)	
   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240)	
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)	
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)	
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Delegating | "error found while loading data from path s3://hudi-bucket/nytaxis/*:" An error occurred while calling o126.load. : java.lang.NoClassDefFoundError: org/apache/parquet/schema/LogicalTypeAnnotation 
   at org.apache.hudi.common.table.TableSchemaResolver.convertParquetSchemaToAvro(TableSchemaResolver.java:340)
    at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:148) at org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:565)
    at org.apache.hudi.common.table.TableSchemaResolver.<init>(TableSchemaResolver.java:82) 
   at org.apache.hudi.HoodieBaseRelation.x$1$lzycompute(HoodieBaseRelation.scala:126)
    at org.apache.hudi.HoodieBaseRelation.x$1(HoodieBaseRelation.scala:125) 
   at org.apache.hudi.HoodieBaseRelation.tableAvroSchema$lzycompute(HoodieBaseRelation.scala:125) 
   at org.apache.hudi.HoodieBaseRelation.tableAvroSchema(HoodieBaseRelation.scala:125) 
   at org.apache.hudi.HoodieBaseRelation.<init>(HoodieBaseRelation.scala:147) 
   at org.apache.hudi.BaseFileOnlyRelation.<init>(BaseFileOnlyRelation.scala:53) 
   at org.apache.hudi.DefaultSource.resolveBaseFileOnlyRelation(DefaultSource.scala:217) 
   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:113) 
   at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:66) 
   at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354) 
   at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326) 
   at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308) 
   at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308) 
   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240) 
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Delegating
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] kapjoshi-cisco commented on issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

Posted by GitBox <gi...@apache.org>.

kapjoshi-cisco commented on issue #5636:
URL: https://github.com/apache/hudi/issues/5636#issuecomment-1205808116

   @xushiyan i was able to run and insert data  to hudi using above stated configuration with 0.11.1 but with condition of updating `structs` and `arrays` types to `string` type. But if i dont change the type to `string` then i get this error, please let me know what am I missing here. I tried following steps from issue #5484 with no luck. 
   *config*:
   ```bash
   commonConfig = {
                   'className': 'org.apache.hudi', 'hoodie.datasource.hive_sync.use_jdbc': 'false',
                   'hoodie.datasource.write.precombine.field': 'ModTime',
                   'hoodie.datasource.write.recordkey.field': 'Moid',
                   # 'hoodie.datasource.hive_sync.mode': 'hms', # tried with and without sync.mode
                   # 'hoodie.meta.sync.client.tool.class': 'org.apache.hudi.aws.sync.AwsGlueCatalogSyncTool',
                   'hoodie.table.name': 'intersight',
                   'hoodie.consistency.check.enabled': 'true',
                   'hoodie.datasource.hive_sync.database': args['database_name'],
                   'hoodie.datasource.write.reconcile.schema': 'true',
                   'hoodie.datasource.hive_sync.table': 'intersight' + prefix.replace("/", "_").lower(),
                   'hoodie.datasource.hive_sync.enable': 'true', 'path': 's3://' + args['curated_bucket'] + '/merged/app' + prefix
               }
   ```
   *Error*:
   ```bash
   2022-08-04 21:45:23,099 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(73)): Error from Python:Traceback (most recent call last):
     File "/tmp/second-delete-upsert.py", line 236, in <module>
       startMerging(df_prefix_map_list)
     File "/tmp/second-delete-upsert.py", line 171, in startMerging
       .mode('append') \
     File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 1107, in save
       self._jwrite.save()
     File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1305, in __call__
       answer, self.gateway_client, self.target_id, self.name)
     File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
       return f(*a, **kw)
     File "/opt/amazon/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
       format(target_id, ".", name), value)
   py4j.protocol.Py4JJavaError: An error occurred while calling o145.save.
   : org.apache.hudi.exception.HoodieException: Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool
   	at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:58)
   	at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:648)
   	at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2$adapted(HoodieSparkSqlWriter.scala:647)
   	at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
   	at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:647)
   	at org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:592)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:178)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:184)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
   	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
   	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   	at py4j.Gateway.invoke(Gateway.java:282)
   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   	at java.lang.Thread.run(Thread.java:750)
   Caused by: org.apache.hudi.exception.HoodieException: Got runtime exception when hive syncing intersight_asset_deviceregistrations
   	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:143)
   	at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:56)
   	... 45 more
   Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Could not convert field Type from STRING to ARRAY<STRUCT<ClassId:string,Moid:string,ObjectType:string,link:string>> for field permissionresources
   	at org.apache.hudi.hive.util.HiveSchemaUtil.getSchemaDifference(HiveSchemaUtil.java:109)
   	at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:284)
   	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:218)
   	at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:152)
   	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:140)
   	... 46 more
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] tjtoll commented on issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

Posted by GitBox <gi...@apache.org>.

tjtoll commented on issue #5636:
URL: https://github.com/apache/hudi/issues/5636#issuecomment-1135737992

   Since you went to .11 you might need this specific jar instead: hudi-spark3.1-bundle_2.12-0.11.0.jar
   
   Glue is spark 3.1 and I’m guessing the jar you have is 3.2
   
   > On May 24, 2022, at 3:38 AM, Kapil Kumar Joshi ***@***.***> wrote:
   > 
   > 
   > > Thank you this works
   > 
   > Sorry there as an error handling section, which made it look like the job passed, its still failing with the below error
   > 
   > Jars used:
   > 
   > s3://hudi-jars/jars/hudi-spark3-bundle_2.12-0.11.0.jar, # using 0.11.0 version
   > s3://hudi-jars/jars/spark-avro_2.12-3.1.2.jar,
   > s3://hudi-jars/jars/calcite-core-1.16.0.jar
   > java.lang.NoClassDefFoundError: org/apache/parquet/schema/LogicalTypeAnnotation	
   > at org.apache.hudi.common.table.TableSchemaResolver.convertParquetSchemaToAvro(TableSchemaResolver.java:340)	
   > at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:148)	at org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:565)	
   > at org.apache.hudi.common.table.TableSchemaResolver.<init>(TableSchemaResolver.java:82)	
   > at org.apache.hudi.HoodieBaseRelation.x$1$lzycompute(HoodieBaseRelation.scala:126)	
   > at org.apache.hudi.HoodieBaseRelation.x$1(HoodieBaseRelation.scala:125)	
   > at org.apache.hudi.HoodieBaseRelation.tableAvroSchema$lzycompute(HoodieBaseRelation.scala:125)	
   > at org.apache.hudi.HoodieBaseRelation.tableAvroSchema(HoodieBaseRelation.scala:125)	
   > at org.apache.hudi.HoodieBaseRelation.<init>(HoodieBaseRelation.scala:147)	at org.apache.hudi.BaseFileOnlyRelation.<init>(BaseFileOnlyRelation.scala:53)	
   > at org.apache.hudi.DefaultSource.resolveBaseFileOnlyRelation(DefaultSource.scala:217)
   > 	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:113)	
   > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:66)	
   > at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)	
   > at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326)	
   > at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308)	
   > at scala.Option.getOrElse(Option.scala:189)	
   > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308)	
   > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240)	
   > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)	
   > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)	
   > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Delegating | "error found while loading data from path s3://hudi-bucket/nytaxis/*:" An error occurred while calling o126.load. : java.lang.NoClassDefFoundError: org/apache/parquet/schema/LogicalTypeAnnotation 
   > at org.apache.hudi.common.table.TableSchemaResolver.convertParquetSchemaToAvro(TableSchemaResolver.java:340)
   >  at org.apache.hudi.common.table.TableSchemaResolver.getTableAvroSchemaFromDataFile(TableSchemaResolver.java:148) at org.apache.hudi.common.table.TableSchemaResolver.hasOperationField(TableSchemaResolver.java:565)
   >  at org.apache.hudi.common.table.TableSchemaResolver.<init>(TableSchemaResolver.java:82) 
   > at org.apache.hudi.HoodieBaseRelation.x$1$lzycompute(HoodieBaseRelation.scala:126)
   >  at org.apache.hudi.HoodieBaseRelation.x$1(HoodieBaseRelation.scala:125) 
   > at org.apache.hudi.HoodieBaseRelation.tableAvroSchema$lzycompute(HoodieBaseRelation.scala:125) 
   > at org.apache.hudi.HoodieBaseRelation.tableAvroSchema(HoodieBaseRelation.scala:125) 
   > at org.apache.hudi.HoodieBaseRelation.<init>(HoodieBaseRelation.scala:147) 
   > at org.apache.hudi.BaseFileOnlyRelation.<init>(BaseFileOnlyRelation.scala:53) 
   > at org.apache.hudi.DefaultSource.resolveBaseFileOnlyRelation(DefaultSource.scala:217) 
   > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:113) 
   > at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:66) 
   > at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354) 
   > at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326) 
   > at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308) 
   > at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308) 
   > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:240) 
   > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
   > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   >  at sun.reflect.DelegatingMethodAccessorImpl.invoke(Delegating
   > 
   > 
   > 
   > 
   > —
   > Reply to this email directly, view it on GitHub, or unsubscribe.
   > You are receiving this because you commented.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] Gatsby-Lee commented on issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

Posted by GitBox <gi...@apache.org>.

Gatsby-Lee commented on issue #5636:
URL: https://github.com/apache/hudi/issues/5636#issuecomment-1140327319

   > Specifically wanted to work on 0.11.0 to get this fix https://issues.apache.org/jira/browse/HUDI-1079, @Gatsby-Lee @xushiyan @umehrot2 please let me know if there is a possibility to run a glue job with 0.11.0 hudi version.
   
   I am also a Glue user, but haven't figured out how to handle the current issue in Glue2,3 + Hudi 0.11
   
   Maybe you can create a AWS ticket.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] kapjoshi-cisco commented on issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

Posted by GitBox <gi...@apache.org>.

kapjoshi-cisco commented on issue #5636:
URL: https://github.com/apache/hudi/issues/5636#issuecomment-1135480856

   Thank you this works


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] tchunchu commented on issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

Posted by GitBox <gi...@apache.org>.

tchunchu commented on issue #5636:
URL: https://github.com/apache/hudi/issues/5636#issuecomment-1137734544

   @tjtoll  possible duplicate of https://github.com/apache/hudi/issues/5484


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] tjtoll commented on issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

Posted by GitBox <gi...@apache.org>.

tjtoll commented on issue #5636:
URL: https://github.com/apache/hudi/issues/5636#issuecomment-1133482798

   For Glue 3.0 we use:
   hudi-spark3-bundle_2.12-0.9.0.jar
   spark-avro_2.12-3.1.2.jar
   calcite-core-1.16.0.jar
   
   Switch out the hudi-spark3-bundle2.12 for .10 or .11 if you want those instead.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] tjtoll commented on issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

Posted by GitBox <gi...@apache.org>.

tjtoll commented on issue #5636:
URL: https://github.com/apache/hudi/issues/5636#issuecomment-1137491859

   I’m not sure what the next thing to try is, we do not convert to dynamic frames before we write 
   
   > On May 25, 2022, at 8:20 AM, Kapil Kumar Joshi ***@***.***> wrote:
   > 
   > 
   > Tried with below combination of jars its failing now with a new error, please let me know if I'm still missing something here
   > 
   > hudi-spark3.1-bundle_2.12-0.11.0.jar
   > spark-avro_2.12-3.1.2.jar
   > calcite-core-1.16.0.jar
   > Error while writing to dynamic frame:
   > 
   > 2022-05-25 12:13:39,886 WARN [Thread-12] metadata.Hive (Hive.java:registerAllFunctionsOnce(237)): Failed to register all functions.
   > java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
   > 	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1709)
   > 	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:87)
   > 	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:137)
   > 	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:108)
   > 	at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientFactory.createMetaStoreClient(SessionHiveMetaStoreClientFactory.java:50)
   > 	at org.apache.hadoop.hive.ql.metadata.HiveUtils.createMetaStoreClient(HiveUtils.java:507)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3746)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3726)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3988)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:251)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:234)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:402)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:335)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:315)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:291)
   > 	at org.apache.hudi.hive.ddl.HMSDDLExecutor.<init>(HMSDDLExecutor.java:69)
   > 	at org.apache.hudi.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:73)
   > 	at org.apache.hudi.hive.HiveSyncTool.initClient(HiveSyncTool.java:95)
   > 	at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:89)
   > 	at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:80)
   > 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   > 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   > 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   > 	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   > 	at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
   > 	at org.apache.hudi.sync.common.util.SyncUtilHelpers.instantiateMetaSyncTool(SyncUtilHelpers.java:78)
   > 	at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:59)
   > 	at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:622)
   > 	at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2$adapted(HoodieSparkSqlWriter.scala:621)
   > 	at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
   > 	at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:621)
   > 	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:680)
   > 	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:313)
   > 	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:163)
   > 	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
   > 	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   > 	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   > 	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
   > 	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
   > 	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
   > 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   > 	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
   > 	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
   > 	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
   > 	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
   > 	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
   > 	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   > 	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   > 	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
   > 	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
   > 	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   > 	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   > 	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
   > 	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
   > 	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
   > 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
   > 	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
   > 	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
   > 	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
   > 	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
   > 	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
   > 	at com.amazonaws.services.glue.marketplace.connector.SparkCustomDataSink.writeDynamicFrame(CustomDataSink.scala:45)
   > 	at com.amazonaws.services.glue.DataSink.pyWriteDynamicFrame(DataSink.scala:64)
   > 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   > 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   > 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   > 	at java.lang.reflect.Method.invoke(Method.java:498)
   > 	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   > 	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   > 	at py4j.Gateway.invoke(Gateway.java:282)
   > 	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   > 	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   > 	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   > 	at java.lang.Thread.run(Thread.java:750)
   > Caused by: java.lang.reflect.InvocationTargetException
   > 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   > 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   > 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   > 	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   > 	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1707)
   > 	... 73 more
   > Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
   > 	at org.apache.thrift.transport.TSocket.open(TSocket.java:226)
   > 	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:480)
   > 	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:247)
   > 	at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70)
   > 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   > 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   > 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   > 	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   > 	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1707)
   > 	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:87)
   > 	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:137)
   > 	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:108)
   > 	at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientFactory.createMetaStoreClient(SessionHiveMetaStoreClientFactory.java:50)
   > 	at org.apache.hadoop.hive.ql.metadata.HiveUtils.createMetaStoreClient(HiveUtils.java:507)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3746)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3726)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3988)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:251)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:234)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:402)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:335)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:315)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:291)
   > 	at org.apache.hudi.hive.ddl.HMSDDLExecutor.<init>(HMSDDLExecutor.java:69)
   > 	at org.apache.hudi.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:73)
   > 	at org.apache.hudi.hive.HiveSyncTool.initClient(HiveSyncTool.java:95)
   > 	at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:89)
   > 	at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:80)
   > 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   > 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   > 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   > 	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   > 	at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
   > 	at org.apache.hudi.sync.common.util.SyncUtilHelpers.instantiateMetaSyncTool(SyncUtilHelpers.java:78)
   > 	at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:59)
   > 	at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:622)
   > 	at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2$adapted(HoodieSparkSqlWriter.scala:621)
   > 	at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
   > 	at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:621)
   > 	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:680)
   > 	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:313)
   > 	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:163)
   > 	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
   > 	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   > 	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   > 	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
   > 	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
   > 	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
   > 	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   > 	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
   > 	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
   > 	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
   > 	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
   > 	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
   > 	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   > 	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   > 	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
   > 	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
   > 	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   > 	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   > 	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
   > 	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
   > 	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
   > 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
   > 	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
   > 	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
   > 	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
   > 	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
   > 	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
   > 	at com.amazonaws.services.glue.marketplace.connector.SparkCustomDataSink.writeDynamicFrame(CustomDataSink.scala:45)
   > 	at com.amazonaws.services.glue.DataSink.pyWriteDynamicFrame(DataSink.scala:64)
   > 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   > 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   > 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   > 	at java.lang.reflect.Method.invoke(Method.java:498)
   > 	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   > 	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   > 	at py4j.Gateway.invoke(Gateway.java:282)
   > 	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   > 	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   > 	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   > 	at java.lang.Thread.run(Thread.java:750)
   > Caused by: java.net.ConnectException: Connection refused (Connection refused)
   > 	at java.net.PlainSocketImpl.socketConnect(Native Method)
   > 	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
   > 	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
   > 	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
   > 	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
   > 	at java.net.Socket.connect(Socket.java:607)
   > 	at org.apache.thrift.transport.TSocket.open(TSocket.java:221)
   > 	... 81 more
   > )
   > 	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:529)
   > 	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:247)
   > 	at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70)
   > 	... 78 more
   > —
   > Reply to this email directly, view it on GitHub, or unsubscribe.
   > You are receiving this because you commented.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] kapjoshi-cisco commented on issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

Posted by GitBox <gi...@apache.org>.

kapjoshi-cisco commented on issue #5636:
URL: https://github.com/apache/hudi/issues/5636#issuecomment-1140324033

   > @tjtoll + anyone who use AWS Glue
   > 
   > If you have to use Glue3, then go with Hudi 0.9 that you can get from AWS Marketplace. you don't need to download anything manually since you can get JARs by setting Glue Connection.
   > 
   > If you have to use higher than Hudi 0.9, then go with Glue2 + Hudi 0.10.1. Here are JARs you need. When you use custom JARs to use Hudi, you have to remove Glue Connection for Hudi.
   > 
   > * https://repo1.maven.org/maven2/org/apache/spark/spark-avro_2.11/2.4.4/spark-avro_2.11-2.4.4.jar
   > * https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark-bundle_2.11/0.10.1/
   > 
   > **WHY Hudi 0.10.1?** Hudi 0.9 has some bugs I noticed.
   > 
   > * Metadata Table doesn't work properly. It leaks data
   > * DELETE_OPERATION fails when syncing metadata to Glue Catalog
   
   yeah, this combination works and just to confirm is there no possibility to use 0.11.0 with aws glue as of today? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] kapjoshi-cisco commented on issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

Posted by GitBox <gi...@apache.org>.

kapjoshi-cisco commented on issue #5636:
URL: https://github.com/apache/hudi/issues/5636#issuecomment-1140326922

   Specifically wanted to work on 0.11.0 to get this fix https://issues.apache.org/jira/browse/HUDI-1079, @Gatsby-Lee @xushiyan @umehrot2  please let me know if there is a possibility to run a glue job with 0.11.0 hudi version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] kapjoshi-cisco commented on issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

Posted by GitBox <gi...@apache.org>.

kapjoshi-cisco commented on issue #5636:
URL: https://github.com/apache/hudi/issues/5636#issuecomment-1137170757

   Tried with below combination of jars its failing now with a new error, please let me know if I'm still missing something here
   
   1. hudi-spark3.1-bundle_2.12-0.11.0.jar
   2. spark-avro_2.12-3.1.2.jar
   3. calcite-core-1.16.0.jar
   
   Error while writing to dynamic frame:
   ```bash
   2022-05-25 12:13:39,886 WARN [Thread-12] metadata.Hive (Hive.java:registerAllFunctionsOnce(237)): Failed to register all functions.
   java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
   	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1709)
   	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:87)
   	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:137)
   	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:108)
   	at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientFactory.createMetaStoreClient(SessionHiveMetaStoreClientFactory.java:50)
   	at org.apache.hadoop.hive.ql.metadata.HiveUtils.createMetaStoreClient(HiveUtils.java:507)
   	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3746)
   	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3726)
   	at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3988)
   	at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:251)
   	at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:234)
   	at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:402)
   	at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:335)
   	at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:315)
   	at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:291)
   	at org.apache.hudi.hive.ddl.HMSDDLExecutor.<init>(HMSDDLExecutor.java:69)
   	at org.apache.hudi.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:73)
   	at org.apache.hudi.hive.HiveSyncTool.initClient(HiveSyncTool.java:95)
   	at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:89)
   	at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:80)
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   	at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
   	at org.apache.hudi.sync.common.util.SyncUtilHelpers.instantiateMetaSyncTool(SyncUtilHelpers.java:78)
   	at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:59)
   	at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:622)
   	at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2$adapted(HoodieSparkSqlWriter.scala:621)
   	at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
   	at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:621)
   	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:680)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:313)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:163)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
   	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
   	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
   	at com.amazonaws.services.glue.marketplace.connector.SparkCustomDataSink.writeDynamicFrame(CustomDataSink.scala:45)
   	at com.amazonaws.services.glue.DataSink.pyWriteDynamicFrame(DataSink.scala:64)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   	at py4j.Gateway.invoke(Gateway.java:282)
   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   	at java.lang.Thread.run(Thread.java:750)
   Caused by: java.lang.reflect.InvocationTargetException
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1707)
   	... 73 more
   Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
   	at org.apache.thrift.transport.TSocket.open(TSocket.java:226)
   	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:480)
   	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:247)
   	at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70)
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   	at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1707)
   	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:87)
   	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:137)
   	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:108)
   	at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientFactory.createMetaStoreClient(SessionHiveMetaStoreClientFactory.java:50)
   	at org.apache.hadoop.hive.ql.metadata.HiveUtils.createMetaStoreClient(HiveUtils.java:507)
   	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3746)
   	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3726)
   	at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3988)
   	at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:251)
   	at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:234)
   	at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:402)
   	at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:335)
   	at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:315)
   	at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:291)
   	at org.apache.hudi.hive.ddl.HMSDDLExecutor.<init>(HMSDDLExecutor.java:69)
   	at org.apache.hudi.hive.HoodieHiveClient.<init>(HoodieHiveClient.java:73)
   	at org.apache.hudi.hive.HiveSyncTool.initClient(HiveSyncTool.java:95)
   	at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:89)
   	at org.apache.hudi.hive.HiveSyncTool.<init>(HiveSyncTool.java:80)
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   	at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
   	at org.apache.hudi.sync.common.util.SyncUtilHelpers.instantiateMetaSyncTool(SyncUtilHelpers.java:78)
   	at org.apache.hudi.sync.common.util.SyncUtilHelpers.runHoodieMetaSync(SyncUtilHelpers.java:59)
   	at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2(HoodieSparkSqlWriter.scala:622)
   	at org.apache.hudi.HoodieSparkSqlWriter$.$anonfun$metaSync$2$adapted(HoodieSparkSqlWriter.scala:621)
   	at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
   	at org.apache.hudi.HoodieSparkSqlWriter$.metaSync(HoodieSparkSqlWriter.scala:621)
   	at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:680)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:313)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:163)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
   	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
   	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
   	at com.amazonaws.services.glue.marketplace.connector.SparkCustomDataSink.writeDynamicFrame(CustomDataSink.scala:45)
   	at com.amazonaws.services.glue.DataSink.pyWriteDynamicFrame(DataSink.scala:64)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   	at py4j.Gateway.invoke(Gateway.java:282)
   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   	at java.lang.Thread.run(Thread.java:750)
   Caused by: java.net.ConnectException: Connection refused (Connection refused)
   	at java.net.PlainSocketImpl.socketConnect(Native Method)
   	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
   	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
   	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
   	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
   	at java.net.Socket.connect(Socket.java:607)
   	at org.apache.thrift.transport.TSocket.open(TSocket.java:221)
   	... 81 more
   )
   	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:529)
   	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:247)
   	at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70)
   	... 78 more
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] Gatsby-Lee commented on issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

Posted by GitBox <gi...@apache.org>.

Gatsby-Lee commented on issue #5636:
URL: https://github.com/apache/hudi/issues/5636#issuecomment-1137922706

   @tjtoll + anyone who use AWS Glue
   
   If you have to use Glue3, then go with Hudi 0.9 that you can get from AWS Marketplace.
   you don't need to download anything manually since you can get JARs by setting Glue Connection.
   
   If you have to use higher than Hudi 0.9, then go with Glue2 + Hudi 0.10.1.
   Here are JARs you need. When you use custom JARs to use Hudi, you have to remove Glue Connection for Hudi.
   
   * https://repo1.maven.org/maven2/org/apache/spark/spark-avro_2.11/2.4.4/spark-avro_2.11-2.4.4.jar
   * https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark-bundle_2.11/0.10.1/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] Gatsby-Lee commented on issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

Posted by GitBox <gi...@apache.org>.

Gatsby-Lee commented on issue #5636:
URL: https://github.com/apache/hudi/issues/5636#issuecomment-1140324275

   > > @tjtoll + anyone who use AWS Glue
   > > If you have to use Glue3, then go with Hudi 0.9 that you can get from AWS Marketplace. you don't need to download anything manually since you can get JARs by setting Glue Connection.
   > > If you have to use higher than Hudi 0.9, then go with Glue2 + Hudi 0.10.1. Here are JARs you need. When you use custom JARs to use Hudi, you have to remove Glue Connection for Hudi.
   > > 
   > > * https://repo1.maven.org/maven2/org/apache/spark/spark-avro_2.11/2.4.4/spark-avro_2.11-2.4.4.jar
   > > * https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark-bundle_2.11/0.10.1/
   > > 
   > > **WHY Hudi 0.10.1?** Hudi 0.9 has some bugs I noticed.
   > > 
   > > * Metadata Table doesn't work properly. It leaks data
   > > * DELETE_OPERATION fails when syncing metadata to Glue Catalog
   > 
   > yeah, this combination works and just to confirm is there no possibility to use 0.11.0 with aws glue as of today?
   
   as far as I know, no.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] codope commented on issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

Posted by GitBox <gi...@apache.org>.

codope commented on issue #5636:
URL: https://github.com/apache/hudi/issues/5636#issuecomment-1200967575

   @xushiyan I believe you have worked with Glue sync recently (Hudi v0.11.x). Can you please clarify the versions that you used?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] gtwuser closed issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

Posted by GitBox <gi...@apache.org>.

gtwuser closed issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0 
URL: https://github.com/apache/hudi/issues/5636


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] xushiyan commented on issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

Posted by GitBox <gi...@apache.org>.

xushiyan commented on issue #5636:
URL: https://github.com/apache/hudi/issues/5636#issuecomment-1207336880

   utilities bundle contains spark 3.1 implicitly. We recommend to change to utilities-slim bundle, when you also put spark bundle there. Checkout the release notes https://hudi.apache.org/releases/release-0.11.0/#slim-utilities-bundle


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] xushiyan commented on issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

Posted by GitBox <gi...@apache.org>.

xushiyan commented on issue #5636:
URL: https://github.com/apache/hudi/issues/5636#issuecomment-1204622574

   pls use the matching spark bundle, for e.g., if the glue runs with spark 3.1, then use `hudi-spark3.1-bundle_2.12-0.11.1.jar`. If Glue is with spark 2.4, then use hudi-spark2.4-bundle_2.12-0.11.1.jar. So it's just about matching spark version. If hudi version >=0.11, there is no need to add spark-avro.jar, please pay attention to release notes https://hudi.apache.org/releases/release-0.11.0#bundle-usage-updates
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] Gatsby-Lee commented on issue #5636: [SUPPORT]Need info on the versions of hudi dependent jars which can be used with Glue 3.0

Posted by GitBox <gi...@apache.org>.

Gatsby-Lee commented on issue #5636:
URL: https://github.com/apache/hudi/issues/5636#issuecomment-1201496247

   @codope hello, you can try these. ( Glue3 )
   * https://mvnrepository.com/artifact/org.apache.hudi/hudi-utilities-bundle_2.12/0.11.1
   * https://repo1.maven.org/maven2/org/apache/calcite/calcite-core/1.30.0/calcite-core-1.30.0.jar
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org