You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/14 19:37:39 UTC

[GitHub] [hudi] michetti commented on issue #1789: [SUPPORT] What jars are needed to run on AWS Glue 1.0 ?

michetti commented on issue #1789:
URL: https://github.com/apache/hudi/issues/1789#issuecomment-658373503


   Hey @GrigorievNick, I saw the issue was closed but if I understood correctly, the link you posted is about AWS Athena and how it can work with Hudi tables registered in the AWS Glue catalog, while the issue is about getting Hudi to work on AWS Glue Jobs (AWS serverless Spark service). Not sure I missed something?
   
   I'm was having the same error as @WilliamWhispell, and from what I could find, it seems to be related to a version 
   mismatch between the org.eclipse.jetty jars required by Hudi and the AWS Glue Jobs runtime.
   
   For example, Timeline service depends on Javalin 2.8.0, which in turn requires Jetty version 9.4.15.v20190215:
   - https://github.com/apache/hudi/blob/release-0.5.3/hudi-timeline-service/pom.xml#L111
   - https://github.com/tipsy/javalin/blob/javalin-2.8.0/pom.xml#L43
   
   While Spark 2.4.3 (this is the version Glue Jobs 1.0 runtime uses) depends on Jetty version 9.3.24.v20180605:
   - https://github.com/apache/spark/blob/v2.4.3/pom.xml#L137
   
   I got it working by shadowing _org.eclipse.jetty._  in the spark-bundle, by adding the following [here](https://github.com/apache/hudi/blob/release-0.5.3/packaging/hudi-spark-bundle/pom.xml#L99):
   ```xml
   <relocation>
     <pattern>org.eclipse.jetty.</pattern>
     <shadedPattern>org.apache.hudi.org.eclipse.jetty.</shadedPattern>
   </relocation>
   ```
   
   @WilliamWhispell, I'm not sure there is a better way, but with Hudi 0.5.3 on AWS Glue Jobs 1.0, I needed the following jars:
   - httpclient-4.5.12.jar (due to [this](https://forums.aws.amazon.com/thread.jspa?messageID=930176) other error)
   - spark-avro_2.11-2.4.3.jar
   - hudi-spark-bundle_2.11-0.5.3.jar (your own, with the changes above)
   
   And remember that you also need to configure spark the way it is described in Hudi documentation:
   ```scala
   val sparkConf: SparkConf = new SparkConf();
   sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
   sparkConf.set("spark.sql.hive.convertMetastoreParquet", "false");
     
   val sparkContext: SparkContext = new SparkContext(sparkConf)
   val glueContext: GlueContext = new GlueContext(sparkContext)
   val spark: SparkSession = glueContext.getSparkSession
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org