You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/05 23:18:24 UTC

[GitHub] [hudi] rufferjr opened a new issue #1923: [SUPPORT] Hive Sync fails to add decimal partition

rufferjr opened a new issue #1923:
URL: https://github.com/apache/hudi/issues/1923


   **Describe the problem you faced**
   
   (Possible related issue: https://github.com/apache/hudi/issues/1790) When creating a new table with a decimal type partition column, Hudi fails on sync with the Hive metastore. The container log stacktrace from the issue can be found in [this gist](https://gist.github.com/rufferjr/e6d2955eb3eb1edb321e10c9a91e021c). This was tested with a hudi-spark-bundle built from commit 539621bd.
   
   **To Reproduce**
   
   I was able to produce this error by running the following code:
   
   ```
   val hudiOptions = 
       Map[String, String](
         "hoodie.table.name" -> "test_table",
         "hoodie.datasource.write.table.name" -> "test_table",
         "hoodie.consistency.check.enabled" -> "true",
         "hoodie.compact.inline.max.delta.commits" -> "12",
         "hoodie.compact.inline" -> "true",
         "hoodie.clean.automatic" -> "true",
         "hoodie.cleaner.commits.retained" -> "1",
         "hoodie.datasource.write.table.type" -> "MERGE_ON_READ",
         "hoodie.datasource.write.recordkey.field" -> "pk",
         "hoodie.datasource.write.keygenerator.class" -> "org.apache.hudi.keygen.ComplexKeyGenerator",
         "hoodie.datasource.write.partitionpath.field" -> "_partition_col",
         "hoodie.datasource.write.precombine.field" -> "sort_k",
         "hoodie.bulkinsert.shuffle.parallelism" -> "1800",
         "hoodie.parquet.max.file.size" -> String.valueOf(500 * 1024 * 1024), //500mb
         "hoodie.datasource.hive_sync.enable" -> "true",
         "hoodie.datasource.hive_sync.database" -> "test_vault",
         "hoodie.datasource.hive_sync.table" -> "test_table",
         "hoodie.datasource.hive_sync.partition_fields" -> "_partition_col",
         "hoodie.datasource.hive_sync.partition_extractor_class" -> "org.apache.hudi.hive.MultiPartKeysValueExtractor",
         "hoodie.datasource.hive_sync.jdbcurl" -> s"jdbc:hive2://${hiveServer2URI}:10000"
       )
   
   spark.sql("CREATE DATABASE IF NOT EXISTS test_vault")
   spark.sql("USE test_vault")
   spark.sql("DROP TABLE IF EXISTS test_table_ro")
   spark.sql("DROP TABLE IF EXISTS test_table_rt")
   
   val data = Seq(Row("A", Decimal(70646037), 2))
   val schema = List(StructField("pk", StringType, true), StructField("partition_col", DecimalType(3, 0), true), StructField("sort_k", IntegerType, true))
   var df = spark.createDataFrame(sc.parallelize(data), StructType(schema))
   
   df.withColumn("_partition_col", expr(s"MOD(FLOOR(partition_col / 20000), 100)")).write.format("org.apache.hudi").option("hoodie.datasource.write.operation", "upsert").options(hudiOptions).mode(SaveMode.Overwrite).save("%s/%s/%s".format("s3://hudi-test-bucket/data", "test_vault", "test_table"))
   ```
   
   **Expected behavior**
   
   I would expect Hudi to be able to handle DecimalTypes in partition columns, rather than the failure we see above.
   
   **Environment Description**
   
   * Hudi version : 0.6.0-SNAPShOT
   
   * Spark version : 2.11
   
   * Hive version : 2.3.6
   
   * Hadoop version : 2.8.5
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   **Stacktrace**
   
   Stacktrace [gist](https://gist.github.com/rufferjr/e6d2955eb3eb1edb321e10c9a91e021c).
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1923: [SUPPORT] Hive Sync fails to add decimal partition

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1923:
URL: https://github.com/apache/hudi/issues/1923#issuecomment-669713417


   @rufferjr : Can you give some examples of valid hive partitions with decimal types. I would like to see how the partition path is encoded for decimal types. In Hudi , it is simple to plugin new PartitionExtractor. You simply need to implement the  interface org.apache.hudi.hive.PartitionValueExtractor and add the jar to to the classpath. It would be great if you can give some examples. We can then see how to implement and add direct support. Filed : https://issues.apache.org/jira/browse/HUDI-1154


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar closed issue #1923: [SUPPORT] Hive Sync fails to add decimal partition

Posted by GitBox <gi...@apache.org>.
bvaradar closed issue #1923:
URL: https://github.com/apache/hudi/issues/1923


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1923: [SUPPORT] Hive Sync fails to add decimal partition

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1923:
URL: https://github.com/apache/hudi/issues/1923#issuecomment-675465931


   Thanks @rufferjr  We will target this for the next release. Closing this ticket as we have jira to track it. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] rufferjr commented on issue #1923: [SUPPORT] Hive Sync fails to add decimal partition

Posted by GitBox <gi...@apache.org>.
rufferjr commented on issue #1923:
URL: https://github.com/apache/hudi/issues/1923#issuecomment-670636479


   @bvaradar would you like the S3 partition path? If so, the following examples may be of use:
   
   s3://data-beta/vault/cod_combinations/partition_val=1003
   s3://data-beta/vault/cod_combinations/partition_val=1008
   ... etc.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org