You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/04/22 09:29:11 UTC

[GitHub] [incubator-hudi] badion opened a new issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

badion opened a new issue #1550:
URL: https://github.com/apache/incubator-hudi/issues/1550


   Currenlty we are working with Hudi 0.5.0 and AWS Glue, everything working fine for .parquet and COW mode, with complex types in data and different nullable options. 
   
   After switching to Hudi 0.5.2 , start facing the issues related to:
   
   https://github.com/apache/incubator-hudi/pull/1406
   
   Spark application fails while writing Dataframe into Hudi table when using complex types like:
   
   ```
   {
      "city":[
         {
            "name":"some_name",
            "index":"some_index"
         }
      ]
   }
   
   ```
   And having **nullable fields = true** for it. Till the moment of saving, everything is fine, and we are able to see complete dataframe:
   ```
   +----------------------------+
   |city                        |
   |[[some_name, some_index]].  |
   +----------------------------+
   ```
   ```
   root
    |-- city: array (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- name: string (nullable = true)
    |    |    |-- index: string (nullable = true)
   
   ```
   **Note that All simple types working fine with saving data into Hudi table, as well as complex types using nullable = false**
   
   
   Steps to reproduce the behavior:
   ```
   from pathlib import Path
   
   spark = SparkSession.builder \
       .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
       .config("spark.jars.packages",
               "org.apache.hudi:hudi-spark-bundle_2.11:0.5.2-incubating,org.apache.spark:spark-avro_2.11:2.4.4") \
       .appName('nested_type_hudi') \
       .enableHiveSupport() \
       .getOrCreate()
   
   
   
   PROJECT_PATH = str(Path(__file__).parent)
   
   input_data = """{"city":[{"name":"some_name","index":"some_index"}]}"""
   
   schema = StructType([
           StructField('city', ArrayType(StructType([StructField('name', StringType(), True),
                                                     StructField('index', StringType(), True)]), True), True)
       ])
   
   options = {
           'hoodie.table.name': "nested_hierarchy_example",
           'hoodie.datasource.write.precombine.field': "object_ts",
           'hoodie.datasource.write.recordkey.field': "recordkey"
       }
   
   nested_hierarchy_df = spark.read.schema(schema).json(spark.sparkContext.parallelize([input_data])) \
           .withColumn('object_ts', lit(123)) \
           .withColumn('recordkey', lit('abc')) 
   
   write_table(nested_hierarchy_df, options, 'append', f'file://{PROJECT_PATH}/test_data/nested_output')
   
   
   def write_table(df, options, mode, output_dir):
       df.write.format("org.apache.hudi").options(**options).mode(mode).save(output_dir)
   
   ```
   
   
   **Expected behavior**
   Hudi table should be successfully saved in parquet format with complex type fields, which contains **nullable = true**. Hudi 0.5.0 working fine with all variety of complex types and nullable fields.
   
   
   Local/AWS Glue 1.0:
   
   * Language: Python 3.7.5
   * Hudi version : 0.5.2
   * Spark version : 2.4.4(tried locally)/2.4.3(tried on AWS Glue)
   * Hive version : Not applicable
   * Hadoop version : 2.8.5
   * Storage (HDFS/S3/GCS..) : S3
   * Running on Docker? (yes/no) : no
   
   
   
   **Stacktrace**
   
   ```
   java.io.IOException: Could not create payload for class: org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
   	at org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:125)
   	at org.apache.hudi.DataSourceUtils.createHoodieRecord(DataSourceUtils.java:178)
   	at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:102)
   	at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$1.apply(HoodieSparkSqlWriter.scala:99)
   ```
   ...
   ```
   Caused by: org.apache.avro.UnresolvedUnionException: Not in union [{"type":"record","name":"city","namespace":"hoodie.nested_hierarchy_example.nested_hierarchy_example_record","fields":[{"name":"name","type":["string","null"]},{"name":"index","type":["string","null"]}]},"null"]: {"name": "some_name", "index": "some_index"}
   	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:740)
   	at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:205)
   	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:123)
   ```
   ...
   ```
   Caused by: org.apache.hudi.exception.HoodieException: Unable to instantiate class 
   	at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:80)
   	at org.apache.hudi.DataSourceUtils.createPayload(DataSourceUtils.java:122)
   	... 28 more
   Caused by: java.lang.reflect.InvocationTargetException
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   
   ```
   Is this is already a known issue for Hudi greater 0.5.0?
   if there is a workaround that would allow us to upgrade to 0.5.2?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] badion edited a comment on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

Posted by GitBox <gi...@apache.org>.
badion edited a comment on issue #1550:
URL: https://github.com/apache/incubator-hudi/issues/1550#issuecomment-618411983


   @vinothchandar Seems like issue gone after building .jar file from commit(merge) - _ce0a4c64d07d6eea926d1bfb92b69ae387b88f50_, which was apparently after release of _Hudi release 0.5.2_. Also one thing that we tried to use hudi jar from mvn central, it seems like it doesn't have fix with avro yet. 
   
   I think will will wait next **release**, which will include those changes.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] bvaradar commented on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1550:
URL: https://github.com/apache/incubator-hudi/issues/1550#issuecomment-620254532


   Closing this issue as it will be resolved in next release.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] vinothchandar commented on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1550:
URL: https://github.com/apache/incubator-hudi/issues/1550#issuecomment-625044161


   You can follow this here btw https://lists.apache.org/thread.html/r1fb5ad5547f55f40b20306dac90a711c9c0e29f6855f63b6b2118987%40%3Cdev.hudi.apache.org%3E 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] badion commented on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

Posted by GitBox <gi...@apache.org>.
badion commented on issue #1550:
URL: https://github.com/apache/incubator-hudi/issues/1550#issuecomment-617669268


   As a note, Hudi 0.5.2, was packaged from master 1 day ago
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] vinothchandar commented on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1550:
URL: https://github.com/apache/incubator-hudi/issues/1550#issuecomment-618028676


   @badion This does seem directly related to the complex types issue fixed recently.. 0.5.1-2 we moved out of databricks-avro and to spark-avro and this seems like a miss. 
   
   Are you interested in a custom patch for this on top of 0.5.2? Not sure I follow the last sentence.. Please clarify, happy to get this moving along for you.. 
   
   cc @umehrot2 @zhedoubushishi as well to chime in 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] badion commented on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

Posted by GitBox <gi...@apache.org>.
badion commented on issue #1550:
URL: https://github.com/apache/incubator-hudi/issues/1550#issuecomment-618411983


   @vinothchandar Seems like issue gone after building .jar file from commit(merge) - _ce0a4c64d07d6eea926d1bfb92b69ae387b88f50_, which was apparently after release of _Hudi release 0.5.2_. Also one thing that we tried to use hudi jar from mvn central, it seems like it doesn't have fix with avro yet. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] vinothchandar commented on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1550:
URL: https://github.com/apache/incubator-hudi/issues/1550#issuecomment-624099540


   @rolandjohann Thanks for the feedback.. We are trying to bundle few more such fixes and release 0.6.0 later this month... backporting some fixes alone on 0.5.2 and doing a 0.5.3 may make sense though.. Let me bring this up with the community and see what everyone feels.. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] rolandjohann commented on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

Posted by GitBox <gi...@apache.org>.
rolandjohann commented on issue #1550:
URL: https://github.com/apache/incubator-hudi/issues/1550#issuecomment-624024759


   First thanks for the great lib, that reduces complexity of our ETL pipelines massively!
   
   Is the next release date in the near future? I'm asking because the latest release contains this existential bug that causes the library to simply not work. Currently I'm evaluating this as alternative to delta lake and reached the point of this issue pretty fast. Is it possible to release a hotfix that at new users are able to start working with this lib by following the getting started section and start to implement more complex data models?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-hudi] umehrot2 commented on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

Posted by GitBox <gi...@apache.org>.
umehrot2 commented on issue #1550:
URL: https://github.com/apache/incubator-hudi/issues/1550#issuecomment-618769492


   @badion yeah the fix for this did not make it to 0.5.2. You can either build your custom Hudi with this patch applied on top of 0.5.2 or wait until next release.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org