You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/10/30 00:52:05 UTC

[GitHub] [hudi] pranotishanbhag edited a comment on issue #3841: Schema evolution improvement in 0.9.0 brakes existing applications

pranotishanbhag edited a comment on issue #3841:
URL: https://github.com/apache/hudi/issues/3841#issuecomment-955083608


   Hi,
   
   I am facing the same issue with 0.9. My schema is as below
   ```
   root
    |-- _hoodie_commit_time: string (nullable = true)
    |-- _hoodie_commit_seqno: string (nullable = true)
    |-- _hoodie_record_key: string (nullable = true)
    |-- _hoodie_partition_path: string (nullable = true)
    |-- _hoodie_file_name: string (nullable = true)
    |-- is_deleted: boolean (nullable = true)
    |-- dedupe_key: long (nullable = true)
    |-- ums_last_updated_date: long (nullable = true)
    |-- source_created_date: long (nullable = true)
    |-- item_pairs: array (nullable = true)
    |  |-- element: struct (containsNull = true)
    |  |  |-- invalid_reasons: array (nullable = true)
    |  |  |  |-- element: string (containsNull = true)
    |  |  |-- additional_attributes: string (nullable = true)
    |  |  |-- mapping_state: string (nullable = true)
    |  |  |-- to_item_version: long (nullable = true)
    |  |  |-- to_item_attributes: string (nullable = true)
    |  |  |-- to_region_id: string (nullable = true)
    |  |  |-- to_marketplace_id: string (nullable = true)
    |  |  |-- to_item_id: string (nullable = true)
    |  |  |-- to_website_id: string (nullable = true)
    |  |  |-- to_catalog_id: string (nullable = true)
    |  |  |-- from_item_version: long (nullable = true)
    |  |  |-- from_item_attributes: string (nullable = true)
    |  |  |-- from_region_id: string (nullable = true)
    |  |  |-- from_marketplace_id: string (nullable = true)
    |  |  |-- from_item_id: string (nullable = true)
    |  |  |-- from_website_id: string (nullable = true)
    |  |  |-- from_catalog_id: string (nullable = true)
    |-- mapping_source: string (nullable = true)
    |-- state: string (nullable = true)
    |-- mapping_type: string (nullable = true)
    |-- id: string (nullable = true)
    |-- mapping_class: string (nullable = true)
   ```
   
   I do have a list of item_pairs but I do not see any column repeating in my schema.
   
   I have also set these options:
   ```
       val sparkConf = new SparkConf()
         .setAppName(appName)
         .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
         .set("spark.sql.hive.convertMetastoreParquet", "false")
         .set("spark.hadoop.parquet.avro.add-list-element-records", "false") // null array handling
         .set("spark.hadoop.parquet.avro.write-old-list-structure", "false") //schema evolution
         .set("parquet.avro.add-list-element-records", "false") // null array handling
         .set("parquet.avro.write-old-list-structure", "false") //schema evolution
   ```
   
   Also set hoodie.avro.schema.validate = true But i dont see any schema issue reported with this option.
   
   I am using COW mode with Hudi 0.9 and spark 2.4.
   
   Please can you help with this as my launch is blocked because of this issue.
   
   Thanks,
   Pranoti


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org