You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/25 16:03:43 UTC

[GitHub] [hudi] tandonraghav opened a new issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

tandonraghav opened a new issue #2919:
URL: https://github.com/apache/hudi/issues/2919


   I am facing issue in Schema Evolution. While adding a new field to the Spark DF, it is giving exception if there are previous Log files/Records which do not have that field.
   
   I can see *type* is reversed in *test* and there is no default value(In Hoodie Log files). Is it because of the SchemaConvertors? 
   
   **Environment Description**
   
   * Hudi version : 0.8.0
   
   * Spark version : 2.4
   
   * Hive version : 
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :  HDFS
   
   * Running on Docker? (yes/no) : no
   
   Code 
   
   1. Create the DF
   
   ````
   Dataset<GenericRecord> genericRecordDfFromKafka = db.map((MapFunction<Row, GenericRecord>) row -> {
                                       String docId=((GenericRowWithSchema) row.getAs("json_value")).getAs("doc_id");
                                       String dbName = ((GenericRowWithSchema) row.getAs("json_value")).getAs("collection");
                                       String before = ((GenericRowWithSchema) row.getAs("json_value")).getAs("o");
                                       String after = ((GenericRowWithSchema) row.getAs("json_value")).getAs("o2");
                                       String op = ((GenericRowWithSchema) row.getAs("json_value")).getAs("op");
                                       Long ts = (((GenericRowWithSchema) row.getAs("json_value")).getStruct(5).getStruct(0).getAs("t"));
                                       return convertOplogToAvro(dbName, before, after, ts,docId,op,schemaStr);
                                   }, Encoders.bean(GenericRecord.class));
                                   
    Dataset<Row> ds = AvroConversionUtils.createDataFrame(genericRecordDfFromKafka.rdd(),
                                           schemaStr, sparkSession);                               
   
    Dataset<Row> insertedDs=ds.select("*").where(ds.col("op").notEqual("d"));
    persistDFInHudi(insertedDs, db_name, tablePath,hiveUrl);
    
    private void persistDFInHudi(Dataset<Row> ds, String dbName, String tablePath, String hiveUrl) {
           ds.write()
                   .format("org.apache.hudi")
                   .options(QuickstartUtils.getQuickstartWriteConfigs())
                   .option(DataSourceWriteOptions.OPERATION_OPT_KEY(), DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL())
                   .option(DataSourceWriteOptions.TABLE_TYPE_OPT_KEY(), DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL())
                   .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), "ts_ms")
                   .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), "id")
                   .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), "db_name")
                   .option(HoodieWriteConfig.TABLE_NAME, dbName)
                   .option(HoodieCompactionConfig.INLINE_COMPACT_PROP, true)
                   .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY(), "true")
                   .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY(), dbName)
                   .option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY(), "db_name")
                   .option(DataSourceWriteOptions.DEFAULT_HIVE_ASSUME_DATE_PARTITION_OPT_VAL(), "false")
                   .option(HoodieCompactionConfig.INLINE_COMPACT_NUM_DELTA_COMMITS_PROP, 100)
                   .option(HoodieCompactionConfig.INLINE_COMPACT_TIME_DELTA_SECONDS_PROP, 2000)
                   .option(HoodieCompactionConfig.INLINE_COMPACT_TRIGGER_STRATEGY_PROP,
                           String.valueOf(CompactionTriggerStrategy.NUM_OR_TIME))
                   .option(DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY(), "some_db").
                   option(DataSourceWriteOptions.HIVE_URL_OPT_KEY(), hiveUrl)
                   /*.option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY(),
                           NonPartitionedExtractor.class.getName())*/
                   .option(DataSourceWriteOptions.HIVE_STYLE_PARTITIONING_OPT_KEY(), true)
                   /*.option(DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY(),
                           NonpartitionedKeyGenerator.class.getName())*/
                   /*.option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY(),
                           NonPartitionedExtractor.class.getName())*/
                   .option(DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY(),
                           MultiPartKeysValueExtractor.class.getName())
                   .option(HoodieIndexConfig.INDEX_TYPE_PROP, HoodieIndex.IndexType.SIMPLE.toString())
                   .mode(SaveMode.Append)
                   .save(tablePath);
       }
       
       private GenericRecord convertOplogToAvro(String dbName, String before, String after, Long ts, String id, String op,
                                                String schemaStr) throws IOException, RestClientException {
           Schema schema = new Schema.Parser().parse(schemaStr);
           Document dataTypeDoc=new Document();
           if(before!=null){
               Document document = Document.parse(before);
               dataTypeDoc = Utils.convertToAvroDoc(document);
           }
           dataTypeDoc.put("ts_ms", ts);
           dataTypeDoc.put("db_name", dbName);
           dataTypeDoc.put("id", id);
           dataTypeDoc.put("op",op);
   
           Document finalDataTypeDoc = dataTypeDoc;
           Stream.of(DefaultFields.values())
                   .forEach(field -> finalDataTypeDoc.put(field.getName(), field.getValueFrom(finalDataTypeDoc)));
   
           return new MercifulJsonConverter().convert(dataTypeDoc.toJson(JsonWriterSettings.builder()
                   .outputMode(JsonMode.RELAXED)
                   .dateTimeConverter(new JsonDateConvertor())
                   .objectIdConverter(new ObjectIdConverter())
                   .build()), schema);
       }
    
    ````
   
   2. Persist this ds with Original Schema first and call it few times, to make sure some uncompacted Log files are there.
   3. Persist this ds again with New schema and it will throw Error 
   **Caused by: org.apache.avro.AvroTypeException: Found hoodie.products.products_record, expecting hoodie.products.products_record, missing required field test2**
   
   Our schema is dynamic and I am not removing any field rather adding a field to the end, then also it is failing.
   
   Original Schema 
   
   ````
   {
     "type": "record",
     "name": "foo",
     "namespace": "products",
     "fields": [
       {
         "name": "id",
         "type": [
           "null",
           "string"
         ],
         "default": null
       },
       {
         "name": "product_id",
         "type": [
           "null",
           "string"
         ],
         "default": null
       },
       {
         "name": "db_name",
         "type": [
           "null",
           "string"
         ],
         "default": null
       },
       {
         "name": "catalog_id",
         "type": [
           "null",
           "string"
         ],
         "default": null
       },
       {
         "name": "feed_id",
         "type": [
           "null",
           "string"
         ],
         "default": null
       },
       {
         "name": "ts_ms",
         "type": [
           "null",
           "double"
         ],
         "default": null
       },
       {
         "name": "op",
         "type": [
           "null",
           "string"
         ],
         "default": null
       },
       {
         "name": "test",
         "type": [
           "null",
           "string"
         ],
         "default": null
       }
     ]
   }
   ````
   
   Changed Schema
   
   ````
   {
     "type": "record",
     "name": "foo",
     "namespace": "products",
     "fields": [
       {
         "name": "id",
         "type": [
           "null",
           "string"
         ],
         "default": null
       },
       {
         "name": "product_id",
         "type": [
           "null",
           "string"
         ],
         "default": null
       },
       {
         "name": "db_name",
         "type": [
           "null",
           "string"
         ],
         "default": null
       },
       {
         "name": "catalog_id",
         "type": [
           "null",
           "string"
         ],
         "default": null
       },
       {
         "name": "feed_id",
         "type": [
           "null",
           "string"
         ],
         "default": null
       },
       {
         "name": "ts_ms",
         "type": [
           "null",
           "double"
         ],
         "default": null
       },
       {
         "name": "op",
         "type": [
           "null",
           "string"
         ],
         "default": null
       },
       {
         "name": "test",
         "type": [
           "null",
           "string"
         ],
         "default": null
       },
       {
         "name": "test2",
         "type": [
           "null",
           "string"
         ],
         "default": null
       }
     ]
   }
   ````
   
   Added a column **test2** in the end with default value.
    
   
   
   Schema shown by Hoodie in Logs
   
   ```{
     "type": "record",
     "name": "Max_IND_record",
     "namespace": "hoodie.Max_IND",
     "fields": [
       {
         "name": "_hoodie_commit_time",
         "type": [
           "null",
           "string"
         ],
         "doc": "",
         "default": null
       },
       {
         "name": "_hoodie_commit_seqno",
         "type": [
           "null",
           "string"
         ],
         "doc": "",
         "default": null
       },
       {
         "name": "_hoodie_record_key",
         "type": [
           "null",
           "string"
         ],
         "doc": "",
         "default": null
       },
       {
         "name": "_hoodie_partition_path",
         "type": [
           "null",
           "string"
         ],
         "doc": "",
         "default": null
       },
       {
         "name": "_hoodie_file_name",
         "type": [
           "null",
           "string"
         ],
         "doc": "",
         "default": null
       },
       {
         "name": "id",
         "type": [
           "string",
           "null"
         ]
       },
       {
         "name": "product_id",
         "type": [
           "string",
           "null"
         ]
       },
       {
         "name": "db_name",
         "type": [
           "string",
           "null"
         ]
       },
       {
         "name": "catalog_id",
         "type": [
           "string",
           "null"
         ]
       },
       {
         "name": "feed_id",
         "type": [
           "string",
           "null"
         ]
       },
       {
         "name": "ts_ms",
         "type": [
           "double",
           "null"
         ]
       },
       {
         "name": "op",
         "type": [
           "string",
           "null"
         ]
       },
       {
         "name": "test",
         "type": [
           "string",
           "null"
         ]
       }
     ]
   }
   ````
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-905658828


   Please do try out latest master and let us know if you run into issues. Will close this out in a weeks time if we don't hear from you. thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] aditiwari01 commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

aditiwari01 commented on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-841635115


   I guess I've read somewhere that new version is released every 6 to 8 weeks. So maybe sometime in July. Not sure though.
   
   @nsivabalan @vinothchandar Can comment on this better.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] tandonraghav commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

tandonraghav commented on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-840335514


   @bvaradar Can you please help here? I am stuck for past few days. 
   I can see PR's going on. Will tht resolve the issue. If yes, what will be the release date?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] n3nash commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

n3nash commented on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-853958300


   @tandonraghav Hive-Sync tool will register the schema in the glue catalog. This process is not new. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] aditiwari01 commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

aditiwari01 commented on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-841614452


   Hey @tandonraghav ,
   
   Can you quickly retry schema evolution with latest hudi (0.9.0-SNAPSHOT)?
   I've faced similar issue in past and raised a fix for the same. The PR is merged but it's not released yet.
   
   You can refer this issue for more details: https://github.com/apache/hudi/issues/2675


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

vinothchandar commented on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-875686911


   @codope worth looking at this ticket and everything with same label to see you have covered those cases for schema evol tests as well. FYI 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] tandonraghav commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

tandonraghav commented on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-842020015


   @n3nash So, form now onwards, Glue(Hive) will be storing schema as well?
   So, this hive sync is always required from Ingestion job? 
   
   Can i use Glue Apis to update it asynchronously? If Yes, then which Schema should be updated ?
   
   Does Hive-Sync tool will take care of schema update?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] n3nash edited a comment on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

n3nash edited a comment on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-842009795


   @tandonraghav The tentative date for the next release is sometime in June. You can backport the PR @aditiwari01 is pointing to on 0.8.0 build and use that in your prod environment. If you run into any issues backporting this fix, let me know, happy to help review that. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar closed issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

vinothchandar closed issue #2919:
URL: https://github.com/apache/hudi/issues/2919


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] tandonraghav commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

tandonraghav commented on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-835063730


   @vinothchandar @nsivabalan  Can you guys please help here. I am stuck and cant things forward if Schema Evolution is not happening.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] n3nash commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

n3nash commented on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-853958300


   @tandonraghav Hive-Sync tool will register the schema in the glue catalog. This process is not new. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] tandonraghav commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

tandonraghav commented on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-842020015


   @n3nash So, form now onwards, Glue(Hive) will be storing schema as well?
   So, this hive sync is always required from Ingestion job? 
   
   Can i use Glue Apis to update it asynchronously? If Yes, then which Schema should be updated ?
   
   Does Hive-Sync tool will take care of schema update?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

vinothchandar commented on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-861498213


   Probably don't need it persisted in glue like this. 
   @pengzhiwei2018 any thoughts? cc @umehrot2 as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] aditiwari01 edited a comment on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

aditiwari01 edited a comment on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-841614452


   Hey @tandonraghav ,
   
   Can you quickly retry schema evolution with latest hudi (0.9.0-SNAPSHOT)?
   I've faced similar issue in past and raised a fix for the same. The PR is merged but it's not released yet.
   
   You can refer this issue for more details: https://github.com/apache/hudi/issues/2675
   Jira: https://issues.apache.org/jira/browse/HUDI-1716


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] tandonraghav commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

tandonraghav commented on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-841631655


   @aditiwari01 Yes, i did tried with 0.9.0-SNAPSHOT, it is working fine. But it is storing schema with Glue now
   Also, it is storing schema inside Avro log files, I do understand why this is required.
   
   Do you know any tentative timeline when it will be released? Because we cant use SNAPSHOT in prod.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] tandonraghav commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

tandonraghav commented on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-860846241


   @n3nash @vinothchandar I am seeing the entire schema is getting persisted in Glue **TBLPROPERTIES**. This was not the behaviour previously. Do we need schema there as well or we can have a config to switch it off?
   
   Hudi version - 0.9.0-SNAPSHOT
   
   ````
   hive> show create table max_ro;
   OK
   CREATE EXTERNAL TABLE `max_ro`(
     `_hoodie_commit_time` string, 
     `_hoodie_commit_seqno` string, 
     `_hoodie_record_key` string, 
     `_hoodie_partition_path` string, 
     `_hoodie_file_name` string, 
     `string_pincode_113` string, 
     `double_pincode_113` double, 
     `string_availability_157` string, 
     `string_availability2_169` string, 
     `string_availability3_150` string, 
     `string_availability4_158` string, 
     `string_availability5_187` string, 
     `string_availability6_150` string, 
     `string_availability7_778` string, 
     `string_availability8_192` string, 
     `string_availability9_700` string, 
     `string_availability10_131` string, 
     `string_availability11_186` string, 
     `string_availability12_878` string, 
     `string_availability13_466` string, 
     `id` string, 
     `product_id` string, 
     `catalog_id` string, 
     `feed_id` string, 
     `ts_ms` double, 
     `op` string)
   PARTITIONED BY ( 
     `db_name` string)
   ROW FORMAT SERDE 
     'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
   WITH SERDEPROPERTIES ( 
     'path'='file:/tmp/test/hudi-user-data/max') 
   STORED AS INPUTFORMAT 
     'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
   OUTPUTFORMAT 
     'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
   LOCATION
     'file:/tmp/test/hudi-user-data/max'
   TBLPROPERTIES (
     'last_commit_time_sync'='20210614221425', 
     'last_modified_by'='raghav', 
     'last_modified_time'='1623689081', 
     'spark.sql.sources.provider'='hudi', 
     'spark.sql.sources.schema.numPartCols'='1', 
     'spark.sql.sources.schema.numParts'='1', 
     'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"string_pincode_113","type":"string","nullable":true,"metadata":{}},{"name":"double_pincode_113","type":"double","nullable":true,"metadata":{}},{"name":"string_availability_157","type":"string","nullable":true,"metadata":{}},{"name":"string_availability2_169","type":"string","nullable":true,"metadata":{}},{"name":"string_availability3_150","type":"string","nullable":true,"metadata":{}},{"name":"string_availability4_158","type":"string","nullable":true,"metadata":{}},{"name":"string_availability5_187","type":"string","nullable":true
 ,"metadata":{}},{"name":"string_availability6_150","type":"string","nullable":true,"metadata":{}},{"name":"string_availability7_778","type":"string","nullable":true,"metadata":{}},{"name":"string_availability8_192","type":"string","nullable":true,"metadata":{}},{"name":"string_availability9_700","type":"string","nullable":true,"metadata":{}},{"name":"string_availability10_131","type":"string","nullable":true,"metadata":{}},{"name":"string_availability11_186","type":"string","nullable":true,"metadata":{}},{"name":"string_availability12_878","type":"string","nullable":true,"metadata":{}},{"name":"string_availability13_466","type":"string","nullable":true,"metadata":{}},{"name":"id","type":"string","nullable":true,"metadata":{}},{"name":"product_id","type":"string","nullable":true,"metadata":{}},{"name":"catalog_id","type":"string","nullable":true,"metadata":{}},{"name":"feed_id","type":"string","nullable":true,"metadata":{}},{"name":"ts_ms","type":"double","nullable":true,"metadata":{
 }},{"name":"op","type":"string","nullable":true,"metadata":{}},{"name":"db_name","type":"string","nullable":true,"metadata":{}}]}', 
     'spark.sql.sources.schema.partCol.0'='db_name', 
     'transient_lastDdlTime'='1623689081')
   ````


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-898181212


   @tandonraghav : we had a [fix](https://github.com/apache/hudi/pull/3137) sometime back wrt schema evolution and latest master should work. Can you try out w/ latest master and let us know how it goes. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] n3nash edited a comment on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

n3nash edited a comment on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-842009795


   @tandonraghav The tentative date for the next release is sometime in June. You can backport the PR @aditiwari01 is pointing to on 0.8.0 build and use that in your prod environment. If you run into any issues backporting this fix, let me know, happy to help review that. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan closed issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

nsivabalan closed issue #2919:
URL: https://github.com/apache/hudi/issues/2919


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

vinothchandar commented on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-926263798


   cc @codope could you take a pass? and see if this needs triaging still with 0.9.0. Reopen as needed 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] n3nash commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

n3nash commented on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-842009795


   @tandonraghav The tentative date for the next release is sometime in June. You can backport the PR @aditiwari01 is pointing to on 0.8.0 build and use that in your prod environment. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] vinothchandar commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

vinothchandar commented on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-926263558


   @nsivabalan why did we reopen this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] n3nash commented on issue #2919: [SUPPORT] Schema Evolution Failing Spark+Hudi- Adding New fields

Posted by GitBox <gi...@apache.org>.

n3nash commented on issue #2919:
URL: https://github.com/apache/hudi/issues/2919#issuecomment-842009795


   @tandonraghav The tentative date for the next release is sometime in June. You can backport the PR @aditiwari01 is pointing to on 0.8.0 build and use that in your prod environment. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org