You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "shaurya-nwse (via GitHub)" <gi...@apache.org> on 2023/04/18 09:10:33 UTC

[GitHub] [hudi] shaurya-nwse opened a new issue, #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change

shaurya-nwse opened a new issue, #8486:
URL: https://github.com/apache/hudi/issues/8486

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   Yes
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   We've been running Hudi in production successfully for many of our tables. Our general use case is to create a Hive table from a Kafka topic and we use the Hudi deltastreamer for this. 
   Recently, one of our topics had an additional field added to it (with a default value) but deltastreamer started throwing exceptions. The stacktrace points to probably some version mismatch between Avro and Jackson used in the the `parquet-avro`  package. 
   
   
   **To Reproduce**
   We ran into this when saving the table on HDFS, but I could reproduce it locally as well. 
   
   Steps to reproduce the behavior:
   
   1. Create a deltastreamer configuration to consume an Avro topic from Kafka and write to HDFS
   ```properties
   hoodie.datasource.write.recordkey.field=work_experience_id
   hoodie.datasource.write.partitionpath.field=dt
   hoodie.datasource.write.precombine.field=dt
   hoodie.datasource.write.operation=upsert
   hoodie.table.name=wes
   
   hoodie.index.type=GLOBAL_BLOOM
   hoodie.deltastreamer.source.kafka.topic=wes
   hoodie.deltastreamer.schemaprovider.registry.url=http://localhost:8081/subjects/wes-value/versions/latest
   hoodie.metadata.enable=false
   bootstrap.servers=localhost:9092
   auto.offset.reset=earliest
   schema.registry.url=http://0.0.0.0:8081
   group.id=wes-only
   
   hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator
   hoodie.deltastreamer.source.kafka.value.deserializer=io.confluent.kafka.serializers.KafkaAvroDeserializer
   
   hoodie.datasource.write.reconcile.schema=true
   ```
   2. The avro schema on the schema registry prior to changes is as follows:
   ```avsc
   {
     "namespace": "com.ds.model.profile",
     "type": "record",
     "name": "WES",
     "fields": [
         {"name": "work_experience_id", "type": "long"},
         {"name": "profile_id", "type": "long", "default": -1},
         {"name": "primary_job", "type": "boolean"},
         {"name": "jobtitle", "type": "string"},
         {"name": "discipline_id", "type": ["null", "int"], "default": null},
         {"name": "industry_id", "type": ["null", "long"], "default": null},
         {"name": "dt", "type":  ["null","string"], "default":  null}
     ]
   }
   ```
   3. Initial run works fine and the hudi table is created on HDFS/Local FS
   4. The schema is modified to add a new field with a default value
   ```avsc
   {
     "namespace": "com.ds.model.profile",
     "type": "record",
     "name": "WES",
     "fields": [
         {"name": "work_experience_id", "type": "long"},
         {"name": "profile_id", "type": "long", "default": -1},
         {"name": "primary_job", "type": "boolean"},
         {"name": "jobtitle", "type": "string"},
         {"name": "discipline_id", "type": ["null", "int"], "default": null},
         {"name": "industry_id", "type": ["null", "long"], "default": null},
         // this field below is added
         {"name": "industry_ids", "type": {"type": "array", "items": ["null", "long"] }, "default": []},
         //
         {"name": "dt", "type":  ["null","string"], "default":  null}
     ]
   }
   ```
    5. The deltastreamer job fails with: `Caused by: java.lang.NoSuchMethodError: 'org.codehaus.jackson.JsonNode org.apache.avro.Schema$Field.defaultValue()'`
   
   The timeline shows rollback markers as the attempt to commit fails.
   
   **Expected behavior**
   
   The schema change should be backward compatible as we added a field with a default value. The deltastreamer writes data to the hudi table conforming to the new schema.
   
   **Environment Description**
   
   * Hudi version : `0.11.1`
   
   * Spark version : `3.2.1`
   
   * Hive version : `3.1`
   
   * Hadoop version : `3.1.0.0-78`
   
   * Storage (HDFS/S3/GCS..) : HDFS/LocalFS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   The spark-submit for running the deltastreamer (to reproduce locally):
   ```bash
   #!/bin/bash
   spark-submit \
       --name "WesHudi" \
       --jars /Users/shaurya.rawat/Documents/jars/hudi-utilities-bundle_2.12-0.11.1.jar \
       --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
       --conf spark.hadoop.parquet.avro.write-old-list-structure=false\
       --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer /Users/shaurya.rawat/Documents/jars/hudi-utilities-bundle_2.12-0.11.1.jar  \
       --table-type COPY_ON_WRITE \
       --source-ordering-field dt \
       --source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
       --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \
       --props file:///Users/shaurya.rawat/Documents/hudi-deltastreamer/config/wes-only.properties \
       --target-base-path file:///Users/shaurya.rawat/Documents/hudi-deltastreamer/data/wes \
       --target-table wes \
       --op UPSERT
   ```
   
   **Stacktrace**
   
    ```
   Caused by: org.apache.hudi.exception.HoodieException: operation has failed
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:248)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:278)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:135)
   	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
   	... 3 more
   Caused by: java.lang.NoSuchMethodError: 'org.codehaus.jackson.JsonNode org.apache.avro.Schema$Field.defaultValue()'
   	at org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:168)
   	at org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:95)
   	at org.apache.parquet.avro.AvroRecordMaterializer.<init>(AvroRecordMaterializer.java:33)
   	at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138)
   	at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185)
   	at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
   	at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
   	at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:48)
   	at org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:106)
   	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
   	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
   	... 4 more
   ```
   
   Any help is appreciated 🙂 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] ad1happy2go commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.

ad1happy2go commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1514601993

   Did you tried this -     
   --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.13.0 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] ad1happy2go commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.

ad1happy2go commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1513237248

   This was a known issue got fixed here - https://github.com/apache/hudi/pull/4488
   
   Can you please check with this patch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org