You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "shaurya-nwse (via GitHub)" <gi...@apache.org> on 2023/04/18 09:10:33 UTC

[GitHub] [hudi] shaurya-nwse opened a new issue, #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change

shaurya-nwse opened a new issue, #8486:
URL: https://github.com/apache/hudi/issues/8486

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   Yes
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   We've been running Hudi in production successfully for many of our tables. Our general use case is to create a Hive table from a Kafka topic and we use the Hudi deltastreamer for this. 
   Recently, one of our topics had an additional field added to it (with a default value) but deltastreamer started throwing exceptions. The stacktrace points to probably some version mismatch between Avro and Jackson used in the the `parquet-avro`  package. 
   
   
   **To Reproduce**
   We ran into this when saving the table on HDFS, but I could reproduce it locally as well. 
   
   Steps to reproduce the behavior:
   
   1. Create a deltastreamer configuration to consume an Avro topic from Kafka and write to HDFS
   ```properties
   hoodie.datasource.write.recordkey.field=work_experience_id
   hoodie.datasource.write.partitionpath.field=dt
   hoodie.datasource.write.precombine.field=dt
   hoodie.datasource.write.operation=upsert
   hoodie.table.name=wes
   
   hoodie.index.type=GLOBAL_BLOOM
   hoodie.deltastreamer.source.kafka.topic=wes
   hoodie.deltastreamer.schemaprovider.registry.url=http://localhost:8081/subjects/wes-value/versions/latest
   hoodie.metadata.enable=false
   bootstrap.servers=localhost:9092
   auto.offset.reset=earliest
   schema.registry.url=http://0.0.0.0:8081
   group.id=wes-only
   
   hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator
   hoodie.deltastreamer.source.kafka.value.deserializer=io.confluent.kafka.serializers.KafkaAvroDeserializer
   
   hoodie.datasource.write.reconcile.schema=true
   ```
   2. The avro schema on the schema registry prior to changes is as follows:
   ```avsc
   {
     "namespace": "com.ds.model.profile",
     "type": "record",
     "name": "WES",
     "fields": [
         {"name": "work_experience_id", "type": "long"},
         {"name": "profile_id", "type": "long", "default": -1},
         {"name": "primary_job", "type": "boolean"},
         {"name": "jobtitle", "type": "string"},
         {"name": "discipline_id", "type": ["null", "int"], "default": null},
         {"name": "industry_id", "type": ["null", "long"], "default": null},
         {"name": "dt", "type":  ["null","string"], "default":  null}
     ]
   }
   ```
   3. Initial run works fine and the hudi table is created on HDFS/Local FS
   4. The schema is modified to add a new field with a default value
   ```avsc
   {
     "namespace": "com.ds.model.profile",
     "type": "record",
     "name": "WES",
     "fields": [
         {"name": "work_experience_id", "type": "long"},
         {"name": "profile_id", "type": "long", "default": -1},
         {"name": "primary_job", "type": "boolean"},
         {"name": "jobtitle", "type": "string"},
         {"name": "discipline_id", "type": ["null", "int"], "default": null},
         {"name": "industry_id", "type": ["null", "long"], "default": null},
         // this field below is added
         {"name": "industry_ids", "type": {"type": "array", "items": ["null", "long"] }, "default": []},
         //
         {"name": "dt", "type":  ["null","string"], "default":  null}
     ]
   }
   ```
    5. The deltastreamer job fails with: `Caused by: java.lang.NoSuchMethodError: 'org.codehaus.jackson.JsonNode org.apache.avro.Schema$Field.defaultValue()'`
   
   The timeline shows rollback markers as the attempt to commit fails.
   
   **Expected behavior**
   
   The schema change should be backward compatible as we added a field with a default value. The deltastreamer writes data to the hudi table conforming to the new schema.
   
   **Environment Description**
   
   * Hudi version : `0.11.1`
   
   * Spark version : `3.2.1`
   
   * Hive version : `3.1`
   
   * Hadoop version : `3.1.0.0-78`
   
   * Storage (HDFS/S3/GCS..) : HDFS/LocalFS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   The spark-submit for running the deltastreamer (to reproduce locally):
   ```bash
   #!/bin/bash
   spark-submit \
       --name "WesHudi" \
       --jars /Users/shaurya.rawat/Documents/jars/hudi-utilities-bundle_2.12-0.11.1.jar \
       --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
       --conf spark.hadoop.parquet.avro.write-old-list-structure=false\
       --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer /Users/shaurya.rawat/Documents/jars/hudi-utilities-bundle_2.12-0.11.1.jar  \
       --table-type COPY_ON_WRITE \
       --source-ordering-field dt \
       --source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
       --schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \
       --props file:///Users/shaurya.rawat/Documents/hudi-deltastreamer/config/wes-only.properties \
       --target-base-path file:///Users/shaurya.rawat/Documents/hudi-deltastreamer/data/wes \
       --target-table wes \
       --op UPSERT
   ```
   
   **Stacktrace**
   
    ```
   Caused by: org.apache.hudi.exception.HoodieException: operation has failed
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:248)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:278)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:135)
   	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
   	... 3 more
   Caused by: java.lang.NoSuchMethodError: 'org.codehaus.jackson.JsonNode org.apache.avro.Schema$Field.defaultValue()'
   	at org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:168)
   	at org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:95)
   	at org.apache.parquet.avro.AvroRecordMaterializer.<init>(AvroRecordMaterializer.java:33)
   	at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138)
   	at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185)
   	at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
   	at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
   	at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:48)
   	at org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:106)
   	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
   	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
   	... 4 more
   ```
   
   Any help is appreciated 🙂 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1514601993

   Did you tried this -     
   --packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.13.0 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1513237248

   This was a known issue got fixed here - https://github.com/apache/hudi/pull/4488
   
   Can you please check with this patch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1521967662

   @shaurya-nwse You can also try building 0.13.0 code and try using that. We should not ideally use both as utilities bundle should contain the spark bundle. Sorry for not giving the utilities bundle last time. 
   
   There is utilities bundle also available on Maven
   https://mvnrepository.com/artifact/org.apache.hudi/hudi-utilities-bundle_2.12/0.13.0
   
   If you still want to use 0.11, then try patching it with https://github.com/apache/hudi/pull/4488


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] shaurya-nwse commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change

Posted by "shaurya-nwse (via GitHub)" <gi...@apache.org>.
shaurya-nwse commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1514790158

   Hi @ad1happy2go , The spark bundle is specifically for reading if i'm not mistaken. I'm using the hudi utilities bundle so I can use the deltastreamer to consume from kafka and populate my hudi table. Is there something similar for the utilities bundle that is for spark 3.2 unless I compile it myself? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] shaurya-nwse commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change

Posted by "shaurya-nwse (via GitHub)" <gi...@apache.org>.
shaurya-nwse commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1516043059

   Update: The earlier exception went away if I combine the `hudi-utilities-slim-bundle` and the `hudi-spark3.2-bundle` as it now has the dependencies for spark3.2. 
   ```
   --packages org.apache.hudi:hudi-utilities-slim-bundle_2.12:0.11.1,org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.1
   ```
   However now it runs into this issue: 
   ```
   Caused by: org.apache.avro.AvroRuntimeException: Malformed data. Length is negative: -58
   	at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:308)
   	at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:322)
   	at org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:219)
   	at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:456)
   	at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:191)
   	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
   	at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
   	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
   	at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
   	at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:298)
   	at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183)
   	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
   	at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
   	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
   	at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
   	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
   	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
   	at org.apache.hudi.avro.HoodieAvroUtils.bytesToAvro(HoodieAvroUtils.java:156)
   	at org.apache.hudi.avro.HoodieAvroUtils.bytesToAvro(HoodieAvroUtils.java:146)
   	at org.apache.hudi.common.model.OverwriteWithLatestAvroPayload.getInsertValue(OverwriteWithLatestAvroPayload.java:75)
   	at org.apache.hudi.common.model.HoodieRecordPayload.getInsertValue(HoodieRecordPayload.java:105)
   	at org.apache.hudi.io.HoodieMergeHandle.writeInsertRecord(HoodieMergeHandle.java:278)
   	at org.apache.hudi.io.HoodieMergeHandle.writeIncomingRecords(HoodieMergeHandle.java:386)
   	at org.apache.hudi.io.HoodieMergeHandle.close(HoodieMergeHandle.java:394)
   	at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:160)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:358)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:349)
   	at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)
   	... 28 more
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] shaurya-nwse closed issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change

Posted by "shaurya-nwse (via GitHub)" <gi...@apache.org>.
shaurya-nwse closed issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change
URL: https://github.com/apache/hudi/issues/8486


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] shaurya-nwse commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change

Posted by "shaurya-nwse (via GitHub)" <gi...@apache.org>.
shaurya-nwse commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1523155772

   Hi @ad1happy2go We already have tables written using 0.11. Nonetheless when I tried to write using the 0.13.0 utilities, this is what I get: 
   ```
   Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.execution.datasources.Spark32NestedSchemaPruning
   	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
   	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
   	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
   	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
   	at java.lang.Class.forName0(Native Method)
   	at java.lang.Class.forName(Class.java:264)
   	at org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:54)
   	... 22 more
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] shaurya-nwse commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change

Posted by "shaurya-nwse (via GitHub)" <gi...@apache.org>.
shaurya-nwse commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1523166426

   Some context, we have 3 topics being ingested via a multitable deltastreamer and 2 of them work fine but after the schema changed for the 3rd table we ran into the problem with incompatible dependencies which seem to go away using the 2 packages I mentioned above, except now we get this issue: https://github.com/apache/hudi/issues/8486#issuecomment-1516043059


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] shaurya-nwse commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change

Posted by "shaurya-nwse (via GitHub)" <gi...@apache.org>.
shaurya-nwse commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1531257695

   Hi we're currently testing if we can upgrade to 0.13.0 after building it for spark3.2. I'll close this issue for now and open a new one if we face any issue after upgrading. Thanks for your help. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ad1happy2go commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change

Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1523224879

   You need to build code using spark 3.2.
   mvn clean package -T2C -DskipTests -Dspark.version=3.2
   
   Can you provide the steps to reproduce this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] shaurya-nwse commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change

Posted by "shaurya-nwse (via GitHub)" <gi...@apache.org>.
shaurya-nwse commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1514563969

   Hi @ad1happy2go, Thanks for pointing me in the right direction. 👍 
   I cloned and compiled the source with `spark3.2` and `scala-2.12` profiles and locally I can see it honors the schema evolution changes. 
   Is there someplace where the jar specifically compiled with these profiles is hosted? I couldn't find it on maven central. If not, then I would proceed to compile and host it someplace within our org. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org