You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "shaurya-nwse (via GitHub)" <gi...@apache.org> on 2023/04/18 09:10:33 UTC
[GitHub] [hudi] shaurya-nwse opened a new issue, #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change
shaurya-nwse opened a new issue, #8486:
URL: https://github.com/apache/hudi/issues/8486
**_Tips before filing an issue_**
- Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
Yes
- Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
- If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
**Describe the problem you faced**
We've been running Hudi in production successfully for many of our tables. Our general use case is to create a Hive table from a Kafka topic and we use the Hudi deltastreamer for this.
Recently, one of our topics had an additional field added to it (with a default value) but deltastreamer started throwing exceptions. The stacktrace points to probably some version mismatch between Avro and Jackson used in the the `parquet-avro` package.
**To Reproduce**
We ran into this when saving the table on HDFS, but I could reproduce it locally as well.
Steps to reproduce the behavior:
1. Create a deltastreamer configuration to consume an Avro topic from Kafka and write to HDFS
```properties
hoodie.datasource.write.recordkey.field=work_experience_id
hoodie.datasource.write.partitionpath.field=dt
hoodie.datasource.write.precombine.field=dt
hoodie.datasource.write.operation=upsert
hoodie.table.name=wes
hoodie.index.type=GLOBAL_BLOOM
hoodie.deltastreamer.source.kafka.topic=wes
hoodie.deltastreamer.schemaprovider.registry.url=http://localhost:8081/subjects/wes-value/versions/latest
hoodie.metadata.enable=false
bootstrap.servers=localhost:9092
auto.offset.reset=earliest
schema.registry.url=http://0.0.0.0:8081
group.id=wes-only
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator
hoodie.deltastreamer.source.kafka.value.deserializer=io.confluent.kafka.serializers.KafkaAvroDeserializer
hoodie.datasource.write.reconcile.schema=true
```
2. The avro schema on the schema registry prior to changes is as follows:
```avsc
{
"namespace": "com.ds.model.profile",
"type": "record",
"name": "WES",
"fields": [
{"name": "work_experience_id", "type": "long"},
{"name": "profile_id", "type": "long", "default": -1},
{"name": "primary_job", "type": "boolean"},
{"name": "jobtitle", "type": "string"},
{"name": "discipline_id", "type": ["null", "int"], "default": null},
{"name": "industry_id", "type": ["null", "long"], "default": null},
{"name": "dt", "type": ["null","string"], "default": null}
]
}
```
3. Initial run works fine and the hudi table is created on HDFS/Local FS
4. The schema is modified to add a new field with a default value
```avsc
{
"namespace": "com.ds.model.profile",
"type": "record",
"name": "WES",
"fields": [
{"name": "work_experience_id", "type": "long"},
{"name": "profile_id", "type": "long", "default": -1},
{"name": "primary_job", "type": "boolean"},
{"name": "jobtitle", "type": "string"},
{"name": "discipline_id", "type": ["null", "int"], "default": null},
{"name": "industry_id", "type": ["null", "long"], "default": null},
// this field below is added
{"name": "industry_ids", "type": {"type": "array", "items": ["null", "long"] }, "default": []},
//
{"name": "dt", "type": ["null","string"], "default": null}
]
}
```
5. The deltastreamer job fails with: `Caused by: java.lang.NoSuchMethodError: 'org.codehaus.jackson.JsonNode org.apache.avro.Schema$Field.defaultValue()'`
The timeline shows rollback markers as the attempt to commit fails.
**Expected behavior**
The schema change should be backward compatible as we added a field with a default value. The deltastreamer writes data to the hudi table conforming to the new schema.
**Environment Description**
* Hudi version : `0.11.1`
* Spark version : `3.2.1`
* Hive version : `3.1`
* Hadoop version : `3.1.0.0-78`
* Storage (HDFS/S3/GCS..) : HDFS/LocalFS
* Running on Docker? (yes/no) : no
**Additional context**
The spark-submit for running the deltastreamer (to reproduce locally):
```bash
#!/bin/bash
spark-submit \
--name "WesHudi" \
--jars /Users/shaurya.rawat/Documents/jars/hudi-utilities-bundle_2.12-0.11.1.jar \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.hadoop.parquet.avro.write-old-list-structure=false\
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer /Users/shaurya.rawat/Documents/jars/hudi-utilities-bundle_2.12-0.11.1.jar \
--table-type COPY_ON_WRITE \
--source-ordering-field dt \
--source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
--schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \
--props file:///Users/shaurya.rawat/Documents/hudi-deltastreamer/config/wes-only.properties \
--target-base-path file:///Users/shaurya.rawat/Documents/hudi-deltastreamer/data/wes \
--target-table wes \
--op UPSERT
```
**Stacktrace**
```
Caused by: org.apache.hudi.exception.HoodieException: operation has failed
at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:248)
at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226)
at org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
at org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:278)
at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:135)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
... 3 more
Caused by: java.lang.NoSuchMethodError: 'org.codehaus.jackson.JsonNode org.apache.avro.Schema$Field.defaultValue()'
at org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:168)
at org.apache.parquet.avro.AvroRecordConverter.<init>(AvroRecordConverter.java:95)
at org.apache.parquet.avro.AvroRecordMaterializer.<init>(AvroRecordMaterializer.java:33)
at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138)
at org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:185)
at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
at org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:48)
at org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:106)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
... 4 more
```
Any help is appreciated 🙂
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change
Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1514601993
Did you tried this -
--packages org.apache.hudi:hudi-spark3.2-bundle_2.12:0.13.0
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change
Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1513237248
This was a known issue got fixed here - https://github.com/apache/hudi/pull/4488
Can you please check with this patch.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change
Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1521967662
@shaurya-nwse You can also try building 0.13.0 code and try using that. We should not ideally use both as utilities bundle should contain the spark bundle. Sorry for not giving the utilities bundle last time.
There is utilities bundle also available on Maven
https://mvnrepository.com/artifact/org.apache.hudi/hudi-utilities-bundle_2.12/0.13.0
If you still want to use 0.11, then try patching it with https://github.com/apache/hudi/pull/4488
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] shaurya-nwse commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change
Posted by "shaurya-nwse (via GitHub)" <gi...@apache.org>.
shaurya-nwse commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1514790158
Hi @ad1happy2go , The spark bundle is specifically for reading if i'm not mistaken. I'm using the hudi utilities bundle so I can use the deltastreamer to consume from kafka and populate my hudi table. Is there something similar for the utilities bundle that is for spark 3.2 unless I compile it myself?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] shaurya-nwse commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change
Posted by "shaurya-nwse (via GitHub)" <gi...@apache.org>.
shaurya-nwse commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1516043059
Update: The earlier exception went away if I combine the `hudi-utilities-slim-bundle` and the `hudi-spark3.2-bundle` as it now has the dependencies for spark3.2.
```
--packages org.apache.hudi:hudi-utilities-slim-bundle_2.12:0.11.1,org.apache.hudi:hudi-spark3.2-bundle_2.12:0.11.1
```
However now it runs into this issue:
```
Caused by: org.apache.avro.AvroRuntimeException: Malformed data. Length is negative: -58
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:308)
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:322)
at org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:219)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:456)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:191)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:298)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:183)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:259)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
at org.apache.hudi.avro.HoodieAvroUtils.bytesToAvro(HoodieAvroUtils.java:156)
at org.apache.hudi.avro.HoodieAvroUtils.bytesToAvro(HoodieAvroUtils.java:146)
at org.apache.hudi.common.model.OverwriteWithLatestAvroPayload.getInsertValue(OverwriteWithLatestAvroPayload.java:75)
at org.apache.hudi.common.model.HoodieRecordPayload.getInsertValue(HoodieRecordPayload.java:105)
at org.apache.hudi.io.HoodieMergeHandle.writeInsertRecord(HoodieMergeHandle.java:278)
at org.apache.hudi.io.HoodieMergeHandle.writeIncomingRecords(HoodieMergeHandle.java:386)
at org.apache.hudi.io.HoodieMergeHandle.close(HoodieMergeHandle.java:394)
at org.apache.hudi.table.action.commit.HoodieMergeHelper.runMerge(HoodieMergeHelper.java:160)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdateInternal(BaseSparkCommitActionExecutor.java:358)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpdate(BaseSparkCommitActionExecutor.java:349)
at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:322)
... 28 more
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] shaurya-nwse closed issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change
Posted by "shaurya-nwse (via GitHub)" <gi...@apache.org>.
shaurya-nwse closed issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change
URL: https://github.com/apache/hudi/issues/8486
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] shaurya-nwse commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change
Posted by "shaurya-nwse (via GitHub)" <gi...@apache.org>.
shaurya-nwse commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1523155772
Hi @ad1happy2go We already have tables written using 0.11. Nonetheless when I tried to write using the 0.13.0 utilities, this is what I get:
```
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.execution.datasources.Spark32NestedSchemaPruning
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:54)
... 22 more
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] shaurya-nwse commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change
Posted by "shaurya-nwse (via GitHub)" <gi...@apache.org>.
shaurya-nwse commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1523166426
Some context, we have 3 topics being ingested via a multitable deltastreamer and 2 of them work fine but after the schema changed for the 3rd table we ran into the problem with incompatible dependencies which seem to go away using the 2 packages I mentioned above, except now we get this issue: https://github.com/apache/hudi/issues/8486#issuecomment-1516043059
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] shaurya-nwse commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change
Posted by "shaurya-nwse (via GitHub)" <gi...@apache.org>.
shaurya-nwse commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1531257695
Hi we're currently testing if we can upgrade to 0.13.0 after building it for spark3.2. I'll close this issue for now and open a new one if we face any issue after upgrading. Thanks for your help.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] ad1happy2go commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change
Posted by "ad1happy2go (via GitHub)" <gi...@apache.org>.
ad1happy2go commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1523224879
You need to build code using spark 3.2.
mvn clean package -T2C -DskipTests -Dspark.version=3.2
Can you provide the steps to reproduce this issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] shaurya-nwse commented on issue #8486: [SUPPORT] AvroRecordConverter throws NoSuchMethodError(Avro defaultValue) on schema change
Posted by "shaurya-nwse (via GitHub)" <gi...@apache.org>.
shaurya-nwse commented on issue #8486:
URL: https://github.com/apache/hudi/issues/8486#issuecomment-1514563969
Hi @ad1happy2go, Thanks for pointing me in the right direction. 👍
I cloned and compiled the source with `spark3.2` and `scala-2.12` profiles and locally I can see it honors the schema evolution changes.
Is there someplace where the jar specifically compiled with these profiles is hosted? I couldn't find it on maven central. If not, then I would proceed to compile and host it someplace within our org.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org