You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "James Clarke (Jira)" <ji...@apache.org> on 2021/03/30 18:13:00 UTC

[jira] [Created] (AVRO-3101) Primitive number values are silently truncated in Java GenericDatumWriter

James Clarke created AVRO-3101:
----------------------------------

             Summary: Primitive number values are silently truncated in Java GenericDatumWriter
                 Key: AVRO-3101
                 URL: https://issues.apache.org/jira/browse/AVRO-3101
             Project: Apache Avro
          Issue Type: Bug
          Components: java
    Affects Versions: 1.10.2, 1.10.1, 1.10.0
            Reporter: James Clarke


Primitive java numeric types are silently truncated in GenericDatumWriter.

Previously (1.9.2) a Type.LONG field with a double value set would cause a ClassCastException when serializing the datum.

Changes in AVRO-2070 cause a double value to be silently truncated.

I don't know if this is a bug or expected behavior since in 1.9.2 (and way way earlier) Type.INT would be silently truncated but other numerics would not.

My use-case involves users generating data which conforms to a dynamically generated Avro schema. The current change provides type safety (for downstream consumers) but does not maintain data integrity. From my POV it would be better to users to explicitly error with a ClassCastException than to introduce corrupt data.

Example test case, which throws ClassCastException in 1.9.2 and prints 456 (not the value set) in 1.10.2. 
{code:java}
@Test
fun testWritingDoubleToLong() {
 val longType = Schema.create(Schema.Type.LONG)
 val field = Schema.Field("long", longType)
 val fields = listOf(field)
 val schema = Schema.createRecord("test", "doc", "", false, fields)
 val record: GenericRecord = GenericData.Record(schema)
 record.put("long", 456.4)

 val stream = ByteArrayOutputStream()
 val datumWriter: DatumWriter<GenericRecord> = GenericDatumWriter(schema)
 val encoder = EncoderFactory.get().binaryEncoder(stream, null)
 datumWriter.write(record, encoder)
 encoder.flush()
 val decoder = DecoderFactory.get().binaryDecoder(stream.toByteArray(), null)
 val datumReader: DatumReader<GenericRecord> = GenericDatumReader(schema)
 val output = datumReader.read(null, decoder)
 println(output["long"])
}{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)