You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Andy Coates (JIRA)" <ji...@apache.org> on 2017/09/26 15:45:00 UTC
[jira] [Comment Edited] (AVRO-2078) Avro does not enforce schema
resolution rules for Decimal type
[ https://issues.apache.org/jira/browse/AVRO-2078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180950#comment-16180950 ]
Andy Coates edited comment on AVRO-2078 at 9/26/17 3:44 PM:
------------------------------------------------------------
This is particularly nasty bug as it can easily lead to data corruption. If you write decimal "1.2345" with a write schema with a scale of 4 and then deserialize with a scale of 3, the value comes out as "12.345"!!!!
{code:java}
@Test
public void shouldThrowIfExistingFieldChangesType() throws Exception {
GenericData genericData = new GenericData();
genericData.addLogicalTypeConversion(new Conversions.DecimalConversion());
final Schema v1 = Schema.createRecord("thing", "", "namespace", false, ImmutableList.of(
new Schema.Field("decimal", LogicalTypes.decimal(3, 3).addToSchema(Schema.create(Schema.Type.BYTES)), "", Schema.NULL_VALUE)
));
final Schema v2 = Schema.createRecord("thing", "", "namespace", false, ImmutableList.of(
new Schema.Field("decimal", LogicalTypes.decimal(6, 4).addToSchema(Schema.create(Schema.Type.BYTES)), "", Schema.NULL_VALUE)
));
final GenericData.Record recordV2 = new GenericData.Record(v2);
recordV2.put("decimal", new BigDecimal("1.2345"));
ByteBuffer bytes = serialize(genericData, recordV2);
final GenericRecord deserialized = deserialize(genericData, v1, v2, bytes);
final Object result = deserialized.get("decimal");
// Below fails because result is 'new BigDecimal("12.345")'
assertThat(result, is (new BigDecimal("1.2345")));
}
private ByteBuffer serialize(final GenericData genericData, final GenericData.Record recordV2) throws java.io.IOException {
ByteBufferOutputStream output = new ByteBufferOutputStream();
BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(output, null);
DatumWriter<IndexedRecord> datumWriter = genericData.createDatumWriter(recordV2.getSchema());
datumWriter.write(recordV2, encoder);
encoder.flush();
return output.getBufferList().get(0);
}
private GenericRecord deserialize(final GenericData genericData, final Schema v1, final Schema v2, final ByteBuffer bytes) throws java.io.IOException {
ByteBufferInputStream input = new ByteBufferInputStream(bytes);
final DatumReader<GenericRecord> datumReader = genericData.createDatumReader(v2, v1);
return datumReader.read(new GenericData.Record(v1), DecoderFactory.get().binaryDecoder(input, null));
}
{code}
was (Author: bigandy):
This is particularly nasty bug as it can easily lead to data corruption. If you write decimal "1.2345" with a write schema with a scale of 4 and then deserialize with a scale of 3, the value comes out as "12.345"!!!!
> Avro does not enforce schema resolution rules for Decimal type
> --------------------------------------------------------------
>
> Key: AVRO-2078
> URL: https://issues.apache.org/jira/browse/AVRO-2078
> Project: Avro
> Issue Type: Bug
> Reporter: Anthony Hsu
> Assignee: Nandor Kollar
> Fix For: 1.8.2
>
> Attachments: dec.avro
>
>
> According to http://avro.apache.org/docs/1.8.2/spec.html#Decimal
> bq. For the purposes of schema resolution, two schemas that are {{decimal}} logical types _match_ if their scales and precisions match.
> This is not enforced.
> I wrote a file with (precision 5, scale 2) and tried to read it with a reader schema with (precision 3, scale 1). I expected an AvroTypeException to be thrown, but none was thrown.
> Test data file attached. The code to read it is:
> {noformat:title=ReadDecimal.java}
> import java.io.File;
> import org.apache.avro.Schema;
> import org.apache.avro.file.DataFileReader;
> import org.apache.avro.generic.GenericDatumReader;
> import org.apache.avro.generic.GenericRecord;
> import org.apache.avro.io.DatumReader;
> public class ReadDecimal {
> public static void main(String[] args) throws Exception {
> Schema schema = new Schema.Parser().parse("{\n" + " \"type\" : \"record\",\n" + " \"name\" : \"some_schema\",\n"
> + " \"namespace\" : \"com.howdy\",\n" + " \"fields\" : [ {\n" + " \"name\" : \"name\",\n"
> + " \"type\" : \"string\"\n" + " }, {\n" + " \"name\" : \"value\",\n" + " \"type\" : {\n"
> + " \"type\" : \"bytes\",\n" + " \"logicalType\" : \"decimal\",\n" + " \"precision\" : 3,\n"
> + " \"scale\" : 1\n" + " }\n" + " } ]\n" + "}");
> DatumReader<GenericRecord> datumReader = new GenericDatumReader<>(schema);
> // dec.avro has precision 5, scale 2
> DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(
> new File("/tmp/dec.avro"), datumReader);
> GenericRecord foo = null;
> while (dataFileReader.hasNext()) {
> foo = dataFileReader.next(foo); // AvroTypeException expected due to change in scale/precision but none occurs
> }
> }
> }
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)