You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Ryan Skraba (Jira)" <ji...@apache.org> on 2019/11/29 14:37:00 UTC

[jira] [Commented] (AVRO-2636) GenericData defaultValueCache caches mutable ByteBuffers

    [ https://issues.apache.org/jira/browse/AVRO-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985049#comment-16985049 ] 

Ryan Skraba commented on AVRO-2636:
-----------------------------------

Hello!  It appears that every *internal* use of {{getDefaultValue(field)}} inside Avro makes a deep copy before using it as a datum.  This isn't well documented as a requirement on this method, and I'm taking a look at the impact it would have to either (1) not cache the default value of bytes, or (2) to return a read-only {{bb.duplicate()}} directly from this method.

I just made a PR on the related AVRO-2592 to avoid modifying the ByteBuffer during decimal conversion, which would correct your unit test, but not the underlying problem.  If it is a problem... I'm thinking that a {{byte[]}} default value for fixed types will always be mutable.  At the very minimum, there should be a strong advisory/warning in the javadoc.

As a quick question, are you using the {{getDefaultValue}} method in a different way?

> GenericData defaultValueCache caches mutable ByteBuffers
> --------------------------------------------------------
>
>                 Key: AVRO-2636
>                 URL: https://issues.apache.org/jira/browse/AVRO-2636
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Valentin Nikotin
>            Priority: Minor
>
> It appears that for default value for Byte type (and Decimal logical type if it uses underlying Bytes type) value rendered with getDefaultValue is cached. This leads to bugs when you read the same value (for example if converted with DecimalConversion). For single thread environment workaround would be to reset ByteBuffer after read, but in concurrent environment we should not cache mutable objects.
>  
> {code:java}
> @Test(expected=NumberFormatException.class)
> public void testReuse() {
>     Conversions.DecimalConversion decimalConversion =
>             new Conversions.DecimalConversion();
>     LogicalType logicalDecimal =
>             LogicalTypes.decimal(38, 9);
>     ByteBuffer defaultValue =
>             decimalConversion.toBytes(
>                     BigDecimal.valueOf(42L).setScale(9),
>                     null,
>                     logicalDecimal);
>     Schema schema = SchemaBuilder
>             .record("test")
>             .fields()
>             .name("decimal")
>             .type(logicalDecimal.addToSchema(SchemaBuilder.builder().bytesType()))
>             .withDefault(defaultValue)
>             .endRecord();
>     BigDecimal firstRead = decimalConversion
>             .fromBytes(
>                     (ByteBuffer) GenericData.get().getDefaultValue(schema.getField("decimal")),
>                     null,
>                     logicalDecimal);
>     BigDecimal secondRead = decimalConversion
>             .fromBytes(
>                     (ByteBuffer) GenericData.get().getDefaultValue(schema.getField("decimal")),
>                     null,
>                     logicalDecimal);
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)