You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by Csaba Galyo <ga...@cloudera.com.INVALID> on 2020/04/02 11:40:20 UTC
Re: Why is the set of logicalType Conversions empty by default for a GenericDatumReader?
+dev maybe someone can answer on this
On Thu, Apr 2, 2020 at 1:07 PM <ga...@cloudera.com> wrote:
> Hello,
>
> I have a question on a specific design decision in Avro. I have a schema
> with a "logicalType=decimal" field. When using SpecificDatumReader to
> deserialize it, the field will get correctly deserialized as BigDecimal,
> because the set of Converters contains the BigDecimalConversion.
>
> When converting with a GenericDatumReader, the set of converters is empty.
> Is there a reason why it's empty? Why are the default converters not
> included?
>
> When reading the field with a GenericDatumReader, the converters set is
> provided by the GenericData object. So if I provide a GenericData with the
> converters, it will get converted to BigDecimal. If GenericData is not
> provided in the GenericDatumReader's constructor, I will get a ByteBuffer.
>
> Sample code below:
>
> car.avsc:
> {
> "type": "record",
> "namespace": "com.schwarzenegger",
> "name": "Car",
> "fields": [
> { "name": "model", "type": "string" },
> { "name": "engineCode", "type": { "type": "bytes",
> "logicalType": "decimal", "precision": 8, "scale": 0 } }
> ]
> }
>
> Test.java:
> public class Test {
>
> public static void main(String[] args) throws Exception {
> Schema schema = new Schema.Parser().parse( ... );
> System.out.println("LogicalType = " +
> schema.getField("engineCode").schema().getLogicalType());
>
> GenericData.Record record = null;
> try (FileInputStream payloadInputStream = new FileInputStream(new
> File("C:\\Temp\\car.txt"))) {
> GenericData genericData = new GenericData();
> genericData.addLogicalTypeConversion(new
> Conversions.DecimalConversion());
> GenericDatumReader genericReader = new
> GenericDatumReader(schema, schema, genericData);
> record = (GenericData.Record) genericReader.read(null,
> DecoderFactory.get().binaryDecoder(payloadInputStream, null));
> }
> Object engineCode =
> record.get(record.getSchema().getField("engineCode").pos());
> System.out.println(String.format("code = %s, class = %s",
> engineCode, engineCode.getClass().getName()));
> }
> }
>
> This will print out:
>
> LogicalType = org.apache.avro.LogicalTypes$Decimal@f8
> code = 12345678, class = java.math.BigDecimal
>
> If I remove that Genericdata part and create the GenericDatumReader without
> it, then I will get backa ByteBuffer because the conversions set is empty.
>
> Is there a reason why that is? If not, can we modify Avro and add the
> default conversions to the GenericDatumReader?
>
> In GenericDatumReader this is the relevant code:
>
> ****************
> protected Object read(Object old, Schema expected, ResolvingDecoder in)
> throws IOException {
> Object datum = this.readWithoutConversion(old, expected, in);
> LogicalType logicalType = expected.getLogicalType();
> if (logicalType != null) {
> Conversion<?> conversion =
> this.getData().getConversionFor(logicalType);
> if (conversion != null) {
> return this.convert(datum, expected, logicalType,
> conversion);
> }
> }
>
> return datum;
> }
>
> public Conversion<Object> getConversionFor(LogicalType logicalType) {
> return logicalType == null ? null :
> (Conversion)this.conversions.get(logicalType.getName());
> }
> ****************
>
> Thanks,
> Csaba
>
>
>
Re: Why is the set of logicalType Conversions empty by default for a GenericDatumReader?
Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Logical type conversions are empty by default to avoid a breaking behavior
change. If we added default conversions, you'd get a different object back
in code that updated an Avro dependency and it would break at runtime with
a ClassCastException.
On Thu, Apr 2, 2020 at 4:40 AM Csaba Galyo <ga...@cloudera.com.invalid>
wrote:
> +dev maybe someone can answer on this
>
> On Thu, Apr 2, 2020 at 1:07 PM <ga...@cloudera.com> wrote:
>
> > Hello,
> >
> > I have a question on a specific design decision in Avro. I have a schema
> > with a "logicalType=decimal" field. When using SpecificDatumReader to
> > deserialize it, the field will get correctly deserialized as BigDecimal,
> > because the set of Converters contains the BigDecimalConversion.
> >
> > When converting with a GenericDatumReader, the set of converters is
> empty.
> > Is there a reason why it's empty? Why are the default converters not
> > included?
> >
> > When reading the field with a GenericDatumReader, the converters set is
> > provided by the GenericData object. So if I provide a GenericData with
> the
> > converters, it will get converted to BigDecimal. If GenericData is not
> > provided in the GenericDatumReader's constructor, I will get a
> ByteBuffer.
> >
> > Sample code below:
> >
> > car.avsc:
> > {
> > "type": "record",
> > "namespace": "com.schwarzenegger",
> > "name": "Car",
> > "fields": [
> > { "name": "model", "type": "string" },
> > { "name": "engineCode", "type": { "type": "bytes",
> > "logicalType": "decimal", "precision": 8, "scale": 0 } }
> > ]
> > }
> >
> > Test.java:
> > public class Test {
> >
> > public static void main(String[] args) throws Exception {
> > Schema schema = new Schema.Parser().parse( ... );
> > System.out.println("LogicalType = " +
> > schema.getField("engineCode").schema().getLogicalType());
> >
> > GenericData.Record record = null;
> > try (FileInputStream payloadInputStream = new FileInputStream(new
> > File("C:\\Temp\\car.txt"))) {
> > GenericData genericData = new GenericData();
> > genericData.addLogicalTypeConversion(new
> > Conversions.DecimalConversion());
> > GenericDatumReader genericReader = new
> > GenericDatumReader(schema, schema, genericData);
> > record = (GenericData.Record) genericReader.read(null,
> > DecoderFactory.get().binaryDecoder(payloadInputStream, null));
> > }
> > Object engineCode =
> > record.get(record.getSchema().getField("engineCode").pos());
> > System.out.println(String.format("code = %s, class = %s",
> > engineCode, engineCode.getClass().getName()));
> > }
> > }
> >
> > This will print out:
> >
> > LogicalType = org.apache.avro.LogicalTypes$Decimal@f8
> > code = 12345678, class = java.math.BigDecimal
> >
> > If I remove that Genericdata part and create the GenericDatumReader
> without
> > it, then I will get backa ByteBuffer because the conversions set is
> empty.
> >
> > Is there a reason why that is? If not, can we modify Avro and add the
> > default conversions to the GenericDatumReader?
> >
> > In GenericDatumReader this is the relevant code:
> >
> > ****************
> > protected Object read(Object old, Schema expected, ResolvingDecoder
> in)
> > throws IOException {
> > Object datum = this.readWithoutConversion(old, expected, in);
> > LogicalType logicalType = expected.getLogicalType();
> > if (logicalType != null) {
> > Conversion<?> conversion =
> > this.getData().getConversionFor(logicalType);
> > if (conversion != null) {
> > return this.convert(datum, expected, logicalType,
> > conversion);
> > }
> > }
> >
> > return datum;
> > }
> >
> > public Conversion<Object> getConversionFor(LogicalType logicalType) {
> > return logicalType == null ? null :
> > (Conversion)this.conversions.get(logicalType.getName());
> > }
> > ****************
> >
> > Thanks,
> > Csaba
> >
> >
> >
>
--
Ryan Blue
Software Engineer
Netflix