You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by ga...@cloudera.com on 2020/04/02 11:07:21 UTC

Why is the set of logicalType Conversions empty by default for a GenericDatumReader?

Hello,

I have a question on a specific design decision in Avro. I have a schema
with a "logicalType=decimal" field. When using SpecificDatumReader to
deserialize it, the field will get correctly deserialized as BigDecimal,
because the set of Converters contains the BigDecimalConversion.

When converting with a GenericDatumReader, the set of converters is empty.
Is there a reason why it's empty? Why are the default converters not
included?

When reading the field with a GenericDatumReader, the converters set is
provided by the GenericData object. So if I provide a GenericData with the
converters, it will get converted to BigDecimal. If GenericData is not
provided in the GenericDatumReader's constructor, I will get a ByteBuffer.

Sample code below:

car.avsc:
{
  "type": "record",
  "namespace": "com.schwarzenegger",
  "name": "Car",
  "fields": [
            { "name": "model", "type":  "string" },
            { "name": "engineCode", "type": { "type": "bytes",
"logicalType": "decimal", "precision": 8, "scale": 0 } }
  ]
}

Test.java:
public class Test {

    public static void main(String[] args) throws Exception {
        Schema schema = new Schema.Parser().parse( ... );
        System.out.println("LogicalType = " +
schema.getField("engineCode").schema().getLogicalType());
		
        GenericData.Record record = null;
        try (FileInputStream payloadInputStream = new FileInputStream(new
File("C:\\Temp\\car.txt"))) {
            GenericData genericData = new GenericData();
            genericData.addLogicalTypeConversion(new
Conversions.DecimalConversion());
            GenericDatumReader genericReader = new
GenericDatumReader(schema, schema, genericData);
            record = (GenericData.Record) genericReader.read(null,
DecoderFactory.get().binaryDecoder(payloadInputStream, null));
        }
        Object engineCode =
record.get(record.getSchema().getField("engineCode").pos());
        System.out.println(String.format("code = %s, class = %s",
engineCode, engineCode.getClass().getName()));
   }
}

This will print out:

LogicalType = org.apache.avro.LogicalTypes$Decimal@f8
code = 12345678, class = java.math.BigDecimal

If I remove that Genericdata part and create the GenericDatumReader without
it, then I will get backa  ByteBuffer because the conversions set is empty.

Is there a reason why that is? If not, can we modify Avro and add the
default conversions to the GenericDatumReader?

In GenericDatumReader this is the relevant code:

****************
    protected Object read(Object old, Schema expected, ResolvingDecoder in)
throws IOException {
        Object datum = this.readWithoutConversion(old, expected, in);
        LogicalType logicalType = expected.getLogicalType();
        if (logicalType != null) {
            Conversion<?> conversion =
this.getData().getConversionFor(logicalType);
            if (conversion != null) {
                return this.convert(datum, expected, logicalType,
conversion);
            }
        }

        return datum;
    }

    public Conversion<Object> getConversionFor(LogicalType logicalType) {
        return logicalType == null ? null :
(Conversion)this.conversions.get(logicalType.getName());
    }
****************

Thanks,
Csaba



Re: Why is the set of logicalType Conversions empty by default for a GenericDatumReader?

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Logical type conversions are empty by default to avoid a breaking behavior
change. If we added default conversions, you'd get a different object back
in code that updated an Avro dependency and it would break at runtime with
a ClassCastException.

On Thu, Apr 2, 2020 at 4:40 AM Csaba Galyo <ga...@cloudera.com.invalid>
wrote:

> +dev maybe someone can answer on this
>
> On Thu, Apr 2, 2020 at 1:07 PM <ga...@cloudera.com> wrote:
>
> > Hello,
> >
> > I have a question on a specific design decision in Avro. I have a schema
> > with a "logicalType=decimal" field. When using SpecificDatumReader to
> > deserialize it, the field will get correctly deserialized as BigDecimal,
> > because the set of Converters contains the BigDecimalConversion.
> >
> > When converting with a GenericDatumReader, the set of converters is
> empty.
> > Is there a reason why it's empty? Why are the default converters not
> > included?
> >
> > When reading the field with a GenericDatumReader, the converters set is
> > provided by the GenericData object. So if I provide a GenericData with
> the
> > converters, it will get converted to BigDecimal. If GenericData is not
> > provided in the GenericDatumReader's constructor, I will get a
> ByteBuffer.
> >
> > Sample code below:
> >
> > car.avsc:
> > {
> >   "type": "record",
> >   "namespace": "com.schwarzenegger",
> >   "name": "Car",
> >   "fields": [
> >             { "name": "model", "type":  "string" },
> >             { "name": "engineCode", "type": { "type": "bytes",
> > "logicalType": "decimal", "precision": 8, "scale": 0 } }
> >   ]
> > }
> >
> > Test.java:
> > public class Test {
> >
> >     public static void main(String[] args) throws Exception {
> >         Schema schema = new Schema.Parser().parse( ... );
> >         System.out.println("LogicalType = " +
> > schema.getField("engineCode").schema().getLogicalType());
> >
> >         GenericData.Record record = null;
> >         try (FileInputStream payloadInputStream = new FileInputStream(new
> > File("C:\\Temp\\car.txt"))) {
> >             GenericData genericData = new GenericData();
> >             genericData.addLogicalTypeConversion(new
> > Conversions.DecimalConversion());
> >             GenericDatumReader genericReader = new
> > GenericDatumReader(schema, schema, genericData);
> >             record = (GenericData.Record) genericReader.read(null,
> > DecoderFactory.get().binaryDecoder(payloadInputStream, null));
> >         }
> >         Object engineCode =
> > record.get(record.getSchema().getField("engineCode").pos());
> >         System.out.println(String.format("code = %s, class = %s",
> > engineCode, engineCode.getClass().getName()));
> >    }
> > }
> >
> > This will print out:
> >
> > LogicalType = org.apache.avro.LogicalTypes$Decimal@f8
> > code = 12345678, class = java.math.BigDecimal
> >
> > If I remove that Genericdata part and create the GenericDatumReader
> without
> > it, then I will get backa  ByteBuffer because the conversions set is
> empty.
> >
> > Is there a reason why that is? If not, can we modify Avro and add the
> > default conversions to the GenericDatumReader?
> >
> > In GenericDatumReader this is the relevant code:
> >
> > ****************
> >     protected Object read(Object old, Schema expected, ResolvingDecoder
> in)
> > throws IOException {
> >         Object datum = this.readWithoutConversion(old, expected, in);
> >         LogicalType logicalType = expected.getLogicalType();
> >         if (logicalType != null) {
> >             Conversion<?> conversion =
> > this.getData().getConversionFor(logicalType);
> >             if (conversion != null) {
> >                 return this.convert(datum, expected, logicalType,
> > conversion);
> >             }
> >         }
> >
> >         return datum;
> >     }
> >
> >     public Conversion<Object> getConversionFor(LogicalType logicalType) {
> >         return logicalType == null ? null :
> > (Conversion)this.conversions.get(logicalType.getName());
> >     }
> > ****************
> >
> > Thanks,
> > Csaba
> >
> >
> >
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Why is the set of logicalType Conversions empty by default for a GenericDatumReader?

Posted by Csaba Galyo <ga...@cloudera.com>.
+dev maybe someone can answer on this

On Thu, Apr 2, 2020 at 1:07 PM <ga...@cloudera.com> wrote:

> Hello,
>
> I have a question on a specific design decision in Avro. I have a schema
> with a "logicalType=decimal" field. When using SpecificDatumReader to
> deserialize it, the field will get correctly deserialized as BigDecimal,
> because the set of Converters contains the BigDecimalConversion.
>
> When converting with a GenericDatumReader, the set of converters is empty.
> Is there a reason why it's empty? Why are the default converters not
> included?
>
> When reading the field with a GenericDatumReader, the converters set is
> provided by the GenericData object. So if I provide a GenericData with the
> converters, it will get converted to BigDecimal. If GenericData is not
> provided in the GenericDatumReader's constructor, I will get a ByteBuffer.
>
> Sample code below:
>
> car.avsc:
> {
>   "type": "record",
>   "namespace": "com.schwarzenegger",
>   "name": "Car",
>   "fields": [
>             { "name": "model", "type":  "string" },
>             { "name": "engineCode", "type": { "type": "bytes",
> "logicalType": "decimal", "precision": 8, "scale": 0 } }
>   ]
> }
>
> Test.java:
> public class Test {
>
>     public static void main(String[] args) throws Exception {
>         Schema schema = new Schema.Parser().parse( ... );
>         System.out.println("LogicalType = " +
> schema.getField("engineCode").schema().getLogicalType());
>
>         GenericData.Record record = null;
>         try (FileInputStream payloadInputStream = new FileInputStream(new
> File("C:\\Temp\\car.txt"))) {
>             GenericData genericData = new GenericData();
>             genericData.addLogicalTypeConversion(new
> Conversions.DecimalConversion());
>             GenericDatumReader genericReader = new
> GenericDatumReader(schema, schema, genericData);
>             record = (GenericData.Record) genericReader.read(null,
> DecoderFactory.get().binaryDecoder(payloadInputStream, null));
>         }
>         Object engineCode =
> record.get(record.getSchema().getField("engineCode").pos());
>         System.out.println(String.format("code = %s, class = %s",
> engineCode, engineCode.getClass().getName()));
>    }
> }
>
> This will print out:
>
> LogicalType = org.apache.avro.LogicalTypes$Decimal@f8
> code = 12345678, class = java.math.BigDecimal
>
> If I remove that Genericdata part and create the GenericDatumReader without
> it, then I will get backa  ByteBuffer because the conversions set is empty.
>
> Is there a reason why that is? If not, can we modify Avro and add the
> default conversions to the GenericDatumReader?
>
> In GenericDatumReader this is the relevant code:
>
> ****************
>     protected Object read(Object old, Schema expected, ResolvingDecoder in)
> throws IOException {
>         Object datum = this.readWithoutConversion(old, expected, in);
>         LogicalType logicalType = expected.getLogicalType();
>         if (logicalType != null) {
>             Conversion<?> conversion =
> this.getData().getConversionFor(logicalType);
>             if (conversion != null) {
>                 return this.convert(datum, expected, logicalType,
> conversion);
>             }
>         }
>
>         return datum;
>     }
>
>     public Conversion<Object> getConversionFor(LogicalType logicalType) {
>         return logicalType == null ? null :
> (Conversion)this.conversions.get(logicalType.getName());
>     }
> ****************
>
> Thanks,
> Csaba
>
>
>

Re: Why is the set of logicalType Conversions empty by default for a GenericDatumReader?

Posted by Csaba Galyo <ga...@cloudera.com.INVALID>.
+dev maybe someone can answer on this

On Thu, Apr 2, 2020 at 1:07 PM <ga...@cloudera.com> wrote:

> Hello,
>
> I have a question on a specific design decision in Avro. I have a schema
> with a "logicalType=decimal" field. When using SpecificDatumReader to
> deserialize it, the field will get correctly deserialized as BigDecimal,
> because the set of Converters contains the BigDecimalConversion.
>
> When converting with a GenericDatumReader, the set of converters is empty.
> Is there a reason why it's empty? Why are the default converters not
> included?
>
> When reading the field with a GenericDatumReader, the converters set is
> provided by the GenericData object. So if I provide a GenericData with the
> converters, it will get converted to BigDecimal. If GenericData is not
> provided in the GenericDatumReader's constructor, I will get a ByteBuffer.
>
> Sample code below:
>
> car.avsc:
> {
>   "type": "record",
>   "namespace": "com.schwarzenegger",
>   "name": "Car",
>   "fields": [
>             { "name": "model", "type":  "string" },
>             { "name": "engineCode", "type": { "type": "bytes",
> "logicalType": "decimal", "precision": 8, "scale": 0 } }
>   ]
> }
>
> Test.java:
> public class Test {
>
>     public static void main(String[] args) throws Exception {
>         Schema schema = new Schema.Parser().parse( ... );
>         System.out.println("LogicalType = " +
> schema.getField("engineCode").schema().getLogicalType());
>
>         GenericData.Record record = null;
>         try (FileInputStream payloadInputStream = new FileInputStream(new
> File("C:\\Temp\\car.txt"))) {
>             GenericData genericData = new GenericData();
>             genericData.addLogicalTypeConversion(new
> Conversions.DecimalConversion());
>             GenericDatumReader genericReader = new
> GenericDatumReader(schema, schema, genericData);
>             record = (GenericData.Record) genericReader.read(null,
> DecoderFactory.get().binaryDecoder(payloadInputStream, null));
>         }
>         Object engineCode =
> record.get(record.getSchema().getField("engineCode").pos());
>         System.out.println(String.format("code = %s, class = %s",
> engineCode, engineCode.getClass().getName()));
>    }
> }
>
> This will print out:
>
> LogicalType = org.apache.avro.LogicalTypes$Decimal@f8
> code = 12345678, class = java.math.BigDecimal
>
> If I remove that Genericdata part and create the GenericDatumReader without
> it, then I will get backa  ByteBuffer because the conversions set is empty.
>
> Is there a reason why that is? If not, can we modify Avro and add the
> default conversions to the GenericDatumReader?
>
> In GenericDatumReader this is the relevant code:
>
> ****************
>     protected Object read(Object old, Schema expected, ResolvingDecoder in)
> throws IOException {
>         Object datum = this.readWithoutConversion(old, expected, in);
>         LogicalType logicalType = expected.getLogicalType();
>         if (logicalType != null) {
>             Conversion<?> conversion =
> this.getData().getConversionFor(logicalType);
>             if (conversion != null) {
>                 return this.convert(datum, expected, logicalType,
> conversion);
>             }
>         }
>
>         return datum;
>     }
>
>     public Conversion<Object> getConversionFor(LogicalType logicalType) {
>         return logicalType == null ? null :
> (Conversion)this.conversions.get(logicalType.getName());
>     }
> ****************
>
> Thanks,
> Csaba
>
>
>