You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Fady <fa...@legsem.com> on 2014/11/08 10:09:40 UTC

How to get Specific classes to expose BigDecimal fields

Hello,

I am working on a project that aims at converting Mainframe data to Avro 
records (https://github.com/legsem/legstar.avro).

Mainframe data often contains Decimal types. For these, I would like the 
corresponding Avro records to expose BigDecimal fields.

Initially, I followed the recommendation here: 
http://avro.apache.org/docs/1.7.7/spec.html#Decimal. My schema contains 
for instance:

     {
       "name":"transactionAmount",
       "type":{
         "type":"bytes",
         "logicalType":"decimal",
         "precision":7,
         "scale":2
       }
     }

This works fine but the Avro Specific record produced by the 
SpecificCompiler exposes a ByteBuffer for that field.

   @Deprecated public java.nio.ByteBuffer transactionAmount;

Not what I want.

I tried this alternative:

     {
       "name":"transactionAmount",
       "type":{
         "type":"string",
         "java-class":"java.math.BigDecimal",
         "logicalType":"decimal",
         "precision":7,
         "scale":2
       }

Now the SchemaCompiler produces the result I need:

   @Deprecated public java.math.BigDecimal transactionAmount;

There are 2 problems though:

1. It is less efficient to serialize/deserialize a BigDecimal from a 
string rather then the 2's complement.

2. The Specific Record obtained this way cannot be populated using a 
deep copy from a Generic Record.

To clarify the second point:

When I convert the mainframe data I do something like:

         GenericRecord genericRecord = new GenericData.Record(schema);
         ... populate genericRecord from Mainframe data ...
         return (D) SpecificData.get().deepCopy(schema, genericRecord);

This fails with :
         java.lang.ClassCastException: java.lang.String cannot be cast 
to java.math.BigDecimal
             at 
legstar.avro.test.specific.cusdat.Transaction.put(Transaction.java:47)
             at 
org.apache.avro.generic.GenericData.setField(GenericData.java:573)
             at 
org.apache.avro.generic.GenericData.setField(GenericData.java:590)
             at 
org.apache.avro.generic.GenericData.deepCopy(GenericData.java:972)
             at 
org.apache.avro.generic.GenericData.deepCopy(GenericData.java:926)
             at 
org.apache.avro.generic.GenericData.deepCopy(GenericData.java:970)
             at 
org.apache.avro.generic.GenericData.deepCopy(GenericData.java:970)


This is because the code in the Specific record assumes the value 
received is already a BigDecimal

     case 1: transactionAmount = (java.math.BigDecimal)value$; break;

In other words, the java-class trick produces the right interface for 
Specific classes but the internal data types are not consistent with the 
GenericRecord derived from the same schema.

So my question is: what would be a better approach for Specific classes 
to expose BigDecimal fields?


Re: How to get Specific classes to expose BigDecimal fields

Posted by Michael Pigott <mp...@gmail.com>.
You're welcome!  I'm glad I was able to help.  If you find a better
long-term solution, feel free to offer it to AVRO-1497!  -Mike

On Tue Nov 11 2014 at 3:44:26 AM Fady <fa...@legsem.com> wrote:

>
> Thank you Mike for taking the time to reply to this,
>
> I looked at your code and applied the AVRO-457 patch you did. Indeed you
> fixed a very similar problem. In your case XMLSchema delivers and expects
> BigDecimals so you mapped that to ByteBuffer as specified in
> http://avro.apache.org/docs/1.7.7/spec.html#Decimal.
>
> As for the SpecificCompiler, I ended up creating s custom compiler:
>
>       /**
>        * Temporary workaround for the lack of support for BigDecimal in
> Avro Specific Compiler.
>        * <p/>
>        * The record.vm template is customized to expose BigDecimal getters
> and setters.
>        *
>        */
>       public class CustomSpecificCompiler extends SpecificCompiler {
>
>           private static final String TEMPLATES_PATH =
> "/com/legstar/avro/generator/specific/templates/java/classic/";
>
>           public CustomSpecificCompiler(Schema schema) {
>               super(schema);
>               setTemplateDir(TEMPLATES_PATH);
>           }
>
>           /**
>            * In the case of BigDecimals there is an internal java type
> (ByteBuffer)
>            * and an external java type for getters/setters.
>            *
>            * @param schema the field schema
>            * @return the field java type
>            */
>           public String externalJavaType(Schema schema) {
>               return isBigDecimal(schema) ? "java.math.BigDecimal" : super
>                       .javaType(schema);
>           }
>
>           /** Tests whether a field is to be externalized as a BigDecimal
> */
>           public static boolean isBigDecimal(Schema schema) {
>               if (Type.BYTES == schema.getType()) {
>                   JsonNode logicalTypeNode =
> schema.getJsonProp("logicalType");
>                   if (logicalTypeNode != null
>                           && "decimal".equals(logicalTypeNode.asText())) {
>                       return true;
>                   }
>               }
>               return false;
>           }
>
>       }
>
> And then changed the record.vm velocity template like this:
>
>     72c72
>     <   public ${this.mangle($schema.getName())}(#foreach($field in
> $schema.getFields())${this.externalJavaType($field.schema())}
> ${this.mangle($field.name())}#if($velocityCount <
> $schema.getFields().size()), #end#end) {
>     ---
>     >   public ${this.mangle($schema.getName())}(#foreach($field in
> $schema.getFields())${this.javaType($field.schema())} ${this.mangle($
> field.name())}#if($velocityCount < $schema.getFields().size()), #end#end)
> {
>     74c74
>     <     ${this.generateSetMethod($schema, $field)}(${this.mangle($
> field.name())});
>     ---
>     >     this.${this.mangle($field.name())} = ${this.mangle($field.name
> ())};
>     110,113c110
>     <   public ${this.externalJavaType($field.schema())}
> ${this.generateGetMethod($schema, $field)}() {
>     < #if ($this.isBigDecimal($field.schema()))
>     <     return new java.math.BigDecimal(new
> java.math.BigInteger(${this.mangle($field.name())}.array()),
> $field.schema().getJsonProp("scale"));
>     < #else
>     ---
>     >   public ${this.javaType($field.schema())}
> ${this.generateGetMethod($schema, $field)}() {
>     115d111
>     < #end
>     124,127c120
>     <   public void ${this.generateSetMethod($schema,
> $field)}(${this.externalJavaType($field.schema())} value) {
>     < #if ($this.isBigDecimal($field.schema()))
>     <     this.${this.mangle($field.name(), $schema.isError())} =
> java.nio.ByteBuffer.wrap(value.unscaledValue().toByteArray());
>     < #else
>     ---
>     >   public void ${this.generateSetMethod($schema,
> $field)}(${this.javaType($field.schema())} value) {
>     129d121
>     < #end
>
> This fixes the issue for me but is not a good long term solution.
> Particularly the builder part of the generated Specific class is still
> exposing ByteBuffer instead of BigDecimal which is inconsistent.
>
> More generally, it seems to me a better solution would be that the
> "java-class" trick be extended so that more complex conversions can occur
> between the avro type and the java type exposed by Specific classes. Right
> now, the java type must be castable from the avro type which is limiting.
>
> Anyway, thanks again for your great insight.
>
>
> Fady
>
>
>
>
>
> On 11/11/2014 05:06, Michael Pigott wrote:
>
> Hi Fady,
>     Properly handling BigDecimal types in Java is still an open question.
> AVRO-1402 [1] added BigDecimal types to the Avro spec, but the Java support
> is an open ticket under AVRO-1497 [2].  When I added BigDecimal support to
> AVRO-457 (XML <-> Avro support), I added support for the Avro decimal
> logical type using Java BigDecimals.  You can see the conversion code [3]
> as well as the writer [4] and reader [5] code in my GitHub repository, or
> download the patch in AVRO-457 [6] and look for BigDecimal in the
> Utils.java, XmlDatumWriter.java, and XmlDatumReader.java files,
> respectively.
>
>  Good luck!
> Mike
>
>  [1] https://issues.apache.org/jira/browse/AVRO-1402
> [2] https://issues.apache.org/jira/browse/AVRO-1497
> [3]
> https://github.com/mikepigott/xml-to-avro/blob/master/avro-to-xml/src/main/java/org/apache/avro/xml/Utils.java#L537
> [4]
> https://github.com/mikepigott/xml-to-avro/blob/master/avro-to-xml/src/main/java/org/apache/avro/xml/XmlDatumWriter.java#L1150
> [5]
> https://github.com/mikepigott/xml-to-avro/blob/master/avro-to-xml/src/main/java/org/apache/avro/xml/XmlDatumReader.java#L998
> [6] https://issues.apache.org/jira/browse/AVRO-457
>
> On Sat Nov 08 2014 at 4:11:32 AM Fady <fa...@legsem.com> wrote:
>
>> Hello,
>>
>> I am working on a project that aims at converting Mainframe data to Avro
>> records (https://github.com/legsem/legstar.avro).
>>
>> Mainframe data often contains Decimal types. For these, I would like the
>> corresponding Avro records to expose BigDecimal fields.
>>
>> Initially, I followed the recommendation here:
>> http://avro.apache.org/docs/1.7.7/spec.html#Decimal. My schema contains
>> for instance:
>>
>>      {
>>        "name":"transactionAmount",
>>        "type":{
>>          "type":"bytes",
>>          "logicalType":"decimal",
>>          "precision":7,
>>          "scale":2
>>        }
>>      }
>>
>> This works fine but the Avro Specific record produced by the
>> SpecificCompiler exposes a ByteBuffer for that field.
>>
>>    @Deprecated public java.nio.ByteBuffer transactionAmount;
>>
>> Not what I want.
>>
>> I tried this alternative:
>>
>>      {
>>        "name":"transactionAmount",
>>        "type":{
>>          "type":"string",
>>          "java-class":"java.math.BigDecimal",
>>          "logicalType":"decimal",
>>          "precision":7,
>>          "scale":2
>>        }
>>
>> Now the SchemaCompiler produces the result I need:
>>
>>    @Deprecated public java.math.BigDecimal transactionAmount;
>>
>> There are 2 problems though:
>>
>> 1. It is less efficient to serialize/deserialize a BigDecimal from a
>> string rather then the 2's complement.
>>
>> 2. The Specific Record obtained this way cannot be populated using a
>> deep copy from a Generic Record.
>>
>> To clarify the second point:
>>
>> When I convert the mainframe data I do something like:
>>
>>          GenericRecord genericRecord = new GenericData.Record(schema);
>>          ... populate genericRecord from Mainframe data ...
>>          return (D) SpecificData.get().deepCopy(schema, genericRecord);
>>
>> This fails with :
>>          java.lang.ClassCastException: java.lang.String cannot be cast
>> to java.math.BigDecimal
>>              at
>> legstar.avro.test.specific.cusdat.Transaction.put(Transaction.java:47)
>>              at
>> org.apache.avro.generic.GenericData.setField(GenericData.java:573)
>>              at
>> org.apache.avro.generic.GenericData.setField(GenericData.java:590)
>>              at
>> org.apache.avro.generic.GenericData.deepCopy(GenericData.java:972)
>>              at
>> org.apache.avro.generic.GenericData.deepCopy(GenericData.java:926)
>>              at
>> org.apache.avro.generic.GenericData.deepCopy(GenericData.java:970)
>>              at
>> org.apache.avro.generic.GenericData.deepCopy(GenericData.java:970)
>>
>>
>> This is because the code in the Specific record assumes the value
>> received is already a BigDecimal
>>
>>      case 1: transactionAmount = (java.math.BigDecimal)value$; break;
>>
>> In other words, the java-class trick produces the right interface for
>> Specific classes but the internal data types are not consistent with the
>> GenericRecord derived from the same schema.
>>
>> So my question is: what would be a better approach for Specific classes
>> to expose BigDecimal fields?
>>
>>
>

Re: How to get Specific classes to expose BigDecimal fields

Posted by Fady <fa...@legsem.com>.
Thank you Mike for taking the time to reply to this,

I looked at your code and applied the AVRO-457 patch you did. Indeed you 
fixed a very similar problem. In your case XMLSchema delivers and 
expects BigDecimals so you mapped that to ByteBuffer as specified in 
http://avro.apache.org/docs/1.7.7/spec.html#Decimal.

As for the SpecificCompiler, I ended up creating s custom compiler:

       /**
        * Temporary workaround for the lack of support for BigDecimal in 
Avro Specific Compiler.
        * <p/>
        * The record.vm template is customized to expose BigDecimal 
getters and setters.
        *
        */
       public class CustomSpecificCompiler extends SpecificCompiler {

           private static final String TEMPLATES_PATH = 
"/com/legstar/avro/generator/specific/templates/java/classic/";

           public CustomSpecificCompiler(Schema schema) {
               super(schema);
               setTemplateDir(TEMPLATES_PATH);
           }

           /**
            * In the case of BigDecimals there is an internal java type 
(ByteBuffer)
            * and an external java type for getters/setters.
            *
            * @param schema the field schema
            * @return the field java type
            */
           public String externalJavaType(Schema schema) {
               return isBigDecimal(schema) ? "java.math.BigDecimal" : super
                       .javaType(schema);
           }

           /** Tests whether a field is to be externalized as a 
BigDecimal */
           public static boolean isBigDecimal(Schema schema) {
               if (Type.BYTES == schema.getType()) {
                   JsonNode logicalTypeNode = 
schema.getJsonProp("logicalType");
                   if (logicalTypeNode != null
                           && "decimal".equals(logicalTypeNode.asText())) {
                       return true;
                   }
               }
               return false;
           }

       }

And then changed the record.vm velocity template like this:

     72c72
     <   public ${this.mangle($schema.getName())}(#foreach($field in 
$schema.getFields())${this.externalJavaType($field.schema())} 
${this.mangle($field.name())}#if($velocityCount < 
$schema.getFields().size()), #end#end) {
     ---
     >   public ${this.mangle($schema.getName())}(#foreach($field in 
$schema.getFields())${this.javaType($field.schema())} 
${this.mangle($field.name())}#if($velocityCount < 
$schema.getFields().size()), #end#end) {
     74c74
     <     ${this.generateSetMethod($schema, 
$field)}(${this.mangle($field.name())});
     ---
     >     this.${this.mangle($field.name())} = 
${this.mangle($field.name())};
     110,113c110
     <   public ${this.externalJavaType($field.schema())} 
${this.generateGetMethod($schema, $field)}() {
     < #if ($this.isBigDecimal($field.schema()))
     <     return new java.math.BigDecimal(new 
java.math.BigInteger(${this.mangle($field.name())}.array()), 
$field.schema().getJsonProp("scale"));
     < #else
     ---
     >   public ${this.javaType($field.schema())} 
${this.generateGetMethod($schema, $field)}() {
     115d111
     < #end
     124,127c120
     <   public void ${this.generateSetMethod($schema, 
$field)}(${this.externalJavaType($field.schema())} value) {
     < #if ($this.isBigDecimal($field.schema()))
     <     this.${this.mangle($field.name(), $schema.isError())} = 
java.nio.ByteBuffer.wrap(value.unscaledValue().toByteArray());
     < #else
     ---
     >   public void ${this.generateSetMethod($schema, 
$field)}(${this.javaType($field.schema())} value) {
     129d121
     < #end

This fixes the issue for me but is not a good long term solution. 
Particularly the builder part of the generated Specific class is still 
exposing ByteBuffer instead of BigDecimal which is inconsistent.

More generally, it seems to me a better solution would be that the 
"java-class" trick be extended so that more complex conversions can 
occur between the avro type and the java type exposed by Specific 
classes. Right now, the java type must be castable from the avro type 
which is limiting.

Anyway, thanks again for your great insight.

Fady




On 11/11/2014 05:06, Michael Pigott wrote:
> Hi Fady,
>     Properly handling BigDecimal types in Java is still an open 
> question.  AVRO-1402 [1] added BigDecimal types to the Avro spec, but 
> the Java support is an open ticket under AVRO-1497 [2].  When I added 
> BigDecimal support to AVRO-457 (XML <-> Avro support), I added support 
> for the Avro decimal logical type using Java BigDecimals.  You can see 
> the conversion code [3] as well as the writer [4] and reader [5] code 
> in my GitHub repository, or download the patch in AVRO-457 [6] and 
> look for BigDecimal in the Utils.java, XmlDatumWriter.java, and 
> XmlDatumReader.java files, respectively.
>
> Good luck!
> Mike
>
> [1] https://issues.apache.org/jira/browse/AVRO-1402
> [2] https://issues.apache.org/jira/browse/AVRO-1497
> [3] 
> https://github.com/mikepigott/xml-to-avro/blob/master/avro-to-xml/src/main/java/org/apache/avro/xml/Utils.java#L537
> [4] 
> https://github.com/mikepigott/xml-to-avro/blob/master/avro-to-xml/src/main/java/org/apache/avro/xml/XmlDatumWriter.java#L1150
> [5] 
> https://github.com/mikepigott/xml-to-avro/blob/master/avro-to-xml/src/main/java/org/apache/avro/xml/XmlDatumReader.java#L998
> [6] https://issues.apache.org/jira/browse/AVRO-457
>
> On Sat Nov 08 2014 at 4:11:32 AM Fady <fady@legsem.com 
> <ma...@legsem.com>> wrote:
>
>     Hello,
>
>     I am working on a project that aims at converting Mainframe data
>     to Avro
>     records (https://github.com/legsem/legstar.avro).
>
>     Mainframe data often contains Decimal types. For these, I would
>     like the
>     corresponding Avro records to expose BigDecimal fields.
>
>     Initially, I followed the recommendation here:
>     http://avro.apache.org/docs/1.7.7/spec.html#Decimal. My schema
>     contains
>     for instance:
>
>          {
>            "name":"transactionAmount",
>            "type":{
>              "type":"bytes",
>              "logicalType":"decimal",
>              "precision":7,
>              "scale":2
>            }
>          }
>
>     This works fine but the Avro Specific record produced by the
>     SpecificCompiler exposes a ByteBuffer for that field.
>
>        @Deprecated public java.nio.ByteBuffer transactionAmount;
>
>     Not what I want.
>
>     I tried this alternative:
>
>          {
>            "name":"transactionAmount",
>            "type":{
>              "type":"string",
>              "java-class":"java.math.BigDecimal",
>              "logicalType":"decimal",
>              "precision":7,
>              "scale":2
>            }
>
>     Now the SchemaCompiler produces the result I need:
>
>        @Deprecated public java.math.BigDecimal transactionAmount;
>
>     There are 2 problems though:
>
>     1. It is less efficient to serialize/deserialize a BigDecimal from a
>     string rather then the 2's complement.
>
>     2. The Specific Record obtained this way cannot be populated using a
>     deep copy from a Generic Record.
>
>     To clarify the second point:
>
>     When I convert the mainframe data I do something like:
>
>              GenericRecord genericRecord = new GenericData.Record(schema);
>              ... populate genericRecord from Mainframe data ...
>              return (D) SpecificData.get().deepCopy(schema,
>     genericRecord);
>
>     This fails with :
>              java.lang.ClassCastException: java.lang.String cannot be cast
>     to java.math.BigDecimal
>                  at
>     legstar.avro.test.specific.cusdat.Transaction.put(Transaction.java:47)
>                  at
>     org.apache.avro.generic.GenericData.setField(GenericData.java:573)
>                  at
>     org.apache.avro.generic.GenericData.setField(GenericData.java:590)
>                  at
>     org.apache.avro.generic.GenericData.deepCopy(GenericData.java:972)
>                  at
>     org.apache.avro.generic.GenericData.deepCopy(GenericData.java:926)
>                  at
>     org.apache.avro.generic.GenericData.deepCopy(GenericData.java:970)
>                  at
>     org.apache.avro.generic.GenericData.deepCopy(GenericData.java:970)
>
>
>     This is because the code in the Specific record assumes the value
>     received is already a BigDecimal
>
>          case 1: transactionAmount = (java.math.BigDecimal)value$; break;
>
>     In other words, the java-class trick produces the right interface for
>     Specific classes but the internal data types are not consistent
>     with the
>     GenericRecord derived from the same schema.
>
>     So my question is: what would be a better approach for Specific
>     classes
>     to expose BigDecimal fields?
>


Re: How to get Specific classes to expose BigDecimal fields

Posted by Michael Pigott <mp...@gmail.com>.
Hi Fady,
    Properly handling BigDecimal types in Java is still an open question.
AVRO-1402 [1] added BigDecimal types to the Avro spec, but the Java support
is an open ticket under AVRO-1497 [2].  When I added BigDecimal support to
AVRO-457 (XML <-> Avro support), I added support for the Avro decimal
logical type using Java BigDecimals.  You can see the conversion code [3]
as well as the writer [4] and reader [5] code in my GitHub repository, or
download the patch in AVRO-457 [6] and look for BigDecimal in the
Utils.java, XmlDatumWriter.java, and XmlDatumReader.java files,
respectively.

Good luck!
Mike

[1] https://issues.apache.org/jira/browse/AVRO-1402
[2] https://issues.apache.org/jira/browse/AVRO-1497
[3]
https://github.com/mikepigott/xml-to-avro/blob/master/avro-to-xml/src/main/java/org/apache/avro/xml/Utils.java#L537
[4]
https://github.com/mikepigott/xml-to-avro/blob/master/avro-to-xml/src/main/java/org/apache/avro/xml/XmlDatumWriter.java#L1150
[5]
https://github.com/mikepigott/xml-to-avro/blob/master/avro-to-xml/src/main/java/org/apache/avro/xml/XmlDatumReader.java#L998
[6] https://issues.apache.org/jira/browse/AVRO-457

On Sat Nov 08 2014 at 4:11:32 AM Fady <fa...@legsem.com> wrote:

> Hello,
>
> I am working on a project that aims at converting Mainframe data to Avro
> records (https://github.com/legsem/legstar.avro).
>
> Mainframe data often contains Decimal types. For these, I would like the
> corresponding Avro records to expose BigDecimal fields.
>
> Initially, I followed the recommendation here:
> http://avro.apache.org/docs/1.7.7/spec.html#Decimal. My schema contains
> for instance:
>
>      {
>        "name":"transactionAmount",
>        "type":{
>          "type":"bytes",
>          "logicalType":"decimal",
>          "precision":7,
>          "scale":2
>        }
>      }
>
> This works fine but the Avro Specific record produced by the
> SpecificCompiler exposes a ByteBuffer for that field.
>
>    @Deprecated public java.nio.ByteBuffer transactionAmount;
>
> Not what I want.
>
> I tried this alternative:
>
>      {
>        "name":"transactionAmount",
>        "type":{
>          "type":"string",
>          "java-class":"java.math.BigDecimal",
>          "logicalType":"decimal",
>          "precision":7,
>          "scale":2
>        }
>
> Now the SchemaCompiler produces the result I need:
>
>    @Deprecated public java.math.BigDecimal transactionAmount;
>
> There are 2 problems though:
>
> 1. It is less efficient to serialize/deserialize a BigDecimal from a
> string rather then the 2's complement.
>
> 2. The Specific Record obtained this way cannot be populated using a
> deep copy from a Generic Record.
>
> To clarify the second point:
>
> When I convert the mainframe data I do something like:
>
>          GenericRecord genericRecord = new GenericData.Record(schema);
>          ... populate genericRecord from Mainframe data ...
>          return (D) SpecificData.get().deepCopy(schema, genericRecord);
>
> This fails with :
>          java.lang.ClassCastException: java.lang.String cannot be cast
> to java.math.BigDecimal
>              at
> legstar.avro.test.specific.cusdat.Transaction.put(Transaction.java:47)
>              at
> org.apache.avro.generic.GenericData.setField(GenericData.java:573)
>              at
> org.apache.avro.generic.GenericData.setField(GenericData.java:590)
>              at
> org.apache.avro.generic.GenericData.deepCopy(GenericData.java:972)
>              at
> org.apache.avro.generic.GenericData.deepCopy(GenericData.java:926)
>              at
> org.apache.avro.generic.GenericData.deepCopy(GenericData.java:970)
>              at
> org.apache.avro.generic.GenericData.deepCopy(GenericData.java:970)
>
>
> This is because the code in the Specific record assumes the value
> received is already a BigDecimal
>
>      case 1: transactionAmount = (java.math.BigDecimal)value$; break;
>
> In other words, the java-class trick produces the right interface for
> Specific classes but the internal data types are not consistent with the
> GenericRecord derived from the same schema.
>
> So my question is: what would be a better approach for Specific classes
> to expose BigDecimal fields?
>
>