You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "jason mathews (Jira)" <ji...@apache.org> on 2020/01/06 15:27:00 UTC

[jira] [Commented] (AVRO-2070) Tolerate any Number when writing primitive values in Java in GenericDatumWriter

    [ https://issues.apache.org/jira/browse/AVRO-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17008942#comment-17008942 ] 

jason mathews commented on AVRO-2070:
-------------------------------------

This issue should be categorized as a Bug not an Improvement.

I'm running into this issue and have to create a custom GenericDatumWriter class to allow for mixed number type instances as this fix would elminate doing so.

Using a mix number types in Java (Short, Integer, Long, Float) when type is Double results in a ClassCastException.

Java Example:

 
{code:java}
Schema doubleType = Schema.create(Schema.Type.DOUBLE);
Schema.Field field = new Schema.Field("d", doubleType);
List<Schema.Field> fields = Collections.singletonList(field);
Schema schema = Schema.createRecord("test", "doc", "", false, fields);
// serialize
GenericDatumWriter<GenericData.Record> datumWriter = new GenericDatumWriter<>(schema);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
try(DataFileWriter<GenericData.Record> dataWriter = new DataFileWriter<>(datumWriter)) {
  dataWriter.create(schema, bos);
  GenericData.Record r = new GenericData.Record(schema);
  r.put("d", 123.456);
  dataWriter.append(r);
 
  r = new GenericData.Record(schema);
  r.put("d", 123); // try as Integer
  dataWriter.append(r); // throws exception
 
{code}
Output:

 
{noformat}
Exception in thread "main" org.apache.avro.file.DataFileWriter$AppendWriteException:  java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Double
{noformat}
 

But having mixed numeric types in Python fastavro implementation has no such number restriction and a double schema type for example can contain a mix of floating point or integers.

Python Example:

 
{code:java}
from fastavro import json_writer, json_reader, parse_schema
schema = {
 "namespace": "",
 "type": "record",
 "name": "record",
 "fields": [
 { "name": "d", "type": "double" }
 ]
}
parsed_schema = parse_schema(schema)
records = [
 { u'd': 1.2345 },
 { u'd': 12345 }
]
with open('test.avro', 'w') as out:
 #fastavro.schemaless_writer(out, parsed_schema, { u'd': 1.2345 } )
 #fastavro.schemaless_writer(out, parsed_schema, { u'd': 12345 } )
 json_writer(out, parsed_schema, records)
with open('test.avro', 'r') as fo:
 avro_reader = json_reader(fo, schema)
 for record in avro_reader:
 print(record)
"""
output:
{'d': 1.2345}
{'d': 12345}
"""
{code}
 

> Tolerate any Number when writing primitive values in Java in GenericDatumWriter
> -------------------------------------------------------------------------------
>
>                 Key: AVRO-2070
>                 URL: https://issues.apache.org/jira/browse/AVRO-2070
>             Project: Apache Avro
>          Issue Type: Improvement
>          Components: java
>            Reporter: Daniil Gitelson
>            Priority: Major
>
> Tolerating any Number (instead of concrete Long, Double, Float) makes possible to use mutable Number implmentation for performance reasons (specially for primitive collection iterations)
> Currently, this only works for int only:
> {code:java}
>       // Here it works
>       case INT:     out.writeInt(((Number)datum).intValue()); break;
>       // This should be replaced with ((Number)datum).longValue() etc
>       case LONG:    out.writeLong((Long)datum);       break;
>       case FLOAT:   out.writeFloat((Float)datum);     break;
>       case DOUBLE:  out.writeDouble((Double)datum);   break;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)