You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2014/06/06 23:58:01 UTC
[jira] [Updated] (AVRO-1500) Unknown datum type exception during
union type resolution (no short to int conversion).
[ https://issues.apache.org/jira/browse/AVRO-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doug Cutting updated AVRO-1500:
-------------------------------
Resolution: Fixed
Fix Version/s: 1.7.7
Assignee: Michael Pershyn
Status: Resolved (was: Patch Available)
I committed this. Thanks, Michael!
> Unknown datum type exception during union type resolution (no short to int conversion).
> ---------------------------------------------------------------------------------------
>
> Key: AVRO-1500
> URL: https://issues.apache.org/jira/browse/AVRO-1500
> Project: Avro
> Issue Type: Bug
> Affects Versions: 1.7.4, 1.7.5, 1.7.6
> Environment: java API
> Reporter: Michael Pershyn
> Assignee: Michael Pershyn
> Labels: easyfix, newbie, patch
> Fix For: 1.7.7
>
> Attachments: AVRO-1500-README.patch, AVRO-1500-ThriftData.patch, AVRO-1500-unit-test.diff, AVRO-1500.patch
>
>
> There is a conversion for values of type {{short}} (and other numeric types) if in the schema they are declared as int:
> {code:title=/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumWriter.java}
> protected void write(Schema schema, Object datum, Encoder out)
> throws IOException {
> try {
> switch (schema.getType()) {
> // ...
> case INT: out.writeInt(((Number)datum).intValue()); break;
> // ...
> {code}
> So, if a value of {{short}} type is passed to INT field, it will be converted and saved as INT in avro.
> But, when there is next field in schema:
> {code}
> ["null",{"type":"int","thrift":"short"}] {code}
> which is a union with int in it, and short is passed in (lets say 5), then we are having the next exception:
> {code}
> org.apache.avro.AvroRuntimeException: Unknown datum type: 5
> at org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:593)
> at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:558)
> at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:144)
> {code}
> This happens because in {{org.apache.avro.generic.GenericData}} there is no check if the passed object has a type of {{java.lang.Short}}, and it is not get converted then in {{write}} method of {{GenericDatumWriter}}:
> {code:title=lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java}
> /** Return the schema full name for a datum. Called by {@link
> * #resolveUnion(Schema,Object)}. */
> protected String getSchemaName(Object datum) {
> /* ... */
> if (isInteger(datum))
> return Type.INT.getName();
> if (isLong(datum))
> return Type.LONG.getName();
> if (isFloat(datum))
> return Type.FLOAT.getName();
> if (isDouble(datum))
> return Type.DOUBLE.getName();
> if (isBoolean(datum))
> return Type.BOOLEAN.getName();
> throw new AvroRuntimeException("Unknown datum type: "+datum);
> {code}
> This error initially occured during thrift to avro conversion, when thrift obj had optional field of type i16.
> In thrift to avro schema converter, if the type is short in thrift (i16) it will be implicitly converted to int in avro-schema, so the values should be converted as well. This is already done if they are not in the union (not optional). Otherwise the exception is thrown.
> The snippet from schema conversion code is below:
> {code:title=lang/java/thrift/src/main/java/org/apache/avro/thrift/ThriftData.java}
> private Schema getSchema(FieldValueMetaData f) {
> switch (f.type) {
> /* ... */
> case TType.I16:
> Schema s = Schema.create(Schema.Type.INT);
> s.addProp(THRIFT_PROP, "short");
> return s;
> /* ... */
> {code}
> Proposal is to add isShort check to generic data, as well as isShort method implementation:
> {code:title=lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java}
> protected String getSchemaName(Object datum) {
> // ..
> if (isInteger(datum) || isShort(datum))
> return Type.INT.getName();
> // ..
> {code}
> or maybe even some kind of isNumeric method, so the behaviour will be same for INT fields and INT fields that are in Union.
--
This message was sent by Atlassian JIRA
(v6.2#6252)