You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "David Mollitor (Jira)" <ji...@apache.org> on 2021/08/04 21:07:00 UTC
[jira] [Created] (AVRO-3184) Cache Datum Type Strings in Resolve
Union
David Mollitor created AVRO-3184:
------------------------------------
Summary: Cache Datum Type Strings in Resolve Union
Key: AVRO-3184
URL: https://issues.apache.org/jira/browse/AVRO-3184
Project: Apache Avro
Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor
{code:java|title=GenericData.java}
protected String getSchemaName(Object datum) {
if (datum == null || datum == JsonProperties.NULL_VALUE)
return Type.NULL.getName();
if (isRecord(datum))
return getRecordSchema(datum).getFullName();
if (isEnum(datum))
return getEnumSchema(datum).getFullName();
if (isArray(datum))
return Type.ARRAY.getName();
if (isMap(datum))
return Type.MAP.getName();
if (isFixed(datum))
return getFixedSchema(datum).getFullName();
if (isString(datum))
return Type.STRING.getName();
if (isBytes(datum))
return Type.BYTES.getName();
if (isInteger(datum))
return Type.INT.getName();
if (isLong(datum))
return Type.LONG.getName();
if (isFloat(datum))
return Type.FLOAT.getName();
if (isDouble(datum))
return Type.DOUBLE.getName();
if (isBoolean(datum))
return Type.BOOLEAN.getName();
throw new AvroRuntimeException(String.format("Unknown datum type %s: %s", datum.getClass().getName(), datum));
}
{code}
This is a lot of effort for each of the simple native types (Long, Float, Double, etc.) type. It is the last thing that is checked. Add a cache for these simple use cases.
I came across this while examining performance of Apache ORC which includes an Avro benchmark for comparison. You can see the charts with the change implemented.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)