You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "David Mollitor (Jira)" <ji...@apache.org> on 2021/08/04 21:07:00 UTC

[jira] [Created] (AVRO-3184) Cache Datum Type Strings in Resolve Union

David Mollitor created AVRO-3184:
------------------------------------

             Summary: Cache Datum Type Strings in Resolve Union
                 Key: AVRO-3184
                 URL: https://issues.apache.org/jira/browse/AVRO-3184
             Project: Apache Avro
          Issue Type: Improvement
            Reporter: David Mollitor
            Assignee: David Mollitor


{code:java|title=GenericData.java}
  protected String getSchemaName(Object datum) {
    if (datum == null || datum == JsonProperties.NULL_VALUE)
      return Type.NULL.getName();
    if (isRecord(datum))
      return getRecordSchema(datum).getFullName();
    if (isEnum(datum))
      return getEnumSchema(datum).getFullName();
    if (isArray(datum))
      return Type.ARRAY.getName();
    if (isMap(datum))
      return Type.MAP.getName();
    if (isFixed(datum))
      return getFixedSchema(datum).getFullName();
    if (isString(datum))
      return Type.STRING.getName();
    if (isBytes(datum))
      return Type.BYTES.getName();
    if (isInteger(datum))
      return Type.INT.getName();
    if (isLong(datum))
      return Type.LONG.getName();
    if (isFloat(datum))
      return Type.FLOAT.getName();
    if (isDouble(datum))
      return Type.DOUBLE.getName();
    if (isBoolean(datum))
      return Type.BOOLEAN.getName();
    throw new AvroRuntimeException(String.format("Unknown datum type %s: %s", datum.getClass().getName(), datum));
  }
{code}

This is a lot of effort for each of the simple native types (Long, Float, Double, etc.) type.  It is the last thing that is checked.  Add a cache for these simple use cases.

I came across this while examining performance of Apache ORC which includes an Avro benchmark for comparison.  You can see the charts with the change implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)