You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/09/15 08:16:00 UTC

[jira] [Commented] (AVRO-3184) Cache Datum Type Strings in Resolve Union

    [ https://issues.apache.org/jira/browse/AVRO-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415376#comment-17415376 ] 

ASF subversion and git services commented on AVRO-3184:
-------------------------------------------------------

Commit 97d4dd72d3102233721d95a85bb5000dcf32396b in avro's branch refs/heads/master from belugabehr
[ https://gitbox.apache.org/repos/asf?p=avro.git;h=97d4dd7 ]

AVRO-3184: Cache Datum Type Strings in Resolve Union (#1301)

* AVRO-3184: Cache Datum Type Strings in Resolve Union

* Fix typo

* Address Long type

* Updated based on GitHub feedback

* Revert Map cache entry

* Remove testing artifact

* Add Avro UTF8 class to primitive cache

> Cache Datum Type Strings in Resolve Union
> -----------------------------------------
>
>                 Key: AVRO-3184
>                 URL: https://issues.apache.org/jira/browse/AVRO-3184
>             Project: Apache Avro
>          Issue Type: Improvement
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.11.0
>
>         Attachments: AVRO-3184.JPG, AVRO-master.JPG
>
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> {code:java|title=GenericData.java}
>   protected String getSchemaName(Object datum) {
>     if (datum == null || datum == JsonProperties.NULL_VALUE)
>       return Type.NULL.getName();
>     if (isRecord(datum))
>       return getRecordSchema(datum).getFullName();
>     if (isEnum(datum))
>       return getEnumSchema(datum).getFullName();
>     if (isArray(datum))
>       return Type.ARRAY.getName();
>     if (isMap(datum))
>       return Type.MAP.getName();
>     if (isFixed(datum))
>       return getFixedSchema(datum).getFullName();
>     if (isString(datum))
>       return Type.STRING.getName();
>     if (isBytes(datum))
>       return Type.BYTES.getName();
>     if (isInteger(datum))
>       return Type.INT.getName();
>     if (isLong(datum))
>       return Type.LONG.getName();
>     if (isFloat(datum))
>       return Type.FLOAT.getName();
>     if (isDouble(datum))
>       return Type.DOUBLE.getName();
>     if (isBoolean(datum))
>       return Type.BOOLEAN.getName();
>     throw new AvroRuntimeException(String.format("Unknown datum type %s: %s", datum.getClass().getName(), datum));
>   }
> {code}
> This is a lot of effort for each of the simple native types (Long, Float, Double, etc.) type.  It is the last thing that is checked.  Add a cache for these simple use cases.
> I came across this while examining performance of Apache ORC which includes an Avro benchmark for comparison.  You can see the charts with the change implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)