You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Ryan Skraba (Jira)" <ji...@apache.org> on 2021/05/03 13:51:00 UTC

[jira] [Commented] (AVRO-3048) Using builders leads to performance degradation

    [ https://issues.apache.org/jira/browse/AVRO-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338379#comment-17338379 ] 

Ryan Skraba commented on AVRO-3048:
-----------------------------------

I tend to agree, we should be using the *{{MODEL$}}* that's right there, directly if it's available. 

Thanks for the fix for new generated code – I think that's the most obvious and important solution for the moment.

For old generated specific records, there's already a cache in place (keyed on the SpecificRecord class), but it looks like it's the {{*Class.forName(className)*}} that is super expensive. 

Even for old generated records, we can probably work around the {{*getForClass*}} overhead by creating one initial builder as a singleton, then creating all new builders from that one.
{code:java}
// Create this once because it makes an expensive classloader call.
private static final SimpleRecord.Builder expensive = SimpleRecord.newBuilder();

// All new instances of Builder are cloned off of the expensive one:
SimpleRecord.Builder cheap = SimpleRecord.newBuilder(expensive);{code}
A quick question though, I can't see how your PR reduces any calls to {{*newInstance(...)*}} (which is also an expensive call).  Am I mistaken?

> Using builders leads to performance degradation
> -----------------------------------------------
>
>                 Key: AVRO-3048
>                 URL: https://issues.apache.org/jira/browse/AVRO-3048
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.9.2, 1.10.1
>            Reporter: Peter
>            Assignee: Martin Jubelgas
>            Priority: Major
>
> When you do a .newBuilder() for avro generated classes, this will call
> org.apache.avro.specific.SpecificData.getForSchema:
>  
> public static SpecificData getForSchema(Schema reader) {
>     if (reader.getType() == Type.RECORD) {
>       final String className = getClassName(reader);
>       if (className != null) {
>         final Class<?> clazz;
>         try             
> {
>                              clazz = Class.forName(className);   
>                 return getForClass(clazz);             }
>           catch (ClassNotFoundException e)
> {           return SpecificData.get();
>          }
>       }
>     }
>  
> which seems then to seldom find the value inside the try and a lot of ClassNotFoundException is thrown.
> Throwing internal exceptions has great performance penalties and in practice users of avro 1.9.x. and 1.10.x in high performance applications are forced not to use builders.
>  
> Information about same problem is also found on:
> [https://forums.databricks.com/questions/50803/orgapacheavrospecificspecificdatagetforschema-sear.html]
> Problem exists on at least 1.9.2 and 1.10.1 (but not on 1.7.x) in OSGI environment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)