You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Dawid Wysakowicz (Jira)" <ji...@apache.org> on 2020/10/08 14:57:00 UTC
[jira] [Comment Edited] (FLINK-19440) Performance regression on 15.09.2020

    [ https://issues.apache.org/jira/browse/FLINK-19440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210259#comment-17210259 ] 

Dawid Wysakowicz edited comment on FLINK-19440 at 10/8/20, 2:56 PM:
--------------------------------------------------------------------

It's not easy to tell. Increasing the number of records will increase the throughput, because of the initialization in the first record. 

However, from my investigation in the code, the hot path in Avro 1.10 also has some new code that will result in slower performance (additonal checks and more complex instantiation logic for SpecificRecords - I checked that under a profiler).
Therefore as far as I can tell there is a regression on hot path in Avro 1.10.

The bit were I saw the biggest difference between 1.10 and 1.8.2 on the hot path is in {{SpecificData#getClass}} method if the type of the passed {{Schema}} is {{RECORD}}

1.10:
{code}
    case FIXED:
    case RECORD:
    case ENUM:
      String name = schema.getFullName();
      if (name == null)
        return null;
      Class c = classCache.computeIfAbsent(name, n -> {
        try {
          return ClassUtils.forName(getClassLoader(), getClassName(schema));
        } catch (ClassNotFoundException e) {
          try { // nested class?
            return ClassUtils.forName(getClassLoader(), getNestedClassName(schema));
          } catch (ClassNotFoundException ex) {
            return NO_CLASS;
          }
        }
      });
      return c == NO_CLASS ? null : c;
{code}

1.8.2:
{code}
    case FIXED:
    case RECORD:
    case ENUM:
      String name = schema.getFullName();
      if (name == null) return null;
      Class c = classCache.get(name);
      if (c == null) {
        try {
          c = ClassUtils.forName(getClassLoader(), getClassName(schema));
        } catch (ClassNotFoundException e) {
          c = NO_CLASS;
        }
        classCache.put(name, c);
      }
      return c == NO_CLASS ? null : c;
{code}


was (Author: dawidwys):
It's not easy to tell. Increasing the number of records will increase the throughput, because of the initialization in the first record. 

However, from my investigation in the code, the hot path in Avro 1.10 also has some new code that will result in slower performance (additonal checks and more complex instantiation logic for SpecificRecords - I checked that under a profiler).
Therefore as far as I can tell there is a regression on hot path in Avro 1.10.

> Performance regression on 15.09.2020
> ------------------------------------
>
>                 Key: FLINK-19440
>                 URL: https://issues.apache.org/jira/browse/FLINK-19440
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Type Serialization System, Benchmarks
>    Affects Versions: 1.12.0
>            Reporter: Piotr Nowojski
>            Assignee: Dawid Wysakowicz
>            Priority: Blocker
>         Attachments: jmh-result.csv
>
>
> {{serializerAvro}} benchmark is showing a regression on 15.09.2020:
> http://codespeed.dak8s.net:8000/timeline/?ben=serializerAvro&env=2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)