You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Andrei Pangin (Jira)" <ji...@apache.org> on 2022/10/09 22:58:00 UTC

[jira] [Created] (PARQUET-2202) Redundant String allocation on the hot path in CapacityByteArrayOutputStream.setByte

Andrei Pangin created PARQUET-2202:
--------------------------------------

             Summary: Redundant String allocation on the hot path in CapacityByteArrayOutputStream.setByte
                 Key: PARQUET-2202
                 URL: https://issues.apache.org/jira/browse/PARQUET-2202
             Project: Parquet
          Issue Type: Bug
          Components: parquet-mr
    Affects Versions: 1.12.3
            Reporter: Andrei Pangin
         Attachments: profile-alloc.png, profile-cpu.png

Profiling of a Spark application revealed a performance issue in production:

{{CapacityByteArrayOutputStream.setByte}} consumed 2.2% of total CPU time and made up 4.6% of total allocations. However, in normal case, this method should allocate nothing at all.

Here is an excerpt from async-profiler report.

CPU profile:

!profile-cpu.png!

Allocation profile:

!profile-alloc.png!

The reason is a {{checkArgument()}} call with an unconditionally constructed dynamic String:

[https://github.com/apache/parquet-mr/blob/62b774cd0f0c60cfbe540bbfa60bee15929af5d4/parquet-common/src/main/java/org/apache/parquet/bytes/CapacityByteArrayOutputStream.java#L303]

The suggested fix is to move String construction under the condition:
{code:java}
if (index >= bytesUsed) {
    throw new IllegalArgumentException("Index: " + index + " is >= the current size of: " + bytesUsed);
}{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)