You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/03/19 15:10:00 UTC

[jira] [Work logged] (BEAM-14134) Many coders cause significant unnecessary allocations

     [ https://issues.apache.org/jira/browse/BEAM-14134?focusedWorklogId=744638&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-744638 ]

ASF GitHub Bot logged work on BEAM-14134:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 19/Mar/22 15:09
            Start Date: 19/Mar/22 15:09
    Worklog Time Spent: 10m 
      Work Description: steveniemitz opened a new pull request #17134:
URL: https://github.com/apache/beam/pull/17134


   Many coders have significant overhead due to the usage of `DataInputStream`.  DataInputStream allocates a significant amount of internal buffers when instantiated, which adds unnecessary overhead for very simple operations like decoding a big-endian long.
   
   This changes most coders that use DataInputStream internally to use a more optimized big-endian decoder.  I actually benchmarked three different options here, the solution I arrived at was the best mix of performance and allocations.
   
   ```
   Benchmark                Mode  Cnt          Score         Error  Units
   readLongViaLocalBuffer  thrpt   10  204364633.343 ± 7412002.528  ops/s
   readLongViaTLBuffer     thrpt   10  108663164.381 ±  229471.991  ops/s
   readLongViaReadCalls    thrpt   10  160694853.195 ± 5272248.704  ops/s
   ```
   
   readLongViaLocalBuffer allocates an 8 byte buffer per call and reads it using a single read() call.
   readLongViaTLBuffer does the same, but uses a thread-local buffer rather than allocating a new one each call.
   readLongViaReadCalls simply calls read 8 times, storing the results in temporary variables.
   
   R: @lukecwik  maybe?  Not really sure who's the best to look at this.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 744638)
    Remaining Estimate: 0h
            Time Spent: 10m

> Many coders cause significant unnecessary allocations
> -----------------------------------------------------
>
>                 Key: BEAM-14134
>                 URL: https://issues.apache.org/jira/browse/BEAM-14134
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Steve Niemitz
>            Assignee: Steve Niemitz
>            Priority: P2
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Many coders (BigEndian*, Map, Iterable, Instant) use DataInputStream to read longs/ints/shorts.  Internally each DataInputStream allocates ~200 bytes of buffers when instantiated.  This means every long, int, short, etc decoded allocates over 200 bytes.
> We should eliminate all uses of DataInputStream in hot-paths and replace it with something more efficient.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)