You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/06/07 13:36:10 UTC

[GitHub] [arrow-rs] tustvold commented on pull request #4376: Move record delimiting into ColumnReader (#4365)

tustvold commented on PR #4376:
URL: https://github.com/apache/arrow-rs/pull/4376#issuecomment-1580839944

   The benchmarks in #4378 show this to have a minor performance benefit, likely due to not needing to buffer and split off definition levels and values
   
   ```
   arrow_array_reader/ListArray/plain encoded optional strings no NULLs
                           time:   [1.5840 ms 1.5868 ms 1.5903 ms]
                           change: [-8.9378% -8.6442% -8.3995%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     2 (2.00%) low mild
     4 (4.00%) high mild
     7 (7.00%) high severe
   Benchmarking arrow_array_reader/ListArray/plain encoded optional strings half NULLs: Warming up for 3.0000 s
   Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60.
   arrow_array_reader/ListArray/plain encoded optional strings half NULLs
                           time:   [1.2136 ms 1.2143 ms 1.2150 ms]
                           change: [-2.9329% -2.8874% -2.8359%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high severe
   ```
   
   Looking at the flamegraph of this PR, we can see that reading the repetition levels is a relatively small portion of the runtime, at least compared to the overheads associated with stripping empty lists and padding nulls, making this even more impressive
   
   ![image](https://github.com/apache/arrow-rs/assets/1781103/8618b72d-9055-43fa-94bb-fd3eec62bced)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org