You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "spebern (via GitHub)" <gi...@apache.org> on 2023/03/25 11:16:46 UTC

[GitHub] [arrow-rs] spebern opened a new pull request, #3939: Faster decimal parsing

spebern opened a new pull request, #3939:
URL: https://github.com/apache/arrow-rs/pull/3939

   # Which issue does this PR close?
   
   # Rationale for this change
   
   Improves decimal parsing performance. For me, this is now faster than attempting to split at the decimal point and
   using `lexical` for the whole and fraction (of course that might also be how I implemented it).
   
   For `Decimal128`:
   
   ```
   123.123                 time:   [7.3252 ns 7.4176 ns 7.5170 ns]
                           change: [-58.812% -58.303% -57.768%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   123.1234                time:   [10.979 ns 11.243 ns 11.486 ns]
                           change: [-38.788% -37.112% -35.198%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   123.1                   time:   [8.5550 ns 8.5617 ns 8.5689 ns]
                           change: [-55.004% -54.842% -54.705%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     6 (6.00%) high mild
     2 (2.00%) high severe
   
   123                     time:   [7.2431 ns 7.2494 ns 7.2564 ns]
                           change: [-54.113% -53.909% -53.713%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     4 (4.00%) high mild
     2 (2.00%) high severe
   
   -123.123                time:   [7.3594 ns 7.4514 ns 7.5416 ns]
                           change: [-59.417% -58.872% -58.321%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   -123.1234               time:   [12.149 ns 12.369 ns 12.582 ns]
                           change: [-32.216% -30.822% -29.550%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) low mild
   
   -123.1                  time:   [8.6034 ns 8.6088 ns 8.6145 ns]
                           change: [-57.670% -57.505% -57.343%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   -123                    time:   [7.2707 ns 7.2769 ns 7.2844 ns]
                           change: [-57.830% -57.731% -57.637%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     4 (4.00%) high mild
     2 (2.00%) high severe
   
   0.0000123               time:   [6.6341 ns 6.6384 ns 6.6433 ns]
                           change: [-53.732% -53.613% -53.519%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 10 outliers among 100 measurements (10.00%)
     6 (6.00%) high mild
     4 (4.00%) high severe
   
   12.                     time:   [6.1916 ns 6.1966 ns 6.2021 ns]
                           change: [-57.541% -57.482% -57.423%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     4 (4.00%) high mild
     1 (1.00%) high severe
   
   -12.                    time:   [6.2060 ns 6.2099 ns 6.2141 ns]
                           change: [-60.478% -60.326% -60.199%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     1 (1.00%) high mild
     5 (5.00%) high severe
   
   00.1                    time:   [7.3312 ns 7.3386 ns 7.3471 ns]
                           change: [-56.612% -56.520% -56.428%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     4 (4.00%) high mild
     2 (2.00%) high severe
   
   -00.1                   time:   [7.3639 ns 7.3713 ns 7.3798 ns]
                           change: [-58.489% -58.416% -58.351%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     5 (5.00%) high mild
     2 (2.00%) high severe
   
   12345678912345678.1234  time:   [21.708 ns 21.787 ns 21.865 ns]
                           change: [-52.333% -52.150% -51.979%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   -12345678912345678.1234 time:   [18.831 ns 18.843 ns 18.856 ns]
                           change: [-60.041% -59.963% -59.884%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     4 (4.00%) high mild
     5 (5.00%) high severe
   
   99999999999999999.999   time:   [19.872 ns 19.885 ns 19.899 ns]
                           change: [-55.287% -55.227% -55.161%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 11 outliers among 100 measurements (11.00%)
     6 (6.00%) high mild
     5 (5.00%) high severe
   
   -99999999999999999.999  time:   [19.985 ns 20.033 ns 20.113 ns]
                           change: [-56.555% -56.478% -56.387%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     4 (4.00%) high mild
     5 (5.00%) high severe
   
   .123                    time:   [7.1665 ns 7.4205 ns 7.6506 ns]
                           change: [-44.271% -42.572% -40.824%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     7 (7.00%) low mild
     2 (2.00%) high mild
   
   -.123                   time:   [7.3686 ns 7.6408 ns 7.9085 ns]
                           change: [-43.858% -41.981% -39.919%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   123.                    time:   [7.3530 ns 7.3858 ns 7.4187 ns]
                           change: [-57.254% -57.128% -57.008%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     7 (7.00%) high mild
     6 (6.00%) high severe
   
   -123.                   time:   [7.3742 ns 7.4001 ns 7.4423 ns]
                           change: [-58.124% -57.968% -57.802%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   ```
   
   For `Decimal256`:
   
   ```
   123.123                 time:   [37.015 ns 37.066 ns 37.130 ns]
                           change: [-61.511% -61.300% -61.155%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     7 (7.00%) high mild
     1 (1.00%) high severe
   
   123.1234                time:   [38.021 ns 38.200 ns 38.398 ns]
                           change: [-60.331% -60.106% -59.924%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   123.1                   time:   [37.363 ns 37.495 ns 37.649 ns]
                           change: [-58.708% -58.482% -58.274%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     5 (5.00%) high mild
   
   123                     time:   [34.478 ns 34.800 ns 35.222 ns]
                           change: [-57.829% -57.427% -56.948%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) high mild
     6 (6.00%) high severe
   
   -123.123                time:   [37.421 ns 37.523 ns 37.630 ns]
                           change: [-60.613% -60.378% -60.201%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   -123.1234               time:   [37.633 ns 37.718 ns 37.815 ns]
                           change: [-61.822% -61.482% -61.174%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) high mild
     2 (2.00%) high severe
   
   -123.1                  time:   [36.221 ns 36.319 ns 36.418 ns]
                           change: [-60.366% -60.193% -60.045%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   -123                    time:   [34.028 ns 34.070 ns 34.111 ns]
                           change: [-59.304% -59.076% -58.900%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) low mild
     1 (1.00%) high mild
   
   0.0000123               time:   [26.564 ns 26.592 ns 26.622 ns]
                           change: [-59.787% -59.547% -59.385%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   12.                     time:   [28.850 ns 28.884 ns 28.921 ns]
                           change: [-57.897% -57.651% -57.457%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 11 outliers among 100 measurements (11.00%)
     2 (2.00%) low mild
     7 (7.00%) high mild
     2 (2.00%) high severe
   
   -12.                    time:   [28.945 ns 28.975 ns 29.006 ns]
                           change: [-58.174% -57.898% -57.663%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   00.1                    time:   [30.927 ns 30.965 ns 31.005 ns]
                           change: [-59.556% -59.468% -59.381%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   -00.1                   time:   [31.071 ns 31.116 ns 31.156 ns]
                           change: [-59.174% -59.071% -58.969%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 10 outliers among 100 measurements (10.00%)
     8 (8.00%) low mild
     2 (2.00%) high mild
   
   12345678912345678.1234  time:   [115.78 ns 115.90 ns 116.03 ns]
                           change: [-61.270% -61.066% -60.927%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   -12345678912345678.1234 time:   [116.61 ns 116.88 ns 117.24 ns]
                           change: [-60.717% -60.632% -60.534%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   
   99999999999999999.999   time:   [115.75 ns 115.90 ns 116.05 ns]
                           change: [-61.206% -61.106% -61.017%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     4 (4.00%) high mild
     1 (1.00%) high severe
   
   -99999999999999999.999  time:   [116.15 ns 116.25 ns 116.35 ns]
                           change: [-60.777% -60.562% -60.416%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) low mild
     3 (3.00%) high mild
     3 (3.00%) high severe
   
   .123                    time:   [20.929 ns 20.994 ns 21.057 ns]
                           change: [-59.013% -58.820% -58.635%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) low mild
   
   -.123                   time:   [21.542 ns 21.601 ns 21.660 ns]
                           change: [-57.871% -57.770% -57.670%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     5 (5.00%) high mild
   
   123.                    time:   [34.390 ns 34.606 ns 34.857 ns]
                           change: [-58.918% -58.698% -58.457%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     5 (5.00%) high mild
     4 (4.00%) high severe
   
   -123.                   time:   [34.592 ns 34.951 ns 35.429 ns]
                           change: [-56.946% -55.936% -54.987%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   ```
    
   # What changes are included in this PR?
   
   - Move sign check out of loop (as suggested by @tustvold previously)
   - Use wrapping operations instead of checked
     The decimal precision that is supported by arrow ensures that there cannot be any overflows.
   
   # Are there any user-facing changes?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org