You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "spebern (via GitHub)" <gi...@apache.org> on 2023/03/25 11:16:46 UTC

[GitHub] [arrow-rs] spebern opened a new pull request, #3939: Faster decimal parsing

spebern opened a new pull request, #3939:
URL: https://github.com/apache/arrow-rs/pull/3939

   # Which issue does this PR close?
   
   # Rationale for this change
   
   Improves decimal parsing performance. For me, this is now faster than attempting to split at the decimal point and
   using `lexical` for the whole and fraction (of course that might also be how I implemented it).
   
   For `Decimal128`:
   
   ```
   123.123                 time:   [7.3252 ns 7.4176 ns 7.5170 ns]
                           change: [-58.812% -58.303% -57.768%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   123.1234                time:   [10.979 ns 11.243 ns 11.486 ns]
                           change: [-38.788% -37.112% -35.198%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   123.1                   time:   [8.5550 ns 8.5617 ns 8.5689 ns]
                           change: [-55.004% -54.842% -54.705%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     6 (6.00%) high mild
     2 (2.00%) high severe
   
   123                     time:   [7.2431 ns 7.2494 ns 7.2564 ns]
                           change: [-54.113% -53.909% -53.713%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     4 (4.00%) high mild
     2 (2.00%) high severe
   
   -123.123                time:   [7.3594 ns 7.4514 ns 7.5416 ns]
                           change: [-59.417% -58.872% -58.321%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   -123.1234               time:   [12.149 ns 12.369 ns 12.582 ns]
                           change: [-32.216% -30.822% -29.550%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) low mild
   
   -123.1                  time:   [8.6034 ns 8.6088 ns 8.6145 ns]
                           change: [-57.670% -57.505% -57.343%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   -123                    time:   [7.2707 ns 7.2769 ns 7.2844 ns]
                           change: [-57.830% -57.731% -57.637%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     4 (4.00%) high mild
     2 (2.00%) high severe
   
   0.0000123               time:   [6.6341 ns 6.6384 ns 6.6433 ns]
                           change: [-53.732% -53.613% -53.519%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 10 outliers among 100 measurements (10.00%)
     6 (6.00%) high mild
     4 (4.00%) high severe
   
   12.                     time:   [6.1916 ns 6.1966 ns 6.2021 ns]
                           change: [-57.541% -57.482% -57.423%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     4 (4.00%) high mild
     1 (1.00%) high severe
   
   -12.                    time:   [6.2060 ns 6.2099 ns 6.2141 ns]
                           change: [-60.478% -60.326% -60.199%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     1 (1.00%) high mild
     5 (5.00%) high severe
   
   00.1                    time:   [7.3312 ns 7.3386 ns 7.3471 ns]
                           change: [-56.612% -56.520% -56.428%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     4 (4.00%) high mild
     2 (2.00%) high severe
   
   -00.1                   time:   [7.3639 ns 7.3713 ns 7.3798 ns]
                           change: [-58.489% -58.416% -58.351%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     5 (5.00%) high mild
     2 (2.00%) high severe
   
   12345678912345678.1234  time:   [21.708 ns 21.787 ns 21.865 ns]
                           change: [-52.333% -52.150% -51.979%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   -12345678912345678.1234 time:   [18.831 ns 18.843 ns 18.856 ns]
                           change: [-60.041% -59.963% -59.884%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     4 (4.00%) high mild
     5 (5.00%) high severe
   
   99999999999999999.999   time:   [19.872 ns 19.885 ns 19.899 ns]
                           change: [-55.287% -55.227% -55.161%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 11 outliers among 100 measurements (11.00%)
     6 (6.00%) high mild
     5 (5.00%) high severe
   
   -99999999999999999.999  time:   [19.985 ns 20.033 ns 20.113 ns]
                           change: [-56.555% -56.478% -56.387%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     4 (4.00%) high mild
     5 (5.00%) high severe
   
   .123                    time:   [7.1665 ns 7.4205 ns 7.6506 ns]
                           change: [-44.271% -42.572% -40.824%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     7 (7.00%) low mild
     2 (2.00%) high mild
   
   -.123                   time:   [7.3686 ns 7.6408 ns 7.9085 ns]
                           change: [-43.858% -41.981% -39.919%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   123.                    time:   [7.3530 ns 7.3858 ns 7.4187 ns]
                           change: [-57.254% -57.128% -57.008%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     7 (7.00%) high mild
     6 (6.00%) high severe
   
   -123.                   time:   [7.3742 ns 7.4001 ns 7.4423 ns]
                           change: [-58.124% -57.968% -57.802%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   ```
   
   For `Decimal256`:
   
   ```
   123.123                 time:   [37.015 ns 37.066 ns 37.130 ns]
                           change: [-61.511% -61.300% -61.155%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     7 (7.00%) high mild
     1 (1.00%) high severe
   
   123.1234                time:   [38.021 ns 38.200 ns 38.398 ns]
                           change: [-60.331% -60.106% -59.924%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   123.1                   time:   [37.363 ns 37.495 ns 37.649 ns]
                           change: [-58.708% -58.482% -58.274%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     5 (5.00%) high mild
   
   123                     time:   [34.478 ns 34.800 ns 35.222 ns]
                           change: [-57.829% -57.427% -56.948%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) high mild
     6 (6.00%) high severe
   
   -123.123                time:   [37.421 ns 37.523 ns 37.630 ns]
                           change: [-60.613% -60.378% -60.201%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   -123.1234               time:   [37.633 ns 37.718 ns 37.815 ns]
                           change: [-61.822% -61.482% -61.174%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) high mild
     2 (2.00%) high severe
   
   -123.1                  time:   [36.221 ns 36.319 ns 36.418 ns]
                           change: [-60.366% -60.193% -60.045%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   -123                    time:   [34.028 ns 34.070 ns 34.111 ns]
                           change: [-59.304% -59.076% -58.900%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) low mild
     1 (1.00%) high mild
   
   0.0000123               time:   [26.564 ns 26.592 ns 26.622 ns]
                           change: [-59.787% -59.547% -59.385%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   12.                     time:   [28.850 ns 28.884 ns 28.921 ns]
                           change: [-57.897% -57.651% -57.457%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 11 outliers among 100 measurements (11.00%)
     2 (2.00%) low mild
     7 (7.00%) high mild
     2 (2.00%) high severe
   
   -12.                    time:   [28.945 ns 28.975 ns 29.006 ns]
                           change: [-58.174% -57.898% -57.663%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   00.1                    time:   [30.927 ns 30.965 ns 31.005 ns]
                           change: [-59.556% -59.468% -59.381%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   -00.1                   time:   [31.071 ns 31.116 ns 31.156 ns]
                           change: [-59.174% -59.071% -58.969%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 10 outliers among 100 measurements (10.00%)
     8 (8.00%) low mild
     2 (2.00%) high mild
   
   12345678912345678.1234  time:   [115.78 ns 115.90 ns 116.03 ns]
                           change: [-61.270% -61.066% -60.927%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   -12345678912345678.1234 time:   [116.61 ns 116.88 ns 117.24 ns]
                           change: [-60.717% -60.632% -60.534%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   
   99999999999999999.999   time:   [115.75 ns 115.90 ns 116.05 ns]
                           change: [-61.206% -61.106% -61.017%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     4 (4.00%) high mild
     1 (1.00%) high severe
   
   -99999999999999999.999  time:   [116.15 ns 116.25 ns 116.35 ns]
                           change: [-60.777% -60.562% -60.416%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) low mild
     3 (3.00%) high mild
     3 (3.00%) high severe
   
   .123                    time:   [20.929 ns 20.994 ns 21.057 ns]
                           change: [-59.013% -58.820% -58.635%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) low mild
   
   -.123                   time:   [21.542 ns 21.601 ns 21.660 ns]
                           change: [-57.871% -57.770% -57.670%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     5 (5.00%) high mild
   
   123.                    time:   [34.390 ns 34.606 ns 34.857 ns]
                           change: [-58.918% -58.698% -58.457%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     5 (5.00%) high mild
     4 (4.00%) high severe
   
   -123.                   time:   [34.592 ns 34.951 ns 35.429 ns]
                           change: [-56.946% -55.936% -54.987%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   ```
    
   # What changes are included in this PR?
   
   - Move sign check out of loop (as suggested by @tustvold previously)
   - Use wrapping operations instead of checked
     The decimal precision that is supported by arrow ensures that there cannot be any overflows.
   
   # Are there any user-facing changes?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] spebern commented on a diff in pull request #3939: Faster decimal parsing (30-60%)

Posted by "spebern (via GitHub)" <gi...@apache.org>.
spebern commented on code in PR #3939:
URL: https://github.com/apache/arrow-rs/pull/3939#discussion_r1148369339


##########
arrow-cast/src/parse.rs:
##########
@@ -1705,5 +1695,58 @@ mod tests {
                 "actual: '{actual_256}', expected: '{expected_256}'"
             );
         }
+
+        let edge_tests_128 = [
+            (
+                "99999999999999999999999999999999999999",
+                99999999999999999999999999999999999999i128,
+                0,
+            ),
+            (
+                "999999999999999999999999999999999999.99",
+                99999999999999999999999999999999999999i128,
+                2,
+            ),
+            (
+                "9999999999999999999999999.9999999999999",
+                99999999999999999999999999999999999999i128,
+                13,
+            ),
+            (
+                "9999999999999999999999999",
+                99999999999999999999999990000000000000i128,
+                13,
+            ),

Review Comment:
   Oh, ok I just saw the tests for it...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3939: Faster decimal parsing (30-60%)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3939:
URL: https://github.com/apache/arrow-rs/pull/3939#discussion_r1148355429


##########
arrow-cast/src/parse.rs:
##########
@@ -691,12 +681,12 @@ pub fn parse_decimal<T: DecimalType>(
         if exp as u8 + digits > precision {
             return Err(ArrowError::ParseError("parse decimal overflow".to_string()));
         }
-        let mul = base.pow_checked(exp as _)?;
-        result = result.mul_checked(mul)?;
+        let mul = base.pow_wrapping(exp as _);

Review Comment:
   I'm fairly certain this is correct, but could we possibly get some tests of numbers on the edge of overflowing? E.g. numbers with 38 digits and a corresponding precision?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] spebern commented on a diff in pull request #3939: Faster decimal parsing (30-60%)

Posted by "spebern (via GitHub)" <gi...@apache.org>.
spebern commented on code in PR #3939:
URL: https://github.com/apache/arrow-rs/pull/3939#discussion_r1148375953


##########
arrow-cast/src/parse.rs:
##########
@@ -614,14 +614,21 @@ pub fn parse_decimal<T: DecimalType>(
     scale: i8,
 ) -> Result<T::Native, ArrowError> {
     let mut seen_dot = false;

Review Comment:
   Thanks for the idea, it improved!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #3939: Faster decimal parsing (30-60%)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold merged PR #3939:
URL: https://github.com/apache/arrow-rs/pull/3939


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] Dandandan commented on a diff in pull request #3939: Faster decimal parsing (30-60%)

Posted by "Dandandan (via GitHub)" <gi...@apache.org>.
Dandandan commented on code in PR #3939:
URL: https://github.com/apache/arrow-rs/pull/3939#discussion_r1148354851


##########
arrow-cast/src/parse.rs:
##########
@@ -614,14 +614,21 @@ pub fn parse_decimal<T: DecimalType>(
     scale: i8,
 ) -> Result<T::Native, ArrowError> {
     let mut seen_dot = false;

Review Comment:
   Instead of tracking `seen_dot` might also be more efficient to have 2 code paths for handling the part before / after dot separately.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3939: Faster decimal parsing (30-60%)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3939:
URL: https://github.com/apache/arrow-rs/pull/3939#discussion_r1148374216


##########
arrow-cast/src/parse.rs:
##########
@@ -1705,5 +1695,58 @@ mod tests {
                 "actual: '{actual_256}', expected: '{expected_256}'"
             );
         }
+
+        let edge_tests_128 = [
+            (
+                "99999999999999999999999999999999999999",
+                99999999999999999999999999999999999999i128,
+                0,
+            ),
+            (
+                "999999999999999999999999999999999999.99",
+                99999999999999999999999999999999999999i128,
+                2,
+            ),
+            (
+                "9999999999999999999999999.9999999999999",
+                99999999999999999999999999999999999999i128,
+                13,
+            ),
+            (
+                "9999999999999999999999999",
+                99999999999999999999999990000000000000i128,
+                13,
+            ),

Review Comment:
   Yes, I think we should ignore leading zeroes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] spebern commented on a diff in pull request #3939: Faster decimal parsing (30-60%)

Posted by "spebern (via GitHub)" <gi...@apache.org>.
spebern commented on code in PR #3939:
URL: https://github.com/apache/arrow-rs/pull/3939#discussion_r1148360652


##########
arrow-cast/src/parse.rs:
##########
@@ -614,14 +614,21 @@ pub fn parse_decimal<T: DecimalType>(
     scale: i8,
 ) -> Result<T::Native, ArrowError> {
     let mut seen_dot = false;

Review Comment:
   I will check



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3939: Faster decimal parsing (30-60%)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3939:
URL: https://github.com/apache/arrow-rs/pull/3939#discussion_r1148365876


##########
arrow-cast/src/parse.rs:
##########
@@ -1705,5 +1695,58 @@ mod tests {
                 "actual: '{actual_256}', expected: '{expected_256}'"
             );
         }
+
+        let edge_tests_128 = [
+            (
+                "99999999999999999999999999999999999999",
+                99999999999999999999999999999999999999i128,
+                0,
+            ),
+            (
+                "999999999999999999999999999999999999.99",
+                99999999999999999999999999999999999999i128,
+                2,
+            ),
+            (
+                "9999999999999999999999999.9999999999999",
+                99999999999999999999999999999999999999i128,
+                13,
+            ),
+            (
+                "9999999999999999999999999",
+                99999999999999999999999990000000000000i128,
+                13,
+            ),

Review Comment:
   This test appears to be failing, I think a 0 digit in front of a decimal shouldn't count to the precision



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] spebern commented on a diff in pull request #3939: Faster decimal parsing (30-60%)

Posted by "spebern (via GitHub)" <gi...@apache.org>.
spebern commented on code in PR #3939:
URL: https://github.com/apache/arrow-rs/pull/3939#discussion_r1148375014


##########
arrow-cast/src/parse.rs:
##########
@@ -1705,5 +1695,58 @@ mod tests {
                 "actual: '{actual_256}', expected: '{expected_256}'"
             );
         }
+
+        let edge_tests_128 = [
+            (
+                "99999999999999999999999999999999999999",
+                99999999999999999999999999999999999999i128,
+                0,
+            ),
+            (
+                "999999999999999999999999999999999999.99",
+                99999999999999999999999999999999999999i128,
+                2,
+            ),
+            (
+                "9999999999999999999999999.9999999999999",
+                99999999999999999999999999999999999999i128,
+                13,
+            ),
+            (
+                "9999999999999999999999999",
+                99999999999999999999999990000000000000i128,
+                13,
+            ),

Review Comment:
   Thanks for finding this!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #3939: Faster decimal parsing (30-60%)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #3939:
URL: https://github.com/apache/arrow-rs/pull/3939#issuecomment-1491955032

   ```
   123.123                 time:   [42.035 ns 42.094 ns 42.183 ns]
                           change: [-62.228% -61.918% -61.696%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   123.1234                time:   [42.248 ns 42.279 ns 42.312 ns]
                           change: [-61.786% -61.694% -61.616%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high severe
   
   123.1                   time:   [40.929 ns 40.986 ns 41.045 ns]
                           change: [-56.455% -56.286% -56.152%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) low mild
     1 (1.00%) high mild
   
   123                     time:   [37.682 ns 37.729 ns 37.778 ns]
                           change: [-54.921% -54.837% -54.758%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   -123.123                time:   [42.763 ns 42.796 ns 42.833 ns]
                           change: [-61.283% -61.254% -61.223%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     1 (1.00%) low severe
     2 (2.00%) low mild
     3 (3.00%) high mild
   
   -123.1234               time:   [42.858 ns 42.882 ns 42.906 ns]
                           change: [-61.322% -61.283% -61.244%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high severe
   
   -123.1                  time:   [42.256 ns 42.308 ns 42.364 ns]
                           change: [-55.400% -55.347% -55.289%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   -123                    time:   [39.039 ns 39.112 ns 39.184 ns]
                           change: [-53.739% -53.670% -53.595%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   0.0000123               time:   [25.752 ns 25.782 ns 25.814 ns]
                           change: [-61.473% -61.428% -61.380%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   12.                     time:   [32.498 ns 32.551 ns 32.605 ns]
                           change: [-51.983% -51.800% -51.642%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high severe
   
   -12.                    time:   [33.481 ns 33.546 ns 33.620 ns]
                           change: [-50.743% -50.662% -50.578%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 14 outliers among 100 measurements (14.00%)
     7 (7.00%) low mild
     5 (5.00%) high mild
     2 (2.00%) high severe
   
   00.1                    time:   [23.664 ns 23.690 ns 23.719 ns]
                           change: [-65.650% -65.560% -65.475%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 11 outliers among 100 measurements (11.00%)
     7 (7.00%) high mild
     4 (4.00%) high severe
   
   -00.1                   time:   [24.072 ns 24.082 ns 24.094 ns]
                           change: [-65.453% -65.421% -65.390%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   12345678912345678.1234  time:   [116.16 ns 116.25 ns 116.34 ns]
                           change: [-66.102% -65.917% -65.785%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) low mild
   
   -12345678912345678.1234 time:   [116.72 ns 116.82 ns 116.92 ns]
                           change: [-65.614% -65.559% -65.501%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) low mild
     2 (2.00%) high mild
   
   99999999999999999.999   time:   [115.77 ns 115.87 ns 115.97 ns]
                           change: [-66.087% -66.017% -65.949%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     4 (4.00%) low mild
     4 (4.00%) high mild
   
   -99999999999999999.999  time:   [116.36 ns 116.47 ns 116.59 ns]
                           change: [-65.727% -65.691% -65.654%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   .123                    time:   [25.459 ns 25.484 ns 25.512 ns]
                           change: [-56.754% -56.701% -56.647%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     4 (4.00%) high mild
     1 (1.00%) high severe
   
   -.123                   time:   [27.129 ns 27.257 ns 27.384 ns]
                           change: [-54.647% -54.463% -54.300%] (p = 0.00 < 0.05)
                           Performance has improved.
   
   123.                    time:   [38.325 ns 38.371 ns 38.420 ns]
                           change: [-54.232% -54.172% -54.108%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) low mild
     1 (1.00%) high mild
   
   -123.                   time:   [39.456 ns 39.516 ns 39.576 ns]
                           change: [-53.316% -53.214% -53.105%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     4 (4.00%) high mild
     4 (4.00%) high severe
   ```
   
   Nice work :+1:


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3939: Faster decimal parsing (30-60%)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3939:
URL: https://github.com/apache/arrow-rs/pull/3939#discussion_r1148360906


##########
arrow-cast/src/parse.rs:
##########
@@ -1705,5 +1695,58 @@ mod tests {
                 "actual: '{actual_256}', expected: '{expected_256}'"
             );
         }
+
+        let edge_tests_128 = [
+            (
+                "99999999999999999999999999999999999999",
+                99999999999999999999999999999999999999i128,
+                0,
+            ),
+            (
+                "999999999999999999999999999999999999.99",
+                99999999999999999999999999999999999999i128,
+                2,
+            ),
+            (
+                "9999999999999999999999999.9999999999999",
+                99999999999999999999999999999999999999i128,
+                13,
+            ),
+            (
+                "9999999999999999999999999",
+                99999999999999999999999990000000000000i128,
+                13,
+            ),

Review Comment:
   ```suggestion
               ),
               (
                   "0.99999999999999999999999999999999999999",
                   99999999999999999999999999999999999999i128,
                   38,
               ),
   ```
   
   I _think_ this is correct?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3939: Faster decimal parsing (30-60%)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3939:
URL: https://github.com/apache/arrow-rs/pull/3939#discussion_r1148360906


##########
arrow-cast/src/parse.rs:
##########
@@ -1705,5 +1695,58 @@ mod tests {
                 "actual: '{actual_256}', expected: '{expected_256}'"
             );
         }
+
+        let edge_tests_128 = [
+            (
+                "99999999999999999999999999999999999999",
+                99999999999999999999999999999999999999i128,
+                0,
+            ),
+            (
+                "999999999999999999999999999999999999.99",
+                99999999999999999999999999999999999999i128,
+                2,
+            ),
+            (
+                "9999999999999999999999999.9999999999999",
+                99999999999999999999999999999999999999i128,
+                13,
+            ),
+            (
+                "9999999999999999999999999",
+                99999999999999999999999990000000000000i128,
+                13,
+            ),

Review Comment:
   ```suggestion
               ),
               (
                   "0.99999999999999999999999999999999999999",
                   99999999999999999999999999999999999999i128,
                   38,
               ),
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3939: Faster decimal parsing (30-60%)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3939:
URL: https://github.com/apache/arrow-rs/pull/3939#discussion_r1148355429


##########
arrow-cast/src/parse.rs:
##########
@@ -691,12 +681,12 @@ pub fn parse_decimal<T: DecimalType>(
         if exp as u8 + digits > precision {
             return Err(ArrowError::ParseError("parse decimal overflow".to_string()));
         }
-        let mul = base.pow_checked(exp as _)?;
-        result = result.mul_checked(mul)?;
+        let mul = base.pow_wrapping(exp as _);

Review Comment:
   I'm fairly certain this is correct, but could we possibly get some tests of numbers on the edge of overflowing?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] spebern commented on a diff in pull request #3939: Faster decimal parsing (30-60%)

Posted by "spebern (via GitHub)" <gi...@apache.org>.
spebern commented on code in PR #3939:
URL: https://github.com/apache/arrow-rs/pull/3939#discussion_r1148368663


##########
arrow-cast/src/parse.rs:
##########
@@ -1705,5 +1695,58 @@ mod tests {
                 "actual: '{actual_256}', expected: '{expected_256}'"
             );
         }
+
+        let edge_tests_128 = [
+            (
+                "99999999999999999999999999999999999999",
+                99999999999999999999999999999999999999i128,
+                0,
+            ),
+            (
+                "999999999999999999999999999999999999.99",
+                99999999999999999999999999999999999999i128,
+                2,
+            ),
+            (
+                "9999999999999999999999999.9999999999999",
+                99999999999999999999999999999999999999i128,
+                13,
+            ),
+            (
+                "9999999999999999999999999",
+                99999999999999999999999990000000000000i128,
+                13,
+            ),

Review Comment:
   Should we ignore leading zeros in general?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] spebern commented on pull request #3939: Faster decimal parsing (30-60%)

Posted by "spebern (via GitHub)" <gi...@apache.org>.
spebern commented on PR #3939:
URL: https://github.com/apache/arrow-rs/pull/3939#issuecomment-1483845442

   I added some failing tests for "+", "-" and "". However, any attempt of fixing this (such as a simple "is_empty" check after handling the sign) has a big performance impact for up to up to 70%, that I cannot explain yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org