You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/24 21:59:40 UTC

[GitHub] [arrow-rs] jhorstmann opened a new pull request, #2920: Use portable simd instead of packed_simd dependency

jhorstmann opened a new pull request, #2920:
URL: https://github.com/apache/arrow-rs/pull/2920

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #2856.
   
   # Rationale for this change
   
   The `packed_simd` crate is no longer actively maintained and now also causes some issues when run under miri.
   
   Performance numbers are inconclusive. The `sum` aggregation got a bit faster, but `min`/`max` got slower instead.
   
   # What changes are included in this PR?
   
   
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   
   - `IntervalMonthDayNanoType` is no longer a numeric type because of missing support for `i128` lanes in `portable_simd`. The arithmetic kernels on this type should work the same as before since they did not really rely on the numeric type and could instead work with any primitive type. Comparison kernels for this type are no longer implemented though.
   - `ArrowFloatNumericType` no longer contains a `pow` method because this is not yet implemented in `portable_simd`. It seems this was only enabled in the simd version of the trait, so it is unlikely there are many users.
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   
   # Aggregation benchmark results
   
   ```
   $ RUSTFLAGS="-Ctarget-cpu=native -Copt-level=3" perf record cargo +stage1 bench --features simd,test_utils --bench aggregate_kernels -- --baseline base-avx512
   Finished bench [optimized] target(s) in 48.15s
        Running benches/aggregate_kernels.rs (/home/jhorstmann/Source/github/apache/arrow-rs/target/release/deps/aggregate_kernels-1fb22fbe1680a1da)
   float/sum 4096          time:   [269.73 ns 269.82 ns 269.93 ns]
                           thrpt:  [56.528 GiB/s 56.552 GiB/s 56.571 GiB/s]
                    change:
                           time:   [-5.4739% -5.4122% -5.3486%] (p = 0.00 < 0.05)
                           thrpt:  [+5.6508% +5.7219% +5.7909%]
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     4 (4.00%) high mild
     4 (4.00%) high severe
   float/min 4096          time:   [488.84 ns 488.98 ns 489.15 ns]
                           thrpt:  [31.194 GiB/s 31.205 GiB/s 31.214 GiB/s]
                    change:
                           time:   [+1.5681% +1.6031% +1.6388%] (p = 0.00 < 0.05)
                           thrpt:  [-1.6124% -1.5778% -1.5439%]
                           Performance has regressed.
   Found 14 outliers among 100 measurements (14.00%)
     1 (1.00%) low mild
     4 (4.00%) high mild
     9 (9.00%) high severe
   float/max 4096          time:   [429.91 ns 429.96 ns 430.03 ns]
                           thrpt:  [35.483 GiB/s 35.489 GiB/s 35.493 GiB/s]
                    change:
                           time:   [+1.1078% +1.1743% +1.2240%] (p = 0.00 < 0.05)
                           thrpt:  [-1.2092% -1.1606% -1.0957%]
                           Performance has regressed.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) low mild
     2 (2.00%) high severe
   float/sum nulls 4096    time:   [350.29 ns 350.31 ns 350.33 ns]
                           thrpt:  [43.556 GiB/s 43.558 GiB/s 43.560 GiB/s]
                    change:
                           time:   [-26.221% -26.194% -26.173%] (p = 0.00 < 0.05)
                           thrpt:  [+35.452% +35.490% +35.540%]
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     1 (1.00%) low mild
     3 (3.00%) high mild
     3 (3.00%) high severe
   float/min nulls 4096    time:   [1.2936 µs 1.2939 µs 1.2943 µs]
                           thrpt:  [11.789 GiB/s 11.793 GiB/s 11.795 GiB/s]
                    change:
                           time:   [+18.173% +18.230% +18.275%] (p = 0.00 < 0.05)
                           thrpt:  [-15.451% -15.419% -15.378%]
                           Performance has regressed.
   Found 7 outliers among 100 measurements (7.00%)
     6 (6.00%) high mild
     1 (1.00%) high severe
   float/max nulls 4096    time:   [1.1110 µs 1.1111 µs 1.1112 µs]
                           thrpt:  [13.732 GiB/s 13.733 GiB/s 13.734 GiB/s]
                    change:
                           time:   [+3.3113% +3.3532% +3.3820%] (p = 0.00 < 0.05)
                           thrpt:  [-3.2714% -3.2444% -3.2052%]
                           Performance has regressed.
   Found 9 outliers among 100 measurements (9.00%)
     2 (2.00%) low mild
     5 (5.00%) high mild
     2 (2.00%) high severe
   
   double/sum 4096         time:   [568.14 ns 568.17 ns 568.20 ns]
                           thrpt:  [53.709 GiB/s 53.712 GiB/s 53.715 GiB/s]
                    change:
                           time:   [-2.0800% -1.9542% -1.8825%] (p = 0.00 < 0.05)
                           thrpt:  [+1.9186% +1.9931% +2.1241%]
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     2 (2.00%) low mild
     1 (1.00%) high mild
     4 (4.00%) high severe
   double/min 4096         time:   [948.68 ns 948.72 ns 948.76 ns]
                           thrpt:  [32.166 GiB/s 32.167 GiB/s 32.169 GiB/s]
                    change:
                           time:   [+1.3964% +1.4045% +1.4134%] (p = 0.00 < 0.05)
                           thrpt:  [-1.3937% -1.3850% -1.3772%]
                           Performance has regressed.
   Found 7 outliers among 100 measurements (7.00%)
     2 (2.00%) low mild
     4 (4.00%) high mild
     1 (1.00%) high severe
   double/max 4096         time:   [832.17 ns 832.27 ns 832.40 ns]
                           thrpt:  [36.662 GiB/s 36.668 GiB/s 36.672 GiB/s]
                    change:
                           time:   [+1.9066% +1.9232% +1.9399%] (p = 0.00 < 0.05)
                           thrpt:  [-1.9029% -1.8869% -1.8709%]
                           Performance has regressed.
   Found 9 outliers among 100 measurements (9.00%)
     4 (4.00%) high mild
     5 (5.00%) high severe
   double/sum nulls 4096   time:   [654.15 ns 654.19 ns 654.24 ns]
                           thrpt:  [46.646 GiB/s 46.649 GiB/s 46.653 GiB/s]
                    change:
                           time:   [-16.146% -16.137% -16.129%] (p = 0.00 < 0.05)
                           thrpt:  [+19.231% +19.243% +19.255%]
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     2 (2.00%) low mild
     2 (2.00%) high mild
     1 (1.00%) high severe
   double/min nulls 4096   time:   [2.5084 µs 2.5086 µs 2.5090 µs]
                           thrpt:  [12.163 GiB/s 12.165 GiB/s 12.166 GiB/s]
                    change:
                           time:   [+18.631% +18.650% +18.672%] (p = 0.00 < 0.05)
                           thrpt:  [-15.734% -15.719% -15.705%]
                           Performance has regressed.
   Found 10 outliers among 100 measurements (10.00%)
     2 (2.00%) low mild
     4 (4.00%) high mild
     4 (4.00%) high severe
   double/max nulls 4096   time:   [2.1508 µs 2.1509 µs 2.1510 µs]
                           thrpt:  [14.188 GiB/s 14.189 GiB/s 14.189 GiB/s]
                    change:
                           time:   [+0.9921% +1.0044% +1.0168%] (p = 0.00 < 0.05)
                           thrpt:  [-1.0065% -0.9945% -0.9824%]
                           Change within noise threshold.
   Found 9 outliers among 100 measurements (9.00%)
     1 (1.00%) low mild
     3 (3.00%) high mild
     5 (5.00%) high severe
   
   string/min string 4096  time:   [11.783 µs 11.786 µs 11.789 µs]
                           thrpt:  [2.5887 GiB/s 2.5894 GiB/s 2.5901 GiB/s]
                    change:
                           time:   [-0.4596% -0.4101% -0.3598%] (p = 0.00 < 0.05)
                           thrpt:  [+0.3611% +0.4118% +0.4617%]
                           Change within noise threshold.
   Found 3 outliers among 100 measurements (3.00%)
   ```
   
   The `portable_simd` crate has an internal switch to a bit-based mask representation when the avx512 target feature is enabled. Since target features do not apply to the standard library this can only be used via `Zbuild-std` or by using the crate as a regular dependency. I'm still looking into the performance with that feature activated (see https://github.com/rust-lang/portable-simd/issues/312 and https://github.com/llvm/llvm-project/issues/58546).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] jhorstmann commented on pull request #2920: Use portable simd instead of packed_simd dependency

Posted by "jhorstmann (via GitHub)" <gi...@apache.org>.
jhorstmann commented on PR #2920:
URL: https://github.com/apache/arrow-rs/pull/2920#issuecomment-1495653899

   Closing due to lack of support for 128 or 256-bit types. We probably need to refactor which types support simd operations, or make decimal types no longer primitive types.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] jhorstmann commented on pull request #2920: Use portable simd instead of packed_simd dependency

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on PR #2920:
URL: https://github.com/apache/arrow-rs/pull/2920#issuecomment-1314493955

   Seems that there was already a [workaround required for `Decimal256Type`](https://github.com/apache/arrow-rs/blob/19f372d82315e0df29c9b545bb011b0cbd45ceda/arrow/src/datatypes/numeric.rs#L470). Would be really nice to decouple the simd support from numeric types. Maybe specialization could help? Haven't really used that before, and it's still unstable. But so is `packed_simd` or `portable_simd`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2920: Use portable simd instead of packed_simd dependency

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2920:
URL: https://github.com/apache/arrow-rs/pull/2920#discussion_r1003816920


##########
arrow/src/datatypes/numeric.rs:
##########
@@ -337,34 +240,34 @@ macro_rules! make_numeric_type {
     };
 }
 
-make_numeric_type!(Int8Type, i8, i8x64, m8x64);
-make_numeric_type!(Int16Type, i16, i16x32, m16x32);
-make_numeric_type!(Int32Type, i32, i32x16, m32x16);
-make_numeric_type!(Int64Type, i64, i64x8, m64x8);
-make_numeric_type!(UInt8Type, u8, u8x64, m8x64);
-make_numeric_type!(UInt16Type, u16, u16x32, m16x32);
-make_numeric_type!(UInt32Type, u32, u32x16, m32x16);
-make_numeric_type!(UInt64Type, u64, u64x8, m64x8);
-make_numeric_type!(Float32Type, f32, f32x16, m32x16);
-make_numeric_type!(Float64Type, f64, f64x8, m64x8);
-
-make_numeric_type!(TimestampSecondType, i64, i64x8, m64x8);
-make_numeric_type!(TimestampMillisecondType, i64, i64x8, m64x8);
-make_numeric_type!(TimestampMicrosecondType, i64, i64x8, m64x8);
-make_numeric_type!(TimestampNanosecondType, i64, i64x8, m64x8);
-make_numeric_type!(Date32Type, i32, i32x16, m32x16);
-make_numeric_type!(Date64Type, i64, i64x8, m64x8);
-make_numeric_type!(Time32SecondType, i32, i32x16, m32x16);
-make_numeric_type!(Time32MillisecondType, i32, i32x16, m32x16);
-make_numeric_type!(Time64MicrosecondType, i64, i64x8, m64x8);
-make_numeric_type!(Time64NanosecondType, i64, i64x8, m64x8);
-make_numeric_type!(IntervalYearMonthType, i32, i32x16, m32x16);
-make_numeric_type!(IntervalDayTimeType, i64, i64x8, m64x8);
-make_numeric_type!(IntervalMonthDayNanoType, i128, i128x4, m128x4);
-make_numeric_type!(DurationSecondType, i64, i64x8, m64x8);
-make_numeric_type!(DurationMillisecondType, i64, i64x8, m64x8);
-make_numeric_type!(DurationMicrosecondType, i64, i64x8, m64x8);
-make_numeric_type!(DurationNanosecondType, i64, i64x8, m64x8);
+make_numeric_type!(Int8Type, i8, i8x64, mask8x64);
+make_numeric_type!(Int16Type, i16, i16x32, mask16x32);
+make_numeric_type!(Int32Type, i32, i32x16, mask32x16);
+make_numeric_type!(Int64Type, i64, i64x8, mask64x8);
+make_numeric_type!(UInt8Type, u8, u8x64, mask8x64);
+make_numeric_type!(UInt16Type, u16, u16x32, mask16x32);
+make_numeric_type!(UInt32Type, u32, u32x16, mask32x16);
+make_numeric_type!(UInt64Type, u64, u64x8, mask64x8);
+make_numeric_type!(Float32Type, f32, f32x16, mask32x16);
+make_numeric_type!(Float64Type, f64, f64x8, mask64x8);
+
+make_numeric_type!(TimestampSecondType, i64, i64x8, mask64x8);
+make_numeric_type!(TimestampMillisecondType, i64, i64x8, mask64x8);
+make_numeric_type!(TimestampMicrosecondType, i64, i64x8, mask64x8);
+make_numeric_type!(TimestampNanosecondType, i64, i64x8, mask64x8);
+make_numeric_type!(Date32Type, i32, i32x16, mask32x16);
+make_numeric_type!(Date64Type, i64, i64x8, mask64x8);
+make_numeric_type!(Time32SecondType, i32, i32x16, mask32x16);
+make_numeric_type!(Time32MillisecondType, i32, i32x16, mask32x16);
+make_numeric_type!(Time64MicrosecondType, i64, i64x8, mask64x8);
+make_numeric_type!(Time64NanosecondType, i64, i64x8, mask64x8);
+make_numeric_type!(IntervalYearMonthType, i32, i32x16, mask32x16);
+make_numeric_type!(IntervalDayTimeType, i64, i64x8, mask64x8);
+// make_numeric_type!(IntervalMonthDayNanoType, i128, i32x4, mask32x4); // TODO: simd types are wrong since i128 is not supported (https://github.com/rust-lang/portable-simd/issues/108)

Review Comment:
   This is likely to cause problems for decimals as well - see #2881. I wonder if we might be able to find a better way to fallback to non-SIMD kernels for such types :thinking: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] jhorstmann closed pull request #2920: Use portable simd instead of packed_simd dependency

Posted by "jhorstmann (via GitHub)" <gi...@apache.org>.
jhorstmann closed pull request #2920: Use portable simd instead of packed_simd dependency
URL: https://github.com/apache/arrow-rs/pull/2920


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org