You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "Andy Thomason (Jira)" <ji...@apache.org> on 2019/12/09 14:48:00 UTC

[jira] [Commented] (ARROW-5303) [Rust] Add SIMD vectorization of numeric casts

    [ https://issues.apache.org/jira/browse/ARROW-5303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16991657#comment-16991657 ] 

Andy Thomason commented on ARROW-5303:
--------------------------------------

It should be noted that LLVM autovectorisation does a pretty good job of generating SIMD code from cast loops:

[https://godbolt.org/z/-EUSws]

The following loop consumes 32 bytes of u8 and emits 128 bytes of u32 per iteration on znver2 and much more on skylake-avx512 (although this is a stunt). It would be hard to see packed_simd doing any better and in fact may do worse.
 
pub fn cast(dest: &mut [u32], src : &[u8]) {
    dest.iter_mut().zip(src.iter()).for_each(|(d, s)| *d = *s as u32);
}
 
In practice, you will be memory bandwidth limited quite quickly, especially as rust (and llvm) do not properly support non-temporal store vectorisation.
 

> [Rust] Add SIMD vectorization of numeric casts
> ----------------------------------------------
>
>                 Key: ARROW-5303
>                 URL: https://issues.apache.org/jira/browse/ARROW-5303
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust
>    Affects Versions: 0.13.0
>            Reporter: Neville Dipale
>            Priority: Minor
>
> To improve the performance of cast kernels, we need SIMD support in numeric casts.
> An initial exploration shows that we can't trivially add SIMD casts between our Arrow T::Simd types, because `packed_simd` only supports a cast between T::Simd types that have the same number of lanes.
> This means that adding casts from f64 to i64 (same lane length) satisfies the bound trait `where TO::Simd : packed_simd::FromCast<FROM::Simd>`, but f64 to i32 (different lane length) doesn't.
> We would benefit from investigating work-arounds to this limitation. Please see [github::nevi_me::arrow/\{branch:simd-cast}/../kernels/cast.rs|[https://github.com/nevi-me/arrow/blob/simd-cast/rust/arrow/src/compute/kernels/cast.rs#L601]] for an example implementation that's limited by the differences in lane length.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)