You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2024/03/28 15:33:27 UTC

Re: [I] [C++] Performance of numeric casts [arrow]

jorisvandenbossche commented on issue #40874:
URL: https://github.com/apache/arrow/issues/40874#issuecomment-2025511001

   Looking into the inner loop in both Arrow and numpy, it seems to be quite similar. In numpy, almost all time is spent in `_aligned_contig_cast_long_to_double`, which essentially boils down to:
   
   ```c
       npy_intp N = dimensions[0];
       char *src = args[0], *dst = args[1];
   
       while (N--) {
           *(npy_double *)dst = ((npy_double)(*(npy_long *)src));
           dst += sizeof(npy_double);
           src += sizeof(npy_long);
       }
   ```
   
   and in Arrow, almost all time is spent in `CastPrimitive`, which essentially does:
   
   https://github.com/apache/arrow/blob/cf832b8b5dd91ca1b70519fa544f0a44ebdb3bce/cpp/src/arrow/compute/kernels/scalar_cast_internal.cc#L40-L46
   
   Anybody any insight in why our templated C++ code is so much slower than numpy's C code? Logically it looks very similar, or would there be something in our code that prevents optimizations?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org