You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Rob Ambalu (Jira)" <ji...@apache.org> on 2021/11/24 14:35:00 UTC

[jira] [Created] (ARROW-14838) Fix issue with valgrind unrecognized instruction

Rob Ambalu created ARROW-14838:
----------------------------------

             Summary: Fix issue with valgrind unrecognized instruction
                 Key: ARROW-14838
                 URL: https://issues.apache.org/jira/browse/ARROW-14838
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
            Reporter: Rob Ambalu


This is the same issue as ARROW-9851, but I dont believe the correct fix was implemented.  

The merged fix lets you control if AVX512 is used by a compile flag, but that is sub-optimal because:

a) It forces you to avoid avx512 instructions solely in order to be able to run under valgrind

b) You dont always have control of the compile flags, ie if you are pip installing pyarrow

 

I actually reached out to the valgrind community and they insisted that pyarrow is at fault for not checking the CPU feature flags.  Apparently valgrind will instrument the CPU feature flags when you run under valgrind, and it would should AVX512 is not supported, and so pyarrow should avoid using it ( I assume arrow would hit a similar issue if run on an actual CPU without AVX512 support ).  

Please look into dynamically checking the CPU flags to avoid this issue and it has become a major issue for us, we cant valgrind our processes anymore now that we import arrow / pyarrow in many places.

 

valgrind error:

This is the error from valgrind:

 
{noformat}
vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F 0x44 0x24 0xA 0xC7 0x43
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==32035== valgrind: Unrecognised instruction at address 0x1085ea85.
==32035==    at 0x1085EA85: arrow::compute::internal::RegisterScalarAggregateSumAvx512(arrow::compute::FunctionRegistry*) (in /home/ra7293/user/hfalgo_csp/.venvs/PY36/build_tools_venv/lib/libarrow.so.100.1.0)
==32035==    by 0x1067967C: arrow::compute::GetFunctionRegistry() (in /home/ra7293/user/hfalgo_csp/.venvs/PY36/build_tools_venv/lib/libarrow.so.100.1.0)
{noformat}
 

 

response from valgrind dev community:

 
{noformat}
> 0x62 0xF1 0xFD 0x8 0x6F 0x44 0x24 0xA 0xC7 0x43
 
This is  vmovdqa64 0xa0(%rsp),%xmm0  which requires CPU feature flag
AVX512VL/AVX512F which is not supported by valgrind 3.18.1.
 
In the results of the emulated CPUID instruction, valgrind tells
the running application that AVX512 is not supported.
It is a bug in pyarrow ( python ) that its initialization routine
does not check the cpu feature flags, and switch its later run-time
instructions appropriately.  Please report this bug to the maintainers
of pyarrow.  Modern run-time libraries use the STT_GNU_IFUNC feature
to choose at run time which instruction set is available and may be used.
pyarrow (and/or python3 itself) must upgrade.
 
Gcc supports compile-time feature flags to avoid generating code
that uses certain instructions.  Run "info gcc" and search for "avx512",
paying attention to the flags beginning with "-mavx512", and in particular
the "-mno-avx512" variants.
 
Some software administration systems for installing and maintaining
software packages can limit the hardware features that packages
may assume.  You may be able to work-around valgrind's non-support
for AVX512 by telling the packaging system to avoid the package
variants which require AVX512.
{noformat}
{noformat}
*no* further _formatting_ is done here{noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)