You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2021/11/24 14:37:00 UTC

[jira] [Commented] (ARROW-14838) Fix issue with valgrind unrecognized instruction

    [ https://issues.apache.org/jira/browse/ARROW-14838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448650#comment-17448650 ] 

Antoine Pitrou commented on ARROW-14838:
----------------------------------------

We already check for CPU flags before running AVX512 code, so I'm not sure what more we're supposed to do?

> Fix issue with valgrind unrecognized instruction
> ------------------------------------------------
>
>                 Key: ARROW-14838
>                 URL: https://issues.apache.org/jira/browse/ARROW-14838
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Rob Ambalu
>            Priority: Major
>
> This is the same issue as ARROW-9851, but I dont believe the correct fix was implemented.  
> The merged fix lets you control if AVX512 is used by a compile flag, but that is sub-optimal because:
> a) It forces you to avoid avx512 instructions solely in order to be able to run under valgrind
> b) You dont always have control of the compile flags, ie if you are pip installing pyarrow
>  
> I actually reached out to the valgrind community and they insisted that pyarrow is at fault for not checking the CPU feature flags.  Apparently valgrind will instrument the CPU feature flags when you run under valgrind, and it would should AVX512 is not supported, and so pyarrow should avoid using it ( I assume arrow would hit a similar issue if run on an actual CPU without AVX512 support ).  
> Please look into dynamically checking the CPU flags to avoid this issue and it has become a major issue for us, we cant valgrind our processes anymore now that we import arrow / pyarrow in many places.
>  
> valgrind error:
> This is the error from valgrind:
>  
> {noformat}
> vex amd64->IR: unhandled instruction bytes: 0x62 0xF1 0xFD 0x8 0x6F 0x44 0x24 0xA 0xC7 0x43
> vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
> vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
> vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
> ==32035== valgrind: Unrecognised instruction at address 0x1085ea85.
> ==32035==    at 0x1085EA85: arrow::compute::internal::RegisterScalarAggregateSumAvx512(arrow::compute::FunctionRegistry*) (in /home/ra7293/user/hfalgo_csp/.venvs/PY36/build_tools_venv/lib/libarrow.so.100.1.0)
> ==32035==    by 0x1067967C: arrow::compute::GetFunctionRegistry() (in /home/ra7293/user/hfalgo_csp/.venvs/PY36/build_tools_venv/lib/libarrow.so.100.1.0)
> {noformat}
>  
>  
> response from valgrind dev community:
>  
> {noformat}
> > 0x62 0xF1 0xFD 0x8 0x6F 0x44 0x24 0xA 0xC7 0x43
>  
> This is  vmovdqa64 0xa0(%rsp),%xmm0  which requires CPU feature flag
> AVX512VL/AVX512F which is not supported by valgrind 3.18.1.
>  
> In the results of the emulated CPUID instruction, valgrind tells
> the running application that AVX512 is not supported.
> It is a bug in pyarrow ( python ) that its initialization routine
> does not check the cpu feature flags, and switch its later run-time
> instructions appropriately.  Please report this bug to the maintainers
> of pyarrow.  Modern run-time libraries use the STT_GNU_IFUNC feature
> to choose at run time which instruction set is available and may be used.
> pyarrow (and/or python3 itself) must upgrade.
>  
> Gcc supports compile-time feature flags to avoid generating code
> that uses certain instructions.  Run "info gcc" and search for "avx512",
> paying attention to the flags beginning with "-mavx512", and in particular
> the "-mno-avx512" variants.
>  
> Some software administration systems for installing and maintaining
> software packages can limit the hardware features that packages
> may assume.  You may be able to work-around valgrind's non-support
> for AVX512 by telling the packaging system to avoid the package
> variants which require AVX512.
> {noformat}
> {noformat}
> *no* further _formatting_ is done here{noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)