You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "zeroshade (via GitHub)" <gi...@apache.org> on 2023/09/18 21:40:03 UTC

[GitHub] [arrow] zeroshade opened a new pull request, #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

zeroshade opened a new pull request, #37785:
URL: https://github.com/apache/arrow/pull/37785

   <!--
   Thanks for opening a pull request!
   If this is your first pull request you can find detailed information on how 
   to contribute here:
     * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
     * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
   
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   -->
   
   ### Rationale for this change
   The linked issue discovered a problem with the ARM64 implementation of the extractBits functionality which was being hidden because until `go1.21`, it looks like `golang.org/x/sys/cpu` wasn't properly detecting the ASIMD bit flag on the ARM processors and so it was using the pure go implementation. So we fix the assembly and add a test which was able to reproduce the failure without the fix on an ARM64 machine. 
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   ### What changes are included in this PR?
   There was a problem with the assembly as it existed as the short circuit case where the selection bitmap is 0 wasn't actually setting the 0 value back onto the stack as a return. The assembly which initialized that register to 0 only occurred AFTER the `CBZ` instruction which would jump to the bottom and return. This means that we *actually* returned whatever happened to be on the stack at that location (i.e. garbage!). This only occurred if you were using the ARM64-specific assembly implementation, not the amd64 or pure go versions.
   
   We also include a test to ensure we get the correct values for a variety of bitmaps and selections, along with this ensuring there's enough data on the stack so that if this problem re-occured we would catch it in this test.
   
   ### Are these changes tested?
   Test is added.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1331777665


##########
go/parquet/internal/bmi/bitmap_neon_arm64.s:
##########
@@ -27,8 +27,8 @@ LBB0_2:
     MOVD R0, res+16(FP)
     RET
 LBB0_4:
-    WORD $0xaa1f03e0 // mov    x0, xzr
     WORD $0xa8c17bfd // ldp    x29, x30, [sp], #16
+    MOVD ZR, res+16(FP)

Review Comment:
   The assembly wasn't *generated* by hand, but this fix was absolutely written by hand (me) by way of looking at the code, the generated assembly, and stepping through the instructions to figure out what was happening incorrectly.
   
   In this case, the problem was that the instructions which zero-out the return value got placed *after* the loop condition (checking if the selection bitmap was zero), so in that scenario we ended up returning whatever happened to be the next 8 bytes after the memory location the arguments were placed. I had to look up a bunch of stuff about how Go plan9 assembly works and also ARM64 assembly, to work out that zero'ing out one area was able to be done by simply storing from `ZR` the zero-register.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1339059574


##########
go/parquet/internal/utils/clib.go:
##########
@@ -0,0 +1,30 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+//go:build !noasm && !appengine && (amd64 || arm64)
+// +build !noasm
+// +build !appengine
+// +build amd64 arm64
+
+package utils
+
+import "unsafe"
+
+//go:noescape
+func _ClibMemcpy(dst, src unsafe.Pointer, n uint) unsafe.Pointer
+
+//go:noescape
+func _ClibMemset(dst unsafe.Pointer, c int, n uint) unsafe.Pointer

Review Comment:
   By the looks of it, the calls to memcpy/memset could easily be replaced by the equivalent `for` loop. Given the size is small and constant, I don't think calling a generic version of memcpy/memset would be a significant win.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits [arrow]

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1343242528


##########
go/parquet/internal/utils/_lib/README.md:
##########
@@ -0,0 +1,148 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# SIMD Bit Packing Implementation
+
+Go doesn't have any SIMD intrinsics so for some low-level optimizations we can 
+leverage auto-vectorization and the fact that Go lets you specify the body of a
+function in assembly to benefit from SIMD.
+
+In here we have implementations using SIMD intrinsics for AVX (amd64) and NEON (arm64).
+
+## Generating the Go assembly
+
+c2goasm and asm2plan9s are two projects which can be used in conjunction to generate
+compatible Go assembly from C assembly.
+
+First the tools need to be installed:
+
+```bash
+go install github.com/klauspost/asmfmt/cmd/asmfmt@latest
+go install github.com/minio/asm2plan9s@latest
+go install github.com/minio/c2goasm@latest
+```
+
+### Generating for amd64
+
+The Makefile in the directory above will work for amd64. `make assembly` will compile
+the c sources and then call `c2goasm` to generate the Go assembly for amd64 
+architectures.
+
+### Generating for arm64
+
+Unfortunately there are some caveats for arm64. c2goasm / asm2plan9s doesn't fully
+support arm64 correctly. However, proper assembly can be created with some slight
+manipulation of the result.
+
+The Makefile has the NEON flags for compiling the assembly by using 
+`make _lib/bit_packing_neon.s` and `make _lib/unpack_bool_neon.s` to generate the
+raw assembly sources. 
+
+Before calling `c2goasm` there's a few things that need to be modified in the assembly:

Review Comment:
   I've added a `script.sed` file which brings together *some* of the modifications to the assembly on the result of `c2goasm`, should make generating the file easier in the future if it comes to it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1337402138


##########
go/parquet/internal/utils/clib.go:
##########
@@ -0,0 +1,30 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+//go:build !noasm && !appengine && (amd64 || arm64)
+// +build !noasm
+// +build !appengine
+// +build amd64 arm64
+
+package utils
+
+import "unsafe"
+
+//go:noescape
+func _ClibMemcpy(dst, src unsafe.Pointer, n uint) unsafe.Pointer
+
+//go:noescape
+func _ClibMemset(dst unsafe.Pointer, c int, n uint) unsafe.Pointer

Review Comment:
   Yuck, is there a way to avoid the calls to memset and memcpy?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits [arrow]

Posted by "tschaub (via GitHub)" <gi...@apache.org>.
tschaub commented on PR #37785:
URL: https://github.com/apache/arrow/pull/37785#issuecomment-1743241303

   I just came across this and confirmed that it fixes the panic I found in #37968.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits [arrow]

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on PR #37785:
URL: https://github.com/apache/arrow/pull/37785#issuecomment-1745200775

   Thanks for checking @tschaub! looks great


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits [arrow]

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1344292341


##########
go/parquet/internal/utils/clib_amd64.s:
##########
@@ -44,6 +44,15 @@ MEMCPY_BYTE_DONE:
 	POPQ R8
 	RET
 
+// func _ClibMemcpy(dst, src unsafe.Pointer, n uint) unsafe.Pointer
+TEXT ·_ClibMemcpy(SB), NOSPLIT|NOFRAME, $16-24

Review Comment:
   I didn't make any changes to the amd64 implementation which still uses `memcpy`. Since there's no issues or problems with the amd64 implementation, I didn't want to change or remove anything with it. I just left the addition the Go function definition there for easier testing in the future if we need it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1331788781


##########
go/parquet/internal/bmi/bitmap_neon_arm64.s:
##########
@@ -27,8 +27,8 @@ LBB0_2:
     MOVD R0, res+16(FP)
     RET
 LBB0_4:
-    WORD $0xaa1f03e0 // mov    x0, xzr
     WORD $0xa8c17bfd // ldp    x29, x30, [sp], #16
+    MOVD ZR, res+16(FP)

Review Comment:
   Do you think it's a C/C++ compiler bug? That sounds a bit unlikely to me, given how simple the `extract_bits` function is.
   
   Also, why are you compiling this code for Neon? I find it unlikely that Neon can be used to speed up your SW emulation of `PEXT`.
   
   Also, we have a SW emulation in Parquet C++ that processes 5 bits at a time rather than one. You could perhaps try to compile that one (or rewrite it in Go) :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on PR #37785:
URL: https://github.com/apache/arrow/pull/37785#issuecomment-1730418300

   @cyb70289 @pitrou  holy crap i got it to work! :grin: now i just gotta document what i did before i forget so i never have to do this again lol


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits [arrow]

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #37785:
URL: https://github.com/apache/arrow/pull/37785#issuecomment-1749940424

   After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit f7530de9fe20f8a3731b465af07a79a56640fe79.
   
   There were no benchmark performance regressions. 🎉
   
   The [full Conbench report](https://github.com/apache/arrow/runs/17452487423) has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #37785:
URL: https://github.com/apache/arrow/pull/37785#issuecomment-1724494311

   :warning: GitHub issue #37712 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1337352462


##########
go/parquet/internal/bmi/bmi_arm64.go:
##########
@@ -14,44 +14,51 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
+//go:build !noasm
 // +build !noasm
 
 package bmi
 
 import (
+	"fmt"
 	"os"
 	"strings"
-)
-import (
-	"golang.org/x/sys/cpu"

Review Comment:
   From what I could tell in my testing, it appears that on `golang1.20` and below, it was ending up false on arm64 platforms when it shouldn't. Try testing it on `golang1.20` instead of 1.21 and let me know, because that's what I was observing in my tests (and explains why we never caught this issue before now)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits [arrow]

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1343242078


##########
go/parquet/internal/utils/clib.go:
##########
@@ -0,0 +1,30 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+//go:build !noasm && !appengine && (amd64 || arm64)
+// +build !noasm
+// +build !appengine
+// +build amd64 arm64
+
+package utils
+
+import "unsafe"
+
+//go:noescape
+func _ClibMemcpy(dst, src unsafe.Pointer, n uint) unsafe.Pointer
+
+//go:noescape
+func _ClibMemset(dst unsafe.Pointer, c int, n uint) unsafe.Pointer

Review Comment:
   Latest update has removed the `memset`/`memcpy` and replaced them with simple fixed size loops. My local testing showed all the tests passing, so assuming the CI is happy with things then we should be good here if you're happy with the results @pitrou. Let me know please when you get a chance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits [arrow]

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on PR #37785:
URL: https://github.com/apache/arrow/pull/37785#issuecomment-1743894281

   @tschaub Can you please try testing with the latest version of this branch/PR and confirm for me that everything is still working on your end and there's no performance degradation vs the previous fix? I agree with @pitrou's statement that it's unlikely the generic `memset`/`memcpy` were giving us much of a benefit over a simple fixed length loop, but it'd be nice to confirm that there's no negative effects


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1339065786


##########
go/parquet/internal/utils/clib.go:
##########
@@ -0,0 +1,30 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+//go:build !noasm && !appengine && (amd64 || arm64)
+// +build !noasm
+// +build !appengine
+// +build amd64 arm64
+
+package utils
+
+import "unsafe"
+
+//go:noescape
+func _ClibMemcpy(dst, src unsafe.Pointer, n uint) unsafe.Pointer
+
+//go:noescape
+func _ClibMemset(dst unsafe.Pointer, c int, n uint) unsafe.Pointer

Review Comment:
   I'll give it a try



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on PR #37785:
URL: https://github.com/apache/arrow/pull/37785#issuecomment-1724509226

   Ugh, looks like ALL the neon assembly is borked, it just wasn't ACTUALLY being run because `golang.org/x/sys/cpu` wasn't registering the ASIMD bit, but `github.com/klauspost/cpuid/v2` DOES correctly recognize it and now we're seeing it die. 
   
   I guess I'll take a stab at fixing and debugging that tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits [arrow]

Posted by "tschaub (via GitHub)" <gi...@apache.org>.
tschaub commented on PR #37785:
URL: https://github.com/apache/arrow/pull/37785#issuecomment-1744224804

   @zeroshade - I only tested a couple different cases, but I don't see any negative effect with the latest changes.
   
   Here are some benchmarks transforming a `small.parquet` file (2400 rows, 20 physical columns, uncompressed):
   ```
   Benchmark 1: ./gpq-noasm convert small.parquet small-noasm.parquet
     Time (mean ± σ):      72.2 ms ±   2.2 ms    [User: 103.9 ms, System: 25.2 ms]
     Range (min … max):    68.0 ms …  77.4 ms    41 runs
    
   Benchmark 2: ./gpq-before convert small.parquet small-before.parquet
     Time (mean ± σ):      68.6 ms ±   2.2 ms    [User: 99.9 ms, System: 25.1 ms]
     Range (min … max):    63.1 ms …  74.9 ms    42 runs
    
   Benchmark 3: ./gpq-after convert small.parquet small-after.parquet
     Time (mean ± σ):      68.5 ms ±   2.6 ms    [User: 99.0 ms, System: 24.5 ms]
     Range (min … max):    63.5 ms …  73.5 ms    40 runs
    
   Summary
     ./gpq-after convert small.parquet small-after.parquet ran
       1.00 ± 0.05 times faster than ./gpq-before convert small.parquet small-before.parquet
       1.05 ± 0.05 times faster than ./gpq-noasm convert small.parquet small-noasm.parquet
   ```
   
   And here are some numbers from a `medium.parquet` file (162709 rows, 20 physical columns, uncompressed):
   ```
   Benchmark 1: ./gpq-noasm convert medium.parquet medium-noasm.parquet
     Time (mean ± σ):     986.3 ms ±  12.3 ms    [User: 1456.6 ms, System: 281.5 ms]
     Range (min … max):   965.2 ms … 1007.7 ms    10 runs
    
   Benchmark 2: ./gpq-before convert medium.parquet medium-before.parquet
     Time (mean ± σ):     878.9 ms ±   7.2 ms    [User: 1343.1 ms, System: 269.5 ms]
     Range (min … max):   865.4 ms … 888.1 ms    10 runs
    
   Benchmark 3: ./gpq-after convert medium.parquet medium-after.parquet
     Time (mean ± σ):     879.1 ms ±   9.7 ms    [User: 1341.6 ms, System: 268.6 ms]
     Range (min … max):   858.7 ms … 891.8 ms    10 runs
    
   Summary
     ./gpq-before convert medium.parquet medium-before.parquet ran
       1.00 ± 0.01 times faster than ./gpq-after convert medium.parquet medium-after.parquet
       1.12 ± 0.02 times faster than ./gpq-noasm convert medium.parquet medium-noasm.parquet
   ```
   
   So no significant difference in those between the before (c9693c5f9b33) and after (9d4c29b390a2) cases.  The other case is a build with `-tags noasm`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1331768871


##########
go/parquet/internal/bmi/bmi_arm64.go:
##########
@@ -14,40 +14,38 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
+//go:build !noasm
 // +build !noasm
 
 package bmi
 
 import (
 	"os"
 	"strings"
-)
-import (
-	"golang.org/x/sys/cpu"
+
+	"github.com/klauspost/cpuid/v2"
 )
 
 func init() {
-    // Added ability to enable extension via environment:
+	// Added ability to enable extension via environment:
 	// ARM_ENABLE_EXT=NEON go test
 	if ext, ok := os.LookupEnv("ARM_ENABLE_EXT"); ok {
 		exts := strings.Split(ext, ",")
 
 		for _, x := range exts {
 			switch x {
 			case "NEON":
-				cpu.ARM64.HasASIMD = true
+				cpuid.CPU.Enable(cpuid.ASIMD)
 			case "AES":
-				cpu.ARM64.HasAES = true
+				cpuid.CPU.Enable(cpuid.AESARM)
 			case "PMULL":
-				cpu.ARM64.HasPMULL = true
+				cpuid.CPU.Enable(cpuid.PMULL)
 			default:
-				cpu.ARM64.HasASIMD = false
-				cpu.ARM64.HasAES = false
-				cpu.ARM64.HasPMULL = false
+				cpuid.CPU.Disable(cpuid.ASIMD, cpuid.AESARM, cpuid.PMULL)

Review Comment:
   technically we aren't actually *disabling* the extensions on the CPU, we're just telling the library that anything that wants to check for those extensions should consider them disabled and act accordingly. For example if an invalid value is passed we behave as if those extensions weren't enabled regardless and default to the pure go implementation.
   
   That said, I agree it would be more user-friendly to have an explicit "DISABLE" value, and then just print a warning for invalid values. I'll make that change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on PR #37785:
URL: https://github.com/apache/arrow/pull/37785#issuecomment-1735822645

   @pitrou fair point, I'll update / write the README stuff for this today. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1340286455


##########
go/parquet/internal/utils/_lib/README.md:
##########
@@ -0,0 +1,148 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# SIMD Bit Packing Implementation
+
+Go doesn't have any SIMD intrinsics so for some low-level optimizations we can 
+leverage auto-vectorization and the fact that Go lets you specify the body of a
+function in assembly to benefit from SIMD.
+
+In here we have implementations using SIMD intrinsics for AVX (amd64) and NEON (arm64).
+
+## Generating the Go assembly
+
+c2goasm and asm2plan9s are two projects which can be used in conjunction to generate
+compatible Go assembly from C assembly.
+
+First the tools need to be installed:
+
+```bash
+go install github.com/klauspost/asmfmt/cmd/asmfmt@latest
+go install github.com/minio/asm2plan9s@latest
+go install github.com/minio/c2goasm@latest
+```
+
+### Generating for amd64
+
+The Makefile in the directory above will work for amd64. `make assembly` will compile
+the c sources and then call `c2goasm` to generate the Go assembly for amd64 
+architectures.
+
+### Generating for arm64
+
+Unfortunately there are some caveats for arm64. c2goasm / asm2plan9s doesn't fully
+support arm64 correctly. However, proper assembly can be created with some slight
+manipulation of the result.
+
+The Makefile has the NEON flags for compiling the assembly by using 
+`make _lib/bit_packing_neon.s` and `make _lib/unpack_bool_neon.s` to generate the
+raw assembly sources. 
+
+Before calling `c2goasm` there's a few things that need to be modified in the assembly:

Review Comment:
   Are these done by hand?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits [arrow]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1343716370


##########
go/parquet/internal/utils/clib_amd64.s:
##########
@@ -44,6 +44,15 @@ MEMCPY_BYTE_DONE:
 	POPQ R8
 	RET
 
+// func _ClibMemcpy(dst, src unsafe.Pointer, n uint) unsafe.Pointer
+TEXT ·_ClibMemcpy(SB), NOSPLIT|NOFRAME, $16-24

Review Comment:
   Hmm, so is this file still useful?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits [arrow]

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1343241161


##########
go/parquet/internal/bmi/bmi_arm64.go:
##########
@@ -14,44 +14,51 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
+//go:build !noasm
 // +build !noasm
 
 package bmi
 
 import (
+	"fmt"
 	"os"
 	"strings"
-)
-import (
-	"golang.org/x/sys/cpu"

Review Comment:
   @cyb70289 Here is the output of `lscpu` on the AWS EC2 instance that I've been using for testing:
   
   ```shell
   admin@ip-172-31-78-97:~/arrow/go/parquet/internal/utils$ lscpu
   Architecture:           aarch64
     CPU op-mode(s):       32-bit, 64-bit
     Byte Order:           Little Endian
   CPU(s):                 1
     On-line CPU(s) list:  0
   Vendor ID:              ARM
     Model name:           Neoverse-N1
       Model:              1
       Thread(s) per core: 1
       Core(s) per socket: 1
       Socket(s):          1
       Stepping:           r3p1
       BogoMIPS:           243.75
       Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
   Caches (sum of all):
     L1d:                  64 KiB (1 instance)
     L1i:                  64 KiB (1 instance)
     L2:                   1 MiB (1 instance)
     L3:                   32 MiB (1 instance)
   NUMA:
     NUMA node(s):         1
     NUMA node0 CPU(s):    0
   Vulnerabilities:
     Gather data sampling: Not affected
     Itlb multihit:        Not affected
     L1tf:                 Not affected
     Mds:                  Not affected
     Meltdown:             Not affected
     Mmio stale data:      Not affected
     Retbleed:             Not affected
     Spec rstack overflow: Not affected
     Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
     Spectre v1:           Mitigation; __user pointer sanitization
     Spectre v2:           Mitigation; CSV2, BHB
     Srbds:                Not affected
     Tsx async abort:      Not affected
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits [arrow]

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade merged PR #37785:
URL: https://github.com/apache/arrow/pull/37785


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1334615697


##########
go/parquet/internal/bmi/bitmap_neon_arm64.s:
##########
@@ -27,8 +27,8 @@ LBB0_2:
     MOVD R0, res+16(FP)
     RET
 LBB0_4:
-    WORD $0xaa1f03e0 // mov    x0, xzr
     WORD $0xa8c17bfd // ldp    x29, x30, [sp], #16
+    MOVD ZR, res+16(FP)

Review Comment:
   removed the ARM asm version so that it just uses the pure go lookup-table impl



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on PR #37785:
URL: https://github.com/apache/arrow/pull/37785#issuecomment-1731731844

   @cyb70289 @pitrou this is ready for another look over and review now! thanks. Given the complexity i'm going to punt creating actual automation for the ARM assembly generation to follow-up PR instead


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1334487135


##########
go/parquet/internal/bmi/bitmap_neon_arm64.s:
##########
@@ -27,8 +27,8 @@ LBB0_2:
     MOVD R0, res+16(FP)
     RET
 LBB0_4:
-    WORD $0xaa1f03e0 // mov    x0, xzr
     WORD $0xa8c17bfd // ldp    x29, x30, [sp], #16
+    MOVD ZR, res+16(FP)

Review Comment:
   well that settles that :) i'll update this and remove the asm version for arm :smile: Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] cyb70289 commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "cyb70289 (via GitHub)" <gi...@apache.org>.
cyb70289 commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1335476636


##########
go/parquet/internal/bmi/bmi_arm64.go:
##########
@@ -14,44 +14,51 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
+//go:build !noasm
 // +build !noasm
 
 package bmi
 
 import (
+	"fmt"
 	"os"
 	"strings"
-)
-import (
-	"golang.org/x/sys/cpu"

Review Comment:
   `cpu.ARM64.HasASIMD` is false on some arm64 platform?
   It does work correctly on my test machine (ubunt22.05, aarch64, golang1.21).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1331776747


##########
go/parquet/internal/bmi/bmi_arm64.go:
##########
@@ -14,40 +14,38 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
+//go:build !noasm
 // +build !noasm
 
 package bmi
 
 import (
 	"os"
 	"strings"
-)
-import (
-	"golang.org/x/sys/cpu"
+
+	"github.com/klauspost/cpuid/v2"
 )
 
 func init() {
-    // Added ability to enable extension via environment:
+	// Added ability to enable extension via environment:
 	// ARM_ENABLE_EXT=NEON go test
 	if ext, ok := os.LookupEnv("ARM_ENABLE_EXT"); ok {
 		exts := strings.Split(ext, ",")
 
 		for _, x := range exts {
 			switch x {
 			case "NEON":
-				cpu.ARM64.HasASIMD = true
+				cpuid.CPU.Enable(cpuid.ASIMD)
 			case "AES":
-				cpu.ARM64.HasAES = true
+				cpuid.CPU.Enable(cpuid.AESARM)
 			case "PMULL":
-				cpu.ARM64.HasPMULL = true
+				cpuid.CPU.Enable(cpuid.PMULL)
 			default:
-				cpu.ARM64.HasASIMD = false
-				cpu.ARM64.HasAES = false
-				cpu.ARM64.HasPMULL = false
+				cpuid.CPU.Disable(cpuid.ASIMD, cpuid.AESARM, cpuid.PMULL)

Review Comment:
   > technically we aren't actually _disabling_ the extensions on the CPU
   
   Yes, I figured that out :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] cyb70289 commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "cyb70289 (via GitHub)" <gi...@apache.org>.
cyb70289 commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1332397446


##########
go/parquet/internal/bmi/bitmap_neon_arm64.s:
##########
@@ -27,8 +27,8 @@ LBB0_2:
     MOVD R0, res+16(FP)
     RET
 LBB0_4:
-    WORD $0xaa1f03e0 // mov    x0, xzr
     WORD $0xa8c17bfd // ldp    x29, x30, [sp], #16
+    MOVD ZR, res+16(FP)

Review Comment:
   > The assembly wasn't _generated_ by hand, but this fix was absolutely written by hand (me) by way of looking at the code, the generated assembly, and stepping through the instructions to figure out what was happening incorrectly.
   > 
   
   This is cool!
   
   Might be bug from asm2plan9s. Or errors from manually converting/formatting from normal arm assembly to plan9.
   Not sure the best approach to convert normal C/assembly to plan9 assembly. [asm2plan9s](https://github.com/minio/asm2plan9s) and [c2goasm](https://github.com/minio/c2goasm) are archived. Looks we should avoid them.
   
   For this pext like function, the go version (lookup table) should perform better than this one. Maybe we can just delete this assembly implementation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1333184383


##########
go/parquet/internal/bmi/bitmap_neon_arm64.s:
##########
@@ -27,8 +27,8 @@ LBB0_2:
     MOVD R0, res+16(FP)
     RET
 LBB0_4:
-    WORD $0xaa1f03e0 // mov    x0, xzr
     WORD $0xa8c17bfd // ldp    x29, x30, [sp], #16
+    MOVD ZR, res+16(FP)

Review Comment:
   @cyb70289 I'd be fine with removing the arm assembly implementation of the pext function. I haven't benchmarked the lookup table vs the ARM assembly version yet. but I agree the lookup table would likely be faster.
   
   The other issue i'm having is the bit packing side of things, though I think i may have figured out a solution to it... and i've learned more about ARM64 assembly in the last few days than i ever intended on learning LOL as I've needed to do a lot of handcoding assembly to get this to work. Looks like the issue boils down to needing anything with a label to be explicit in the assembly rather than using the `WORD`/`BYTE` syntax to pass them to the processor stream so that the labels get translated to the correct PC locations.
   
   > There should be a well-defined way to generate the assembly, so that other people can later participate in maintenance.
   
   I agree! I was a little dissappointed that @guyuqi didn't add explicit steps to how he generated the arm64 assembly but wanted to benefit from it, so that was my mistake to accept that PR so long ago. Currently I'm trying to explicitly define (and automate as much as possible) with the generation based on when i figure out what works and solves this issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits [arrow]

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on PR #37785:
URL: https://github.com/apache/arrow/pull/37785#issuecomment-1747267212

   Thanks @pitrou I'll hold off on merging till the end of the week to give @cyb70289 a chance to approve
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1335589226


##########
go/parquet/internal/utils/clib.go:
##########
@@ -0,0 +1,30 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+//go:build !noasm && !appengine && (amd64 || arm64)
+// +build !noasm
+// +build !appengine
+// +build amd64 arm64
+
+package utils
+
+import "unsafe"
+
+//go:noescape
+func _ClibMemcpy(dst, src unsafe.Pointer, n uint) unsafe.Pointer
+
+//go:noescape
+func _ClibMemset(dst unsafe.Pointer, c int, n uint) unsafe.Pointer

Review Comment:
   I'm curious, what is this and what is it good for? Doesn't Go provide memcpy/memset replacements?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] cyb70289 commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "cyb70289 (via GitHub)" <gi...@apache.org>.
cyb70289 commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1339448654


##########
go/parquet/internal/bmi/bmi_arm64.go:
##########
@@ -14,44 +14,51 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
+//go:build !noasm
 // +build !noasm
 
 package bmi
 
 import (
+	"fmt"
 	"os"
 	"strings"
-)
-import (
-	"golang.org/x/sys/cpu"

Review Comment:
   This is strange. I tested earlier golang versions. Also tried x/sys/cpu v0.5.0 (used by arrow) and latest 0.12.0. All tests are okay.
   Curious what might be the reason. If possible, can your share info about the test env? E.g.. tested in arm64 baremetal or virtual machine, linux kernel version, output of "lscpu". Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #37785:
URL: https://github.com/apache/arrow/pull/37785#issuecomment-1739448920

   I just read the README. This is horrifying :-o


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1331880925


##########
go/parquet/internal/bmi/bitmap_neon_arm64.s:
##########
@@ -27,8 +27,8 @@ LBB0_2:
     MOVD R0, res+16(FP)
     RET
 LBB0_4:
-    WORD $0xaa1f03e0 // mov    x0, xzr
     WORD $0xa8c17bfd // ldp    x29, x30, [sp], #16
+    MOVD ZR, res+16(FP)

Review Comment:
   Without knowing how Yuqi originally generated the arm64 assembly, I'm not sure what the cause of this was. All i know is that this was never correct, but was hidden by the fact that we were unaware of a bug in the `golang.org/x/sys/cpu` package which was failing to detect ARM features correctly so the ARM builds were just using the pure go implementation.
   
   > Also, why are you compiling this code for Neon? I find it unlikely that Neon can be used to speed up your SW emulation of PEXT.
   
   From what I can tell, ARM doesn't have a `PEXT` instruction, again this C impl was provided by a contributor but doing a bit of research it looks like the implementation there is essentially the standard way to implement `PEXT` for ARM in C when I do some stack overflow searching. (There's also a couple variations I came across, but they roughly even out).
   
   > Also, we have a SW emulation in Parquet C++ that processes 5 bits at a time rather than one. You could perhaps try to compile that one (or rewrite it in Go) :-)
   
   The pure go impl already does it! if you look at `bmi.go` it's a direct port of the software emulation in `level_conversion_inc.h` from the C++ parquet impl. We only go to the assembly / non-pure-go impl if our CPU check says that it supports BMI2 so that we can call `pext_u64` directly at runtime. 
   
   The compiling for NEON is to get the benefit of vectorization for the `levels_to_bitmap` function (i.e. the `GreaterThanBitmap` SIMD method). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] cyb70289 commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "cyb70289 (via GitHub)" <gi...@apache.org>.
cyb70289 commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1333980467


##########
go/parquet/internal/bmi/bitmap_neon_arm64.s:
##########
@@ -27,8 +27,8 @@ LBB0_2:
     MOVD R0, res+16(FP)
     RET
 LBB0_4:
-    WORD $0xaa1f03e0 // mov    x0, xzr
     WORD $0xa8c17bfd // ldp    x29, x30, [sp], #16
+    MOVD ZR, res+16(FP)

Review Comment:
   Wrote a [benchmark](https://github.com/cyb70289/mytests/blob/master/go/bmi/bmi_test.go), the golang version is indeed faster than assembly. Tested on Neoverse-N1.
   ```
   $ go test -bench=.
   goos: linux
   goarch: arm64
   pkg: bmi
   BenchmarkGo-80               663           1759920 ns/op
   BenchmarkAsm-80              435           2720235 ns/op
   PASS
   ok      bmi     2.838s
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1331898274


##########
go/parquet/internal/bmi/bitmap_neon_arm64.s:
##########
@@ -27,8 +27,8 @@ LBB0_2:
     MOVD R0, res+16(FP)
     RET
 LBB0_4:
-    WORD $0xaa1f03e0 // mov    x0, xzr
     WORD $0xa8c17bfd // ldp    x29, x30, [sp], #16
+    MOVD ZR, res+16(FP)

Review Comment:
   > Without knowing how Yuqi originally generated the arm64 assembly, I'm not sure what the cause of this was. 
   
   There should be a well-defined way to generate the assembly, so that other people can later participate in maintenance.
   (also, is it possible that the original arm64 assembly was simply out of sync with the current C source?)
   
   > The pure go impl already does it!
   
   Nice, so you just need to port it to C now :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1337357947


##########
go/parquet/internal/utils/clib.go:
##########
@@ -0,0 +1,30 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+//go:build !noasm && !appengine && (amd64 || arm64)
+// +build !noasm
+// +build !appengine
+// +build amd64 arm64
+
+package utils
+
+import "unsafe"
+
+//go:noescape
+func _ClibMemcpy(dst, src unsafe.Pointer, n uint) unsafe.Pointer
+
+//go:noescape
+func _ClibMemset(dst unsafe.Pointer, c int, n uint) unsafe.Pointer

Review Comment:
   this exists *purely* to test the assembly versions I provided. 
   
   The issue is that the C implementation I'm using to generate the assembly makes calls to memset and memcpy, but C's calling conventions are different than Go's. So in the assembly when it calls memset/memcpy I replaced those CALL instructions to call these replacement versions (because I can't call out to libc from inside of Go like that, no guarantee that the lib is loaded). Which are essentially just ported versions of the C memcpy/memset to ensure that the Calling conventions and argument handling are the same. 
   
   The underscore prefix there ensures that the methods are not exported from the package, so I'm only using these for `clib_test.go` in order to sanity check the assembly methods.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1340283858


##########
go/parquet/internal/utils/_lib/README.md:
##########
@@ -0,0 +1,148 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# SIMD Bit Packing Implementation
+
+Go doesn't have any SIMD intrinsics so for some low-level optimizations we can 
+leverage auto-vectorization and the fact that Go lets you specify the body of a

Review Comment:
   ```suggestion
   leverage auto-vectorization by C++ compilers and the fact that Go lets you specify the body of a
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1339051055


##########
go/parquet/internal/utils/clib.go:
##########
@@ -0,0 +1,30 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+//go:build !noasm && !appengine && (amd64 || arm64)
+// +build !noasm
+// +build !appengine
+// +build amd64 arm64
+
+package utils
+
+import "unsafe"
+
+//go:noescape
+func _ClibMemcpy(dst, src unsafe.Pointer, n uint) unsafe.Pointer
+
+//go:noescape
+func _ClibMemset(dst unsafe.Pointer, c int, n uint) unsafe.Pointer

Review Comment:
   Even the original C++ implementations (bpacking_simd128_generated.h) use memset and memcpy. There might be ways to avoid those calls but they'd likely be much less efficient. We also already use amd64 versions of these anyways, so it's good to have arm64 implementations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on PR #37785:
URL: https://github.com/apache/arrow/pull/37785#issuecomment-1724516601

   Interestingly, it's only failing on linux/arm64 not macOS ARM64.... which is, well, odd. *sigh* Why can't things be easy. I'm guessing there's probably a feature flag that i should be checking for that I'm not


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1334615313


##########
go/parquet/internal/bmi/bmi_arm64.go:
##########
@@ -14,40 +14,38 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
+//go:build !noasm
 // +build !noasm
 
 package bmi
 
 import (
 	"os"
 	"strings"
-)
-import (
-	"golang.org/x/sys/cpu"
+
+	"github.com/klauspost/cpuid/v2"
 )
 
 func init() {
-    // Added ability to enable extension via environment:
+	// Added ability to enable extension via environment:
 	// ARM_ENABLE_EXT=NEON go test
 	if ext, ok := os.LookupEnv("ARM_ENABLE_EXT"); ok {
 		exts := strings.Split(ext, ",")
 
 		for _, x := range exts {
 			switch x {
 			case "NEON":
-				cpu.ARM64.HasASIMD = true
+				cpuid.CPU.Enable(cpuid.ASIMD)
 			case "AES":
-				cpu.ARM64.HasAES = true
+				cpuid.CPU.Enable(cpuid.AESARM)
 			case "PMULL":
-				cpu.ARM64.HasPMULL = true
+				cpuid.CPU.Enable(cpuid.PMULL)
 			default:
-				cpu.ARM64.HasASIMD = false
-				cpu.ARM64.HasAES = false
-				cpu.ARM64.HasPMULL = false
+				cpuid.CPU.Disable(cpuid.ASIMD, cpuid.AESARM, cpuid.PMULL)

Review Comment:
   updated so that if the value of the env var is "DISABLE" then we disable things, otherwise we just print any unrecognized options to stderr



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits [arrow]

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1342924432


##########
go/parquet/internal/utils/_lib/README.md:
##########
@@ -0,0 +1,148 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# SIMD Bit Packing Implementation
+
+Go doesn't have any SIMD intrinsics so for some low-level optimizations we can 
+leverage auto-vectorization and the fact that Go lets you specify the body of a
+function in assembly to benefit from SIMD.
+
+In here we have implementations using SIMD intrinsics for AVX (amd64) and NEON (arm64).
+
+## Generating the Go assembly
+
+c2goasm and asm2plan9s are two projects which can be used in conjunction to generate
+compatible Go assembly from C assembly.
+
+First the tools need to be installed:
+
+```bash
+go install github.com/klauspost/asmfmt/cmd/asmfmt@latest
+go install github.com/minio/asm2plan9s@latest
+go install github.com/minio/c2goasm@latest
+```
+
+### Generating for amd64
+
+The Makefile in the directory above will work for amd64. `make assembly` will compile
+the c sources and then call `c2goasm` to generate the Go assembly for amd64 
+architectures.
+
+### Generating for arm64
+
+Unfortunately there are some caveats for arm64. c2goasm / asm2plan9s doesn't fully
+support arm64 correctly. However, proper assembly can be created with some slight
+manipulation of the result.
+
+The Makefile has the NEON flags for compiling the assembly by using 
+`make _lib/bit_packing_neon.s` and `make _lib/unpack_bool_neon.s` to generate the
+raw assembly sources. 
+
+Before calling `c2goasm` there's a few things that need to be modified in the assembly:

Review Comment:
   I did it with a series of regex's but until i write some automation, yes they are done by hand.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits [arrow]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1346094141


##########
go/parquet/internal/utils/clib_amd64.s:
##########
@@ -44,6 +44,15 @@ MEMCPY_BYTE_DONE:
 	POPQ R8
 	RET
 
+// func _ClibMemcpy(dst, src unsafe.Pointer, n uint) unsafe.Pointer
+TEXT ·_ClibMemcpy(SB), NOSPLIT|NOFRAME, $16-24

Review Comment:
   Ah, I see. Too bad, perhaps you can create a separate GH issue for it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1330425214


##########
go/parquet/internal/bmi/bmi_arm64.go:
##########
@@ -14,40 +14,38 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
+//go:build !noasm
 // +build !noasm
 
 package bmi
 
 import (
 	"os"
 	"strings"
-)
-import (
-	"golang.org/x/sys/cpu"
+
+	"github.com/klauspost/cpuid/v2"
 )
 
 func init() {
-    // Added ability to enable extension via environment:
+	// Added ability to enable extension via environment:
 	// ARM_ENABLE_EXT=NEON go test
 	if ext, ok := os.LookupEnv("ARM_ENABLE_EXT"); ok {
 		exts := strings.Split(ext, ",")
 
 		for _, x := range exts {
 			switch x {
 			case "NEON":
-				cpu.ARM64.HasASIMD = true
+				cpuid.CPU.Enable(cpuid.ASIMD)
 			case "AES":
-				cpu.ARM64.HasAES = true
+				cpuid.CPU.Enable(cpuid.AESARM)
 			case "PMULL":
-				cpu.ARM64.HasPMULL = true
+				cpuid.CPU.Enable(cpuid.PMULL)
 			default:
-				cpu.ARM64.HasASIMD = false
-				cpu.ARM64.HasAES = false
-				cpu.ARM64.HasPMULL = false
+				cpuid.CPU.Disable(cpuid.ASIMD, cpuid.AESARM, cpuid.PMULL)

Review Comment:
   Hmm, so if an invalid value is passed in `ARM_ENABLE_EXT`, all extensions are silently disabled? It would be more user-friendly to print a warning.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1330424605


##########
go/parquet/internal/bmi/bmi_arm64.go:
##########
@@ -14,40 +14,38 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
+//go:build !noasm
 // +build !noasm
 
 package bmi
 
 import (
 	"os"
 	"strings"
-)
-import (
-	"golang.org/x/sys/cpu"
+
+	"github.com/klauspost/cpuid/v2"
 )
 
 func init() {

Review Comment:
   This code looks duplicated in two different source files, is it me?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1330426112


##########
go/parquet/internal/bmi/bitmap_neon_arm64.s:
##########
@@ -27,8 +27,8 @@ LBB0_2:
     MOVD R0, res+16(FP)
     RET
 LBB0_4:
-    WORD $0xaa1f03e0 // mov    x0, xzr
     WORD $0xa8c17bfd // ldp    x29, x30, [sp], #16
+    MOVD ZR, res+16(FP)

Review Comment:
   Hmm, how did you come with this fix? I thought this assembly code wasn't written by hand?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1334614858


##########
go/parquet/internal/bmi/bmi_arm64.go:
##########
@@ -14,40 +14,38 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
+//go:build !noasm
 // +build !noasm
 
 package bmi
 
 import (
 	"os"
 	"strings"
-)
-import (
-	"golang.org/x/sys/cpu"
+
+	"github.com/klauspost/cpuid/v2"
 )
 
 func init() {

Review Comment:
   I switched this around so that it doens't need to be duplicated anymore. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1331764852


##########
go/parquet/internal/bmi/bmi_arm64.go:
##########
@@ -14,40 +14,38 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
+//go:build !noasm
 // +build !noasm
 
 package bmi
 
 import (
 	"os"
 	"strings"
-)
-import (
-	"golang.org/x/sys/cpu"
+
+	"github.com/klauspost/cpuid/v2"
 )
 
 func init() {

Review Comment:
   the reason why the code is duplicated is because this runs at image startup (via the `init` function) which means if this package isn't imported, then this function never runs. The bottom of the function sets the appropriate function pointer based on whether or not the processor actually supports things or not, and this package is a low level utility that doesn't import anything else so there isn't a shared place I could put this initialization between the two places.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1333184383


##########
go/parquet/internal/bmi/bitmap_neon_arm64.s:
##########
@@ -27,8 +27,8 @@ LBB0_2:
     MOVD R0, res+16(FP)
     RET
 LBB0_4:
-    WORD $0xaa1f03e0 // mov    x0, xzr
     WORD $0xa8c17bfd // ldp    x29, x30, [sp], #16
+    MOVD ZR, res+16(FP)

Review Comment:
   @cyb70289 I'd be fine with removing the arm assembly implementation of the pext function. I haven't benchmarked the lookup table vs the ARM assembly version yet. but I agree the lookup table would likely be faster.
   
   The other issue i'm having is the bit packing side of things, though I think i may have figured out a solution to it... and i've learned more about ARM64 assembly in the last few days than i ever intended on learning LOL as I've needed to do a lot of handcoding assembly to get this to work. Looks like the issue boils down to needing anything with a label to be explicit in the assembly rather than using the `WORD`/`BYTE` syntax to pass them to the processor stream so that the labels get translated to the correct PC locations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #37785:
URL: https://github.com/apache/arrow/pull/37785#issuecomment-1733209960

   Regardless of automation, there should be READMEs explaining the rationale and motivations, and also how to generate those files anew.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1340285870


##########
go/parquet/internal/utils/_lib/README.md:
##########
@@ -0,0 +1,148 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# SIMD Bit Packing Implementation
+
+Go doesn't have any SIMD intrinsics so for some low-level optimizations we can 
+leverage auto-vectorization and the fact that Go lets you specify the body of a
+function in assembly to benefit from SIMD.
+
+In here we have implementations using SIMD intrinsics for AVX (amd64) and NEON (arm64).
+
+## Generating the Go assembly
+
+c2goasm and asm2plan9s are two projects which can be used in conjunction to generate
+compatible Go assembly from C assembly.
+
+First the tools need to be installed:
+
+```bash
+go install github.com/klauspost/asmfmt/cmd/asmfmt@latest
+go install github.com/minio/asm2plan9s@latest
+go install github.com/minio/c2goasm@latest
+```
+
+### Generating for amd64
+
+The Makefile in the directory above will work for amd64. `make assembly` will compile
+the c sources and then call `c2goasm` to generate the Go assembly for amd64 
+architectures.
+
+### Generating for arm64
+
+Unfortunately there are some caveats for arm64. c2goasm / asm2plan9s doesn't fully
+support arm64 correctly. However, proper assembly can be created with some slight
+manipulation of the result.
+
+The Makefile has the NEON flags for compiling the assembly by using 
+`make _lib/bit_packing_neon.s` and `make _lib/unpack_bool_neon.s` to generate the
+raw assembly sources. 
+
+Before calling `c2goasm` there's a few things that need to be modified in the assembly:
+
+* x86-64 assembly uses `#` for comments while arm64 assembly uses `//` for comments.
+  `c2goasm` assumes `#` for comments and splits lines based on them. For most lines
+  this isn't an issue, but for any constants this is important and will need to have
+  the comment character converted from `//` to `#`.
+* A `word` for x86-64 is 16 bits, a `double` word is 32 bits, and a `quad` is 64 bits.
+  For arm64, a `word` is 32 bits. This means that constants in the assembly need to be
+  modified. `c2goasm` and `asm2plan9s` expect the x86-64 meaning for the sizes, so
+  usage of `.word ######` needs to be converted to `.long #####` before running
+  `c2goasm`
+  * Because of this change in bits, `MOVQ` instructions will also be converted to 

Review Comment:
   ```suggestion
   * Because of this change in bits, `MOVQ` instructions will also be converted to 
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1340501129


##########
go/parquet/internal/bmi/bmi_arm64.go:
##########
@@ -14,44 +14,51 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
+//go:build !noasm
 // +build !noasm
 
 package bmi
 
 import (
+	"fmt"
 	"os"
 	"strings"
-)
-import (
-	"golang.org/x/sys/cpu"

Review Comment:
   It was a virtual machine launched via EC2 on AWS as a Debian 12 arm64 machine. That said, the original issue that launched this investigation was a mac M1 which appeared to show the same situation that using `go1.20` acted as per `x/sys/cpu` returning false for `HasASIMD` while using `go1.21` got true and then hit this issue and failed horribly.
   
    I'll launch a new one when i get a chance and paste the output of `lscpu` here later on



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zeroshade commented on a diff in pull request #37785: GH-37712: [Go][Parquet] Fix ARM64 assembly for bitmap extract bits

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.
zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1340501129


##########
go/parquet/internal/bmi/bmi_arm64.go:
##########
@@ -14,44 +14,51 @@
 // See the License for the specific language governing permissions and
 // limitations under the License.
 
+//go:build !noasm
 // +build !noasm
 
 package bmi
 
 import (
+	"fmt"
 	"os"
 	"strings"
-)
-import (
-	"golang.org/x/sys/cpu"

Review Comment:
   It was a virtual machine launched via EC2 on AWS as a Debian 12 arm64 machine. I'll launch a new one when i get a chance and paste the output of lscpu here later on



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org