You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/18 20:09:38 UTC

[GitHub] [arrow] kou commented on a diff in pull request #14674: ARROW-18362: [C++][Parquet] Accelerate Parquet bit-packing decoding with ICX AVX-512 instructions

kou commented on code in PR #14674:
URL: https://github.com/apache/arrow/pull/14674#discussion_r1026822634


##########
cpp/src/arrow/util/bpacking_avx512_icx_inline.h:
##########
@@ -0,0 +1,1139 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "bpacking_avx512_icx.h"
+#include "arrow/util/logging.h"
+
+#include <algorithm>
+#include <type_traits>
+#include <limits.h>
+#include <immintrin.h>
+
+#include <boost/preprocessor/repetition/repeat_from_to.hpp>
+
+#define ALWAYS_INLINE __attribute__((always_inline))

Review Comment:
   `__attribute__((always_inline))` doesn't work with MSVC.
   `__forceinline` is the syntax for it in MSVC: https://learn.microsoft.com/en-us/cpp/cpp/inline-functions-cpp?view=msvc-170
   
   We don't want to define a macro without prefix in public headers.
   
   BTW, why do we need this?



##########
cpp/src/arrow/util/bpacking_avx512_icx.h:
##########
@@ -0,0 +1,227 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <cstddef>
+#include <cstdint>
+#include <utility>
+#include <immintrin.h>
+
+namespace arrow {
+namespace internal {
+
+/// Utilities for manipulating bit-packed values. Bit-packing is a technique for
+/// compressing integer values that do not use the full range of the integer type.
+/// E.g. an array of uint32_t values with range [0, 31] only uses the lower 5 bits
+/// of every uint32_t value, or an array of 0/1 booleans only uses the lowest bit
+/// of each integer.
+///
+/// Bit-packing always has a "bit width" parameter that determines the range of
+/// representable unsigned values: [0, 2^bit_width - 1]. The packed representation
+/// is logically the concatenatation of the lower bits of the input values (in
+/// little-endian order). E.g. the values 1, 2, 3, 4 packed with bit width 4 results
+/// in the two output bytes: [ 0 0 1 0 | 0 0 0 1 ] [ 0 1 0 0 | 0 0 1 1 ]
+///                               2         1           4         3
+///
+/// Packed values can be split across words, e.g. packing 1, 17 with bit_width 5 results
+/// in the two output bytes: [ 0 0 1 | 0 0 0 0 1 ] [ x x x x x x | 1 0 ]
+///            lower bits of 17--^         1         next value     ^--upper bits of 17
+///
+/// Bit widths from 0 to 64 are supported (0 bit width means that every value is 0).
+/// The batched unpacking functions operate on batches of 32 values. This batch size
+/// is convenient because for every supported bit width, the end of a 32 value batch
+/// falls on a byte boundary. It is also large enough to amortise loop overheads.
+class BitPacking {

Review Comment:
   It seems that all methods are defined as `static`. (We don't instantiate this.)
   How about using `namespace` instead?



##########
cpp/src/arrow/util/bpacking_avx512_icx_inline.h:
##########
@@ -0,0 +1,1139 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "bpacking_avx512_icx.h"
+#include "arrow/util/logging.h"
+
+#include <algorithm>
+#include <type_traits>
+#include <limits.h>
+#include <immintrin.h>
+
+#include <boost/preprocessor/repetition/repeat_from_to.hpp>

Review Comment:
   We don't want to use Boost for public header to avoid build-time Boost dependency.



##########
cpp/src/arrow/util/bit_stream_utils.h:
##########
@@ -340,17 +344,32 @@ inline int BitReader::GetBatch(int num_bits, T* v, int batch_size) {
     }
   }
 
+  int num_unpacked = 0;
   if (sizeof(T) == 4) {
-    int num_unpacked =

Review Comment:
   Could you revert this scope change to keep narrower variable scope?



##########
cpp/src/arrow/util/bit_stream_utils.h:
##########
@@ -25,6 +25,10 @@
 #include <cstdint>
 
 #include "arrow/util/bit_util.h"
+#if defined(ARROW_HAVE_RUNTIME_AVX512_ICX)
+#include "arrow/util/bpacking_avx512_icx.h"
+#include "arrow/util/bpacking_avx512_icx_inline.h"

Review Comment:
   Why should we split these header files?



##########
cpp/src/arrow/util/bpacking_avx512_icx_inline.h:
##########
@@ -0,0 +1,1139 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "bpacking_avx512_icx.h"
+#include "arrow/util/logging.h"
+
+#include <algorithm>
+#include <type_traits>
+#include <limits.h>
+#include <immintrin.h>
+
+#include <boost/preprocessor/repetition/repeat_from_to.hpp>
+
+#define ALWAYS_INLINE __attribute__((always_inline))
+#define UNLIKELY(expr) __builtin_expect(!!(expr), 0)
+#define LIKELY(expr) __builtin_expect(!!(expr), 1)

Review Comment:
   Can we use existing `ARROW_PREDICT_FALSE()`/`ARROW_PREDICT_TRUE()` instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org