You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "felipecrv (via GitHub)" <gi...@apache.org> on 2023/05/30 13:47:13 UTC

[GitHub] [arrow] felipecrv commented on a diff in pull request #35814: GH-35360: Take offset into account in ScalarHashImpl::ArrayHash()

felipecrv commented on code in PR #35814:
URL: https://github.com/apache/arrow/pull/35814#discussion_r1210302073


##########
cpp/src/arrow/util/hashing.cc:
##########
@@ -0,0 +1,167 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "arrow/util/hashing.h"
+#include "arrow/util/macros.h"
+
+namespace arrow {
+namespace internal {
+
+namespace {
+
+/// \brief Rotate-right, 64-bit.
+///
+/// This compiles to a single instruction on CPUs that have rotation instructions.
+///
+/// \pre n must be in the range [1, 64].
+#define ROTR64(v, n) (((v) >> (n)) | ((v) << (64 - (n))))
+
+/// \brief A hash function for bitmaps that can handle offsets and lengths in
+/// terms of number of bits. The hash only depends on the bits actually hashed.
+///
+/// This implementation is based on 64-bit versions of MurmurHash2 by Austin Appleby.
+///
+/// The key (a bitmap) is read as a sequence of 64-bit words before the trailing
+/// bytes. So if the input is 64-bit aligned, all memory accesses are aligned.
+///
+/// It's the caller's responsibility to ensure that bits_offset + num_bits are
+/// readable from the bitmap.
+///
+/// \param key The pointer to the bitmap.
+/// \param seed The seed for the hash function (useful when chaining hash functions).
+/// \param bits_offset The offset in bits relative to the start of the bitmap.
+/// \param num_bits The number of bits after the offset to be hashed.
+uint64_t MurmurHashBitmap64A(const uint8_t* key, uint64_t seed, uint64_t bits_offset,

Review Comment:
   This is MurmurHash modified to take bits starting on non-byte boundaries.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org