You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "aishikbh (via GitHub)" <gi...@apache.org> on 2023/11/27 08:17:57 UTC

[PR] Add support for murmur3 as partition function. [pinot]

aishikbh opened a new pull request, #12049:
URL: https://github.com/apache/pinot/pull/12049

   Currently pinot supports murmur2 as a partition function. This PR adds support for murmur3 as a partition function.
   
   # Release Notes
   * Added support for Murmur3 as a partition function.
   * Added support for adding optional field `seed` for the hash in `functionConfig` field of `columnPartitionMap`.
   * If no seed is provided, default seed value is taken as `0`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #12049:
URL: https://github.com/apache/pinot/pull/12049#discussion_r1410057271


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  private static final String NAME = "Murmur3";
+  private final int _numPartitions;
+  private final int _hashSeed;
+
+  /**
+   * Constructor for the class.
+   * @param numPartitions Number of partitions.
+   * @param functionConfig to extract configurations for the partition function.
+   */
+  public Murmur3PartitionFunction(int numPartitions, Map<String, String> functionConfig) {
+    Preconditions.checkArgument(numPartitions > 0, "Number of partitions must be > 0");
+    _numPartitions = numPartitions;
+    _hashSeed = (functionConfig == null || functionConfig.get("seed") == null) ? 0

Review Comment:
   also put the config keys as member var:
   ```
   private static final String SEED_KEY = "seed";
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "snleee (via GitHub)" <gi...@apache.org>.
snleee commented on PR #12049:
URL: https://github.com/apache/pinot/pull/12049#issuecomment-1835782616

   For release note, can you add some example configuration for hash function (including the seed)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #12049:
URL: https://github.com/apache/pinot/pull/12049#discussion_r1410054731


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  private static final String NAME = "Murmur3";

Review Comment:
   interestingly we already have MURMUR3 in https://github.com/apache/pinot/blob/b9ed378355acc5313ebc1031510671f7045f29a9/pinot-spi/src/main/java/org/apache/pinot/spi/config/table/HashFunction.java#L4 
   
   i wonder where the impl is and if we can reuse :-/



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "aishikbh (via GitHub)" <gi...@apache.org>.
aishikbh commented on PR #12049:
URL: https://github.com/apache/pinot/pull/12049#issuecomment-1835787608

   > @aishikbh For release note, can you add some example configuration for hash function (including the seed)
   
   sure I will do that. I plan to add to the documentation as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "snleee (via GitHub)" <gi...@apache.org>.
snleee commented on code in PR #12049:
URL: https://github.com/apache/pinot/pull/12049#discussion_r1414581968


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  private static final String NAME = "Murmur3";

Review Comment:
   We want to add the support murmur3 that Debezium uses as default. Unfortunately, Debezium is using a special variant x64_32 that is not available for many library that we looked into (guava/apache). I think that @aishikbh had to copy for `x64_32` variant implementation from `infinispan` that Debezium is directly depend upon
   https://github.com/infinispan/infinispan/blob/main/commons/all/src/main/java/org/infinispan/commons/hash/MurmurHash3.java



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #12049:
URL: https://github.com/apache/pinot/pull/12049#discussion_r1410057271


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  private static final String NAME = "Murmur3";
+  private final int _numPartitions;
+  private final int _hashSeed;
+
+  /**
+   * Constructor for the class.
+   * @param numPartitions Number of partitions.
+   * @param functionConfig to extract configurations for the partition function.
+   */
+  public Murmur3PartitionFunction(int numPartitions, Map<String, String> functionConfig) {
+    Preconditions.checkArgument(numPartitions > 0, "Number of partitions must be > 0");
+    _numPartitions = numPartitions;
+    _hashSeed = (functionConfig == null || functionConfig.get("seed") == null) ? 0

Review Comment:
   also put the config keys as member var:
   ```
   private static final String SEED_KEY = "seed";
   ```
   this was we keep track on all the possible config keys in the class (see BoundedColumnValuePartitionFunction)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "snleee (via GitHub)" <gi...@apache.org>.
snleee commented on code in PR #12049:
URL: https://github.com/apache/pinot/pull/12049#discussion_r1414581968


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  private static final String NAME = "Murmur3";

Review Comment:
   We want to add the support murmur3 funciton that Debezium uses as default. Unfortunately, Debezium is using a special variant x64_32 that is not available for many library that we looked into (guava/apache). I think that @aishikbh had to copy for `x64_32` variant. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "snleee (via GitHub)" <gi...@apache.org>.
snleee commented on code in PR #12049:
URL: https://github.com/apache/pinot/pull/12049#discussion_r1414581968


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  private static final String NAME = "Murmur3";

Review Comment:
   We want to add the support murmur3 that Debezium uses as default. Unfortunately, Debezium is using a special variant x64_32 that is not available for many libraries including Guava and Apache. I think that @aishikbh had to copy `x64_32` variant implementation from `infinispan` that Debezium is directly depend upon
   https://github.com/infinispan/infinispan/blob/main/commons/all/src/main/java/org/infinispan/commons/hash/MurmurHash3.java



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "aishikbh (via GitHub)" <gi...@apache.org>.
aishikbh commented on code in PR #12049:
URL: https://github.com/apache/pinot/pull/12049#discussion_r1412170065


##########
pinot-segment-spi/src/test/java/org/apache/pinot/segment/spi/partition/PartitionFunctionTest.java:
##########
@@ -115,6 +115,82 @@ public void testMurmurPartitioner() {
     }
   }
 
+  /**
+   * Unit test for {@link Murmur3PartitionFunction}.
+   * <ul>
+   *   <li> Tests that partition values are in expected range. </li>
+   *   <li> Tests that toString returns expected string. </li>
+   *   <li> Tests that the partition numbers returned by partition function with null functionConfig and
+   *   functionConfig with empty seed value are equal</li>
+   * </ul>
+   */
+  @Test
+  public void testMurmur3Partitioner() {

Review Comment:
   added.



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  private static final String NAME = "Murmur3";
+  private final int _numPartitions;
+  private final int _hashSeed;
+
+  /**
+   * Constructor for the class.
+   * @param numPartitions Number of partitions.
+   * @param functionConfig to extract configurations for the partition function.
+   */
+  public Murmur3PartitionFunction(int numPartitions, Map<String, String> functionConfig) {
+    Preconditions.checkArgument(numPartitions > 0, "Number of partitions must be > 0");
+    _numPartitions = numPartitions;
+    _hashSeed = (functionConfig == null || functionConfig.get("seed") == null) ? 0

Review Comment:
   Chose to keep the seed value as 0. If someone wants to add seed value they can add.
   
   I have added the config keys as mentioned!



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  private static final String NAME = "Murmur3";

Review Comment:
   I can see that we used the same implementation but for 128 bits [here](https://github.com/apache/pinot/blob/fe072c63bd61a1fe092c68e15bc11fee03f15911/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/HashUtils.java#L31).  So chose to use the same one, but the 32_x86 variant.



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,453 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  public static final byte INVALID_CHAR = (byte) '?';
+  private static final String NAME = "Murmur3";
+  private static final String SEED_KEY = "seed";
+  private static final String MURMUR3_VARIANT = "variant";
+  private final int _numPartitions;
+  private final int _hashSeed;
+  private final String _variant;
+
+  /**
+   * Constructor for the class.
+   * @param numPartitions Number of partitions.
+   * @param functionConfig to extract configurations for the partition function.
+   */
+  public Murmur3PartitionFunction(int numPartitions, Map<String, String> functionConfig) {
+    Preconditions.checkArgument(numPartitions > 0, "Number of partitions must be > 0");
+    Preconditions.checkArgument(
+        functionConfig == null || functionConfig.get(MURMUR3_VARIANT) == null || functionConfig.get(MURMUR3_VARIANT)
+            .isEmpty() || functionConfig.get(MURMUR3_VARIANT).equals("x86_32") || functionConfig.get(MURMUR3_VARIANT)
+            .equals("x64_32"), "Murmur3 variant must be either x86_32 or x64_32");
+    _numPartitions = numPartitions;
+
+    // default value of the hash seed is 0.
+    _hashSeed =
+        (functionConfig == null || functionConfig.get(SEED_KEY) == null || functionConfig.get(SEED_KEY).isEmpty()) ? 0
+            : Integer.parseInt(functionConfig.get(SEED_KEY));
+
+    // default value of the murmur3 variant is x86_32.
+    _variant =
+        (functionConfig == null || functionConfig.get(MURMUR3_VARIANT) == null || functionConfig.get(MURMUR3_VARIANT)
+            .isEmpty()) ? "x86_32" : functionConfig.get(MURMUR3_VARIANT);
+  }
+
+  @Override
+  public int getPartition(Object value) {
+    if (_variant.equals("x86_32")) {
+      return (murmurHash332BitsX86(value.toString().getBytes(UTF_8), _hashSeed) & Integer.MAX_VALUE) % _numPartitions;

Review Comment:
   makes sense. This name is a better name for readability.



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  private static final String NAME = "Murmur3";

Review Comment:
    @snleee summarised it perfectly. 
    
    Just to add : `murmur3` has 3 variants as part of the original implementation : `x86 32 bits`, `x64 128 bits` and `x86 128 bits` and Guava supports these implementations. Since we are generating partition numbers `x86 32 bits` should cover most of the use cases.
    
    The issue comes from the users of Debezium. Debezium uses a variant of the `x64 128` version reduced to 32 bits. I wanted to add that as well, as some users might want to use `Debezium + Murmur3`. I have kept the `x86 32 bits` variant as the default as that is part of the original implementation and `x64 32 bits` as the configurable one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as partition function [pinot]

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #12049:
URL: https://github.com/apache/pinot/pull/12049#issuecomment-1827390876

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/12049?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   Attention: `2 lines` in your changes are missing coverage. Please review.
   > Comparison is base [(`f79b618`)](https://app.codecov.io/gh/apache/pinot/commit/f79b618141e07c140663791959d96956ad91d811?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 61.62% compared to head [(`c59fa5f`)](https://app.codecov.io/gh/apache/pinot/pull/12049?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) 46.76%.
   
   | [Files](https://app.codecov.io/gh/apache/pinot/pull/12049?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Patch % | Lines |
   |---|---|---|
   | [...egment/spi/partition/Murmur3PartitionFunction.java](https://app.codecov.io/gh/apache/pinot/pull/12049?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL3BhcnRpdGlvbi9NdXJtdXIzUGFydGl0aW9uRnVuY3Rpb24uamF2YQ==) | 80.00% | [1 Missing and 1 partial :warning: ](https://app.codecov.io/gh/apache/pinot/pull/12049?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) |
   
   <details><summary>Additional details and impacted files</summary>
   
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #12049       +/-   ##
   =============================================
   - Coverage     61.62%   46.76%   -14.87%     
   + Complexity     1151      946      -205     
   =============================================
     Files          2386     1789      -597     
     Lines        129567    93964    -35603     
     Branches      20053    15194     -4859     
   =============================================
   - Hits          79847    43938    -35909     
   - Misses        43913    46903     +2990     
   + Partials       5807     3123     -2684     
   ```
   
   | [Flag](https://app.codecov.io/gh/apache/pinot/pull/12049/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [custom-integration1](https://app.codecov.io/gh/apache/pinot/pull/12049/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration](https://app.codecov.io/gh/apache/pinot/pull/12049/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration1](https://app.codecov.io/gh/apache/pinot/pull/12049/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [integration2](https://app.codecov.io/gh/apache/pinot/pull/12049/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [java-11](https://app.codecov.io/gh/apache/pinot/pull/12049/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [java-21](https://app.codecov.io/gh/apache/pinot/pull/12049/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.76% <83.33%> (-14.75%)` | :arrow_down: |
   | [skip-bytebuffers-false](https://app.codecov.io/gh/apache/pinot/pull/12049/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   | [skip-bytebuffers-true](https://app.codecov.io/gh/apache/pinot/pull/12049/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.76% <83.33%> (-14.73%)` | :arrow_down: |
   | [temurin](https://app.codecov.io/gh/apache/pinot/pull/12049/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.76% <83.33%> (-14.87%)` | :arrow_down: |
   | [unittests](https://app.codecov.io/gh/apache/pinot/pull/12049/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.76% <83.33%> (-14.87%)` | :arrow_down: |
   | [unittests1](https://app.codecov.io/gh/apache/pinot/pull/12049/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `46.76% <83.33%> (-0.18%)` | :arrow_down: |
   | [unittests2](https://app.codecov.io/gh/apache/pinot/pull/12049/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   
   </details>
   
   [:umbrella: View full report in Codecov by Sentry](https://app.codecov.io/gh/apache/pinot/pull/12049?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).   
   :loudspeaker: Have feedback on the report? [Share it here](https://about.codecov.io/codecov-pr-comment-feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "snleee (via GitHub)" <gi...@apache.org>.
snleee commented on code in PR #12049:
URL: https://github.com/apache/pinot/pull/12049#discussion_r1414582383


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,453 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  public static final byte INVALID_CHAR = (byte) '?';
+  private static final String NAME = "Murmur3";
+  private static final String SEED_KEY = "seed";
+  private static final String MURMUR3_VARIANT = "variant";
+  private final int _numPartitions;
+  private final int _hashSeed;
+  private final String _variant;
+
+  /**
+   * Constructor for the class.
+   * @param numPartitions Number of partitions.
+   * @param functionConfig to extract configurations for the partition function.
+   */
+  public Murmur3PartitionFunction(int numPartitions, Map<String, String> functionConfig) {
+    Preconditions.checkArgument(numPartitions > 0, "Number of partitions must be > 0");
+    Preconditions.checkArgument(
+        functionConfig == null || functionConfig.get(MURMUR3_VARIANT) == null || functionConfig.get(MURMUR3_VARIANT)
+            .isEmpty() || functionConfig.get(MURMUR3_VARIANT).equals("x86_32") || functionConfig.get(MURMUR3_VARIANT)
+            .equals("x64_32"), "Murmur3 variant must be either x86_32 or x64_32");
+    _numPartitions = numPartitions;
+
+    // default value of the hash seed is 0.
+    _hashSeed =
+        (functionConfig == null || functionConfig.get(SEED_KEY) == null || functionConfig.get(SEED_KEY).isEmpty()) ? 0
+            : Integer.parseInt(functionConfig.get(SEED_KEY));
+
+    // default value of the murmur3 variant is x86_32.
+    _variant =
+        (functionConfig == null || functionConfig.get(MURMUR3_VARIANT) == null || functionConfig.get(MURMUR3_VARIANT)
+            .isEmpty()) ? "x86_32" : functionConfig.get(MURMUR3_VARIANT);
+  }
+
+  @Override
+  public int getPartition(Object value) {
+    if (_variant.equals("x86_32")) {
+      return (murmurHash332BitsX86(value.toString().getBytes(UTF_8), _hashSeed) & Integer.MAX_VALUE) % _numPartitions;

Review Comment:
   murmur3Hash32BitsX64?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "snleee (via GitHub)" <gi...@apache.org>.
snleee commented on code in PR #12049:
URL: https://github.com/apache/pinot/pull/12049#discussion_r1414581968


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  private static final String NAME = "Murmur3";

Review Comment:
   We want to add the support murmur3 that Debezium uses as default. Unfortunately, Debezium is using a special variant x64_32 that is not available for many libraries including Guava and Apache. I think that @aishikbh had to copy for `x64_32` variant implementation from `infinispan` that Debezium is directly depend upon
   https://github.com/infinispan/infinispan/blob/main/commons/all/src/main/java/org/infinispan/commons/hash/MurmurHash3.java



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #12049:
URL: https://github.com/apache/pinot/pull/12049#discussion_r1410055786


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  private static final String NAME = "Murmur3";
+  private final int _numPartitions;
+  private final int _hashSeed;
+
+  /**
+   * Constructor for the class.
+   * @param numPartitions Number of partitions.
+   * @param functionConfig to extract configurations for the partition function.
+   */
+  public Murmur3PartitionFunction(int numPartitions, Map<String, String> functionConfig) {
+    Preconditions.checkArgument(numPartitions > 0, "Number of partitions must be > 0");
+    _numPartitions = numPartitions;
+    _hashSeed = (functionConfig == null || functionConfig.get("seed") == null) ? 0

Review Comment:
   given we are using Guava, should we follow guava standard?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "snleee (via GitHub)" <gi...@apache.org>.
snleee commented on code in PR #12049:
URL: https://github.com/apache/pinot/pull/12049#discussion_r1414581968


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  private static final String NAME = "Murmur3";

Review Comment:
   We want to add the support murmur3 funciton that Debezium uses as default. Unfortunately, Debezium is using a special variant x64_32 that is not available for many library that we looked into (guava/apache). I think that @aishikbh had to copy for `x64_32` variant implementation from `infinispan` that Debezium is directly depend upon
   https://github.com/infinispan/infinispan/blob/main/commons/all/src/main/java/org/infinispan/commons/hash/MurmurHash3.java



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "snleee (via GitHub)" <gi...@apache.org>.
snleee commented on code in PR #12049:
URL: https://github.com/apache/pinot/pull/12049#discussion_r1414581968


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  private static final String NAME = "Murmur3";

Review Comment:
   We want to add the support murmur3 for debezium. Unfortunately, Debezium is using a special variant x64_32 that is not available for many library that we looked into (the one `HashFunction` as based on as well as the one from apache commons). I think that @aishikbh had to copy for `x64_32` variant. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #12049:
URL: https://github.com/apache/pinot/pull/12049#discussion_r1410054731


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  private static final String NAME = "Murmur3";

Review Comment:
   interestingly we already have MURMUR3 in https://github.com/apache/pinot/blob/b9ed378355acc5313ebc1031510671f7045f29a9/pinot-spi/src/main/java/org/apache/pinot/spi/config/table/HashFunction.java#L4 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "snleee (via GitHub)" <gi...@apache.org>.
snleee commented on code in PR #12049:
URL: https://github.com/apache/pinot/pull/12049#discussion_r1406592721


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  private static final String NAME = "Murmur3";
+  private final int _numPartitions;
+  private final int _hashSeed;
+
+  /**
+   * Constructor for the class.
+   * @param numPartitions Number of partitions.
+   * @param functionConfig to extract configurations for the partition function.
+   */
+  public Murmur3PartitionFunction(int numPartitions, Map<String, String> functionConfig) {
+    Preconditions.checkArgument(numPartitions > 0, "Number of partitions must be > 0");
+    _numPartitions = numPartitions;
+    _hashSeed = (functionConfig == null || functionConfig.get("seed") == null) ? 0

Review Comment:
   seed = 0 is the default value for murmur3?
   
   Apache commons uses the default seed value of `104729`?
   
   https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/digest/MurmurHash3.html#DEFAULT_SEED
   
   On the other hand, google's murmur3 hash function uses seed = 0 by default. Can we do a bit more research on this and make the informed decision? 



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  private static final String NAME = "Murmur3";
+  private final int _numPartitions;
+  private final int _hashSeed;
+
+  /**
+   * Constructor for the class.
+   * @param numPartitions Number of partitions.
+   * @param functionConfig to extract configurations for the partition function.
+   */
+  public Murmur3PartitionFunction(int numPartitions, Map<String, String> functionConfig) {
+    Preconditions.checkArgument(numPartitions > 0, "Number of partitions must be > 0");
+    _numPartitions = numPartitions;
+    _hashSeed = (functionConfig == null || functionConfig.get("seed") == null) ? 0
+        : Integer.parseInt(functionConfig.get("seed"));
+  }
+
+  @Override
+  public int getPartition(Object value) {
+    return (Hashing.murmur3_32_fixed(_hashSeed).hashBytes(value.toString().getBytes(UTF_8)).asInt() & Integer.MAX_VALUE)

Review Comment:
   it looks that murmur3 has `murmur3_32 / murmur3_128`. Can we add `TODO` comment to add 128 bit support when needed in the future?



##########
pinot-segment-spi/src/test/java/org/apache/pinot/segment/spi/partition/PartitionFunctionTest.java:
##########
@@ -115,6 +115,82 @@ public void testMurmurPartitioner() {
     }
   }
 
+  /**
+   * Unit test for {@link Murmur3PartitionFunction}.
+   * <ul>
+   *   <li> Tests that partition values are in expected range. </li>
+   *   <li> Tests that toString returns expected string. </li>
+   *   <li> Tests that the partition numbers returned by partition function with null functionConfig and
+   *   functionConfig with empty seed value are equal</li>
+   * </ul>
+   */
+  @Test
+  public void testMurmur3Partitioner() {

Review Comment:
   Should we add the test comparing against some hand computed values?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "snleee (via GitHub)" <gi...@apache.org>.
snleee commented on PR #12049:
URL: https://github.com/apache/pinot/pull/12049#issuecomment-1840288841

   @walterddr @Jackie-Jiang Are we good with this PR? @aishikbh answered on your question on why not reusing the function from guava lib. If there's no further question, I will merge this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "aishikbh (via GitHub)" <gi...@apache.org>.
aishikbh commented on code in PR #12049:
URL: https://github.com/apache/pinot/pull/12049#discussion_r1414906892


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,453 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  public static final byte INVALID_CHAR = (byte) '?';
+  private static final String NAME = "Murmur3";
+  private static final String SEED_KEY = "seed";
+  private static final String MURMUR3_VARIANT = "variant";
+  private final int _numPartitions;
+  private final int _hashSeed;
+  private final String _variant;
+
+  /**
+   * Constructor for the class.
+   * @param numPartitions Number of partitions.
+   * @param functionConfig to extract configurations for the partition function.
+   */
+  public Murmur3PartitionFunction(int numPartitions, Map<String, String> functionConfig) {
+    Preconditions.checkArgument(numPartitions > 0, "Number of partitions must be > 0");
+    Preconditions.checkArgument(
+        functionConfig == null || functionConfig.get(MURMUR3_VARIANT) == null || functionConfig.get(MURMUR3_VARIANT)
+            .isEmpty() || functionConfig.get(MURMUR3_VARIANT).equals("x86_32") || functionConfig.get(MURMUR3_VARIANT)
+            .equals("x64_32"), "Murmur3 variant must be either x86_32 or x64_32");
+    _numPartitions = numPartitions;
+
+    // default value of the hash seed is 0.
+    _hashSeed =
+        (functionConfig == null || functionConfig.get(SEED_KEY) == null || functionConfig.get(SEED_KEY).isEmpty()) ? 0
+            : Integer.parseInt(functionConfig.get(SEED_KEY));
+
+    // default value of the murmur3 variant is x86_32.
+    _variant =
+        (functionConfig == null || functionConfig.get(MURMUR3_VARIANT) == null || functionConfig.get(MURMUR3_VARIANT)
+            .isEmpty()) ? "x86_32" : functionConfig.get(MURMUR3_VARIANT);
+  }
+
+  @Override
+  public int getPartition(Object value) {
+    if (_variant.equals("x86_32")) {
+      return (murmurHash332BitsX86(value.toString().getBytes(UTF_8), _hashSeed) & Integer.MAX_VALUE) % _numPartitions;

Review Comment:
   makes sense. This name is better for readability.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #12049:
URL: https://github.com/apache/pinot/pull/12049#discussion_r1414312554


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  private static final String NAME = "Murmur3";

Review Comment:
   +1. Also Guava already provides the murmur3 implementation (used in `HashUtils`), do we need to copy it from another library?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on code in PR #12049:
URL: https://github.com/apache/pinot/pull/12049#discussion_r1415713250


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  private static final String NAME = "Murmur3";

Review Comment:
   :+1:



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "snleee (via GitHub)" <gi...@apache.org>.
snleee commented on code in PR #12049:
URL: https://github.com/apache/pinot/pull/12049#discussion_r1414581968


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/Murmur3PartitionFunction.java:
##########
@@ -0,0 +1,69 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.partition;
+
+import com.google.common.base.Preconditions;
+import com.google.common.hash.Hashing;
+import java.util.Map;
+
+import static java.nio.charset.StandardCharsets.UTF_8;
+
+
+/**
+ * Implementation of {@link PartitionFunction} which partitions based on 32 bit murmur3 hash
+ */
+public class Murmur3PartitionFunction implements PartitionFunction {
+  private static final String NAME = "Murmur3";

Review Comment:
   We want to add the support murmur3 funciton that Debezium uses as default. Unfortunately, Debezium is using a special variant x64_32 that is not available for many library that we looked into (the one `HashFunction` as based on as well as the one from apache commons). I think that @aishikbh had to copy for `x64_32` variant. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [PR] Add support for murmur3 as a partition function [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang merged PR #12049:
URL: https://github.com/apache/pinot/pull/12049


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org