You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/10/03 13:06:00 UTC

[jira] [Commented] (KAFKA-6632) Very slow hashCode methods in Kafka Connect types

    [ https://issues.apache.org/jira/browse/KAFKA-6632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636918#comment-16636918 ] 

ASF GitHub Bot commented on KAFKA-6632:
---------------------------------------

maver1ck closed pull request #4700: KAFKA-6632: Very slow hashCode methods in Kafka Connect types
URL: https://github.com/apache/kafka/pull/4700
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/connect/api/src/main/java/org/apache/kafka/connect/data/ConnectSchema.java b/connect/api/src/main/java/org/apache/kafka/connect/data/ConnectSchema.java
index 85357fef3c2..f4a796177ef 100644
--- a/connect/api/src/main/java/org/apache/kafka/connect/data/ConnectSchema.java
+++ b/connect/api/src/main/java/org/apache/kafka/connect/data/ConnectSchema.java
@@ -300,8 +300,25 @@ public boolean equals(Object o) {
     @Override
     public int hashCode() {
         if (this.hash == null) {
-            this.hash = Objects.hash(type, optional, defaultValue, fields, keySchema, valueSchema, name, version, doc,
-                parameters);
+            int result = type != null ? type.hashCode() : 0;
+            result = 31 * result + (optional ? 1 : 0);
+            result = 31 * result + (defaultValue != null ? defaultValue.hashCode() : 0);
+            if (fields != null) {
+                for (Field f : fields) {
+                    result = 31 * result + f.hashCode();
+                }
+            }
+            result = 31 * result + (keySchema != null ? keySchema.hashCode() : 0);
+            result = 31 * result + (valueSchema != null ? valueSchema.hashCode() : 0);
+            result = 31 * result + (name != null ? name.hashCode() : 0);
+            result = 31 * result + (version != null ? version : 0);
+            result = 31 * result + (doc != null ? doc.hashCode() : 0);
+            if (parameters != null) {
+                for (Map.Entry<String, String> e : parameters.entrySet()) {
+                    result = 31 * result + e.getKey().hashCode() + e.getValue().hashCode();
+                }
+            }
+            this.hash = result;
         }
         return this.hash;
     }
diff --git a/connect/api/src/main/java/org/apache/kafka/connect/data/Field.java b/connect/api/src/main/java/org/apache/kafka/connect/data/Field.java
index b5d3f027968..64e744d7e39 100644
--- a/connect/api/src/main/java/org/apache/kafka/connect/data/Field.java
+++ b/connect/api/src/main/java/org/apache/kafka/connect/data/Field.java
@@ -71,7 +71,10 @@ public boolean equals(Object o) {
 
     @Override
     public int hashCode() {
-        return Objects.hash(name, index, schema);
+        int result = index;
+        result = 31 * result + (name != null ? name.hashCode() : 0);
+        result = 31 * result + (schema != null ? schema.hashCode() : 0);
+        return result;
     }
 
     @Override


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Very slow hashCode methods in Kafka Connect types
> -------------------------------------------------
>
>                 Key: KAFKA-6632
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6632
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>    Affects Versions: 1.0.0
>            Reporter: Maciej BryƄski
>            Priority: Major
>
> hashCode method of ConnectSchema (and Field) is used a lot in SMT and fromConnect.
> Example:
> [https://github.com/apache/kafka/blob/e5d6c9a79a4ca9b82502b8e7f503d86ddaddb7fb/connect/transforms/src/main/java/org/apache/kafka/connect/transforms/InsertField.java#L164]
> Unfortunately it's using Objects.hash which is very slow.
> I rewrite this to own implementation and gain 6x speedup.
> Microbencharks gives:
>  * Original ConnectSchema hashCode: 2995ms
>  * My implementation: 517ms
> (100000000 iterations of calculating: hashCode for on new ConnectSchema(Schema.Type.STRING))
> {code:java}
> @Override
> public int hashCode() {
>     int result = 5;
>     result = 31 * result + type.hashCode();
>     result = 31 * result + (optional ? 1 : 0);
>     result = 31 * result + (defaultValue == null ? 0 : defaultValue.hashCode());
>     if (fields != null) {
>         for (Field f : fields) {
>             result = 31 * result + f.hashCode();
>         }
>     }
>     result = 31 * result + (keySchema == null ? 0 : keySchema.hashCode());
>     result = 31 * result + (valueSchema == null ? 0 : valueSchema.hashCode());
>     result = 31 * result + (name == null ? 0 : name.hashCode());
>     result = 31 * result + (version == null ? 0 : version);
>     result = 31 * result + (doc == null ? 0 : doc.hashCode());
>     if (parameters != null) {
>         for (Map.Entry<String, String> e : parameters.entrySet()) {
>             result = 31 * result + e.getKey().hashCode() + e.getValue().hashCode();
>         }
>     }
>     return result;
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)