You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by GitBox <gi...@apache.org> on 2022/06/13 08:10:01 UTC

[GitHub] [avro] clesaec opened a new pull request, #1719: AVRO-3531:datum reader thread safe

clesaec opened a new pull request, #1719:
URL: https://github.com/apache/avro/pull/1719

   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [x] My PR addresses the following [Avro Jira 3531](https://issues.apache.org/jira/browse/AVRO-3531) issues
   
   ### Tests
   
   - [x] My PR adds the following unit tests: TestGenericDatumReader
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] clesaec commented on a diff in pull request #1719: AVRO-3531:datum reader thread safe

Posted by GitBox <gi...@apache.org>.
clesaec commented on code in PR #1719:
URL: https://github.com/apache/avro/pull/1719#discussion_r904766232


##########
lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java:
##########
@@ -498,34 +498,81 @@ protected Class findStringClass(Schema schema) {
     }
   }
 
-  private Map<Schema, Class> stringClassCache = new IdentityHashMap<>();
+  /**
+   * This class is used to reproduce part of IdentityHashMap in ConcurrentHashMap
+   * code.
+   */
+  private static final class IdentitySchemaKey {
+    private final Schema schema;
+
+    private final int hashcode;
+
+    public IdentitySchemaKey(Schema schema) {
+      this.schema = schema;
+      this.hashcode = System.identityHashCode(schema);
+    }
+
+    @Override
+    public int hashCode() {
+      return this.hashcode;
+    }
 
-  private Class getStringClass(Schema s) {
-    Class c = stringClassCache.get(s);
-    if (c == null) {
-      c = findStringClass(s);
-      stringClassCache.put(s, c);
+    @Override
+    public boolean equals(Object obj) {
+      if (obj == null || !(obj instanceof GenericDatumReader.IdentitySchemaKey)) {
+        return false;
+      }
+      IdentitySchemaKey key = (IdentitySchemaKey) obj;
+      return this == key || this.schema == key.schema;
     }
-    return c;
   }
 
-  private final Map<Class, Constructor> stringCtorCache = new HashMap<>();
+  protected static class ReaderCache {
+    private final Map<IdentitySchemaKey, Class> stringClassCache = new ConcurrentHashMap<>();
 
-  @SuppressWarnings("unchecked")
-  protected Object newInstanceFromString(Class c, String s) {

Review Comment:
   ok, done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] RyanSkraba commented on a diff in pull request #1719: AVRO-3531:datum reader thread safe

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on code in PR #1719:
URL: https://github.com/apache/avro/pull/1719#discussion_r902911632


##########
lang/java/avro/src/test/java/org/apache/avro/generic/TestGenericDatumReader.java:
##########
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.avro.generic;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Random;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+
+import org.apache.avro.Schema;
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestGenericDatumReader {
+
+  private static final Random r = new Random(System.currentTimeMillis());
+
+  @Test
+  public void testReaderCache() {

Review Comment:
   Is this meant to fail when the locks are removed around the maps?  I can't seem to get this to happen either.
   
   This is incredibly difficult to unit-test, of course -- I really appreciate the work you did here!



##########
lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java:
##########
@@ -498,32 +500,72 @@ protected Class findStringClass(Schema schema) {
     }
   }
 
-  private Map<Schema, Class> stringClassCache = new IdentityHashMap<>();
+  protected static class ReaderCache {
+    private final Map<Schema, Class> stringClassCache = new IdentityHashMap<>();
 
-  private Class getStringClass(Schema s) {
-    Class c = stringClassCache.get(s);
-    if (c == null) {
-      c = findStringClass(s);
-      stringClassCache.put(s, c);
+    private final Map<Class, Function<String, Object>> stringCtorCache = new ConcurrentHashMap<>();

Review Comment:
   Hello!  This is something that is confusing me -- wouldn't a `ConcurrentHashMap<>` also solve the problem for `stringClassCache`?  If I remember correctly, they're really well optimized for `computeIfAbsent` scenarios where most multi-threaded accesses are successful reads!
   
   I can't seem to think of any reason why the stringClassCache *requires* an identity hash... maybe for performance?  If that's the case we could always wrap the schema in something that reports its identity hash and uses `==` for `equals`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] RyanSkraba commented on pull request #1719: AVRO-3531:datum reader thread safe

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on PR #1719:
URL: https://github.com/apache/avro/pull/1719#issuecomment-1172558496

   I merged my suggestions myself -- which isn't GREAT form, but I wanted github actions to check it for me.  I hope that's OK!  Let me know if it's alright with you and we'll merge!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] clesaec commented on pull request #1719: AVRO-3531:datum reader thread safe

Posted by GitBox <gi...@apache.org>.
clesaec commented on PR #1719:
URL: https://github.com/apache/avro/pull/1719#issuecomment-1173378018

   Agree with changes for class & method visibility (_not have privilege to officially approve it, but i approve :-)_).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] clesaec commented on a diff in pull request #1719: AVRO-3531:datum reader thread safe

Posted by GitBox <gi...@apache.org>.
clesaec commented on code in PR #1719:
URL: https://github.com/apache/avro/pull/1719#discussion_r903314428


##########
lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java:
##########
@@ -498,32 +500,72 @@ protected Class findStringClass(Schema schema) {
     }
   }
 
-  private Map<Schema, Class> stringClassCache = new IdentityHashMap<>();
+  protected static class ReaderCache {
+    private final Map<Schema, Class> stringClassCache = new IdentityHashMap<>();
 
-  private Class getStringClass(Schema s) {
-    Class c = stringClassCache.get(s);
-    if (c == null) {
-      c = findStringClass(s);
-      stringClassCache.put(s, c);
+    private final Map<Class, Function<String, Object>> stringCtorCache = new ConcurrentHashMap<>();

Review Comment:
   I will try that, but identityHashMap is also optimized by the fact it stores key & object in a single array.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] RyanSkraba commented on pull request #1719: AVRO-3531:datum reader thread safe

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on PR #1719:
URL: https://github.com/apache/avro/pull/1719#issuecomment-1163486652

   I think this is really good!  I'd like to take a closer look at it before merging, especially with respect to exposing `getReaderCache()` instead of `newInstanceFromString(..)`
   
   If we were using Guava annotations (and I'm glad we're NOT), I'd suggest making this method and class package-protected with `@VisibleForTesting`.  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] RyanSkraba commented on a diff in pull request #1719: AVRO-3531:datum reader thread safe

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on code in PR #1719:
URL: https://github.com/apache/avro/pull/1719#discussion_r912121582


##########
lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java:
##########
@@ -498,34 +498,86 @@ protected Class findStringClass(Schema schema) {
     }
   }
 
-  private Map<Schema, Class> stringClassCache = new IdentityHashMap<>();
+  /**
+   * This class is used to reproduce part of IdentityHashMap in ConcurrentHashMap
+   * code.
+   */
+  private static final class IdentitySchemaKey {
+    private final Schema schema;
+
+    private final int hashcode;
+
+    public IdentitySchemaKey(Schema schema) {
+      this.schema = schema;
+      this.hashcode = System.identityHashCode(schema);
+    }
 
-  private Class getStringClass(Schema s) {
-    Class c = stringClassCache.get(s);
-    if (c == null) {
-      c = findStringClass(s);
-      stringClassCache.put(s, c);
+    @Override
+    public int hashCode() {
+      return this.hashcode;
+    }
+
+    @Override
+    public boolean equals(Object obj) {
+      if (obj == null || !(obj instanceof GenericDatumReader.IdentitySchemaKey)) {
+        return false;
+      }
+      IdentitySchemaKey key = (IdentitySchemaKey) obj;
+      return this == key || this.schema == key.schema;
     }
-    return c;
   }
 
-  private final Map<Class, Constructor> stringCtorCache = new HashMap<>();
+  protected static class ReaderCache {
+    private final Map<IdentitySchemaKey, Class> stringClassCache = new ConcurrentHashMap<>();
 
-  @SuppressWarnings("unchecked")
-  protected Object newInstanceFromString(Class c, String s) {
-    try {
-      Constructor ctor = stringCtorCache.get(c);
-      if (ctor == null) {
+    private final Map<Class, Function<String, Object>> stringCtorCache = new ConcurrentHashMap<>();
+
+    private final Function<Schema, Class> findStringClass;
+
+    public ReaderCache(Function<Schema, Class> findStringClass) {
+      this.findStringClass = findStringClass;
+    }
+
+    public Object newInstanceFromString(Class c, String s) {
+      final Function<String, Object> ctor = stringCtorCache.computeIfAbsent(c, this::buildFunction);
+      return ctor.apply(s);
+    }
+
+    private Function<String, Object> buildFunction(Class c) {
+      final Constructor ctor;
+      try {
         ctor = c.getDeclaredConstructor(String.class);
-        ctor.setAccessible(true);
-        stringCtorCache.put(c, ctor);
+      } catch (NoSuchMethodException e) {
+        throw new AvroRuntimeException(e);
       }
-      return ctor.newInstance(s);
-    } catch (NoSuchMethodException | InvocationTargetException | IllegalAccessException | InstantiationException e) {
-      throw new AvroRuntimeException(e);
+      ctor.setAccessible(true);
+
+      return (String s) -> {
+        try {
+          return ctor.newInstance(s);
+        } catch (ReflectiveOperationException e) {
+          throw new AvroRuntimeException(e);
+        }
+      };
+    }
+
+    public Class getStringClass(final Schema s) {
+      final IdentitySchemaKey key = new IdentitySchemaKey(s);
+      return this.stringClassCache.computeIfAbsent(key, (IdentitySchemaKey k) -> this.findStringClass.apply(k.schema));
     }
   }
 
+  private final ReaderCache readerCache = new ReaderCache(this::findStringClass);
+
+  protected ReaderCache getReaderCache() {

Review Comment:
   ```suggestion
     // VisibleForTesting
     ReaderCache getReaderCache() {
   ```



##########
lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectDatumReader.java:
##########
@@ -267,8 +267,8 @@ protected void readField(Object record, Field field, Object oldDatum, ResolvingD
         if (accessor.isStringable()) {
           try {
             String asString = (String) read(null, field.schema(), in);
-            accessor.set(record,
-                asString == null ? null : newInstanceFromString(accessor.getField().getType(), asString));
+            accessor.set(record, asString == null ? null

Review Comment:
   ```suggestion
               accessor.set(record,
                   asString == null ? null : newInstanceFromString(accessor.getField().getType(), asString));
   ```



##########
lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java:
##########
@@ -498,34 +498,86 @@ protected Class findStringClass(Schema schema) {
     }
   }
 
-  private Map<Schema, Class> stringClassCache = new IdentityHashMap<>();
+  /**
+   * This class is used to reproduce part of IdentityHashMap in ConcurrentHashMap
+   * code.
+   */
+  private static final class IdentitySchemaKey {
+    private final Schema schema;
+
+    private final int hashcode;
+
+    public IdentitySchemaKey(Schema schema) {
+      this.schema = schema;
+      this.hashcode = System.identityHashCode(schema);
+    }
 
-  private Class getStringClass(Schema s) {
-    Class c = stringClassCache.get(s);
-    if (c == null) {
-      c = findStringClass(s);
-      stringClassCache.put(s, c);
+    @Override
+    public int hashCode() {
+      return this.hashcode;
+    }
+
+    @Override
+    public boolean equals(Object obj) {
+      if (obj == null || !(obj instanceof GenericDatumReader.IdentitySchemaKey)) {
+        return false;
+      }
+      IdentitySchemaKey key = (IdentitySchemaKey) obj;
+      return this == key || this.schema == key.schema;
     }
-    return c;
   }
 
-  private final Map<Class, Constructor> stringCtorCache = new HashMap<>();
+  protected static class ReaderCache {

Review Comment:
   ```suggestion
     // VisibleForTesting
     static class ReaderCache {
   ```



##########
lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectDatumReader.java:
##########
@@ -267,8 +267,8 @@ protected void readField(Object record, Field field, Object oldDatum, ResolvingD
         if (accessor.isStringable()) {
           try {
             String asString = (String) read(null, field.schema(), in);
-            accessor.set(record,
-                asString == null ? null : newInstanceFromString(accessor.getField().getType(), asString));
+            accessor.set(record, asString == null ? null

Review Comment:
   This change is no longer necessary, thanks to restoring the original method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] clesaec commented on a diff in pull request #1719: AVRO-3531:datum reader thread safe

Posted by GitBox <gi...@apache.org>.
clesaec commented on code in PR #1719:
URL: https://github.com/apache/avro/pull/1719#discussion_r903438864


##########
lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java:
##########
@@ -498,32 +500,72 @@ protected Class findStringClass(Schema schema) {
     }
   }
 
-  private Map<Schema, Class> stringClassCache = new IdentityHashMap<>();
+  protected static class ReaderCache {
+    private final Map<Schema, Class> stringClassCache = new IdentityHashMap<>();
 
-  private Class getStringClass(Schema s) {
-    Class c = stringClassCache.get(s);
-    if (c == null) {
-      c = findStringClass(s);
-      stringClassCache.put(s, c);
+    private final Map<Class, Function<String, Object>> stringCtorCache = new ConcurrentHashMap<>();

Review Comment:
   Change done, this was indeed much more efficient :+1: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] clesaec commented on a diff in pull request #1719: AVRO-3531:datum reader thread safe

Posted by GitBox <gi...@apache.org>.
clesaec commented on code in PR #1719:
URL: https://github.com/apache/avro/pull/1719#discussion_r903441246


##########
lang/java/avro/src/test/java/org/apache/avro/generic/TestGenericDatumReader.java:
##########
@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.avro.generic;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Random;
+import java.util.stream.Collectors;
+import java.util.stream.IntStream;
+
+import org.apache.avro.Schema;
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestGenericDatumReader {
+
+  private static final Random r = new Random(System.currentTimeMillis());
+
+  @Test
+  public void testReaderCache() {

Review Comment:
   It can, but indeed, very difficult to reproduce. At least, this unit test help to ensure of the well behavior of this class.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] RyanSkraba commented on a diff in pull request #1719: AVRO-3531:datum reader thread safe

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on code in PR #1719:
URL: https://github.com/apache/avro/pull/1719#discussion_r904123721


##########
lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java:
##########
@@ -498,34 +498,81 @@ protected Class findStringClass(Schema schema) {
     }
   }
 
-  private Map<Schema, Class> stringClassCache = new IdentityHashMap<>();
+  /**
+   * This class is used to reproduce part of IdentityHashMap in ConcurrentHashMap
+   * code.
+   */
+  private static final class IdentitySchemaKey {
+    private final Schema schema;
+
+    private final int hashcode;
+
+    public IdentitySchemaKey(Schema schema) {
+      this.schema = schema;
+      this.hashcode = System.identityHashCode(schema);
+    }
+
+    @Override
+    public int hashCode() {
+      return this.hashcode;
+    }
 
-  private Class getStringClass(Schema s) {
-    Class c = stringClassCache.get(s);
-    if (c == null) {
-      c = findStringClass(s);
-      stringClassCache.put(s, c);
+    @Override
+    public boolean equals(Object obj) {
+      if (obj == null || !(obj instanceof GenericDatumReader.IdentitySchemaKey)) {
+        return false;
+      }
+      IdentitySchemaKey key = (IdentitySchemaKey) obj;
+      return this == key || this.schema == key.schema;
     }
-    return c;
   }
 
-  private final Map<Class, Constructor> stringCtorCache = new HashMap<>();
+  protected static class ReaderCache {
+    private final Map<IdentitySchemaKey, Class> stringClassCache = new ConcurrentHashMap<>();
 
-  @SuppressWarnings("unchecked")
-  protected Object newInstanceFromString(Class c, String s) {

Review Comment:
   We might have to keep this method (and forward it to `getReaderCache().newInstanceFromString(...)`) for backwards compatibility.  I don't think it's a bit deal.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] RyanSkraba merged pull request #1719: AVRO-3531:datum reader thread safe

Posted by GitBox <gi...@apache.org>.
RyanSkraba merged PR #1719:
URL: https://github.com/apache/avro/pull/1719


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org