You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@avro.apache.org by "opwvhk (via GitHub)" <gi...@apache.org> on 2023/12/19 11:55:17 UTC

[PR] AVRO-3666 [Java] Use the new schema parser [avro]

opwvhk opened a new pull request, #2642:
URL: https://github.com/apache/avro/pull/2642

<!--

*Thank you very much for contributing to Apache Avro - we are happy that you want to help us improve Avro. To help the community review your contribution in the best possible way, please go through the checklist below, which will get the contribution into a shape in which it can be best reviewed.*

*Please understand that we do not do this to make contributions to Avro a hassle. In order to uphold a high standard of quality for code contributions, while at the same time managing a large number of contributions, we need contributors to prepare the contributions well, and give reviewers enough contextual information for the review. Please also understand that contributions that do not follow this guide will take longer to review and thus typically be picked up with lower priority by the community.*

## Contribution Checklist

- Make sure that the pull request corresponds to a [JIRA issue](https://issues.apache.org/jira/projects/AVRO/issues). Exceptions are made for typos in JavaDoc or documentation files, which need no JIRA issue.

- Name the pull request in the form "AVRO-XXXX: [component] Title of the pull request", where *AVRO-XXXX* should be replaced by the actual issue number.
The *component* is optional, but can help identify the correct reviewers faster: either the language ("java", "python") or subsystem such as "build" or "doc" are good candidates.

- Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review.

- Make sure that the change passes the automated tests. You can [build the entire project](https://github.com/apache/avro/blob/main/BUILD.md) or just the [language-specific SDK](https://avro.apache.org/project/how-to-contribute/#unit-tests).

- Each pull request should address only one issue, not mix up code from multiple issues.

- Each commit in the pull request has a meaningful commit message (including the JIRA id)

- Every commit message references Jira issues in their subject lines. In addition, commits follow the guidelines from [How to write a good git commit message](https://chris.beams.io/posts/git-commit/)
1. Subject is separated from body by a blank line
1. Subject is limited to 50 characters (not including Jira issue reference)
1. Subject does not end with a period
1. Subject uses the imperative mood ("add", not "adding")
1. Body wraps at 72 characters
1. Body explains "what" and "why", not "how"

-->

## What is the purpose of the change

This change updates schema parsing so forward references are handles in a uniform way, regardless of the parser used (JSON, IDL, ...).

This change is the missing bit for AVRO-3666, re-enabling some disabled tests (and more changes to debug). This also improves the schema resolving visitor copied from the original IDL parser (it didn't handle circular references in logical types), and aliases pointing to the null namespace (this was lost when querying aliases).

## Verifying this change

This change updates a lot of tests to use the new schema parser (or the internal JSON parser), but does not alter tests. As such, the change is covered by existing schema parsing tests.

## Documentation

- Does this pull request introduce a new feature? (~yes~ / **no**)
- If yes, how is the feature documented? (**not applicable** / ~docs / JavaDocs / not documented~)

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "opwvhk (via GitHub)" <gi...@apache.org>.

opwvhk commented on code in PR #2642:
URL: https://github.com/apache/avro/pull/2642#discussion_r1459282555


##########
lang/java/avro/src/main/java/org/apache/avro/Schema.java:
##########
@@ -1867,47 +1781,70 @@ private static boolean isValidValue(Schema schema, JsonNode value) {
     }
   }
 
-  /**
-   * Parse named schema in order to fill names map. This method does not parse
-   * field of record/error schema.
-   *
-   * @param schema           : json schema representation.
-   * @param names            : map of named schema.
-   * @param currentNameSpace : current working name space.
-   * @return schema.
-   */
-  static Schema parseNamesDeclared(JsonNode schema, Names names, String currentNameSpace) {
+  /** @see #parse(String) */
+  static Schema parse(JsonNode schema, ParseContext context, String currentNameSpace) {
     if (schema == null) {
-      return null;
+      throw new SchemaParseException("Cannot parse <null> schema");
     }
-    if (schema.isObject()) {
-
-      String type = Schema.getOptionalText(schema, "type");
+    if (schema.isTextual()) { // name
+      return context.find(schema.textValue(), currentNameSpace);
+    } else if (schema.isObject()) {
+      Schema result;
+      String type = getRequiredText(schema, "type", "No type");
       Name name = null;
-
+      String space = null;
       String doc = null;
-      Schema result = null;
       final boolean isTypeError = "error".equals(type);
       final boolean isTypeRecord = "record".equals(type);
       final boolean isTypeEnum = "enum".equals(type);
       final boolean isTypeFixed = "fixed".equals(type);
-
       if (isTypeRecord || isTypeError || isTypeEnum || isTypeFixed) {
-        String space = getOptionalText(schema, "namespace");
+        space = getOptionalText(schema, "namespace");
         doc = getOptionalText(schema, "doc");
         if (space == null)
           space = currentNameSpace;
         name = new Name(getRequiredText(schema, "name", "No name in schema"), space);
       }
-      if (isTypeRecord || isTypeError) { // record
+      if (PRIMITIVES.containsKey(type)) { // primitive
+        result = create(PRIMITIVES.get(type));
+      } else if (isTypeRecord || isTypeError) { // record
+        List<Field> fields = new ArrayList<>();
         result = new RecordSchema(name, doc, isTypeError);
-        names.add(result);
+        context.put(result);
         JsonNode fieldsNode = schema.get("fields");
-
         if (fieldsNode == null || !fieldsNode.isArray())
           throw new SchemaParseException("Record has no fields: " + schema);
-        exploreFields(fieldsNode, names, name != null ? name.space : null);
-
+        for (JsonNode field : fieldsNode) {

Review Comment:
   Because the (semantic) complexity is in navigating the schema and fields twice, not in the technical method complexity. I'd be happy to split it in multiple methods to parse specific schema types though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "opwvhk (via GitHub)" <gi...@apache.org>.

opwvhk commented on code in PR #2642:
URL: https://github.com/apache/avro/pull/2642#discussion_r1459280310


##########
lang/java/avro/src/main/java/org/apache/avro/Schema.java:
##########
@@ -125,20 +125,20 @@ private Object readResolve() {
     FACTORY.setCodec(MAPPER);
   }
 
-  /** The type of a schema. */
+  /** The type of schema. */
   public enum Type {
     RECORD, ENUM, ARRAY, MAP, UNION, FIXED, STRING, BYTES, INT, LONG, FLOAT, DOUBLE, BOOLEAN, NULL;
 
     private final String name;
 
-    private Type() {
+    Type() {

Review Comment:
   Actually, it cannot be anything but private. Removing the keyword has no effect.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "opwvhk (via GitHub)" <gi...@apache.org>.

opwvhk commented on code in PR #2642:
URL: https://github.com/apache/avro/pull/2642#discussion_r1459290267


##########
lang/java/avro/src/main/java/org/apache/avro/Schema.java:
##########
@@ -1920,214 +1857,54 @@ static Schema parseNamesDeclared(JsonNode schema, Names names, String currentNam
         if (enumDefault != null)
           defaultSymbol = enumDefault.textValue();
         result = new EnumSchema(name, doc, symbols, defaultSymbol);
-        names.add(result);
+        context.put(result);
       } else if (type.equals("array")) { // array
         JsonNode itemsNode = schema.get("items");
         if (itemsNode == null)
           throw new SchemaParseException("Array has no items type: " + schema);
-        final Schema items = Schema.parseNamesDeclared(itemsNode, names, currentNameSpace);
-        result = Schema.createArray(items);
+        result = new ArraySchema(parse(itemsNode, context, currentNameSpace));
       } else if (type.equals("map")) { // map
         JsonNode valuesNode = schema.get("values");
         if (valuesNode == null)
           throw new SchemaParseException("Map has no values type: " + schema);
-        final Schema values = Schema.parseNamesDeclared(valuesNode, names, currentNameSpace);
-        result = Schema.createMap(values);
+        result = new MapSchema(parse(valuesNode, context, currentNameSpace));
       } else if (isTypeFixed) { // fixed
         JsonNode sizeNode = schema.get("size");
         if (sizeNode == null || !sizeNode.isInt())
           throw new SchemaParseException("Invalid or no size: " + schema);
         result = new FixedSchema(name, doc, sizeNode.intValue());
-        if (name != null)
-          names.add(result);
-      } else if (PRIMITIVES.containsKey(type)) {
-        result = Schema.create(PRIMITIVES.get(type));
-      }
-      if (result != null) {
-        Set<String> reserved = SCHEMA_RESERVED;
-        if (isTypeEnum) {
-          reserved = ENUM_RESERVED;
-        }
-        Schema.addProperties(schema, reserved, result);
-      }
-      return result;
-    } else if (schema.isArray()) {
-      List<Schema> subs = new ArrayList<>(schema.size());
-      schema.forEach((JsonNode item) -> {
-        Schema sub = Schema.parseNamesDeclared(item, names, currentNameSpace);
-        if (sub != null) {
-          subs.add(sub);
-        }
-      });
-      return Schema.createUnion(subs);
-    } else if (schema.isTextual()) {
-      String value = schema.asText();
-      return names.get(value);
-    }
-    return null;
-  }
-
-  private static void addProperties(JsonNode schema, Set<String> reserved, Schema avroSchema) {
-    Iterator<String> i = schema.fieldNames();
-    while (i.hasNext()) { // add properties
-      String prop = i.next();
-      if (!reserved.contains(prop)) // ignore reserved
-        avroSchema.addProp(prop, schema.get(prop));
-    }
-    // parse logical type if present
-    avroSchema.logicalType = LogicalTypes.fromSchemaIgnoreInvalid(avroSchema);
-    // names.space(savedSpace); // restore space
-    if (avroSchema instanceof NamedSchema) {
-      Set<String> aliases = parseAliases(schema);
-      if (aliases != null) // add aliases
-        for (String alias : aliases)
-          avroSchema.addAlias(alias);
-    }
-  }
-
-  /**
-   * Explore record fields in order to fill names map with inner defined named
-   * types.
-   *
-   * @param fieldsNode : json node for field.
-   * @param names      : names map.
-   * @param nameSpace  : current working namespace.
-   */
-  private static void exploreFields(JsonNode fieldsNode, Names names, String nameSpace) {
-    for (JsonNode field : fieldsNode) {
-      final JsonNode fieldType = field.get("type");
-      if (fieldType != null) {
-        if (fieldType.isObject()) {
-          parseNamesDeclared(fieldType, names, nameSpace);
-        } else if (fieldType.isArray()) {
-          exploreFields(fieldType, names, nameSpace);
-        } else if (fieldType.isTextual() && field.isObject()) {
-          parseNamesDeclared(field, names, nameSpace);
-        }
-      }
-    }
-  }
-
-  /**
-   * in complement of parseNamesDeclared, this method parse schema in details.
-   *
-   * @param schema       : json schema.
-   * @param names        : names map.
-   * @param currentSpace : current working name space.
-   * @return complete schema.
-   */
-  static Schema parseCompleteSchema(JsonNode schema, Names names, String currentSpace) {
-    if (schema == null) {
-      throw new SchemaParseException("Cannot parse <null> schema");
-    }
-    if (schema.isTextual()) {
-      String type = schema.asText();
-      Schema avroSchema = names.get(type);
-      if (avroSchema == null) {
-        avroSchema = names.get(currentSpace + "." + type);
-      }
-      return avroSchema;
-    }
-    if (schema.isArray()) {
-      List<Schema> schemas = StreamSupport.stream(schema.spliterator(), false)
-          .map((JsonNode sub) -> parseCompleteSchema(sub, names, currentSpace)).collect(Collectors.toList());
-      return Schema.createUnion(schemas);
-    }
-    if (schema.isObject()) {
-      Schema result = null;
-      String type = getRequiredText(schema, "type", "No type");
-      Name name = null;
-
-      final boolean isTypeError = "error".equals(type);
-      final boolean isTypeRecord = "record".equals(type);
-      final boolean isTypeArray = "array".equals(type);
-
-      if (isTypeRecord || isTypeError || "enum".equals(type) || "fixed".equals(type)) {
-        // named schema
-        String space = getOptionalText(schema, "namespace");
-
-        if (space == null)
-          space = currentSpace;
-        name = new Name(getRequiredText(schema, "name", "No name in schema"), space);
-
-        result = names.get(name);
-        if (result == null) {
-          throw new SchemaParseException("Unparsed field type " + name);
-        }
-      }
-      if (isTypeRecord || isTypeError) {
-        if (result != null && !result.hasFields()) {
-          final List<Field> fields = new ArrayList<>();
-          JsonNode fieldsNode = schema.get("fields");
-          if (fieldsNode == null || !fieldsNode.isArray())
-            throw new SchemaParseException("Record has no fields: " + schema);
-
-          for (JsonNode field : fieldsNode) {
-            Field f = Field.parse(field, names, name.space);
-
-            fields.add(f);
-            if (f.schema.getLogicalType() == null && getOptionalText(field, LOGICAL_TYPE_PROP) != null)
-              LOG.warn(
-                  "Ignored the {}.{}.logicalType property (\"{}\"). It should probably be nested inside the \"type\" for the field.",
-                  name, f.name, getOptionalText(field, "logicalType"));
-          }
-          result.setFields(fields);
-        }
-      } else if (isTypeArray) {
-        JsonNode items = schema.get("items");
-        Schema schemaItems = parseCompleteSchema(items, names, currentSpace);
-        result = Schema.createArray(schemaItems);
-      } else if ("map".equals(type)) {
-        JsonNode values = schema.get("values");
-        Schema mapItems = parseCompleteSchema(values, names, currentSpace);
-        result = Schema.createMap(mapItems);
-      } else if (result == null) {
-        result = names.get(currentSpace + "." + type);
-        if (result == null) {
-          result = names.get(type);
-        }
+        context.put(result);
+      } else { // For unions with self reference
+        return context.find(type, currentNameSpace);
       }
+      Iterator<String> i = schema.fieldNames();
 
       Set<String> reserved = SCHEMA_RESERVED;
-      if ("enum".equals(type)) {
+      if (isTypeEnum) {
         reserved = ENUM_RESERVED;
       }
-      Schema.addProperties(schema, reserved, result);
-      return result;
-    }
-    return null;
-  }
-
-  static Schema parse(JsonNode schema, Names names) {
-    if (schema == null) {
-      throw new SchemaParseException("Cannot parse <null> schema");
-    }
-
-    Schema result = Schema.parseNamesDeclared(schema, names, names.space);
-    Schema.parseCompleteSchema(schema, names, names.space);
-
-    return result;
-  }
-
-  static Schema resolveSchema(JsonNode schema, Names names, String currentNameSpace) {
-    String np = currentNameSpace;
-    String nodeName = getOptionalText(schema, "name");
-    if (nodeName != null) {
-      final JsonNode nameSpace = schema.get("namespace");
-      StringBuilder fullName = new StringBuilder();
-      if (nameSpace != null && nameSpace.isTextual()) {
-        fullName.append(nameSpace.asText()).append(".");
-        np = nameSpace.asText();
+      while (i.hasNext()) { // add properties

Review Comment:
   Yes. I just hadn't changed this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.

Fokko commented on PR #2642:
URL: https://github.com/apache/avro/pull/2642#issuecomment-2034587224

   @opwvhk Could you resolve the merge conflicts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "clesaec (via GitHub)" <gi...@apache.org>.

clesaec commented on code in PR #2642:
URL: https://github.com/apache/avro/pull/2642#discussion_r1432418183


##########
lang/java/avro/src/main/java/org/apache/avro/ParseContext.java:
##########
@@ -220,6 +232,77 @@ public void rollback() {
     newSchemas.clear();
   }
 
+  /**
+   * Resolve all (named) schemas that were parsed. This resolves all forward
+   * references, even if parsed from different files.
+   *
+   * @return all parsed schemas, in the order they were parsed
+   * @throws AvroTypeException if a reference cannot be resolved
+   */
+  public List<Schema> resolveAllTypes() {
+    if (hasNewSchemas()) {
+      throw new IllegalStateException("Types cannot be resolved unless the ParseContext is committed.");
+    }
+
+    if (!isResolved) {
+      NameValidator saved = Schema.getNameValidator();
+      try {
+        Schema.setNameValidator(nameValidator); // Ensure we use the same validation.
+        HashMap<String, Schema> result = new LinkedHashMap<>(oldSchemas);

Review Comment:
   Why LinkedHashMap here; where it needs to keep order ?
   `Map<String, Schema> result = new HashMap<>(oldSchemas);`
   is not enough ?
   



##########
lang/java/avro/src/main/java/org/apache/avro/Schema.java:
##########
@@ -1867,47 +1781,70 @@ private static boolean isValidValue(Schema schema, JsonNode value) {
     }
   }
 
-  /**
-   * Parse named schema in order to fill names map. This method does not parse
-   * field of record/error schema.
-   *
-   * @param schema           : json schema representation.
-   * @param names            : map of named schema.
-   * @param currentNameSpace : current working name space.
-   * @return schema.
-   */
-  static Schema parseNamesDeclared(JsonNode schema, Names names, String currentNameSpace) {
+  /** @see #parse(String) */
+  static Schema parse(JsonNode schema, ParseContext context, String currentNameSpace) {
     if (schema == null) {
-      return null;
+      throw new SchemaParseException("Cannot parse <null> schema");
     }
-    if (schema.isObject()) {
-
-      String type = Schema.getOptionalText(schema, "type");
+    if (schema.isTextual()) { // name
+      return context.find(schema.textValue(), currentNameSpace);
+    } else if (schema.isObject()) {
+      Schema result;
+      String type = getRequiredText(schema, "type", "No type");
       Name name = null;
-
+      String space = null;
       String doc = null;
-      Schema result = null;
       final boolean isTypeError = "error".equals(type);
       final boolean isTypeRecord = "record".equals(type);
       final boolean isTypeEnum = "enum".equals(type);
       final boolean isTypeFixed = "fixed".equals(type);
-
       if (isTypeRecord || isTypeError || isTypeEnum || isTypeFixed) {
-        String space = getOptionalText(schema, "namespace");
+        space = getOptionalText(schema, "namespace");
         doc = getOptionalText(schema, "doc");
         if (space == null)
           space = currentNameSpace;
         name = new Name(getRequiredText(schema, "name", "No name in schema"), space);
       }
-      if (isTypeRecord || isTypeError) { // record
+      if (PRIMITIVES.containsKey(type)) { // primitive
+        result = create(PRIMITIVES.get(type));
+      } else if (isTypeRecord || isTypeError) { // record
+        List<Field> fields = new ArrayList<>();
         result = new RecordSchema(name, doc, isTypeError);
-        names.add(result);
+        context.put(result);
         JsonNode fieldsNode = schema.get("fields");
-
         if (fieldsNode == null || !fieldsNode.isArray())
           throw new SchemaParseException("Record has no fields: " + schema);
-        exploreFields(fieldsNode, names, name != null ? name.space : null);
-
+        for (JsonNode field : fieldsNode) {

Review Comment:
   why not keep here a separate method for fields , and then, reduce complexity of this method ?



##########
lang/java/avro/src/main/java/org/apache/avro/Schema.java:
##########
@@ -1920,214 +1857,54 @@ static Schema parseNamesDeclared(JsonNode schema, Names names, String currentNam
         if (enumDefault != null)
           defaultSymbol = enumDefault.textValue();
         result = new EnumSchema(name, doc, symbols, defaultSymbol);
-        names.add(result);
+        context.put(result);
       } else if (type.equals("array")) { // array
         JsonNode itemsNode = schema.get("items");
         if (itemsNode == null)
           throw new SchemaParseException("Array has no items type: " + schema);
-        final Schema items = Schema.parseNamesDeclared(itemsNode, names, currentNameSpace);
-        result = Schema.createArray(items);
+        result = new ArraySchema(parse(itemsNode, context, currentNameSpace));
       } else if (type.equals("map")) { // map
         JsonNode valuesNode = schema.get("values");
         if (valuesNode == null)
           throw new SchemaParseException("Map has no values type: " + schema);
-        final Schema values = Schema.parseNamesDeclared(valuesNode, names, currentNameSpace);
-        result = Schema.createMap(values);
+        result = new MapSchema(parse(valuesNode, context, currentNameSpace));
       } else if (isTypeFixed) { // fixed
         JsonNode sizeNode = schema.get("size");
         if (sizeNode == null || !sizeNode.isInt())
           throw new SchemaParseException("Invalid or no size: " + schema);
         result = new FixedSchema(name, doc, sizeNode.intValue());
-        if (name != null)
-          names.add(result);
-      } else if (PRIMITIVES.containsKey(type)) {
-        result = Schema.create(PRIMITIVES.get(type));
-      }
-      if (result != null) {
-        Set<String> reserved = SCHEMA_RESERVED;
-        if (isTypeEnum) {
-          reserved = ENUM_RESERVED;
-        }
-        Schema.addProperties(schema, reserved, result);
-      }
-      return result;
-    } else if (schema.isArray()) {
-      List<Schema> subs = new ArrayList<>(schema.size());
-      schema.forEach((JsonNode item) -> {
-        Schema sub = Schema.parseNamesDeclared(item, names, currentNameSpace);
-        if (sub != null) {
-          subs.add(sub);
-        }
-      });
-      return Schema.createUnion(subs);
-    } else if (schema.isTextual()) {
-      String value = schema.asText();
-      return names.get(value);
-    }
-    return null;
-  }
-
-  private static void addProperties(JsonNode schema, Set<String> reserved, Schema avroSchema) {
-    Iterator<String> i = schema.fieldNames();
-    while (i.hasNext()) { // add properties
-      String prop = i.next();
-      if (!reserved.contains(prop)) // ignore reserved
-        avroSchema.addProp(prop, schema.get(prop));
-    }
-    // parse logical type if present
-    avroSchema.logicalType = LogicalTypes.fromSchemaIgnoreInvalid(avroSchema);
-    // names.space(savedSpace); // restore space
-    if (avroSchema instanceof NamedSchema) {
-      Set<String> aliases = parseAliases(schema);
-      if (aliases != null) // add aliases
-        for (String alias : aliases)
-          avroSchema.addAlias(alias);
-    }
-  }
-
-  /**
-   * Explore record fields in order to fill names map with inner defined named
-   * types.
-   *
-   * @param fieldsNode : json node for field.
-   * @param names      : names map.
-   * @param nameSpace  : current working namespace.
-   */
-  private static void exploreFields(JsonNode fieldsNode, Names names, String nameSpace) {
-    for (JsonNode field : fieldsNode) {
-      final JsonNode fieldType = field.get("type");
-      if (fieldType != null) {
-        if (fieldType.isObject()) {
-          parseNamesDeclared(fieldType, names, nameSpace);
-        } else if (fieldType.isArray()) {
-          exploreFields(fieldType, names, nameSpace);
-        } else if (fieldType.isTextual() && field.isObject()) {
-          parseNamesDeclared(field, names, nameSpace);
-        }
-      }
-    }
-  }
-
-  /**
-   * in complement of parseNamesDeclared, this method parse schema in details.
-   *
-   * @param schema       : json schema.
-   * @param names        : names map.
-   * @param currentSpace : current working name space.
-   * @return complete schema.
-   */
-  static Schema parseCompleteSchema(JsonNode schema, Names names, String currentSpace) {
-    if (schema == null) {
-      throw new SchemaParseException("Cannot parse <null> schema");
-    }
-    if (schema.isTextual()) {
-      String type = schema.asText();
-      Schema avroSchema = names.get(type);
-      if (avroSchema == null) {
-        avroSchema = names.get(currentSpace + "." + type);
-      }
-      return avroSchema;
-    }
-    if (schema.isArray()) {
-      List<Schema> schemas = StreamSupport.stream(schema.spliterator(), false)
-          .map((JsonNode sub) -> parseCompleteSchema(sub, names, currentSpace)).collect(Collectors.toList());
-      return Schema.createUnion(schemas);
-    }
-    if (schema.isObject()) {
-      Schema result = null;
-      String type = getRequiredText(schema, "type", "No type");
-      Name name = null;
-
-      final boolean isTypeError = "error".equals(type);
-      final boolean isTypeRecord = "record".equals(type);
-      final boolean isTypeArray = "array".equals(type);
-
-      if (isTypeRecord || isTypeError || "enum".equals(type) || "fixed".equals(type)) {
-        // named schema
-        String space = getOptionalText(schema, "namespace");
-
-        if (space == null)
-          space = currentSpace;
-        name = new Name(getRequiredText(schema, "name", "No name in schema"), space);
-
-        result = names.get(name);
-        if (result == null) {
-          throw new SchemaParseException("Unparsed field type " + name);
-        }
-      }
-      if (isTypeRecord || isTypeError) {
-        if (result != null && !result.hasFields()) {
-          final List<Field> fields = new ArrayList<>();
-          JsonNode fieldsNode = schema.get("fields");
-          if (fieldsNode == null || !fieldsNode.isArray())
-            throw new SchemaParseException("Record has no fields: " + schema);
-
-          for (JsonNode field : fieldsNode) {
-            Field f = Field.parse(field, names, name.space);
-
-            fields.add(f);
-            if (f.schema.getLogicalType() == null && getOptionalText(field, LOGICAL_TYPE_PROP) != null)
-              LOG.warn(
-                  "Ignored the {}.{}.logicalType property (\"{}\"). It should probably be nested inside the \"type\" for the field.",
-                  name, f.name, getOptionalText(field, "logicalType"));
-          }
-          result.setFields(fields);
-        }
-      } else if (isTypeArray) {
-        JsonNode items = schema.get("items");
-        Schema schemaItems = parseCompleteSchema(items, names, currentSpace);
-        result = Schema.createArray(schemaItems);
-      } else if ("map".equals(type)) {
-        JsonNode values = schema.get("values");
-        Schema mapItems = parseCompleteSchema(values, names, currentSpace);
-        result = Schema.createMap(mapItems);
-      } else if (result == null) {
-        result = names.get(currentSpace + "." + type);
-        if (result == null) {
-          result = names.get(type);
-        }
+        context.put(result);
+      } else { // For unions with self reference
+        return context.find(type, currentNameSpace);
       }
+      Iterator<String> i = schema.fieldNames();
 
       Set<String> reserved = SCHEMA_RESERVED;
-      if ("enum".equals(type)) {
+      if (isTypeEnum) {
         reserved = ENUM_RESERVED;
       }
-      Schema.addProperties(schema, reserved, result);
-      return result;
-    }
-    return null;
-  }
-
-  static Schema parse(JsonNode schema, Names names) {
-    if (schema == null) {
-      throw new SchemaParseException("Cannot parse <null> schema");
-    }
-
-    Schema result = Schema.parseNamesDeclared(schema, names, names.space);
-    Schema.parseCompleteSchema(schema, names, names.space);
-
-    return result;
-  }
-
-  static Schema resolveSchema(JsonNode schema, Names names, String currentNameSpace) {
-    String np = currentNameSpace;
-    String nodeName = getOptionalText(schema, "name");
-    if (nodeName != null) {
-      final JsonNode nameSpace = schema.get("namespace");
-      StringBuilder fullName = new StringBuilder();
-      if (nameSpace != null && nameSpace.isTextual()) {
-        fullName.append(nameSpace.asText()).append(".");
-        np = nameSpace.asText();
+      while (i.hasNext()) { // add properties

Review Comment:
   does iterator forEachRemaining method  would be simpler or not ?



##########
lang/java/avro/src/main/java/org/apache/avro/Schema.java:
##########
@@ -125,20 +125,20 @@ private Object readResolve() {
     FACTORY.setCodec(MAPPER);
   }
 
-  /** The type of a schema. */
+  /** The type of schema. */
   public enum Type {
     RECORD, ENUM, ARRAY, MAP, UNION, FIXED, STRING, BYTES, INT, LONG, FLOAT, DOUBLE, BOOLEAN, NULL;
 
     private final String name;
 
-    private Type() {
+    Type() {

Review Comment:
   why not keep it private



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.

Fokko commented on code in PR #2642:
URL: https://github.com/apache/avro/pull/2642#discussion_r1549641360


##########
lang/java/avro/src/main/java/org/apache/avro/ParseContext.java:
##########
@@ -18,25 +18,36 @@
 package org.apache.avro;

Review Comment:
   This is a new file, so it is okay to break the APIs here



##########
lang/java/avro/src/main/java/org/apache/avro/Schema.java:
##########
@@ -80,7 +78,7 @@
  * <li><i>null</i>.
  * </ul>
  *
- * A schema can be constructed using one of its static <tt>createXXX</tt>
+ * You can construct a schema using one of its static <tt>createXXX</tt>

Review Comment:
   ```suggestion
    * Construct a schema using one of its static <tt>createXXX</tt>
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "opwvhk (via GitHub)" <gi...@apache.org>.

opwvhk commented on code in PR #2642:
URL: https://github.com/apache/avro/pull/2642#discussion_r1504415743


##########
lang/java/avro/src/main/java/org/apache/avro/util/SchemaResolver.java:
##########
@@ -111,67 +105,6 @@ public static boolean isFullyResolvedSchema(final Schema schema) {
     }
   }
 
-  /**
-   * Clone the provided schema while resolving all unreferenced schemas.
-   *
-   * @param parseContext the parse context with known names
-   * @param schema       the schema to resolve
-   * @return a copy of the schema with all schemas resolved
-   */
-  public static Schema resolve(final ParseContext parseContext, Schema schema) {

Review Comment:
   And... gone.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "jbonofre (via GitHub)" <gi...@apache.org>.

jbonofre commented on PR #2642:
URL: https://github.com/apache/avro/pull/2642#issuecomment-2098843291

   @g1rjeevan I'm working on both 1.11.4 and 1.12.0 releases preparation. I need to merge one pending change. I hope to submit the release to vote next week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "github-advanced-security[bot] (via GitHub)" <gi...@apache.org>.

github-advanced-security[bot] commented on code in PR #2642:
URL: https://github.com/apache/avro/pull/2642#discussion_r1431314277


##########
lang/java/avro/src/main/java/org/apache/avro/JsonSchemaParser.java:
##########
@@ -57,26 +57,34 @@
     for (String fragment : fragments) {
       buffer.append(fragment);
     }
-    return new JsonSchemaParser().parse(new ParseContext(NameValidator.NO_VALIDATION), buffer, null);
+
+    boolean saved = Schema.getValidateDefaults();
+    try {
+      ParseContext context = new ParseContext(NameValidator.NO_VALIDATION);
+
+      Schema schema = new JsonSchemaParser().parse(context, buffer, true);
+      context.commit();
+      Schema.setValidateDefaults(false);

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Schema.setValidateDefaults](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3148)



##########
lang/java/avro/src/main/java/org/apache/avro/Schema.java:
##########
@@ -1867,47 +1782,70 @@
     }
   }
 
-  /**
-   * Parse named schema in order to fill names map. This method does not parse
-   * field of record/error schema.
-   *
-   * @param schema           : json schema representation.
-   * @param names            : map of named schema.
-   * @param currentNameSpace : current working name space.
-   * @return schema.
-   */
-  static Schema parseNamesDeclared(JsonNode schema, Names names, String currentNameSpace) {
+  /** @see #parse(String) */
+  static Schema parse(JsonNode schema, ParseContext context, String currentNameSpace) {
     if (schema == null) {
-      return null;
+      throw new SchemaParseException("Cannot parse <null> schema");
     }
-    if (schema.isObject()) {
-
-      String type = Schema.getOptionalText(schema, "type");
+    if (schema.isTextual()) { // name
+      return context.find(schema.textValue(), currentNameSpace);
+    } else if (schema.isObject()) {
+      Schema result;
+      String type = getRequiredText(schema, "type", "No type");
       Name name = null;
-
+      String space = null;
       String doc = null;
-      Schema result = null;
       final boolean isTypeError = "error".equals(type);
       final boolean isTypeRecord = "record".equals(type);
       final boolean isTypeEnum = "enum".equals(type);
       final boolean isTypeFixed = "fixed".equals(type);
-
       if (isTypeRecord || isTypeError || isTypeEnum || isTypeFixed) {
-        String space = getOptionalText(schema, "namespace");
+        space = getOptionalText(schema, "namespace");
         doc = getOptionalText(schema, "doc");
         if (space == null)
           space = currentNameSpace;
         name = new Name(getRequiredText(schema, "name", "No name in schema"), space);
       }
-      if (isTypeRecord || isTypeError) { // record
+      if (PRIMITIVES.containsKey(type)) { // primitive
+        result = create(PRIMITIVES.get(type));
+      } else if (isTypeRecord || isTypeError) { // record
+        List<Field> fields = new ArrayList<>();
         result = new RecordSchema(name, doc, isTypeError);
-        names.add(result);
+        context.put(result);
         JsonNode fieldsNode = schema.get("fields");
-
         if (fieldsNode == null || !fieldsNode.isArray())
           throw new SchemaParseException("Record has no fields: " + schema);
-        exploreFields(fieldsNode, names, name != null ? name.space : null);
-
+        for (JsonNode field : fieldsNode) {
+          String fieldName = getRequiredText(field, "name", "No field name");
+          String fieldDoc = getOptionalText(field, "doc");
+          JsonNode fieldTypeNode = field.get("type");
+          if (fieldTypeNode == null)
+            throw new SchemaParseException("No field type: " + field);
+          Schema fieldSchema = parse(fieldTypeNode, context, name.space);
+          Field.Order order = Field.Order.ASCENDING;
+          JsonNode orderNode = field.get("order");
+          if (orderNode != null)
+            order = Field.Order.valueOf(orderNode.textValue().toUpperCase(Locale.ENGLISH));
+          JsonNode defaultValue = field.get("default");
+          if (defaultValue != null
+              && (Type.FLOAT.equals(fieldSchema.getType()) || Type.DOUBLE.equals(fieldSchema.getType()))
+              && defaultValue.isTextual())
+            defaultValue = new DoubleNode(Double.parseDouble(defaultValue.textValue()));

Review Comment:
   ## Missing catch of NumberFormatException
   
   Potential uncaught 'java.lang.NumberFormatException'.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/2965)



##########
lang/java/avro/src/main/java/org/apache/avro/ParseContext.java:
##########
@@ -220,6 +232,77 @@
     newSchemas.clear();
   }
 
+  /**
+   * Resolve all (named) schemas that were parsed. This resolves all forward
+   * references, even if parsed from different files.
+   *
+   * @return all parsed schemas, in the order they were parsed
+   * @throws AvroTypeException if a reference cannot be resolved
+   */
+  public List<Schema> resolveAllTypes() {
+    if (hasNewSchemas()) {
+      throw new IllegalStateException("Types cannot be resolved unless the ParseContext is committed.");
+    }
+
+    if (!isResolved) {
+      NameValidator saved = Schema.getNameValidator();
+      try {
+        Schema.setNameValidator(nameValidator); // Ensure we use the same validation.
+        HashMap<String, Schema> result = new LinkedHashMap<>(oldSchemas);
+        SchemaResolver.ResolvingVisitor visitor = new SchemaResolver.ResolvingVisitor(null, result::get, false);
+        Function<Schema, Schema> resolver = schema -> Schemas.visit(schema, visitor.withRoot(schema));
+        for (Map.Entry<String, Schema> entry : result.entrySet()) {
+          entry.setValue(resolver.apply(entry.getValue()));
+        }
+        oldSchemas.putAll(result);
+        isResolved = true;
+      } finally {
+        Schema.setNameValidator(saved);
+      }
+    }
+
+    return new ArrayList<>(oldSchemas.values());
+  }
+
+  /**
+   * Try to resolve unresolved references in a schema using the types known to
+   * this context. It is advisable to call {@link #resolveAllTypes()} first if you
+   * want the returned types to be stable.
+   *
+   * @param schema the schema resolve
+   * @return the fully resolved schema if possible, {@code null} otherwise
+   */
+  public Schema tryResolve(Schema schema) {
+    if (schema == null) {
+      return null;
+    }
+    return resolve(schema, true);
+  }
+
+  /**
+   * Resolve unresolved references in a schema using the types known to this
+   * context. It is advisable to call {@link #resolveAllTypes()} first if you want
+   * the returned types to be stable.
+   *
+   * @param schema the schema resolve
+   * @return the fully resolved schema
+   * @throws AvroTypeException if the schema cannot be resolved
+   */
+  public Schema resolve(Schema schema) {
+    return resolve(schema, false);
+  }
+
+  public Schema resolve(Schema schema, boolean returnNullUponFailure) {
+    NameValidator saved = Schema.getNameValidator();
+    try {
+      Schema.setNameValidator(nameValidator); // Ensure we use the same validation.

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Schema.setNameValidator](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3154)



##########
lang/java/avro/src/main/java/org/apache/avro/Schema.java:
##########
@@ -1867,47 +1782,70 @@
     }
   }
 
-  /**
-   * Parse named schema in order to fill names map. This method does not parse
-   * field of record/error schema.
-   *
-   * @param schema           : json schema representation.
-   * @param names            : map of named schema.
-   * @param currentNameSpace : current working name space.
-   * @return schema.
-   */
-  static Schema parseNamesDeclared(JsonNode schema, Names names, String currentNameSpace) {
+  /** @see #parse(String) */
+  static Schema parse(JsonNode schema, ParseContext context, String currentNameSpace) {
     if (schema == null) {
-      return null;
+      throw new SchemaParseException("Cannot parse <null> schema");
     }
-    if (schema.isObject()) {
-
-      String type = Schema.getOptionalText(schema, "type");
+    if (schema.isTextual()) { // name
+      return context.find(schema.textValue(), currentNameSpace);
+    } else if (schema.isObject()) {
+      Schema result;
+      String type = getRequiredText(schema, "type", "No type");
       Name name = null;
-
+      String space = null;
       String doc = null;
-      Schema result = null;
       final boolean isTypeError = "error".equals(type);
       final boolean isTypeRecord = "record".equals(type);
       final boolean isTypeEnum = "enum".equals(type);
       final boolean isTypeFixed = "fixed".equals(type);
-
       if (isTypeRecord || isTypeError || isTypeEnum || isTypeFixed) {
-        String space = getOptionalText(schema, "namespace");
+        space = getOptionalText(schema, "namespace");
         doc = getOptionalText(schema, "doc");
         if (space == null)
           space = currentNameSpace;
         name = new Name(getRequiredText(schema, "name", "No name in schema"), space);
       }
-      if (isTypeRecord || isTypeError) { // record
+      if (PRIMITIVES.containsKey(type)) { // primitive
+        result = create(PRIMITIVES.get(type));
+      } else if (isTypeRecord || isTypeError) { // record
+        List<Field> fields = new ArrayList<>();
         result = new RecordSchema(name, doc, isTypeError);
-        names.add(result);
+        context.put(result);
         JsonNode fieldsNode = schema.get("fields");
-
         if (fieldsNode == null || !fieldsNode.isArray())
           throw new SchemaParseException("Record has no fields: " + schema);
-        exploreFields(fieldsNode, names, name != null ? name.space : null);
-
+        for (JsonNode field : fieldsNode) {
+          String fieldName = getRequiredText(field, "name", "No field name");
+          String fieldDoc = getOptionalText(field, "doc");
+          JsonNode fieldTypeNode = field.get("type");
+          if (fieldTypeNode == null)
+            throw new SchemaParseException("No field type: " + field);
+          Schema fieldSchema = parse(fieldTypeNode, context, name.space);

Review Comment:
   ## Dereferenced variable may be null
   
   Variable [name](1) may be null at this access because of [this](2) assignment.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3157)



##########
lang/java/avro/src/main/java/org/apache/avro/JsonSchemaParser.java:
##########
@@ -57,26 +57,34 @@
     for (String fragment : fragments) {
       buffer.append(fragment);
     }
-    return new JsonSchemaParser().parse(new ParseContext(NameValidator.NO_VALIDATION), buffer, null);
+
+    boolean saved = Schema.getValidateDefaults();

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Schema.getValidateDefaults](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3147)



##########
lang/java/avro/src/main/java/org/apache/avro/ParseContext.java:
##########
@@ -220,6 +232,77 @@
     newSchemas.clear();
   }
 
+  /**
+   * Resolve all (named) schemas that were parsed. This resolves all forward
+   * references, even if parsed from different files.
+   *
+   * @return all parsed schemas, in the order they were parsed
+   * @throws AvroTypeException if a reference cannot be resolved
+   */
+  public List<Schema> resolveAllTypes() {
+    if (hasNewSchemas()) {
+      throw new IllegalStateException("Types cannot be resolved unless the ParseContext is committed.");
+    }
+
+    if (!isResolved) {
+      NameValidator saved = Schema.getNameValidator();

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Schema.getNameValidator](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3150)



##########
lang/java/avro/src/main/java/org/apache/avro/ParseContext.java:
##########
@@ -220,6 +232,77 @@
     newSchemas.clear();
   }
 
+  /**
+   * Resolve all (named) schemas that were parsed. This resolves all forward
+   * references, even if parsed from different files.
+   *
+   * @return all parsed schemas, in the order they were parsed
+   * @throws AvroTypeException if a reference cannot be resolved
+   */
+  public List<Schema> resolveAllTypes() {
+    if (hasNewSchemas()) {
+      throw new IllegalStateException("Types cannot be resolved unless the ParseContext is committed.");
+    }
+
+    if (!isResolved) {
+      NameValidator saved = Schema.getNameValidator();
+      try {
+        Schema.setNameValidator(nameValidator); // Ensure we use the same validation.
+        HashMap<String, Schema> result = new LinkedHashMap<>(oldSchemas);
+        SchemaResolver.ResolvingVisitor visitor = new SchemaResolver.ResolvingVisitor(null, result::get, false);
+        Function<Schema, Schema> resolver = schema -> Schemas.visit(schema, visitor.withRoot(schema));
+        for (Map.Entry<String, Schema> entry : result.entrySet()) {
+          entry.setValue(resolver.apply(entry.getValue()));
+        }
+        oldSchemas.putAll(result);
+        isResolved = true;
+      } finally {
+        Schema.setNameValidator(saved);

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Schema.setNameValidator](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3152)



##########
lang/java/avro/src/main/java/org/apache/avro/JsonSchemaParser.java:
##########
@@ -57,26 +57,34 @@
     for (String fragment : fragments) {
       buffer.append(fragment);
     }
-    return new JsonSchemaParser().parse(new ParseContext(NameValidator.NO_VALIDATION), buffer, null);
+
+    boolean saved = Schema.getValidateDefaults();
+    try {
+      ParseContext context = new ParseContext(NameValidator.NO_VALIDATION);
+
+      Schema schema = new JsonSchemaParser().parse(context, buffer, true);
+      context.commit();
+      Schema.setValidateDefaults(false);
+      context.resolveAllTypes();
+      return context.resolve(schema);
+    } finally {
+      // Unless explicitly disabled when needed, defaults should always be validated.
+      Schema.setValidateDefaults(saved);

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Schema.setValidateDefaults](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3149)



##########
lang/java/avro/src/main/java/org/apache/avro/ParseContext.java:
##########
@@ -220,6 +232,77 @@
     newSchemas.clear();
   }
 
+  /**
+   * Resolve all (named) schemas that were parsed. This resolves all forward
+   * references, even if parsed from different files.
+   *
+   * @return all parsed schemas, in the order they were parsed
+   * @throws AvroTypeException if a reference cannot be resolved
+   */
+  public List<Schema> resolveAllTypes() {
+    if (hasNewSchemas()) {
+      throw new IllegalStateException("Types cannot be resolved unless the ParseContext is committed.");
+    }
+
+    if (!isResolved) {
+      NameValidator saved = Schema.getNameValidator();
+      try {
+        Schema.setNameValidator(nameValidator); // Ensure we use the same validation.

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Schema.setNameValidator](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3151)



##########
lang/java/avro/src/main/java/org/apache/avro/ParseContext.java:
##########
@@ -220,6 +232,77 @@
     newSchemas.clear();
   }
 
+  /**
+   * Resolve all (named) schemas that were parsed. This resolves all forward
+   * references, even if parsed from different files.
+   *
+   * @return all parsed schemas, in the order they were parsed
+   * @throws AvroTypeException if a reference cannot be resolved
+   */
+  public List<Schema> resolveAllTypes() {
+    if (hasNewSchemas()) {
+      throw new IllegalStateException("Types cannot be resolved unless the ParseContext is committed.");
+    }
+
+    if (!isResolved) {
+      NameValidator saved = Schema.getNameValidator();
+      try {
+        Schema.setNameValidator(nameValidator); // Ensure we use the same validation.
+        HashMap<String, Schema> result = new LinkedHashMap<>(oldSchemas);
+        SchemaResolver.ResolvingVisitor visitor = new SchemaResolver.ResolvingVisitor(null, result::get, false);
+        Function<Schema, Schema> resolver = schema -> Schemas.visit(schema, visitor.withRoot(schema));
+        for (Map.Entry<String, Schema> entry : result.entrySet()) {
+          entry.setValue(resolver.apply(entry.getValue()));
+        }
+        oldSchemas.putAll(result);
+        isResolved = true;
+      } finally {
+        Schema.setNameValidator(saved);
+      }
+    }
+
+    return new ArrayList<>(oldSchemas.values());
+  }
+
+  /**
+   * Try to resolve unresolved references in a schema using the types known to
+   * this context. It is advisable to call {@link #resolveAllTypes()} first if you
+   * want the returned types to be stable.
+   *
+   * @param schema the schema resolve
+   * @return the fully resolved schema if possible, {@code null} otherwise
+   */
+  public Schema tryResolve(Schema schema) {
+    if (schema == null) {
+      return null;
+    }
+    return resolve(schema, true);
+  }
+
+  /**
+   * Resolve unresolved references in a schema using the types known to this
+   * context. It is advisable to call {@link #resolveAllTypes()} first if you want
+   * the returned types to be stable.
+   *
+   * @param schema the schema resolve
+   * @return the fully resolved schema
+   * @throws AvroTypeException if the schema cannot be resolved
+   */
+  public Schema resolve(Schema schema) {
+    return resolve(schema, false);
+  }
+
+  public Schema resolve(Schema schema, boolean returnNullUponFailure) {
+    NameValidator saved = Schema.getNameValidator();

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Schema.getNameValidator](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3153)



##########
lang/java/avro/src/main/java/org/apache/avro/Schema.java:
##########
@@ -1488,7 +1399,8 @@
    * may refer to it by name.
    */
   public static class Parser {
-    private Names names = new Names();
+    private final Names names = new Names();

Review Comment:
   ## Container contents are never accessed
   
   The contents of this container are never accessed.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3158)



##########
lang/java/avro/src/main/java/org/apache/avro/ParseContext.java:
##########
@@ -220,6 +232,77 @@
     newSchemas.clear();
   }
 
+  /**
+   * Resolve all (named) schemas that were parsed. This resolves all forward
+   * references, even if parsed from different files.
+   *
+   * @return all parsed schemas, in the order they were parsed
+   * @throws AvroTypeException if a reference cannot be resolved
+   */
+  public List<Schema> resolveAllTypes() {
+    if (hasNewSchemas()) {
+      throw new IllegalStateException("Types cannot be resolved unless the ParseContext is committed.");
+    }
+
+    if (!isResolved) {
+      NameValidator saved = Schema.getNameValidator();
+      try {
+        Schema.setNameValidator(nameValidator); // Ensure we use the same validation.
+        HashMap<String, Schema> result = new LinkedHashMap<>(oldSchemas);
+        SchemaResolver.ResolvingVisitor visitor = new SchemaResolver.ResolvingVisitor(null, result::get, false);
+        Function<Schema, Schema> resolver = schema -> Schemas.visit(schema, visitor.withRoot(schema));
+        for (Map.Entry<String, Schema> entry : result.entrySet()) {
+          entry.setValue(resolver.apply(entry.getValue()));
+        }
+        oldSchemas.putAll(result);
+        isResolved = true;
+      } finally {
+        Schema.setNameValidator(saved);
+      }
+    }
+
+    return new ArrayList<>(oldSchemas.values());
+  }
+
+  /**
+   * Try to resolve unresolved references in a schema using the types known to
+   * this context. It is advisable to call {@link #resolveAllTypes()} first if you
+   * want the returned types to be stable.
+   *
+   * @param schema the schema resolve
+   * @return the fully resolved schema if possible, {@code null} otherwise
+   */
+  public Schema tryResolve(Schema schema) {
+    if (schema == null) {
+      return null;
+    }
+    return resolve(schema, true);
+  }
+
+  /**
+   * Resolve unresolved references in a schema using the types known to this
+   * context. It is advisable to call {@link #resolveAllTypes()} first if you want
+   * the returned types to be stable.
+   *
+   * @param schema the schema resolve
+   * @return the fully resolved schema
+   * @throws AvroTypeException if the schema cannot be resolved
+   */
+  public Schema resolve(Schema schema) {
+    return resolve(schema, false);
+  }
+
+  public Schema resolve(Schema schema, boolean returnNullUponFailure) {
+    NameValidator saved = Schema.getNameValidator();
+    try {
+      Schema.setNameValidator(nameValidator); // Ensure we use the same validation.
+      return Schemas.visit(schema,
+          new SchemaResolver.ResolvingVisitor(schema, this::getNamedSchema, returnNullUponFailure));
+    } finally {
+      Schema.setNameValidator(saved);

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Schema.setNameValidator](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3155)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.

Fokko commented on code in PR #2642:
URL: https://github.com/apache/avro/pull/2642#discussion_r1480403168


##########
lang/java/avro/src/main/java/org/apache/avro/util/SchemaResolver.java:
##########
@@ -111,67 +105,6 @@ public static boolean isFullyResolvedSchema(final Schema schema) {
     }
   }
 
-  /**
-   * Clone the provided schema while resolving all unreferenced schemas.
-   *
-   * @param parseContext the parse context with known names
-   * @param schema       the schema to resolve
-   * @return a copy of the schema with all schemas resolved
-   */
-  public static Schema resolve(final ParseContext parseContext, Schema schema) {

Review Comment:
   I think we should leverage jmp to check the API changes. I'm happy to add this as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "g1rjeevan (via GitHub)" <gi...@apache.org>.

g1rjeevan commented on PR #2642:
URL: https://github.com/apache/avro/pull/2642#issuecomment-2047471513

   Hi Team, 
   
   Did 1.12.0 or 1.11.4 got released ? I see CVEs fixes related commons-compress 1.26.0 are already in place. Seems to be release is pending. If not released, what would be the ETA ? cc: @Fokko 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "opwvhk (via GitHub)" <gi...@apache.org>.

opwvhk commented on code in PR #2642:
URL: https://github.com/apache/avro/pull/2642#discussion_r1459258370


##########
lang/java/avro/src/main/java/org/apache/avro/ParseContext.java:
##########
@@ -220,6 +232,77 @@ public void rollback() {
     newSchemas.clear();
   }
 
+  /**
+   * Resolve all (named) schemas that were parsed. This resolves all forward
+   * references, even if parsed from different files.
+   *
+   * @return all parsed schemas, in the order they were parsed
+   * @throws AvroTypeException if a reference cannot be resolved
+   */
+  public List<Schema> resolveAllTypes() {
+    if (hasNewSchemas()) {
+      throw new IllegalStateException("Types cannot be resolved unless the ParseContext is committed.");
+    }
+
+    if (!isResolved) {
+      NameValidator saved = Schema.getNameValidator();
+      try {
+        Schema.setNameValidator(nameValidator); // Ensure we use the same validation.
+        HashMap<String, Schema> result = new LinkedHashMap<>(oldSchemas);

Review Comment:
   This is one of the things I'm not sure about: how to provide the parsed schemas. Ideally, a this would be a `Set<Schema>` (meaning a `HashMap` would suffice here).
   However, current code uses a `List<Schema>`, and this is generally the more prevalent collection type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "github-advanced-security[bot] (via GitHub)" <gi...@apache.org>.

github-advanced-security[bot] commented on code in PR #2642:
URL: https://github.com/apache/avro/pull/2642#discussion_r1463492814


##########
lang/java/avro/src/main/java/org/apache/avro/Schema.java:
##########
@@ -1867,267 +1780,187 @@
     }
   }
 
-  /**
-   * Parse named schema in order to fill names map. This method does not parse
-   * field of record/error schema.
-   *
-   * @param schema           : json schema representation.
-   * @param names            : map of named schema.
-   * @param currentNameSpace : current working name space.
-   * @return schema.
-   */
-  static Schema parseNamesDeclared(JsonNode schema, Names names, String currentNameSpace) {
+  /** @see #parse(String) */
+  static Schema parse(JsonNode schema, ParseContext context, String currentNameSpace) {
     if (schema == null) {
-      return null;
-    }
-    if (schema.isObject()) {
-
-      String type = Schema.getOptionalText(schema, "type");
-      Name name = null;
-
-      String doc = null;
-      Schema result = null;
+      throw new SchemaParseException("Cannot parse <null> schema");
+    } else if (schema.isTextual()) { // name
+      return context.find(schema.textValue(), currentNameSpace);
+    } else if (schema.isObject()) {
+      String type = getRequiredText(schema, "type", "No type");
       final boolean isTypeError = "error".equals(type);
-      final boolean isTypeRecord = "record".equals(type);
-      final boolean isTypeEnum = "enum".equals(type);
-      final boolean isTypeFixed = "fixed".equals(type);
-
-      if (isTypeRecord || isTypeError || isTypeEnum || isTypeFixed) {
-        String space = getOptionalText(schema, "namespace");
-        doc = getOptionalText(schema, "doc");
-        if (space == null)
-          space = currentNameSpace;
-        name = new Name(getRequiredText(schema, "name", "No name in schema"), space);
-      }
-      if (isTypeRecord || isTypeError) { // record
-        result = new RecordSchema(name, doc, isTypeError);
-        names.add(result);
-        JsonNode fieldsNode = schema.get("fields");
-
-        if (fieldsNode == null || !fieldsNode.isArray())
-          throw new SchemaParseException("Record has no fields: " + schema);
-        exploreFields(fieldsNode, names, name != null ? name.space : null);
-
-      } else if (isTypeEnum) { // enum
-        JsonNode symbolsNode = schema.get("symbols");
-        if (symbolsNode == null || !symbolsNode.isArray())
-          throw new SchemaParseException("Enum has no symbols: " + schema);
-        LockableArrayList<String> symbols = new LockableArrayList<>(symbolsNode.size());
-        for (JsonNode n : symbolsNode)
-          symbols.add(n.textValue());
-        JsonNode enumDefault = schema.get("default");
-        String defaultSymbol = null;
-        if (enumDefault != null)
-          defaultSymbol = enumDefault.textValue();
-        result = new EnumSchema(name, doc, symbols, defaultSymbol);
-        names.add(result);
+      if (PRIMITIVES.containsKey(type)) { // primitive
+        return parsePrimitive(schema, type);
+      } else if ("record".equals(type) || isTypeError) { // record
+        return parseRecord(schema, context, currentNameSpace, isTypeError);
+      } else if ("enum".equals(type)) { // enum
+        return parseEnum(schema, context, currentNameSpace);
       } else if (type.equals("array")) { // array
-        JsonNode itemsNode = schema.get("items");
-        if (itemsNode == null)
-          throw new SchemaParseException("Array has no items type: " + schema);
-        final Schema items = Schema.parseNamesDeclared(itemsNode, names, currentNameSpace);
-        result = Schema.createArray(items);
+        return parseArray(schema, context, currentNameSpace);
       } else if (type.equals("map")) { // map
-        JsonNode valuesNode = schema.get("values");
-        if (valuesNode == null)
-          throw new SchemaParseException("Map has no values type: " + schema);
-        final Schema values = Schema.parseNamesDeclared(valuesNode, names, currentNameSpace);
-        result = Schema.createMap(values);
-      } else if (isTypeFixed) { // fixed
-        JsonNode sizeNode = schema.get("size");
-        if (sizeNode == null || !sizeNode.isInt())
-          throw new SchemaParseException("Invalid or no size: " + schema);
-        result = new FixedSchema(name, doc, sizeNode.intValue());
-        if (name != null)
-          names.add(result);
-      } else if (PRIMITIVES.containsKey(type)) {
-        result = Schema.create(PRIMITIVES.get(type));
+        return parseMap(schema, context, currentNameSpace);
+      } else if ("fixed".equals(type)) { // fixed
+        return parseFixed(schema, context, currentNameSpace);
+      } else { // For unions with self reference
+        return context.find(type, currentNameSpace);
       }
-      if (result != null) {
-        Set<String> reserved = SCHEMA_RESERVED;
-        if (isTypeEnum) {
-          reserved = ENUM_RESERVED;
-        }
-        Schema.addProperties(schema, reserved, result);
-      }
-      return result;
-    } else if (schema.isArray()) {
-      List<Schema> subs = new ArrayList<>(schema.size());
-      schema.forEach((JsonNode item) -> {
-        Schema sub = Schema.parseNamesDeclared(item, names, currentNameSpace);
-        if (sub != null) {
-          subs.add(sub);
-        }
-      });
-      return Schema.createUnion(subs);
-    } else if (schema.isTextual()) {
-      String value = schema.asText();
-      return names.get(value);
+    } else if (schema.isArray()) { // union
+      return parseUnion(schema, context, currentNameSpace);
+    } else {
+      throw new SchemaParseException("Schema not yet supported: " + schema);
     }
-    return null;
   }
 
-  private static void addProperties(JsonNode schema, Set<String> reserved, Schema avroSchema) {
-    Iterator<String> i = schema.fieldNames();
-    while (i.hasNext()) { // add properties
-      String prop = i.next();
-      if (!reserved.contains(prop)) // ignore reserved
-        avroSchema.addProp(prop, schema.get(prop));
-    }
-    // parse logical type if present
-    avroSchema.logicalType = LogicalTypes.fromSchemaIgnoreInvalid(avroSchema);
-    // names.space(savedSpace); // restore space
-    if (avroSchema instanceof NamedSchema) {
-      Set<String> aliases = parseAliases(schema);
-      if (aliases != null) // add aliases
-        for (String alias : aliases)
-          avroSchema.addAlias(alias);
-    }
+  private static Schema parsePrimitive(JsonNode schema, String type) {
+    Schema result = create(PRIMITIVES.get(type));
+    parseProperties(schema, result, SCHEMA_RESERVED);
+    return result;
   }
 
-  /**
-   * Explore record fields in order to fill names map with inner defined named
-   * types.
-   *
-   * @param fieldsNode : json node for field.
-   * @param names      : names map.
-   * @param nameSpace  : current working namespace.
-   */
-  private static void exploreFields(JsonNode fieldsNode, Names names, String nameSpace) {
+  private static Schema parseRecord(JsonNode schema, ParseContext context, String currentNameSpace,
+      boolean isTypeError) {
+    Name name = parseName(schema, currentNameSpace);
+    String doc = parseDoc(schema);
+    Schema result = new RecordSchema(name, doc, isTypeError);
+    context.put(result);
+
+    JsonNode fieldsNode = schema.get("fields");
+    if (fieldsNode == null || !fieldsNode.isArray())
+      throw new SchemaParseException("Record has no fields: " + schema);
+    List<Field> fields = new ArrayList<>();
     for (JsonNode field : fieldsNode) {
-      final JsonNode fieldType = field.get("type");
-      if (fieldType != null) {
-        if (fieldType.isObject()) {
-          parseNamesDeclared(fieldType, names, nameSpace);
-        } else if (fieldType.isArray()) {
-          exploreFields(fieldType, names, nameSpace);
-        } else if (fieldType.isTextual() && field.isObject()) {
-          parseNamesDeclared(field, names, nameSpace);
-        }
-      }
+      Field f = parseField(field, context, name.space);
+      fields.add(f);
+      if (f.schema().getLogicalType() == null && getOptionalText(field, LOGICAL_TYPE_PROP) != null)
+        LOG.warn(
+            "Ignored the {}.{}.logicalType property (\"{}\"). It should probably be nested inside the \"type\" for the field.",
+            name, f.name(), getOptionalText(field, "logicalType"));
     }
+    result.setFields(fields);
+    parseProperties(schema, result, SCHEMA_RESERVED);
+    parseAliases(schema, result);
+    return result;
   }
 
-  /**
-   * in complement of parseNamesDeclared, this method parse schema in details.
-   *
-   * @param schema       : json schema.
-   * @param names        : names map.
-   * @param currentSpace : current working name space.
-   * @return complete schema.
-   */
-  static Schema parseCompleteSchema(JsonNode schema, Names names, String currentSpace) {
-    if (schema == null) {
-      throw new SchemaParseException("Cannot parse <null> schema");
-    }
-    if (schema.isTextual()) {
-      String type = schema.asText();
-      Schema avroSchema = names.get(type);
-      if (avroSchema == null) {
-        avroSchema = names.get(currentSpace + "." + type);
-      }
-      return avroSchema;
-    }
-    if (schema.isArray()) {
-      List<Schema> schemas = StreamSupport.stream(schema.spliterator(), false)
-          .map((JsonNode sub) -> parseCompleteSchema(sub, names, currentSpace)).collect(Collectors.toList());
-      return Schema.createUnion(schemas);
-    }
-    if (schema.isObject()) {
-      Schema result = null;
-      String type = getRequiredText(schema, "type", "No type");
-      Name name = null;
+  private static Field parseField(JsonNode field, ParseContext context, String namespace) {
+    String fieldName = getRequiredText(field, "name", "No field name");
+    String fieldDoc = parseDoc(field);
+    JsonNode fieldTypeNode = field.get("type");
+    if (fieldTypeNode == null)
+      throw new SchemaParseException("No field type: " + field);
+    Schema fieldSchema = parse(fieldTypeNode, context, namespace);
 
-      final boolean isTypeError = "error".equals(type);
-      final boolean isTypeRecord = "record".equals(type);
-      final boolean isTypeArray = "array".equals(type);
+    Field.Order order = Field.Order.ASCENDING;
+    JsonNode orderNode = field.get("order");
+    if (orderNode != null)
+      order = Field.Order.valueOf(orderNode.textValue().toUpperCase(Locale.ENGLISH));
 
-      if (isTypeRecord || isTypeError || "enum".equals(type) || "fixed".equals(type)) {
-        // named schema
-        String space = getOptionalText(schema, "namespace");
+    JsonNode defaultValue = field.get("default");
+    if (defaultValue != null && (Type.FLOAT.equals(fieldSchema.getType()) || Type.DOUBLE.equals(fieldSchema.getType()))
+        && defaultValue.isTextual())
+      defaultValue = new DoubleNode(Double.parseDouble(defaultValue.textValue()));

Review Comment:
   ## Missing catch of NumberFormatException
   
   Potential uncaught 'java.lang.NumberFormatException'.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3176)



##########
lang/java/avro/src/main/java/org/apache/avro/Schema.java:
##########
@@ -1867,267 +1780,187 @@
     }
   }
 
-  /**
-   * Parse named schema in order to fill names map. This method does not parse
-   * field of record/error schema.
-   *
-   * @param schema           : json schema representation.
-   * @param names            : map of named schema.
-   * @param currentNameSpace : current working name space.
-   * @return schema.
-   */
-  static Schema parseNamesDeclared(JsonNode schema, Names names, String currentNameSpace) {
+  /** @see #parse(String) */
+  static Schema parse(JsonNode schema, ParseContext context, String currentNameSpace) {
     if (schema == null) {
-      return null;
-    }
-    if (schema.isObject()) {
-
-      String type = Schema.getOptionalText(schema, "type");
-      Name name = null;
-
-      String doc = null;
-      Schema result = null;
+      throw new SchemaParseException("Cannot parse <null> schema");
+    } else if (schema.isTextual()) { // name
+      return context.find(schema.textValue(), currentNameSpace);
+    } else if (schema.isObject()) {
+      String type = getRequiredText(schema, "type", "No type");
       final boolean isTypeError = "error".equals(type);
-      final boolean isTypeRecord = "record".equals(type);
-      final boolean isTypeEnum = "enum".equals(type);
-      final boolean isTypeFixed = "fixed".equals(type);
-
-      if (isTypeRecord || isTypeError || isTypeEnum || isTypeFixed) {
-        String space = getOptionalText(schema, "namespace");
-        doc = getOptionalText(schema, "doc");
-        if (space == null)
-          space = currentNameSpace;
-        name = new Name(getRequiredText(schema, "name", "No name in schema"), space);
-      }
-      if (isTypeRecord || isTypeError) { // record
-        result = new RecordSchema(name, doc, isTypeError);
-        names.add(result);
-        JsonNode fieldsNode = schema.get("fields");
-
-        if (fieldsNode == null || !fieldsNode.isArray())
-          throw new SchemaParseException("Record has no fields: " + schema);
-        exploreFields(fieldsNode, names, name != null ? name.space : null);
-
-      } else if (isTypeEnum) { // enum
-        JsonNode symbolsNode = schema.get("symbols");
-        if (symbolsNode == null || !symbolsNode.isArray())
-          throw new SchemaParseException("Enum has no symbols: " + schema);
-        LockableArrayList<String> symbols = new LockableArrayList<>(symbolsNode.size());
-        for (JsonNode n : symbolsNode)
-          symbols.add(n.textValue());
-        JsonNode enumDefault = schema.get("default");
-        String defaultSymbol = null;
-        if (enumDefault != null)
-          defaultSymbol = enumDefault.textValue();
-        result = new EnumSchema(name, doc, symbols, defaultSymbol);
-        names.add(result);
+      if (PRIMITIVES.containsKey(type)) { // primitive
+        return parsePrimitive(schema, type);
+      } else if ("record".equals(type) || isTypeError) { // record
+        return parseRecord(schema, context, currentNameSpace, isTypeError);
+      } else if ("enum".equals(type)) { // enum
+        return parseEnum(schema, context, currentNameSpace);
       } else if (type.equals("array")) { // array
-        JsonNode itemsNode = schema.get("items");
-        if (itemsNode == null)
-          throw new SchemaParseException("Array has no items type: " + schema);
-        final Schema items = Schema.parseNamesDeclared(itemsNode, names, currentNameSpace);
-        result = Schema.createArray(items);
+        return parseArray(schema, context, currentNameSpace);
       } else if (type.equals("map")) { // map
-        JsonNode valuesNode = schema.get("values");
-        if (valuesNode == null)
-          throw new SchemaParseException("Map has no values type: " + schema);
-        final Schema values = Schema.parseNamesDeclared(valuesNode, names, currentNameSpace);
-        result = Schema.createMap(values);
-      } else if (isTypeFixed) { // fixed
-        JsonNode sizeNode = schema.get("size");
-        if (sizeNode == null || !sizeNode.isInt())
-          throw new SchemaParseException("Invalid or no size: " + schema);
-        result = new FixedSchema(name, doc, sizeNode.intValue());
-        if (name != null)
-          names.add(result);
-      } else if (PRIMITIVES.containsKey(type)) {
-        result = Schema.create(PRIMITIVES.get(type));
+        return parseMap(schema, context, currentNameSpace);
+      } else if ("fixed".equals(type)) { // fixed
+        return parseFixed(schema, context, currentNameSpace);
+      } else { // For unions with self reference
+        return context.find(type, currentNameSpace);
       }
-      if (result != null) {
-        Set<String> reserved = SCHEMA_RESERVED;
-        if (isTypeEnum) {
-          reserved = ENUM_RESERVED;
-        }
-        Schema.addProperties(schema, reserved, result);
-      }
-      return result;
-    } else if (schema.isArray()) {
-      List<Schema> subs = new ArrayList<>(schema.size());
-      schema.forEach((JsonNode item) -> {
-        Schema sub = Schema.parseNamesDeclared(item, names, currentNameSpace);
-        if (sub != null) {
-          subs.add(sub);
-        }
-      });
-      return Schema.createUnion(subs);
-    } else if (schema.isTextual()) {
-      String value = schema.asText();
-      return names.get(value);
+    } else if (schema.isArray()) { // union
+      return parseUnion(schema, context, currentNameSpace);
+    } else {
+      throw new SchemaParseException("Schema not yet supported: " + schema);
     }
-    return null;
   }
 
-  private static void addProperties(JsonNode schema, Set<String> reserved, Schema avroSchema) {
-    Iterator<String> i = schema.fieldNames();
-    while (i.hasNext()) { // add properties
-      String prop = i.next();
-      if (!reserved.contains(prop)) // ignore reserved
-        avroSchema.addProp(prop, schema.get(prop));
-    }
-    // parse logical type if present
-    avroSchema.logicalType = LogicalTypes.fromSchemaIgnoreInvalid(avroSchema);
-    // names.space(savedSpace); // restore space
-    if (avroSchema instanceof NamedSchema) {
-      Set<String> aliases = parseAliases(schema);
-      if (aliases != null) // add aliases
-        for (String alias : aliases)
-          avroSchema.addAlias(alias);
-    }
+  private static Schema parsePrimitive(JsonNode schema, String type) {
+    Schema result = create(PRIMITIVES.get(type));
+    parseProperties(schema, result, SCHEMA_RESERVED);
+    return result;
   }
 
-  /**
-   * Explore record fields in order to fill names map with inner defined named
-   * types.
-   *
-   * @param fieldsNode : json node for field.
-   * @param names      : names map.
-   * @param nameSpace  : current working namespace.
-   */
-  private static void exploreFields(JsonNode fieldsNode, Names names, String nameSpace) {
+  private static Schema parseRecord(JsonNode schema, ParseContext context, String currentNameSpace,
+      boolean isTypeError) {
+    Name name = parseName(schema, currentNameSpace);
+    String doc = parseDoc(schema);
+    Schema result = new RecordSchema(name, doc, isTypeError);
+    context.put(result);
+
+    JsonNode fieldsNode = schema.get("fields");
+    if (fieldsNode == null || !fieldsNode.isArray())
+      throw new SchemaParseException("Record has no fields: " + schema);
+    List<Field> fields = new ArrayList<>();
     for (JsonNode field : fieldsNode) {
-      final JsonNode fieldType = field.get("type");
-      if (fieldType != null) {
-        if (fieldType.isObject()) {
-          parseNamesDeclared(fieldType, names, nameSpace);
-        } else if (fieldType.isArray()) {
-          exploreFields(fieldType, names, nameSpace);
-        } else if (fieldType.isTextual() && field.isObject()) {
-          parseNamesDeclared(field, names, nameSpace);
-        }
-      }
+      Field f = parseField(field, context, name.space);
+      fields.add(f);
+      if (f.schema().getLogicalType() == null && getOptionalText(field, LOGICAL_TYPE_PROP) != null)
+        LOG.warn(
+            "Ignored the {}.{}.logicalType property (\"{}\"). It should probably be nested inside the \"type\" for the field.",
+            name, f.name(), getOptionalText(field, "logicalType"));
     }
+    result.setFields(fields);
+    parseProperties(schema, result, SCHEMA_RESERVED);
+    parseAliases(schema, result);
+    return result;
   }
 
-  /**
-   * in complement of parseNamesDeclared, this method parse schema in details.
-   *
-   * @param schema       : json schema.
-   * @param names        : names map.
-   * @param currentSpace : current working name space.
-   * @return complete schema.
-   */
-  static Schema parseCompleteSchema(JsonNode schema, Names names, String currentSpace) {
-    if (schema == null) {
-      throw new SchemaParseException("Cannot parse <null> schema");
-    }
-    if (schema.isTextual()) {
-      String type = schema.asText();
-      Schema avroSchema = names.get(type);
-      if (avroSchema == null) {
-        avroSchema = names.get(currentSpace + "." + type);
-      }
-      return avroSchema;
-    }
-    if (schema.isArray()) {
-      List<Schema> schemas = StreamSupport.stream(schema.spliterator(), false)
-          .map((JsonNode sub) -> parseCompleteSchema(sub, names, currentSpace)).collect(Collectors.toList());
-      return Schema.createUnion(schemas);
-    }
-    if (schema.isObject()) {
-      Schema result = null;
-      String type = getRequiredText(schema, "type", "No type");
-      Name name = null;
+  private static Field parseField(JsonNode field, ParseContext context, String namespace) {
+    String fieldName = getRequiredText(field, "name", "No field name");
+    String fieldDoc = parseDoc(field);
+    JsonNode fieldTypeNode = field.get("type");
+    if (fieldTypeNode == null)
+      throw new SchemaParseException("No field type: " + field);
+    Schema fieldSchema = parse(fieldTypeNode, context, namespace);
 
-      final boolean isTypeError = "error".equals(type);
-      final boolean isTypeRecord = "record".equals(type);
-      final boolean isTypeArray = "array".equals(type);
+    Field.Order order = Field.Order.ASCENDING;
+    JsonNode orderNode = field.get("order");
+    if (orderNode != null)
+      order = Field.Order.valueOf(orderNode.textValue().toUpperCase(Locale.ENGLISH));
 
-      if (isTypeRecord || isTypeError || "enum".equals(type) || "fixed".equals(type)) {
-        // named schema
-        String space = getOptionalText(schema, "namespace");
+    JsonNode defaultValue = field.get("default");
+    if (defaultValue != null && (Type.FLOAT.equals(fieldSchema.getType()) || Type.DOUBLE.equals(fieldSchema.getType()))
+        && defaultValue.isTextual())
+      defaultValue = new DoubleNode(Double.parseDouble(defaultValue.textValue()));
 
-        if (space == null)
-          space = currentSpace;
-        name = new Name(getRequiredText(schema, "name", "No name in schema"), space);
+    Field f = new Field(fieldName, fieldSchema, fieldDoc, defaultValue, true, order);
+    parseProperties(field, f, FIELD_RESERVED);
+    f.aliases = parseAliases(field);
+    return f;
+  }
 
-        result = names.get(name);
-        if (result == null) {
-          throw new SchemaParseException("Unparsed field type " + name);
-        }
-      }
-      if (isTypeRecord || isTypeError) {
-        if (result != null && !result.hasFields()) {
-          final List<Field> fields = new ArrayList<>();
-          JsonNode fieldsNode = schema.get("fields");
-          if (fieldsNode == null || !fieldsNode.isArray())
-            throw new SchemaParseException("Record has no fields: " + schema);
-
-          for (JsonNode field : fieldsNode) {
-            Field f = Field.parse(field, names, name.space);
-
-            fields.add(f);
-            if (f.schema.getLogicalType() == null && getOptionalText(field, LOGICAL_TYPE_PROP) != null)
-              LOG.warn(
-                  "Ignored the {}.{}.logicalType property (\"{}\"). It should probably be nested inside the \"type\" for the field.",
-                  name, f.name, getOptionalText(field, "logicalType"));
-          }
-          result.setFields(fields);
-        }
-      } else if (isTypeArray) {
-        JsonNode items = schema.get("items");
-        Schema schemaItems = parseCompleteSchema(items, names, currentSpace);
-        result = Schema.createArray(schemaItems);
-      } else if ("map".equals(type)) {
-        JsonNode values = schema.get("values");
-        Schema mapItems = parseCompleteSchema(values, names, currentSpace);
-        result = Schema.createMap(mapItems);
-      } else if (result == null) {
-        result = names.get(currentSpace + "." + type);
-        if (result == null) {
-          result = names.get(type);
-        }
-      }
+  private static Schema parseEnum(JsonNode schema, ParseContext context, String currentNameSpace) {
+    Name name = parseName(schema, currentNameSpace);
+    String doc = parseDoc(schema);
 
-      Set<String> reserved = SCHEMA_RESERVED;
-      if ("enum".equals(type)) {
-        reserved = ENUM_RESERVED;
-      }
-      Schema.addProperties(schema, reserved, result);
-      return result;
+    JsonNode symbolsNode = schema.get("symbols");
+    if (symbolsNode == null || !symbolsNode.isArray()) {
+      throw new SchemaParseException("Enum has no symbols: " + schema);
     }
-    return null;
+    LockableArrayList<String> symbols = new LockableArrayList<>(symbolsNode.size());
+    for (JsonNode n : symbolsNode)
+      symbols.add(n.textValue());
+    JsonNode enumDefault = schema.get("default");
+    String defaultSymbol = null;
+    if (enumDefault != null) {
+      defaultSymbol = enumDefault.textValue();
+    }
+
+    Schema result = new EnumSchema(name, doc, symbols, defaultSymbol);
+    context.put(result);
+    parseProperties(schema, result, ENUM_RESERVED);
+    parseAliases(schema, result);
+    return result;
   }
 
-  static Schema parse(JsonNode schema, Names names) {
-    if (schema == null) {
-      throw new SchemaParseException("Cannot parse <null> schema");
-    }
+  private static Schema parseArray(JsonNode schema, ParseContext context, String currentNameSpace) {
+    Schema result;
+    JsonNode itemsNode = schema.get("items");
+    if (itemsNode == null)
+      throw new SchemaParseException("Array has no items type: " + schema);
+    result = new ArraySchema(parse(itemsNode, context, currentNameSpace));
+    parseProperties(schema, result, SCHEMA_RESERVED);
+    return result;
+  }
+
+  private static Schema parseMap(JsonNode schema, ParseContext context, String currentNameSpace) {
+    Schema result;
+    JsonNode valuesNode = schema.get("values");
+    if (valuesNode == null)
+      throw new SchemaParseException("Map has no values type: " + schema);
+    result = new MapSchema(parse(valuesNode, context, currentNameSpace));
+    parseProperties(schema, result, SCHEMA_RESERVED);
+    return result;
+  }
+
+  private static Schema parseFixed(JsonNode schema, ParseContext context, String currentNameSpace) {
+    Name name = parseName(schema, currentNameSpace);
+    String doc = parseDoc(schema);
 
-    Schema result = Schema.parseNamesDeclared(schema, names, names.space);
-    Schema.parseCompleteSchema(schema, names, names.space);
+    JsonNode sizeNode = schema.get("size");
+    if (sizeNode == null || !sizeNode.isInt())
+      throw new SchemaParseException("Invalid or no size: " + schema);
 
+    Schema result = new FixedSchema(name, doc, sizeNode.intValue());
+    context.put(result);
+    parseProperties(schema, result, SCHEMA_RESERVED);
+    parseAliases(schema, result);
     return result;
   }
 
-  static Schema resolveSchema(JsonNode schema, Names names, String currentNameSpace) {
-    String np = currentNameSpace;
-    String nodeName = getOptionalText(schema, "name");
-    if (nodeName != null) {
-      final JsonNode nameSpace = schema.get("namespace");
-      StringBuilder fullName = new StringBuilder();
-      if (nameSpace != null && nameSpace.isTextual()) {
-        fullName.append(nameSpace.asText()).append(".");
-        np = nameSpace.asText();
-      }
-      fullName.append(nodeName);
-      Schema schema1 = names.get(fullName.toString());
+  private static UnionSchema parseUnion(JsonNode schema, ParseContext context, String currentNameSpace) {
+    LockableArrayList<Schema> types = new LockableArrayList<>(schema.size());
+    for (JsonNode typeNode : schema)
+      types.add(parse(typeNode, context, currentNameSpace));
+    return new UnionSchema(types);
+  }
 
-      if (schema1 != null && schema1.getType() == Type.RECORD && !schema1.hasFields()) {
-        Schema.parseCompleteSchema(schema, names, np);
-      }
-      return schema1;
-    }
-    return null;
+  private static void parseProperties(JsonNode jsonNode, Schema result, Set<String> propertiesToSkip) {
+    parseProperties(jsonNode, (JsonProperties) result, propertiesToSkip);
+    // parse logical type if present
+    result.logicalType = LogicalTypes.fromSchemaIgnoreInvalid(result);
+  }
+
+  private static void parseProperties(JsonNode schema, JsonProperties result, Set<String> propertiesToSkip) {

Review Comment:
   ## Confusing overloading of methods
   
   Method Schema.parseProperties(..) could be confused with overloaded method [parseProperties](1), since dispatch depends on static types.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3175)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "github-advanced-security[bot] (via GitHub)" <gi...@apache.org>.

github-advanced-security[bot] commented on code in PR #2642:
URL: https://github.com/apache/avro/pull/2642#discussion_r1504196997


##########
lang/java/compiler/src/main/java/org/apache/avro/compiler/specific/SchemaTask.java:
##########
@@ -17,20 +17,21 @@
  */
 package org.apache.avro.compiler.specific;
 
+import org.apache.avro.Schema;
+import org.apache.avro.SchemaParser;
+
 import java.io.File;
 import java.io.IOException;
 
-import org.apache.avro.Schema;
-
 /** Ant task to generate Java interface and classes for a protocol. */
 public class SchemaTask extends ProtocolTask {
   @Override
   protected void doCompile(File src, File dest) throws IOException {
-    final Schema.Parser parser = new Schema.Parser();
-    final Schema schema = parser.parse(src);
+    final SchemaParser parser = new SchemaParser();
+    final Schema schema = parser.resolve(parser.parse(src));

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [SchemaParser.resolve](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3195)



##########
lang/java/compiler/src/test/java/org/apache/avro/specific/TestSpecificData.java:
##########
@@ -45,26 +42,29 @@
 import javax.tools.JavaFileObject;
 import javax.tools.StandardJavaFileManager;
 import javax.tools.ToolProvider;
-
-import org.apache.avro.Schema;
-import org.apache.avro.compiler.specific.SpecificCompiler;
-import org.apache.avro.generic.GenericData;
-import org.apache.avro.generic.GenericDatumWriter;
-import org.apache.avro.generic.GenericRecord;
-import org.apache.avro.io.DatumWriter;
-import org.apache.avro.io.DecoderFactory;
-import org.apache.avro.io.Encoder;
-import org.apache.avro.io.EncoderFactory;
-
-import org.junit.jupiter.api.Test;
+import java.io.ByteArrayOutputStream;
+import java.io.File;
+import java.lang.reflect.Method;
+import java.lang.reflect.Modifier;
+import java.net.URL;
+import java.net.URLClassLoader;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Objects;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
 
 import static org.junit.jupiter.api.Assertions.fail;
 
 public class TestSpecificData {
 
   @Test
   void separateThreadContextClassLoader() throws Exception {
-    Schema schema = new Schema.Parser().parse(new File("src/test/resources/foo.Bar.avsc"));
+    SchemaParser parser = new SchemaParser();
+    Schema schema = parser.resolve(parser.parse(new File("src/test/resources/foo.Bar.avsc")));

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [SchemaParser.resolve](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3196)



##########
lang/java/maven-plugin/src/test/java/org/apache/avro/mojo/TestInduceMojo.java:
##########
@@ -61,7 +61,8 @@
     assertTrue(outputDir.listFiles().length != 0);
     File personSchemaFile = Arrays.stream(outputDir.listFiles()).filter(file -> file.getName().endsWith("Person.avsc"))
         .findFirst().orElseThrow(AssertionError::new);
-    assertEquals(ReflectData.get().getSchema(Person.class), new Schema.Parser().parse(personSchemaFile));
+    SchemaParser parser = new SchemaParser();
+    assertEquals(ReflectData.get().getSchema(Person.class), parser.resolve(parser.parse(personSchemaFile)));

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [SchemaParser.resolve](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3199)



##########
lang/java/compiler/src/test/java/org/apache/avro/compiler/specific/TestSpecificCompiler.java:
##########
@@ -731,24 +736,28 @@
 
   @Test
   void logicalTypesWithMultipleFields() throws Exception {
-    Schema logicalTypesWithMultipleFields = new Schema.Parser()
-        .parse(new File("src/test/resources/logical_types_with_multiple_fields.avsc"));
+    Schema logicalTypesWithMultipleFields = parseFile(
+        new File("src/test/resources/logical_types_with_multiple_fields.avsc"));
     assertCompilesWithJavaCompiler(new File(OUTPUT_DIR, "testLogicalTypesWithMultipleFields"),
         new SpecificCompiler(logicalTypesWithMultipleFields).compile(), true);
   }
 
+  private static Schema parseFile(File file) throws IOException {
+    SchemaParser schemaParser = new SchemaParser();
+    return schemaParser.resolve(schemaParser.parse(file));

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [SchemaParser.resolve](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3198)



##########
lang/java/compiler/src/test/java/org/apache/avro/compiler/specific/TestSpecificCompiler.java:
##########
@@ -147,8 +149,8 @@
   }
 
   private SpecificCompiler createCompiler() throws IOException {
-    Schema.Parser parser = new Schema.Parser();
-    Schema schema = parser.parse(this.src);
+    SchemaParser parser = new SchemaParser();
+    Schema schema = parser.resolve(parser.parse(this.src));

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [SchemaParser.resolve](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3197)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "opwvhk (via GitHub)" <gi...@apache.org>.

opwvhk commented on code in PR #2642:
URL: https://github.com/apache/avro/pull/2642#discussion_r1489075983


##########
lang/java/avro/src/main/java/org/apache/avro/util/SchemaResolver.java:
##########
@@ -111,67 +105,6 @@ public static boolean isFullyResolvedSchema(final Schema schema) {
     }
   }
 
-  /**
-   * Clone the provided schema while resolving all unreferenced schemas.
-   *
-   * @param parseContext the parse context with known names
-   * @param schema       the schema to resolve
-   * @return a copy of the schema with all schemas resolved
-   */
-  public static Schema resolve(final ParseContext parseContext, Schema schema) {

Review Comment:
   These resolve methods were added after the last release, and are a fix for what I think is a bug: the SchemaParser can return schemas that are not fully resolved.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "opwvhk (via GitHub)" <gi...@apache.org>.

opwvhk commented on code in PR #2642:
URL: https://github.com/apache/avro/pull/2642#discussion_r1489083861


##########
lang/java/avro/src/main/java/org/apache/avro/Schema.java:
##########
@@ -1546,26 +1461,13 @@ public Schema parse(File file) throws IOException {
       return parse(FACTORY.createParser(file), false);
     }
 
-    public List<Schema> parse(Iterable<File> sources) throws IOException {

Review Comment:
   This bit of public API was added after the last release (#2375).
   
   Other than that, your concern is valid: we should not break a public API before at least having it deprecated, with a documented alternative, for sufficient time to migrate (probably at least a year).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.

Fokko commented on PR #2642:
URL: https://github.com/apache/avro/pull/2642#issuecomment-2036592931

   Thanks for working on this @opwvhk 🙌 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.

Fokko merged PR #2642:
URL: https://github.com/apache/avro/pull/2642


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.

Fokko commented on PR #2642:
URL: https://github.com/apache/avro/pull/2642#issuecomment-2047574517

   A release is near, see the public mailing list: https://lists.apache.org/thread/qg70g7d9j5odkf8fxnnm342mm2kj997l I would say this month since the release process always takes some time in open source.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.

Fokko commented on code in PR #2642:
URL: https://github.com/apache/avro/pull/2642#discussion_r1480394485


##########
lang/java/avro/src/main/java/org/apache/avro/Schema.java:
##########
@@ -1546,26 +1461,13 @@ public Schema parse(File file) throws IOException {
       return parse(FACTORY.createParser(file), false);
     }
 
-    public List<Schema> parse(Iterable<File> sources) throws IOException {

Review Comment:
   I'm very concerned about breaking public API's



##########
lang/java/avro/src/main/java/org/apache/avro/util/SchemaResolver.java:
##########
@@ -111,67 +105,6 @@ public static boolean isFullyResolvedSchema(final Schema schema) {
     }
   }
 
-  /**
-   * Clone the provided schema while resolving all unreferenced schemas.
-   *
-   * @param parseContext the parse context with known names
-   * @param schema       the schema to resolve
-   * @return a copy of the schema with all schemas resolved
-   */
-  public static Schema resolve(final ParseContext parseContext, Schema schema) {

Review Comment:
   Can we just remove those methods? I think other libraries might be relying on these. Should we first deprecate and then remove them?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "opwvhk (via GitHub)" <gi...@apache.org>.

opwvhk commented on PR #2642:
URL: https://github.com/apache/avro/pull/2642#issuecomment-1943282582

   > I tried to review this, but don't know where to start. The schema parser is very fundamental and there are 150+ files to review. My biggest concern is about breaking public API's. We had this in the Avro 1.8 to 1.9 version because we had to remove codehaus Jackson from the public API, and it was very hard to get this downstream into other projects.
   
   These are valid concerns, so I'll address them each.
   
   Breaking API's:
   * Are not intended: all existing methods should be deprecated instead.
   * There's only one exception that I know of: `Parser.parse(Iterable<File>)` was added after the last release (#2375)
   
   PR size:
   * Largely unavoidable, given how much we use the parser
   * Starting point is separating the parser and its tests from the rest
   * The parser requires extra scrutiny, the rest is of the category "it works, so it's good enough if it's readable"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "github-advanced-security[bot] (via GitHub)" <gi...@apache.org>.

github-advanced-security[bot] commented on code in PR #2642:
URL: https://github.com/apache/avro/pull/2642#discussion_r1459347034


##########
lang/java/compiler/src/test/java/org/apache/avro/compiler/schema/TestSchemas.java:
##########
@@ -265,20 +267,21 @@
         + "{\"name\": \"f1\", \"type\": {\"type\": \"record\", \"name\": \"c2\", \"fields\": "
         + "[{\"name\": \"f11\", \"type\": \"int\"},{\"name\": \"f12\", \"type\": \"double\"}" + "]}},"
         + "{\"name\": \"f2\", \"type\": \"long\"}" + "]}";
-    assertEquals("c1.c2.\"int\".!\"long\".!", Schemas.visit(new Schema.Parser().parse(s11), new TestVisitor() {
-      public SchemaVisitorAction visitTerminal(Schema terminal) {
-        sb.append(terminal).append('.');
-        return SchemaVisitorAction.SKIP_SIBLINGS;
-      }
-    }));
+    assertEquals("c1.c2.\"int\".!\"long\".!",
+        Schemas.visit(new SchemaParser().parse(s11).mainSchema(), new TestVisitor() {
+          public SchemaVisitorAction visitTerminal(Schema terminal) {

Review Comment:
   ## Missing Override annotation
   
   This method overrides [TestVisitor.visitTerminal](1); it is advisable to add an Override annotation.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3164)



##########
lang/java/avro/src/main/java/org/apache/avro/ParseContext.java:
##########
@@ -216,12 +252,94 @@
     newSchemas.clear();
   }
 
+  public SchemaParser.ParseResult commit(Schema mainSchema) {
+    Collection<Schema> parsedNamedSchemas = newSchemas.values();
+    SchemaParser.ParseResult parseResult = new SchemaParser.ParseResult() {
+      @Override
+      public Schema mainSchema() {
+        return mainSchema == null ? null : resolve(mainSchema);
+      }
+
+      @Override
+      public List<Schema> parsedNamedSchemas() {
+        return parsedNamedSchemas.stream().map(ParseContext.this::resolve).collect(Collectors.toList());
+      }
+    };
+    commit();
+    return parseResult;
+  }
+
   public void rollback() {
     newSchemas.clear();
   }
 
   /**
-   * Return all known types by their fullname.
+   * Resolve all (named) schemas that were parsed. This resolves all forward
+   * references, even if parsed from different files. Note: the context must be
+   * committed for this method to work.
+   *
+   * @return all parsed schemas, in the order they were parsed
+   * @throws AvroTypeException if a schema reference cannot be resolved
+   */
+  public List<Schema> resolveAllSchemas() {
+    ensureSchemasAreResolved();
+
+    return new ArrayList<>(oldSchemas.values());
+  }
+
+  private void ensureSchemasAreResolved() {
+    if (hasNewSchemas()) {
+      throw new IllegalStateException("Schemas cannot be resolved unless the ParseContext is committed.");
+    }
+    if (resolvingVisitor == null) {
+      NameValidator saved = Schema.getNameValidator();
+      try {
+        // Ensure we use the same validation when copying schemas as when they were
+        // defined.
+        Schema.setNameValidator(nameValidator);

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Schema.setNameValidator](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3168)



##########
lang/java/avro/src/main/java/org/apache/avro/JsonSchemaParser.java:
##########
@@ -57,26 +57,32 @@
     for (String fragment : fragments) {
       buffer.append(fragment);
     }
-    return new JsonSchemaParser().parse(new ParseContext(NameValidator.NO_VALIDATION), buffer, null);
+
+    boolean saved = Schema.getValidateDefaults();

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Schema.getValidateDefaults](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3165)



##########
lang/java/idl/src/main/java/org/apache/avro/idl/IdlReader.java:
##########
@@ -404,6 +390,14 @@
     @Override
     public void exitMainSchemaDeclaration(IdlParser.MainSchemaDeclarationContext ctx) {
       mainSchema = typeStack.pop();
+      switch (mainSchema.getType()) {

Review Comment:
   ## Missing enum case in switch
   
   Switch statement does not have a case for [NULL](1).
   Switch statement does not have a case for [BOOLEAN](2).
   Switch statement does not have a case for [DOUBLE](3).
   Switch statement does not have a case for [FLOAT](4).
   Switch statement does not have a case for [LONG](5).
   Switch statement does not have a case for [INT](6).
   Switch statement does not have a case for [BYTES](7).
   Switch statement does not have a case for [STRING](8).
   Switch statement does not have a case for [UNION](9).
   Switch statement does not have a case for [MAP](10).
   Switch statement does not have a case for [ARRAY](11).
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3171)



##########
lang/java/avro/src/main/java/org/apache/avro/ParseContext.java:
##########
@@ -216,12 +252,94 @@
     newSchemas.clear();
   }
 
+  public SchemaParser.ParseResult commit(Schema mainSchema) {
+    Collection<Schema> parsedNamedSchemas = newSchemas.values();
+    SchemaParser.ParseResult parseResult = new SchemaParser.ParseResult() {
+      @Override
+      public Schema mainSchema() {
+        return mainSchema == null ? null : resolve(mainSchema);
+      }
+
+      @Override
+      public List<Schema> parsedNamedSchemas() {
+        return parsedNamedSchemas.stream().map(ParseContext.this::resolve).collect(Collectors.toList());
+      }
+    };
+    commit();
+    return parseResult;
+  }
+
   public void rollback() {
     newSchemas.clear();
   }
 
   /**
-   * Return all known types by their fullname.
+   * Resolve all (named) schemas that were parsed. This resolves all forward
+   * references, even if parsed from different files. Note: the context must be
+   * committed for this method to work.
+   *
+   * @return all parsed schemas, in the order they were parsed
+   * @throws AvroTypeException if a schema reference cannot be resolved
+   */
+  public List<Schema> resolveAllSchemas() {
+    ensureSchemasAreResolved();
+
+    return new ArrayList<>(oldSchemas.values());
+  }
+
+  private void ensureSchemasAreResolved() {
+    if (hasNewSchemas()) {
+      throw new IllegalStateException("Schemas cannot be resolved unless the ParseContext is committed.");
+    }
+    if (resolvingVisitor == null) {
+      NameValidator saved = Schema.getNameValidator();
+      try {
+        // Ensure we use the same validation when copying schemas as when they were
+        // defined.
+        Schema.setNameValidator(nameValidator);
+        SchemaResolver.ResolvingVisitor visitor = new SchemaResolver.ResolvingVisitor(oldSchemas::get);
+        oldSchemas.values().forEach(schema -> Schemas.visit(schema, visitor));
+        // Before this point is where we can get exceptions due to resolving failures.
+        for (Map.Entry<String, Schema> entry : oldSchemas.entrySet()) {
+          entry.setValue(visitor.getResolved(entry.getValue()));
+        }
+        resolvingVisitor = visitor;
+      } finally {
+        Schema.setNameValidator(saved);

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Schema.setNameValidator](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3169)



##########
lang/java/compiler/src/main/java/org/apache/avro/compiler/idl/SchemaResolver.java:
##########
@@ -108,7 +108,7 @@
    */
   static Protocol resolve(final Protocol protocol) {
     Protocol result = new Protocol(protocol.getName(), protocol.getDoc(), protocol.getNamespace());
-    final Collection<Schema> types = protocol.getTypes();
+    final Collection<Schema> types = protocol.getUnresolvedTypes();

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Protocol.getUnresolvedTypes](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3170)



##########
lang/java/avro/src/main/java/org/apache/avro/JsonSchemaParser.java:
##########
@@ -57,26 +57,32 @@
     for (String fragment : fragments) {
       buffer.append(fragment);
     }
-    return new JsonSchemaParser().parse(new ParseContext(NameValidator.NO_VALIDATION), buffer, null);
+
+    boolean saved = Schema.getValidateDefaults();
+    try {
+      Schema.setValidateDefaults(false);

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Schema.setValidateDefaults](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3166)



##########
lang/java/avro/src/main/java/org/apache/avro/ParseContext.java:
##########
@@ -216,12 +252,94 @@
     newSchemas.clear();
   }
 
+  public SchemaParser.ParseResult commit(Schema mainSchema) {
+    Collection<Schema> parsedNamedSchemas = newSchemas.values();
+    SchemaParser.ParseResult parseResult = new SchemaParser.ParseResult() {
+      @Override
+      public Schema mainSchema() {
+        return mainSchema == null ? null : resolve(mainSchema);
+      }
+
+      @Override
+      public List<Schema> parsedNamedSchemas() {
+        return parsedNamedSchemas.stream().map(ParseContext.this::resolve).collect(Collectors.toList());
+      }
+    };
+    commit();
+    return parseResult;
+  }
+
   public void rollback() {
     newSchemas.clear();
   }
 
   /**
-   * Return all known types by their fullname.
+   * Resolve all (named) schemas that were parsed. This resolves all forward
+   * references, even if parsed from different files. Note: the context must be
+   * committed for this method to work.
+   *
+   * @return all parsed schemas, in the order they were parsed
+   * @throws AvroTypeException if a schema reference cannot be resolved
+   */
+  public List<Schema> resolveAllSchemas() {
+    ensureSchemasAreResolved();
+
+    return new ArrayList<>(oldSchemas.values());
+  }
+
+  private void ensureSchemasAreResolved() {
+    if (hasNewSchemas()) {
+      throw new IllegalStateException("Schemas cannot be resolved unless the ParseContext is committed.");
+    }
+    if (resolvingVisitor == null) {
+      NameValidator saved = Schema.getNameValidator();

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [Schema.getNameValidator](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/avro/security/code-scanning/3167)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "opwvhk (via GitHub)" <gi...@apache.org>.

opwvhk commented on code in PR #2642:
URL: https://github.com/apache/avro/pull/2642#discussion_r1463526713


##########
lang/java/avro/src/main/java/org/apache/avro/Schema.java:
##########
@@ -1867,47 +1781,70 @@ private static boolean isValidValue(Schema schema, JsonNode value) {
     }
   }
 
-  /**
-   * Parse named schema in order to fill names map. This method does not parse
-   * field of record/error schema.
-   *
-   * @param schema           : json schema representation.
-   * @param names            : map of named schema.
-   * @param currentNameSpace : current working name space.
-   * @return schema.
-   */
-  static Schema parseNamesDeclared(JsonNode schema, Names names, String currentNameSpace) {
+  /** @see #parse(String) */
+  static Schema parse(JsonNode schema, ParseContext context, String currentNameSpace) {
     if (schema == null) {
-      return null;
+      throw new SchemaParseException("Cannot parse <null> schema");
     }
-    if (schema.isObject()) {
-
-      String type = Schema.getOptionalText(schema, "type");
+    if (schema.isTextual()) { // name
+      return context.find(schema.textValue(), currentNameSpace);
+    } else if (schema.isObject()) {
+      Schema result;
+      String type = getRequiredText(schema, "type", "No type");
       Name name = null;
-
+      String space = null;
       String doc = null;
-      Schema result = null;
       final boolean isTypeError = "error".equals(type);
       final boolean isTypeRecord = "record".equals(type);
       final boolean isTypeEnum = "enum".equals(type);
       final boolean isTypeFixed = "fixed".equals(type);
-
       if (isTypeRecord || isTypeError || isTypeEnum || isTypeFixed) {
-        String space = getOptionalText(schema, "namespace");
+        space = getOptionalText(schema, "namespace");
         doc = getOptionalText(schema, "doc");
         if (space == null)
           space = currentNameSpace;
         name = new Name(getRequiredText(schema, "name", "No name in schema"), space);
       }
-      if (isTypeRecord || isTypeError) { // record
+      if (PRIMITIVES.containsKey(type)) { // primitive
+        result = create(PRIMITIVES.get(type));
+      } else if (isTypeRecord || isTypeError) { // record
+        List<Field> fields = new ArrayList<>();
         result = new RecordSchema(name, doc, isTypeError);
-        names.add(result);
+        context.put(result);
         JsonNode fieldsNode = schema.get("fields");
-
         if (fieldsNode == null || !fieldsNode.isArray())
           throw new SchemaParseException("Record has no fields: " + schema);
-        exploreFields(fieldsNode, names, name != null ? name.space : null);
-
+        for (JsonNode field : fieldsNode) {

Review Comment:
   I;'ve split the method to reduce complexity / increase readability.



##########
lang/java/avro/src/main/java/org/apache/avro/Schema.java:
##########
@@ -1867,47 +1781,70 @@ private static boolean isValidValue(Schema schema, JsonNode value) {
     }
   }
 
-  /**
-   * Parse named schema in order to fill names map. This method does not parse
-   * field of record/error schema.
-   *
-   * @param schema           : json schema representation.
-   * @param names            : map of named schema.
-   * @param currentNameSpace : current working name space.
-   * @return schema.
-   */
-  static Schema parseNamesDeclared(JsonNode schema, Names names, String currentNameSpace) {
+  /** @see #parse(String) */
+  static Schema parse(JsonNode schema, ParseContext context, String currentNameSpace) {
     if (schema == null) {
-      return null;
+      throw new SchemaParseException("Cannot parse <null> schema");
     }
-    if (schema.isObject()) {
-
-      String type = Schema.getOptionalText(schema, "type");
+    if (schema.isTextual()) { // name
+      return context.find(schema.textValue(), currentNameSpace);
+    } else if (schema.isObject()) {
+      Schema result;
+      String type = getRequiredText(schema, "type", "No type");
       Name name = null;
-
+      String space = null;
       String doc = null;
-      Schema result = null;
       final boolean isTypeError = "error".equals(type);
       final boolean isTypeRecord = "record".equals(type);
       final boolean isTypeEnum = "enum".equals(type);
       final boolean isTypeFixed = "fixed".equals(type);
-
       if (isTypeRecord || isTypeError || isTypeEnum || isTypeFixed) {
-        String space = getOptionalText(schema, "namespace");
+        space = getOptionalText(schema, "namespace");
         doc = getOptionalText(schema, "doc");
         if (space == null)
           space = currentNameSpace;
         name = new Name(getRequiredText(schema, "name", "No name in schema"), space);
       }
-      if (isTypeRecord || isTypeError) { // record
+      if (PRIMITIVES.containsKey(type)) { // primitive
+        result = create(PRIMITIVES.get(type));
+      } else if (isTypeRecord || isTypeError) { // record
+        List<Field> fields = new ArrayList<>();
         result = new RecordSchema(name, doc, isTypeError);
-        names.add(result);
+        context.put(result);
         JsonNode fieldsNode = schema.get("fields");
-
         if (fieldsNode == null || !fieldsNode.isArray())
           throw new SchemaParseException("Record has no fields: " + schema);
-        exploreFields(fieldsNode, names, name != null ? name.space : null);
-
+        for (JsonNode field : fieldsNode) {

Review Comment:
   I've split the method to reduce complexity / increase readability.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] AVRO-3666 [Java] Use the new schema parser [avro]

Posted by "g1rjeevan (via GitHub)" <gi...@apache.org>.

g1rjeevan commented on PR #2642:
URL: https://github.com/apache/avro/pull/2642#issuecomment-2097428004

   @Fokko Any update on the release ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org