You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Rumeshkrishnan (JIRA)" <ji...@apache.org> on 2019/02/04 21:21:00 UTC

[jira] [Commented] (AVRO-2299) Get Plain Schema

    [ https://issues.apache.org/jira/browse/AVRO-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760229#comment-16760229 ] 

Rumeshkrishnan commented on AVRO-2299:
--------------------------------------

I was going through all the avro types and found that many changes required with respective AVRO types. I have done code changes in new file, if we can able to find the way reuse functionalities and combine the SchemaNormalization.java then it is helpful. I will try to come up with test cases for this SchemaCanonicalizer. The rules as followed.
 * Canonical normaliser should filter and order the reserved as well as user given properties.
 * User able to normalise schema with additional user defined logical types.
 * name, namespace is different keys in Canonical normaliser. it should not reduce as single property `name` for RECORD, FIXED, ENUM types.
 * reserved avro property ordering as below, followed by user given properties.

{code:java}
"name", "namespace", "type", "fields", "symbols", "items", "values", "logicalType", "size", "order", "doc", "aliases", "default"{code}

*Current  SchemaCanonicalizer.java implementation:*
{code:java}
import org.apache.avro.util.internal.JacksonUtils;

import java.io.IOException;
import java.util.*;

/**
 * Collection of static methods for generating the canonical form of
 * schemas with reserved properties (see {@link #toCanonicalForm}).
 */
public class SchemaCanonicalizer {

  private static final LinkedHashSet<String> RESERVED_PROPERTIES = new LinkedHashSet<>();

  private static final LinkedHashSet<LogicalTypes> ADDITIONAL_LOGICAL_TYPES = new LinkedHashSet<>();

  private SchemaCanonicalizer() {
  }

  private SchemaCanonicalizer(LogicalTypes... lts) {
    ADDITIONAL_LOGICAL_TYPES.addAll(Arrays.asList(lts));
  }

  static {
    Collections.addAll(RESERVED_PROPERTIES,
      "name", "namespace", "type", "fields", "symbols", "items", "values",
      "logicalType", "size", "order", "doc", "aliases", "default");
  }

  public static String toCanonicalForm(Schema s) {
    try {
      return build(s, new StringBuilder()).toString();
    } catch (IOException e) {
      // Shouldn't happen, b/c StringBuilder can't throw IOException
      throw new RuntimeException(e);
    }
  }

  public static String toCanonicalForm(Schema s, LinkedHashSet<String> properties) {
    try {
      RESERVED_PROPERTIES.addAll(properties);
      return build(s, new StringBuilder()).toString();
    } catch (IOException e) {
      // Shouldn't happen, b/c StringBuilder can't throw IOException
      throw new RuntimeException(e);
    }
  }

  private static Appendable build(Schema s, Appendable o) throws IOException {
    Schema.Type st = s.getType();
    LogicalType lt = null;
    if (ADDITIONAL_LOGICAL_TYPES.isEmpty()) {
      lt = s.getLogicalType();
    } else {
      lt = getLogicalType(s);
    }

    if (lt == null) {
      switch (st) {
        default: // boolean, bytes, double, float, int, long, null, string
          return o.append('"').append(st.getName()).append('"');
        case UNION:
          writeUnionType(s, o);
        case ARRAY:
          writeArrayType(s, o);
        case MAP:
          writeMapType(s, o);
        case ENUM:
          writeEnumType(s, o);
        case FIXED:
          writeFixedType(s, o);
        case RECORD:
          writeRecordType(s, o);
      }
    } else {
      writeLogicalType(s, lt, o);
    }

    return o;
  }

  private static LogicalType getLogicalType(Schema s) {
    for (LogicalTypes lts : ADDITIONAL_LOGICAL_TYPES) {
      LogicalType lt = LogicalTypes.fromSchema(s);
      if (lt != null) return lt;
    }
    return null;
  }

  private static Appendable writeLogicalType(Schema s, LogicalType lt, Appendable o) throws IOException {
    o.append("{\"type\":\"").append(s.getType().getName()).append("\"");
    o.append("\"").append(LogicalType.LOGICAL_TYPE_PROP).append("\":\"").append(lt.getName()).append("\"");
    // adding the reserved property
    writeProps(o, s.getObjectProps());
    return o.append("}");
  }

  private static Appendable writeUnionType(Schema s, Appendable o) throws IOException {
    boolean firstTime = true;
    o.append('[');
    for (Schema b : s.getTypes()) {
      if (!firstTime) o.append(',');
      else firstTime = false;
      build(b, o);
    }
    return o.append(']');
  }

  private static Appendable writeArrayType(Schema s, Appendable o) throws IOException {
    o.append("{\"type\":\"").append(s.getType().getName()).append("\"");
    build(s.getElementType(), o.append(",\"items\":"));
    // adding the reserved property
    writeProps(o, s.getObjectProps());
    return o.append("}");
  }

  private static Appendable writeMapType(Schema s, Appendable o) throws IOException {
    o.append("{\"type\":\"").append(s.getType().getName()).append("\"");
    build(s.getValueType(), o.append(",\"values\":"));
    // adding the reserved property
    writeProps(o, s.getObjectProps());
    return o.append("}");
  }

  private static Appendable writeFixedType(Schema s, Appendable o) throws IOException {
    o.append("{\"name\":\"").append(s.getName()).append("\"");
    writeNamespace(o, s.getNamespace());
    o.append(",\"type\":\"").append(s.getType().getName()).append("\"");
    o.append(",\"size\":").append(Integer.toString(s.getFixedSize()));
    writeAliases(o, s.getAliases());
    // adding the reserved property
    writeProps(o, s.getObjectProps());

    return o;
  }

  private static Appendable writeEnumType(Schema s, Appendable o) throws IOException {
    o.append("{\"name\":\"").append(s.getName()).append("\"");
    writeNamespace(o, s.getNamespace());
    o.append(",\"type\":\"").append(s.getType().getName()).append("\"");
    writeDoc(o, s.getDoc());
    writeAliases(o, s.getAliases());
    boolean firstTime = true;

    o.append(",\"symbols\":[");
    for (String enumSymbol : s.getEnumSymbols()) {
      if (!firstTime) o.append(',');
      else firstTime = false;
      o.append('"').append(enumSymbol).append('"');
    }
    o.append("]");
    // adding the reserved property
    writeProps(o, s.getObjectProps());

    return o;
  }

  private static Appendable writeRecordType(Schema s, Appendable o) throws IOException {
    o.append("{\"name\":\"").append(s.getName()).append("\"");
    writeNamespace(o, s.getNamespace());
    o.append(",\"type\":\"").append(s.getType().getName()).append("\"");
    writeDoc(o, s.getDoc());
    writeAliases(o, s.getAliases());
    boolean firstTime = true;

    o.append(",\"fields\":[");
    for (Schema.Field f : s.getFields()) {
      if (!firstTime) o.append(',');
      else firstTime = false;
      o.append("{\"name\":\"").append(f.name()).append("\"");
      build(f.schema(), o.append(",\"type\":"));
      // order
      writeOrder(o, f.order());
      // doc
      writeDoc(o, f.doc());
      // aliases
      writeAliases(o, f.aliases());
      // default
      writeDefault(o, f.defaultVal());
      o.append("}");
    }
    o.append("]");
    // adding the reserved property
    writeProps(o, s.getObjectProps());

    return o;
  }

  private static Appendable writeProps(Appendable o, Map<String, Object> schemaProps) throws IOException {
    for (String propKey : RESERVED_PROPERTIES) {
      if (schemaProps.containsKey(propKey)) {
        String propValue = JacksonUtils.toJsonNode(schemaProps.get(propKey)).toString();
        o.append(",\"").append(propKey).append("\":").append(propValue);
      }
    }
    return o;
  }

  private static Appendable writeNamespace(Appendable o, String namespace) throws IOException {
    if (namespace != null) {
      o.append(",\"namespace\":\"").append(namespace).append("\"");
    }
    return o;
  }

  private static Appendable writeOrder(Appendable o, Schema.Field.Order order) throws IOException {
    if (order != null) {
      o.append(",\"order\":\"").append(order.toString()).append("\"");
    }
    return o;
  }

  private static Appendable writeDoc(Appendable o, String doc) throws IOException {
    if (doc != null) {
      o.append(",\"doc\":\"").append(doc).append("\"");
    }
    return o;
  }

  private static Appendable writeAliases(Appendable o, Set<String> aliases) throws IOException {
    if (!aliases.isEmpty()) {
      String propValue = JacksonUtils.toJsonNode(aliases).toString();
      o.append(",\"aliases\":").append(propValue);
    }
    return o;
  }

  private static Appendable writeDefault(Appendable o, Object object) throws IOException {
    if (object != null) {
      String propValue = JacksonUtils.toJsonNode(object).toString();
      o.append(",\"default\":").append(propValue);
    }
    return o;
  }

}
{code}
[~cutting] kindly review the code, mean while I will create the test cases. If any addition in the new canonical normaliser rules and modification kindly let me know. 

> Get Plain Schema
> ----------------
>
>                 Key: AVRO-2299
>                 URL: https://issues.apache.org/jira/browse/AVRO-2299
>             Project: Apache Avro
>          Issue Type: New Feature
>          Components: java
>    Affects Versions: 1.8.2
>            Reporter: Rumeshkrishnan
>            Priority: Minor
>              Labels: features
>             Fix For: 1.9.0, 1.8.2, 1.8.3, 1.8.4
>
>
> {panel:title=Avro Schema Reserved Keys:}
> "doc", "fields", "items", "name", "namespace",
>  "size", "symbols", "values", "type", "aliases", "default"
> {panel}
> AVRO also supports user defined properties for both Schema and Field.
> Is there way to get the schema with reserved property (key, value)? 
> Input Schema: 
> {code:java}
> {
>   "name": "testSchema",
>   "namespace": "com.avro",
>   "type": "record",
>   "fields": [
>     {
>       "name": "email",
>       "type": "string",
>       "doc": "email id",
>       "user_field_prop": "xxxxx"
>     }
>   ],
>   "user_schema_prop": "xxxxxx"
> }{code}
> Expected Plain Schema:
> {code:java}
> {
>   "name": "testSchema",
>   "namespace": "com.avro",
>   "type": "record",
>   "fields": [
>     {
>       "name": "email",
>       "type": "string",
>       "doc": "email id"
>     }
>   ]
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)