You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by bbende <gi...@git.apache.org> on 2018/03/08 17:11:54 UTC

[GitHub] nifi pull request #2523: NIFI-4935 Support Schema Branches when using HWX Sc...

GitHub user bbende opened a pull request:

    https://github.com/apache/nifi/pull/2523

    NIFI-4935 Support Schema Branches when using HWX Schema Registry

    This PR adds optional "Schema Branch" and "Schema Version" properties to the readers/writers which can be used when using the "Schema Name" access strategy. 
    
    You can specify schema name + branch, or schema name + version, and this will be passed down to the schema registry implementations. Currently only the Hortonworks Schema Registry makes use of the branch concept.
    
    This PR also includes some refactoring to clean up the SchemaRegistry interface and move towards a single method to retrieve a schema that takes a schema identifier. Also added unit tests for all of the access strategies and access writers.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/bbende/nifi schema-reg-branch

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/2523.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2523
    
----
commit 26b01b6a384cafa50ae262dc9f50e05f41bdb818
Author: Bryan Bende <bb...@...>
Date:   2018-03-05T22:02:20Z

    NIFI-4935 Refactoring to support specifying schema branch or schema version when using schema by name strategy

----


---

[GitHub] nifi pull request #2523: NIFI-4935 Support Schema Branches when using HWX Sc...

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2523#discussion_r174821838
  
    --- Diff: nifi-nar-bundles/nifi-confluent-platform-bundle/nifi-confluent-schema-registry-service/src/main/java/org/apache/nifi/confluent/schemaregistry/ConfluentSchemaRegistry.java ---
    @@ -176,28 +178,33 @@ public void onEnabled(final ConfigurationContext context) {
             return baseUrls;
         }
     
    -    @Override
    -    public String retrieveSchemaText(final String schemaName) throws IOException, SchemaNotFoundException {
    -        final RecordSchema schema = retrieveSchema(schemaName);
    -        return schema.getSchemaText().get();
    -    }
    +    private RecordSchema retrieveSchemaByName(final SchemaIdentifier schemaIdentifier) throws IOException, SchemaNotFoundException {
    +        final Optional<String> schemaName = schemaIdentifier.getName();
    +        if (!schemaName.isPresent()) {
    +            throw new org.apache.nifi.schema.access.SchemaNotFoundException("Cannot retrieve schema because Schema Name is not present");
    +        }
     
    -    @Override
    -    public String retrieveSchemaText(final long schemaId, final int version) throws IOException, SchemaNotFoundException {
    -        final RecordSchema schema = retrieveSchema(schemaId, version);
    -        return schema.getSchemaText().get();
    +        final RecordSchema schema = client.getSchema(schemaName.get());
    +        return schema;
         }
     
    -    @Override
    -    public RecordSchema retrieveSchema(final String schemaName) throws IOException, SchemaNotFoundException {
    -        final RecordSchema schema = client.getSchema(schemaName);
    +    private RecordSchema retrieveSchemaByIdAndVersion(final SchemaIdentifier schemaIdentifier) throws IOException, SchemaNotFoundException {
    --- End diff --
    
    This doesn't appear to take into account a version at all - which I think is correct because the Confluent Schema Registry I don't supports ID + Version, as each version would get a different ID? But in this case perhaps the name of the method should just be `retrieveSchemaById` and not `retrieveSchemaByIdAndVersion`?


---

[GitHub] nifi issue #2523: NIFI-4935 Support Schema Branches when using HWX Schema Re...

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on the issue:

    https://github.com/apache/nifi/pull/2523
  
    Great thanks @bbende all looks good to me. +1 merged to master.


---

[GitHub] nifi pull request #2523: NIFI-4935 Support Schema Branches when using HWX Sc...

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2523#discussion_r174826270
  
    --- Diff: nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-standard-record-utils/src/main/java/org/apache/nifi/schema/access/HortonworksAttributeSchemaReferenceWriter.java ---
    @@ -40,8 +40,8 @@ public void writeHeader(RecordSchema schema, OutputStream out) throws IOExceptio
             final Map<String, String> attributes = new HashMap<>(4);
             final SchemaIdentifier id = schema.getIdentifier();
     
    -        final long schemaId = id.getIdentifier().getAsLong();
    -        final int schemaVersion = id.getVersion().getAsInt();
    +        final Long schemaId = id.getIdentifier().getAsLong();
    +        final Integer schemaVersion = id.getVersion().getAsInt();
     
             attributes.put(HortonworksAttributeSchemaReferenceStrategy.SCHEMA_ID_ATTRIBUTE, String.valueOf(schemaId));
             attributes.put(HortonworksAttributeSchemaReferenceStrategy.SCHEMA_VERSION_ATTRIBUTE, String.valueOf(schemaVersion));
    --- End diff --
    
    Should we also be adding an attribute now for the Branch Name, if specified? While I realize that it's not explicitly necessary, since the version would disambiguate anything, it may make it more clear when looking at FlowFiles in a queue or looking at provenance data if we have that info.


---

[GitHub] nifi pull request #2523: NIFI-4935 Support Schema Branches when using HWX Sc...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/nifi/pull/2523


---

[GitHub] nifi pull request #2523: NIFI-4935 Support Schema Branches when using HWX Sc...

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2523#discussion_r174820659
  
    --- Diff: nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/record/StandardSchemaIdentifier.java ---
    @@ -64,6 +75,49 @@ public boolean equals(final Object obj) {
                 return false;
             }
             final SchemaIdentifier other = (SchemaIdentifier) obj;
    -        return getName().equals(other.getName()) && getIdentifier().equals(other.getIdentifier()) && getVersion().equals(other.getVersion());
    +        return getName().equals(other.getName())
    +                && getIdentifier().equals(other.getIdentifier())
    +                && getVersion().equals(other.getVersion())
    +                && getBranch().equals(other.getBranch());
    +    }
    +
    +    /**
    +     * Builder to create instances of Schema
    --- End diff --
    
    Think that's a typo - should be "Builder to create instances of Schema Identifier" - it doesn't build a Schema, just the identifier.


---

[GitHub] nifi pull request #2523: NIFI-4935 Support Schema Branches when using HWX Sc...

Posted by bbende <gi...@git.apache.org>.
Github user bbende commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2523#discussion_r174888711
  
    --- Diff: nifi-commons/nifi-record/src/main/java/org/apache/nifi/serialization/record/StandardSchemaIdentifier.java ---
    @@ -64,6 +75,49 @@ public boolean equals(final Object obj) {
                 return false;
             }
             final SchemaIdentifier other = (SchemaIdentifier) obj;
    -        return getName().equals(other.getName()) && getIdentifier().equals(other.getIdentifier()) && getVersion().equals(other.getVersion());
    +        return getName().equals(other.getName())
    +                && getIdentifier().equals(other.getIdentifier())
    +                && getVersion().equals(other.getVersion())
    +                && getBranch().equals(other.getBranch());
    +    }
    +
    +    /**
    +     * Builder to create instances of Schema
    --- End diff --
    
    Will update


---

[GitHub] nifi pull request #2523: NIFI-4935 Support Schema Branches when using HWX Sc...

Posted by bbende <gi...@git.apache.org>.
Github user bbende commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2523#discussion_r174888759
  
    --- Diff: nifi-nar-bundles/nifi-confluent-platform-bundle/nifi-confluent-schema-registry-service/src/main/java/org/apache/nifi/confluent/schemaregistry/ConfluentSchemaRegistry.java ---
    @@ -176,28 +178,33 @@ public void onEnabled(final ConfigurationContext context) {
             return baseUrls;
         }
     
    -    @Override
    -    public String retrieveSchemaText(final String schemaName) throws IOException, SchemaNotFoundException {
    -        final RecordSchema schema = retrieveSchema(schemaName);
    -        return schema.getSchemaText().get();
    -    }
    +    private RecordSchema retrieveSchemaByName(final SchemaIdentifier schemaIdentifier) throws IOException, SchemaNotFoundException {
    +        final Optional<String> schemaName = schemaIdentifier.getName();
    +        if (!schemaName.isPresent()) {
    +            throw new org.apache.nifi.schema.access.SchemaNotFoundException("Cannot retrieve schema because Schema Name is not present");
    +        }
     
    -    @Override
    -    public String retrieveSchemaText(final long schemaId, final int version) throws IOException, SchemaNotFoundException {
    -        final RecordSchema schema = retrieveSchema(schemaId, version);
    -        return schema.getSchemaText().get();
    +        final RecordSchema schema = client.getSchema(schemaName.get());
    +        return schema;
         }
     
    -    @Override
    -    public RecordSchema retrieveSchema(final String schemaName) throws IOException, SchemaNotFoundException {
    -        final RecordSchema schema = client.getSchema(schemaName);
    +    private RecordSchema retrieveSchemaByIdAndVersion(final SchemaIdentifier schemaIdentifier) throws IOException, SchemaNotFoundException {
    --- End diff --
    
    Good point, I'll change that


---

[GitHub] nifi pull request #2523: NIFI-4935 Support Schema Branches when using HWX Sc...

Posted by bbende <gi...@git.apache.org>.
Github user bbende commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2523#discussion_r174895925
  
    --- Diff: nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-standard-record-utils/src/main/java/org/apache/nifi/schema/access/HortonworksAttributeSchemaReferenceWriter.java ---
    @@ -40,8 +40,8 @@ public void writeHeader(RecordSchema schema, OutputStream out) throws IOExceptio
             final Map<String, String> attributes = new HashMap<>(4);
             final SchemaIdentifier id = schema.getIdentifier();
     
    -        final long schemaId = id.getIdentifier().getAsLong();
    -        final int schemaVersion = id.getVersion().getAsInt();
    +        final Long schemaId = id.getIdentifier().getAsLong();
    +        final Integer schemaVersion = id.getVersion().getAsInt();
     
             attributes.put(HortonworksAttributeSchemaReferenceStrategy.SCHEMA_ID_ATTRIBUTE, String.valueOf(schemaId));
             attributes.put(HortonworksAttributeSchemaReferenceStrategy.SCHEMA_VERSION_ATTRIBUTE, String.valueOf(schemaVersion));
    --- End diff --
    
    We can do that... one unfortunate thing though is that we will only have the branch name if we also did the lookup with the branch name. If you do the lookup by id+version then nothing I can see on the client API tells you the branch name (i.e. SchemaVersionInfo does not have getBranchName())


---

[GitHub] nifi pull request #2523: NIFI-4935 Support Schema Branches when using HWX Sc...

Posted by markap14 <gi...@git.apache.org>.
Github user markap14 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2523#discussion_r174825845
  
    --- Diff: nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-standard-record-utils/src/main/java/org/apache/nifi/schema/access/HortonworksAttributeSchemaReferenceStrategy.java ---
    @@ -84,7 +85,8 @@ public RecordSchema getSchema(Map<String, String> variables, final InputStream c
             final long schemaId = Long.parseLong(schemaIdentifier);
             final int version = Integer.parseInt(schemaVersion);
    --- End diff --
    
    Shouldn't we be supporting the use of a Branch Name instead of a Version here? I.e., should we support using either version OR branch name?


---

[GitHub] nifi pull request #2523: NIFI-4935 Support Schema Branches when using HWX Sc...

Posted by bbende <gi...@git.apache.org>.
Github user bbende commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2523#discussion_r174895242
  
    --- Diff: nifi-nar-bundles/nifi-extension-utils/nifi-record-utils/nifi-standard-record-utils/src/main/java/org/apache/nifi/schema/access/HortonworksAttributeSchemaReferenceStrategy.java ---
    @@ -84,7 +85,8 @@ public RecordSchema getSchema(Map<String, String> variables, final InputStream c
             final long schemaId = Long.parseLong(schemaIdentifier);
             final int version = Integer.parseInt(schemaVersion);
    --- End diff --
    
    I think I decided to leave the attribute strategy alone to avoid creating extra complexity..
    
    Currently the attributes approach provides a direct lookup by id+version (which would disambiguate the branch) and users are accustomed to using this strategy when they want a specific version.
    
    If we then allow the option to use branch instead of version, the combo of id+branch would then get the latest version on the given branch, instead of a direct look up.
    
    Also, since we now allow specifying a version with access-by-name strategy, I wonder how often the attributes approach would even be used.
    
    If you feel strongly about it we can definitely make it work though.



---

[GitHub] nifi issue #2523: NIFI-4935 Support Schema Branches when using HWX Schema Re...

Posted by bbende <gi...@git.apache.org>.
Github user bbende commented on the issue:

    https://github.com/apache/nifi/pull/2523
  
    @markap14 pushed another commit that addresses 3 of the 4 comments, I didn't add the branch option to the attributes access strategy, let me know if you think we need to do that.


---