You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Sönke Liebau <so...@opencore.com> on 2017/10/23 10:42:20 UTC

Dynamically adding avro schema to AvroSchemaRegistry at runtime

Hi everybody,

I am developing a custom ingest processor that writes data out as a binary
avro stream. Currently I simply store the schema in an attribute of the
flowfile and that works quite nicely with an AvroRecordReader deserializing
it.

To make things "nicer" I thought I'd have a look at the AvroSchemaRegistry
and use that to store and look up schemas. However I cannot find a way for
my processor to register a schema with the registry, but only to retrieve
schemas [1]. I understand that I can manually add the schema as a dynamic
property, but what I want to accomplish is that the processor can
automatically add evolving schemas to the registry at runtime.

Am I missing something obvious here, or is the registry simply not supposed
to work like that?

Kind regards,
Sönke

[1]
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-registry-bundle/nifi-registry-service/src/main/java/org/apache/nifi/schemaregistry/services/AvroSchemaRegistry.java#L83

Re: Dynamically adding avro schema to AvroSchemaRegistry at runtime

Posted by Sönke Liebau <so...@opencore.com>.
I see your point, and once we start sending updates throughout the cluster
we probably need to start worrying about race conditions where data may
arrive before a schema at a node or something similar.

I'll drop the idea and look at using an existing schema registry if it
become relevant for me at some point.

Thanks!

On Mon, Oct 23, 2017 at 5:39 PM, Joe Witt <jo...@gmail.com> wrote:

> Sonke
>
> The built-in one was basically meant as a read-only access for schema
> lookups by name.  The idea you bring up definitely makes sense but to
> Andrew's point once we're talking about supporting an automated
> complete lifecycle it probably makes sense for the schema registry
> employed to be its own application and one which we can interact with
> at runtime via HTTP.
>
> For the internal built-in registry it is of course easy enough to
> update the repository in a single node case but in a cluster case we'd
> need the processor to send the update back through the REST API to
> ensure all nodes in a NiFi cluster see the change.  Not sure how much
> more involved that would be.
>
> Thanks
>
> On Mon, Oct 23, 2017 at 11:18 AM, Sönke Liebau
> <so...@opencore.com> wrote:
> > Hey Andrew,
> >
> > that helps, yes. Thank you very much. I'll probably stick with putting
> the
> > schema in the flowfile for now then and revisit this issue once the
> service
> > has been up and running for a while. I don't envision the schema to
> change
> > often enough that running a schema registry alongside Nifi is worthwhile.
> >
> > This limitation was a conscious choice then I gather? I was looking
> through
> > the AvroSchemaRegistry Code and considered extending it a bit to allow
> > storing schemas and allow for versioning of schemas, but if that is not
> > something that would be considered useful then I'll stop those musings..
> >
> > Kind regards,
> > Sönke
> >
> > On Mon, Oct 23, 2017 at 4:35 PM, Andrew Grande <ap...@gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >> Using an external schema registry is the way to go. The embedded one is
> >> meant for ease of use, but once you grow beyond an initial phase, the
> best
> >> practice is to have a full service and potentially use standard
> InvomeHTTP
> >> to perform operations beyond just lookups.
> >>
> >> Does it help?
> >> Andrew
> >>
> >>
> >> On Mon, Oct 23, 2017, 6:43 AM Sönke Liebau <so...@opencore.com>
> >> wrote:
> >>>
> >>> Hi everybody,
> >>>
> >>> I am developing a custom ingest processor that writes data out as a
> >>> binary avro stream. Currently I simply store the schema in an
> attribute of
> >>> the flowfile and that works quite nicely with an AvroRecordReader
> >>> deserializing it.
> >>>
> >>> To make things "nicer" I thought I'd have a look at the
> >>> AvroSchemaRegistry and use that to store and look up schemas. However I
> >>> cannot find a way for my processor to register a schema with the
> registry,
> >>> but only to retrieve schemas [1]. I understand that I can manually add
> the
> >>> schema as a dynamic property, but what I want to accomplish is that the
> >>> processor can automatically add evolving schemas to the registry at
> runtime.
> >>>
> >>> Am I missing something obvious here, or is the registry simply not
> >>> supposed to work like that?
> >>>
> >>> Kind regards,
> >>> Sönke
> >>>
> >>> [1]
> >>> https://github.com/apache/nifi/blob/master/nifi-nar-
> bundles/nifi-registry-bundle/nifi-registry-service/src/
> main/java/org/apache/nifi/schemaregistry/services/
> AvroSchemaRegistry.java#L83
> >
> >
> >
> >
> > --
> > Sönke Liebau
> > Partner
> > Tel. +49 179 7940878
> > OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
>



-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany

Re: Dynamically adding avro schema to AvroSchemaRegistry at runtime

Posted by Joe Witt <jo...@gmail.com>.
Sonke

The built-in one was basically meant as a read-only access for schema
lookups by name.  The idea you bring up definitely makes sense but to
Andrew's point once we're talking about supporting an automated
complete lifecycle it probably makes sense for the schema registry
employed to be its own application and one which we can interact with
at runtime via HTTP.

For the internal built-in registry it is of course easy enough to
update the repository in a single node case but in a cluster case we'd
need the processor to send the update back through the REST API to
ensure all nodes in a NiFi cluster see the change.  Not sure how much
more involved that would be.

Thanks

On Mon, Oct 23, 2017 at 11:18 AM, Sönke Liebau
<so...@opencore.com> wrote:
> Hey Andrew,
>
> that helps, yes. Thank you very much. I'll probably stick with putting the
> schema in the flowfile for now then and revisit this issue once the service
> has been up and running for a while. I don't envision the schema to change
> often enough that running a schema registry alongside Nifi is worthwhile.
>
> This limitation was a conscious choice then I gather? I was looking through
> the AvroSchemaRegistry Code and considered extending it a bit to allow
> storing schemas and allow for versioning of schemas, but if that is not
> something that would be considered useful then I'll stop those musings..
>
> Kind regards,
> Sönke
>
> On Mon, Oct 23, 2017 at 4:35 PM, Andrew Grande <ap...@gmail.com> wrote:
>>
>> Hi,
>>
>> Using an external schema registry is the way to go. The embedded one is
>> meant for ease of use, but once you grow beyond an initial phase, the best
>> practice is to have a full service and potentially use standard InvomeHTTP
>> to perform operations beyond just lookups.
>>
>> Does it help?
>> Andrew
>>
>>
>> On Mon, Oct 23, 2017, 6:43 AM Sönke Liebau <so...@opencore.com>
>> wrote:
>>>
>>> Hi everybody,
>>>
>>> I am developing a custom ingest processor that writes data out as a
>>> binary avro stream. Currently I simply store the schema in an attribute of
>>> the flowfile and that works quite nicely with an AvroRecordReader
>>> deserializing it.
>>>
>>> To make things "nicer" I thought I'd have a look at the
>>> AvroSchemaRegistry and use that to store and look up schemas. However I
>>> cannot find a way for my processor to register a schema with the registry,
>>> but only to retrieve schemas [1]. I understand that I can manually add the
>>> schema as a dynamic property, but what I want to accomplish is that the
>>> processor can automatically add evolving schemas to the registry at runtime.
>>>
>>> Am I missing something obvious here, or is the registry simply not
>>> supposed to work like that?
>>>
>>> Kind regards,
>>> Sönke
>>>
>>> [1]
>>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-registry-bundle/nifi-registry-service/src/main/java/org/apache/nifi/schemaregistry/services/AvroSchemaRegistry.java#L83
>
>
>
>
> --
> Sönke Liebau
> Partner
> Tel. +49 179 7940878
> OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany

Re: Dynamically adding avro schema to AvroSchemaRegistry at runtime

Posted by Sönke Liebau <so...@opencore.com>.
Hey Andrew,

that helps, yes. Thank you very much. I'll probably stick with putting the
schema in the flowfile for now then and revisit this issue once the service
has been up and running for a while. I don't envision the schema to change
often enough that running a schema registry alongside Nifi is worthwhile.

This limitation was a conscious choice then I gather? I was looking through
the AvroSchemaRegistry Code and considered extending it a bit to allow
storing schemas and allow for versioning of schemas, but if that is not
something that would be considered useful then I'll stop those musings..

Kind regards,
Sönke

On Mon, Oct 23, 2017 at 4:35 PM, Andrew Grande <ap...@gmail.com> wrote:

> Hi,
>
> Using an external schema registry is the way to go. The embedded one is
> meant for ease of use, but once you grow beyond an initial phase, the best
> practice is to have a full service and potentially use standard InvomeHTTP
> to perform operations beyond just lookups.
>
> Does it help?
> Andrew
>
> On Mon, Oct 23, 2017, 6:43 AM Sönke Liebau <so...@opencore.com>
> wrote:
>
>> Hi everybody,
>>
>> I am developing a custom ingest processor that writes data out as a
>> binary avro stream. Currently I simply store the schema in an attribute of
>> the flowfile and that works quite nicely with an AvroRecordReader
>> deserializing it.
>>
>> To make things "nicer" I thought I'd have a look at the
>> AvroSchemaRegistry and use that to store and look up schemas. However I
>> cannot find a way for my processor to register a schema with the registry,
>> but only to retrieve schemas [1]. I understand that I can manually add the
>> schema as a dynamic property, but what I want to accomplish is that the
>> processor can automatically add evolving schemas to the registry at runtime.
>>
>> Am I missing something obvious here, or is the registry simply not
>> supposed to work like that?
>>
>> Kind regards,
>> Sönke
>>
>> [1] https://github.com/apache/nifi/blob/master/nifi-nar-
>> bundles/nifi-registry-bundle/nifi-registry-service/src/
>> main/java/org/apache/nifi/schemaregistry/services/
>> AvroSchemaRegistry.java#L83
>>
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany

Re: Dynamically adding avro schema to AvroSchemaRegistry at runtime

Posted by Andrew Grande <ap...@gmail.com>.
Hi,

Using an external schema registry is the way to go. The embedded one is
meant for ease of use, but once you grow beyond an initial phase, the best
practice is to have a full service and potentially use standard InvomeHTTP
to perform operations beyond just lookups.

Does it help?
Andrew

On Mon, Oct 23, 2017, 6:43 AM Sönke Liebau <so...@opencore.com>
wrote:

> Hi everybody,
>
> I am developing a custom ingest processor that writes data out as a binary
> avro stream. Currently I simply store the schema in an attribute of the
> flowfile and that works quite nicely with an AvroRecordReader deserializing
> it.
>
> To make things "nicer" I thought I'd have a look at the AvroSchemaRegistry
> and use that to store and look up schemas. However I cannot find a way for
> my processor to register a schema with the registry, but only to retrieve
> schemas [1]. I understand that I can manually add the schema as a dynamic
> property, but what I want to accomplish is that the processor can
> automatically add evolving schemas to the registry at runtime.
>
> Am I missing something obvious here, or is the registry simply not
> supposed to work like that?
>
> Kind regards,
> Sönke
>
> [1]
> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-registry-bundle/nifi-registry-service/src/main/java/org/apache/nifi/schemaregistry/services/AvroSchemaRegistry.java#L83
>