You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Sivaprasanna Sethuraman (JIRA)" <ji...@apache.org> on 2018/08/16 06:25:00 UTC

[jira] [Commented] (NIFI-5524) Allow UpdateRecord to modify the schema where prudent

    [ https://issues.apache.org/jira/browse/NIFI-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582021#comment-16582021 ] 

Sivaprasanna Sethuraman commented on NIFI-5524:
-----------------------------------------------

I think it is a fair proposal. Many a times, when we are working with varying files with varying field structures, we can't provide a single schema that works for all the records without over-engineering. My thought is that, we can add a new property that takes a boolean. This property can be named "Update Schema" and in the description we can write that, "If set to 'true', if new fields which are not provided with the associated Schema in the Schema Registry are found, they will be added to the Schema." Or something better.

> Allow UpdateRecord to modify the schema where prudent
> -----------------------------------------------------
>
>                 Key: NIFI-5524
>                 URL: https://issues.apache.org/jira/browse/NIFI-5524
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Matt Burgess
>            Priority: Major
>
> Currently UpdateRecord requires the user to know the input schema in order to specify any fields to be added, either by adding them as optional/nullable on the input schema, or knowing they will exist in the output schema, and thus adding them there (as an extension of the input schema).
> A common use case is to infer the schema (or otherwise not easily ascertain the input schema) and add field(s). It would be of great benefit if UpdateRecord could also modify the output schema (especially if inherited from the input schema, but of course open for discussion) to include any added fields. This would remove the requirement for users to know the input/output schemas ahead of time; rather the output fields would simply be added to the output schema (if they don't already exist).
> With great power comes great responsibility; clearly if a field is to be added/updated using a RecordPath expression that perhaps evaluates to multiple nodes, the behavior may well be undefined (although hopefully documented). This Jira proposes to cover the most common use cases of adding/updating fields by also adding/updating schema elements where prudent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)