You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nifi.apache.org by "Matt Gilman (JIRA)" <ji...@apache.org> on 2015/12/19 18:06:46 UTC

[jira] [Comment Edited] (NIFI-826) Export templates in a deterministic way

    [ https://issues.apache.org/jira/browse/NIFI-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065432#comment-15065432 ] 

Matt Gilman edited comment on NIFI-826 at 12/19/15 5:06 PM:
------------------------------------------------------------

Sorry, it had been awhile since I've had this on my mind. In order to ensure templates are exported in a deterministic way, we need to do the 3 bullets identified above. Pruning excess details from the templates (the last bullet) is straightforward. That leaves ensuring component IDs are the same and the components are in a consistent order in the template. Since component IDs are generated when they are added to a flow, the component IDs are only present in templates to associate the source and destination of connections.

My basic approach was to sort the components to ensure they were always ordered the same regardless of the NiFi instance. More specifically, I tried to accomplish this without introducing any new concepts into the flow.xml.

The sorting strategy could consider any number of configuration details. First we'd try the name. If the component didn't have a name we could fall back to the position. Unfortunately, this started to break down and had the side effect that you mentioned about moving a component affecting its position in the template. This would hold true for any configuration details.

Your correct that just introducing a new ID (template ID), we would have collisions. Part of that idea that I forgot to mention was also adding a timestamp that represents when the component was first added to a flow. The template ID and timestamp travels with the component in the template and is used when the component imported. If the component being imported doesn't have a template ID or timestamp (templates from earlier versions) or if a there is a template ID collision we would generate a new timestamp.

With this approach, ordering could be based solely on the timestamp. Since the timestamp travels with the component, we can ensure consistent ordering regardless when/where the component was first added to a flow. Any new components, newly added or reintroduced via subsequent copy/paste or import, would get a new current timestamp. Because of this, they would always end up at the end of the listing. With this ordering ensured, we should be able to generate IDs using a simple one up number. 


was (Author: mcgilman):
Sorry, it had been awhile since I've had this on my mind. In order to ensure templates are exported in a deterministic way, we need to do the 3 bullets identified above. Pruning excess details from the templates (the last bullet) is straightforward. That leaves ensuring component IDs are the same and the components are in a consistent order in the template. Since component IDs are generated when they are added to a flow, the component IDs are only present in templates to associate the source and destination of connections.

My basic approach was to sort the components to ensure they were always ordered the same regardless of the NiFi instance. More specifically, I tried to accomplish this without introducing any new concepts into the flow.xml.

The sorting strategy could consider any number of configuration details. First we'd try the name. If the component didn't have a name we could fall back to the position. Unfortunately, this started to break down and had the side effect that you mentioned about moving a component affecting its position in the template. This would hold true for any configuration details.

Your correct that just introducing a new ID (template ID), we would have collisions. Part of that idea that I forgot to mention was also adding a timestamp that represents when the component was first added to a flow. The template ID and timestamp travels with the component in the template and is used when the component imported. If the component being imported doesn't have a template ID or timestamp (templates from earlier versions) or if a there is a template ID collision we would generate a new timestamp.

With this approach, ordering could be based solely on the timestamp. Since the timestamp travels with the component, we can ensure consistent ordering regardless when/where the component was first added to a flow. Any new components, newly added or reintroduced via subsequent copy/paste or import, would get a new current timestamp. Because of this they would always would end up at the end of the listing. With this ordered ensured, we should be able to generate IDs using a simple one up number. 

> Export templates in a deterministic way
> ---------------------------------------
>
>                 Key: NIFI-826
>                 URL: https://issues.apache.org/jira/browse/NIFI-826
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>            Reporter: Matt Gilman
>            Assignee: Matt Gilman
>
> Templates should be exported in a deterministic way so that they can be compared or diff'ed with another. Items to consider...
> - The ordering of components
> - The id's used to identify the components
> - Consider excluding irrelevant items. When components are imported some settings are ignored (run state).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)