You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Mr TheSegfault (JIRA)" <ji...@apache.org> on 2019/06/11 13:17:00 UTC

[jira] [Comment Edited] (MINIFICPP-550) Create RocksDB Controller Service

    [ https://issues.apache.org/jira/browse/MINIFICPP-550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861025#comment-16861025 ] 

Mr TheSegfault edited comment on MINIFICPP-550 at 6/11/19 1:16 PM:
-------------------------------------------------------------------

Thanks [~bakaid] for that thorough comment.

 

Controller services can be shared with any component. The impetus for some of my comments was that these aren't necessarily component states, but shared states. While we could have per processor states, we may also facilitate inter-processor communication or controller service. Variable registry is a trivial example of this. Another example may be storing a list of blocked IPs that arise from one processor but are shared with subsequent processors in the graph.

The paradigm from NiFi isn't one that I'm against. The "singleton cache" was meant to reflect a similar idea, but one in which we don't need to augment ProcessContext, since that coupling didn't reach all aspects of what I was hoping we could achieve with having reporting tasks and other controller services update the stored state. If I recall the history correctly, I believe StateManager originated with this idea, but the coupling of StateManager to ProcessContext was put in place since no other sensible method of retrieval existed. This is where the singleton idea originated.

Controller services complete both aspects as we can have "known controller service names." For example, we can have a property in the minifi.properties file that defines a controller service name that specify a state manager. This can be used by virtually any component. Other benefits of this include being able to easily turn this on/off via command and control (by updating the flow). It also allows us to more easily change the type. The negative here is that the the config yaml file defines the state manager implementation. I rationalized this via the concept that the config yaml file already defines not only the graph but also the state of the agent. Retrieval can be made via process context on getting the provider, and we can still have an option to store for only the processor or shared amongst many.  Using linked services it would also be possible to intersect results from a volatile and a persistent repo. While RocksDB can be configured so that the WAL is off or highly delayed, this may allow us to do interesting things if using controller services. Finally, there is also precedent that we can inject ( maybe through properties ) a default controller service if non is specified. This will allow us to have it configurable via command while still having one defined if not specified in the update.

 

There are a lot of ideas, excited to see what you come up with. My comment about configuring RocksDB to be mostly in mem doesn't ignore the fact that, yes, there may certainly be portions of state that aren't ever going to be persisted. In these cases those extension points could/should allow for different state manager types, if we can use that same nomenclature. I don't have an overly strong preference in any of these decision points, except that I've always felt ProcessContext coupling with StateManager never felt like the right answer since state can often be more than local, and how does one resolve that coupling if you want state that is processor local and processor shareable? Perhaps even that could be made more graceful. Ultimately the process context is defined as a 'bridge between the processor and the nifi-framework' [1]. I would take that a step further and say that it's scope is limited to the the lifetime of the thread – which in my opinion has implications to how we handle state...but that is just my opinion. If you ultimately feel ProcessContext is the best place, I think that's a reasonable move forward. 

 

Hopefully that all made sense. Thanks

[1] [https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/processor/ProcessContext.java#L30]


was (Author: phrocker):
Thanks [~bakaid] for that thorough comment.

 

Controller services can be shared with any component. The impetus for some of my comments was that these aren't necessarily component states, but shared states. While we could have per processor states, we may also facilitate inter-processor communication or controller service. Variable registry is a trivial example of this. Another example may be storing a list of blocked IPs that arise from one processor but are shared with subsequent processors in the graph.

The paradigm from NiFi isn't one that I'm against. The "singleton cache" was meant to reflect a similar idea, but one in which we don't need to augment ProcessContext, since that coupling didn't reach all aspects of what I was hoping we could achieve with having reporting tasks and other controller services update the stored state. If I recall the history correctly, I believe StateManager originated with this idea, but the coupling of StateManager to ProcessContext was put in place since no other sensible method of retrieval existed. This is where the singleton idea originated.


Controller services complete both aspects as we can have "known controller service names." For example, we can have a property in the minifi.properties file that defines a controller service name that specify a state manager. This can be used by virtually any component. Other benefits of this include being able to easily turn this on/off via command and control (by updating the flow). It also allows us to more easily change the type. The negative here is that the the config yaml file defines the state manager implementation. I rationalized this via the concept that the config yaml file already defines not only the graph but also the state of the agent. Retrieval can be made via process context on getting the provider, and we can still have an option to store for only the processor or shared amongst many.  Using linked services it would also be possible to intersect results from a volatile and a persistent repo. While RocksDB can be configured so that the WAL is off or highly delayed, this may allow us to do interesting things if using controller services. Finally, there is also precedent that we can inject ( maybe through properties ) a default controller service if non is specified. This will allow us to have it configurable via command while still having one defined if not specified in the update.

 

There are a lot of ideas, excited to see what you come up with. My comment about configuring RocksDB to be mostly in mem doesn't ignore the fact that, yes, there may certainly be portions of state that aren't ever going to be persisted. In these cases those extension points could/should allow for different state manager types, if we can use that same nomenclature. I don't have an overly strong preference in any of these decision points, except that I've always felt ProcessContext coupling with StateManager never felt like the right answer since state can often be more than local, and how does one resolve that coupling if you want state that is processor local and processor shareable? Perhaps even that could be made more graceful. Ultimately the process context is defined as a 'bridge between the processor and the nifi-framework' [1]. I would take that a step further and say that it's scope is limited to the the lifetime of the thread – which in my opinion has implications to how we handle state. 



Hopefully that all made sense. Thanks

[1] https://github.com/apache/nifi/blob/master/nifi-api/src/main/java/org/apache/nifi/processor/ProcessContext.java#L30

> Create RocksDB Controller Service
> ---------------------------------
>
>                 Key: MINIFICPP-550
>                 URL: https://issues.apache.org/jira/browse/MINIFICPP-550
>             Project: Apache NiFi MiNiFi C++
>          Issue Type: Bug
>            Reporter: Mr TheSegfault
>            Assignee: Daniel Bakai
>            Priority: Major
>             Fix For: 0.7.0
>
>
> A RocksDB Controller service will give us the ability to store arbitrary information into controller services that can later be sent via SiteToSite. This will support many of my monitoring and test use cases. Using RocksDB as  a key/value store we can serialize and send this information periodically



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)