You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nifi.apache.org by pv...@apache.org on 2021/10/22 07:50:49 UTC

[nifi] branch main updated: NIFI-9319 Make edits and corrections to latest additions to User Guide

This is an automated email from the ASF dual-hosted git repository.

pvillard pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/nifi.git


The following commit(s) were added to refs/heads/main by this push:
     new 77c6f0a  NIFI-9319 Make edits and corrections to latest additions to User Guide
77c6f0a is described below

commit 77c6f0a819d2b52d986042ab7dd5bed6ca500ae5
Author: Andrew Lim <an...@gmail.com>
AuthorDate: Thu Oct 21 12:37:39 2021 -0400

    NIFI-9319 Make edits and corrections to latest additions to User Guide
    
    Signed-off-by: Pierre Villard <pi...@gmail.com>
    
    This closes #5474.
---
 .../asciidoc/images/configure-process-group.png    | Bin 65302 -> 0 bytes
 .../images/process-group-configuration-window.png  | Bin 102300 -> 118585 bytes
 nifi-docs/src/main/asciidoc/user-guide.adoc        |  52 ++++++++++-----------
 3 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/nifi-docs/src/main/asciidoc/images/configure-process-group.png b/nifi-docs/src/main/asciidoc/images/configure-process-group.png
deleted file mode 100644
index 2b1076d..0000000
Binary files a/nifi-docs/src/main/asciidoc/images/configure-process-group.png and /dev/null differ
diff --git a/nifi-docs/src/main/asciidoc/images/process-group-configuration-window.png b/nifi-docs/src/main/asciidoc/images/process-group-configuration-window.png
index 8921129..58b9dd6 100644
Binary files a/nifi-docs/src/main/asciidoc/images/process-group-configuration-window.png and b/nifi-docs/src/main/asciidoc/images/process-group-configuration-window.png differ
diff --git a/nifi-docs/src/main/asciidoc/user-guide.adoc b/nifi-docs/src/main/asciidoc/user-guide.adoc
index b8ff7ca..7583759 100644
--- a/nifi-docs/src/main/asciidoc/user-guide.adoc
+++ b/nifi-docs/src/main/asciidoc/user-guide.adoc
@@ -210,8 +210,8 @@ The available component-level access policies are:
 |view the component   |Allows users to view component configuration details
 |modify the component  |Allows users to modify component configuration details
 |view provenance   |Allows users to view provenance events generated by this component
-|view the data     |Allows users to view metadata and content for this component in flowfile queues in outbound connections and through provenance events
-|modify the data   |Allows users to empty flowfile queues in outbound connections and submit replays through provenance events
+|view the data     |Allows users to view metadata and content for this component in FlowFile queues in outbound connections and through provenance events
+|modify the data   |Allows users to empty FlowFile queues in outbound connections and submit replays through provenance events
 |view the policies |Allows users to view the list of users who can view and modify a component
 |modify the policies  |Allows users to modify the list of users who can view and modify a component
 |retrieve data via site-to-site  |Allows a port to receive data from NiFi instances
@@ -301,7 +301,7 @@ While the options available from the context menu vary, the following options ar
 NOTE: For Processors, Ports, Remote Process Groups, Connections and Labels, it is possible to open the configuration dialog by double-clicking on the desired component.
 
 - *Start* or *Stop*: This option allows the user to start or stop a Processor; the option will be either Start or Stop, depending on the current state of the Processor.
-- *Run Once*: This option allows the user to run a selected Processor exactly once. If the Processor is prevented from executing (e.g. there are no incoming FlowFiles or the outgoing connection has back pressure applied) the Processor won't get triggered. *Execution* settings apply - i.e. *Primary Node* and *All Nodes* setting will result in running the Processor only once on the Primary Node or one time on each of the nodes, respectively. Works only with *Timer Driven* and *CRON driven* [...]
+- *Run Once*: This option allows the user to run a selected Processor exactly once. If the Processor is prevented from executing (e.g., there are no incoming FlowFiles or the outgoing connection has back pressure applied) the Processor won't get triggered. *Execution* settings apply (i.e., *Primary Node* and *All Nodes* settings will result in running the Processor only once on the Primary Node or one time on each of the nodes, respectively). Works only with *Timer driven* and *CRON driv [...]
 - *Enable* or *Disable*: This option allows the user to enable or disable a Processor; the option will be either Enable or Disable, depending on the current state of the Processor.
 - *View data provenance*: This option displays the NiFi Data Provenance table, with information about data provenance events for the FlowFiles routed through that Processor (see <<data_provenance>>).
 - *View status history*: This option opens a graphical representation of the Processor's statistical information over time.
@@ -653,7 +653,7 @@ The 'Run Schedule' dictates how often the Processor should be scheduled to run.
 Scheduling Strategy (see above). If using the Event driven Scheduling Strategy, this field is not available. When using the Timer driven
 Scheduling Strategy, this value is a time duration specified by a number followed by a time unit. For example, `1 second` or `5 mins`.
 The default value of `0 sec` means that the Processor should run as often as possible as long as it has data to process. This is true
-for any time duration of 0, regardless of the time unit (i.e., `0 sec`, `0 mins`, `0 days`). For an explanation of values that are
+for any time duration of 0, regardless of the time unit (e.g., `0 sec`, `0 mins`, `0 days`). For an explanation of values that are
 applicable for the CRON driven Scheduling Strategy, see the description of the CRON driven Scheduling Strategy itself.
 
 ===== Execution
@@ -731,7 +731,7 @@ You can access additional documentation about each Processor's usage by right-cl
 === Configuring a Process Group
 To configure a Process Group, right-click on the Process Group and select the `Configure` option from the context menu. The configuration dialog is opened with two tabs: General and Controller Services.
 
-image::configure-process-group.png["Configure Process Group"]
+image::process-group-configuration-window.png["Configure Process Group"]
 
 
 [[General_tab_ProcessGroup]]
@@ -740,7 +740,7 @@ This tab contains several different configuration items. First is the Process Gr
 
 The next configuration element is the Process Group Parameter Context, which is used to provide parameters to components of the flow. From this drop-down, the user is able to choose which Parameter Context should be bound to this Process Group and can optionally create a new one to bind to the Process Group. For more information refer to <<Parameters>> and <<parameter-contexts,Parameter Contexts>>.
 
-The third element in the configuration dialog is the Process Group Comments. This provides a mechanism for providing any useful information or context about the Process Group.
+The third element in the configuration dialog is the Process Group Comments. This provides a mechanism to add any useful information about the Process Group.
 
 The next two elements, Process Group FlowFile Concurrency and Process Group Outbound Policy, are covered in the following sections.
 
@@ -784,14 +784,14 @@ data that arrives at an Output Port is immediately transferred out of the Proces
 When the Outbound Policy is configured to "Batch Output", the Output Ports will not transfer data out of the Process Group until
 all data that is in the Process Group is queued up at an Output Port (i.e., no data leaves the Process Group until all of the data has finished processing).
 It doesn't matter whether the data is all queued up for the same Output Port, or if some data is queued up for Output Port A while other data is queued up
-for Output Port B. These conditions are both considered the same in terms of the completion of the FlowFile Processing.
+for Output Port B. These conditions are both considered the same in terms of the completion of the FlowFile processing.
 
 Using an Outbound Policy of "Batch Output" along with a FlowFile Concurrency of "Single FlowFile Per Node" allows a user to easily ingest a single FlowFile
 (which in and of itself may represent a batch of data) and then wait until all processing of that FlowFile has completed before continuing on to the next step
 in the dataflow (i.e., the next component outside of the Process Group). Additionally, when using this mode, each FlowFile that is transferred out of the Process Group
 will be given a series of attributes named "batch.output.<Port Name>" for each Output Port in the Process Group. The value will be equal to the number of FlowFiles
-that were routed to that Output Port for this batch of data. For example, consider a case where a single FlowFile is split into 5 FlowFiles, and two FlowFiles go to Output Port A, one goes
-to Output Port B, and two go to Output Port C, and no FlowFiles go to Output Port D. In this case, each FlowFile will have attributes `batch.output.A = 2`,
+that were routed to that Output Port for this batch of data. For example, consider a case where a single FlowFile is split into 5 FlowFiles: two FlowFiles go to Output Port A, one goes
+to Output Port B, two go to Output Port C, and no FlowFiles go to Output Port D. In this case, each FlowFile will have attributes `batch.output.A = 2`,
 `batch.output.B = 1`, `batch.output.C = 2`, `batch.output.D = 0`.
 
 The Outbound Policy of "Batch Output" doesn't provide any benefits when used in conjunction with a FlowFile Concurrency of "Unbounded".
@@ -801,7 +801,7 @@ As a result, the Outbound Policy is ignored if the FlowFile Concurrency is set t
 [[Connecting_Batch_Oriented_Groups]]
 ===== Connecting Batch-Oriented Process Groups
 
-A common use case in NiFi is to perform some batch-oriented process and only after that process completes perform another process on that same batch of data.
+A common use case in NiFi is to perform some batch-oriented process and only after that process completes, perform another process on that same batch of data.
 
 NiFi makes this possible by encapsulating each of these processes in its own Process Group. The Outbound Policy of the first Process Group should be configured as "Batch Output"
 while the FlowFile Concurrency should be either "Single FlowFile Per Node" or "Single Batch Per Node". With this configuration, the first Process Group
@@ -809,7 +809,7 @@ will process an entire batch of data (which will either be a single FlowFile or
 When processing has completed for that batch of data, the data will be held until all FlowFiles are finished processing and ready to leave the Process Group. At that point, the data can be transferred out of the Process Group as a batch. This configuration - when a Process Group is configured with an Outbound Policy of "Batch Output"
 and an Output Port is connected directly to the Input Port of a Process Group with a FlowFile Concurrency of "Single Batch Per Node" - is treated as a slightly special case.
 The receiving Process Group will ingest data not only until its input queues are empty but until they are empty AND the source Process Group has transferred all of the data from that
-batch out of the Process Group. This allows a collection of FlowFiles to be transferred as a single batch of data between Process Groups - even if those FlowFiles
+batch out of the Process Group. This allows a collection of FlowFiles to be transferred as a single batch of data between Process Groups, even if those FlowFiles
 are spread across multiple ports.
 
 
@@ -837,10 +837,10 @@ See <<Backpressure>> for more information.
 ===== Default Settings for Connections
 The final three elements in the Process Group configuration dialog are for Default FlowFile Expiration, Default Back Pressure Object Threshold, and
 Default Back Pressure Data Size Threshold. These settings configure the default values when creating a new Connection. Each Connection represents a queue,
-and every queue has settings for flowfile expiration, back pressure object count, and back pressure data size. The settings specified here will effect the
-default values for all new Connections created within the Process Group; it will not effect existing Connections. Child Process Groups created within the
-configured Process Group will inherit the default settings. Again, existing Process Groups will not be effected. If not overridden with these options, the
-root Process Group obtains its default back pressure settings from nifi.properties, and has a default FlowFile expiration of "0 sec", i.e. do not expire.
+and every queue has settings for FlowFile expiration, back pressure object count, and back pressure data size. The settings specified here will affect the
+default values for all new Connections created within the Process Group; it will not affect existing Connections. Child Process Groups created within the
+configured Process Group will inherit the default settings. Again, existing Process Groups will not be affected. If not overridden with these options, the
+root Process Group obtains its default back pressure settings from `nifi.properties`, and has a default FlowFile expiration of "0 sec" (i.e., do not expire).
 
 NOTE: Setting the Default FlowFile Expiration to a non-zero value may lead to data loss due to a FlowFile expiring as its time limit is reached.
 
@@ -918,7 +918,7 @@ The Referencing Components section now lists an aggregation of all the component
 ==== Parameters and Expression Language
 
 When adding a Parameter that makes use of the Expression Language, it is important to understand the context in which the Expression Language will be evaluated. The expression is always evaluated
-in the context of the Process or Controller Service that references the Parameter. Take, for example, a scenario where Parameter with the name `Time` is added with a value of `${now()}`. The
+in the context of the Processor or Controller Service that references the Parameter. Take, for example, a scenario where a Parameter with the name `Time` is added with a value of `${now()}`. The
 Expression Language results in a call to determine the system time when it is evaluated. When added as a Parameter, the system time is not evaluated when the Parameter is added, but rather when a
 Processor or Controller Service evaluates the Expression. That is, if a Processor has a Property whose value is set to `#{Time}` it will function in exactly the same manner as if the Property's
 value were set to `${now()}`. Each time that the property is referenced, it will produce a different timestamp.
@@ -1138,7 +1138,7 @@ image::variable-putfile-property.png["Processor Property Using Variable"]
 
 ===== Variable Scope
 
-Variables are scoped by the Process Group they are defined in and are available to any Processor defined at that level and below (i.e. any descendant Processors).
+Variables are scoped by the Process Group they are defined in and are available to any Processor defined at that level and below (i.e., any descendant Processors).
 
 Variables in a descendant group override the value in a parent group.  More specifically, if a variable `x` is declared at the root group and also declared inside a process group, components inside the process group will use the value of `x` defined in the process group.
 
@@ -1456,7 +1456,7 @@ The following prioritizers are available:
 ** Note that an UpdateAttribute processor should be used to add the "priority" attribute to the FlowFiles before they reach a connection that has this prioritizer set.
 ** If only one has that attribute it will go first.
 ** Values for the "priority" attribute can be alphanumeric, where "a" will come before "z" and "1" before "9"
-** If "priority" attribute cannot be parsed as a long, unicode string ordering will be used. For example: "99" and "100" will be ordered so the flowfile with "99" comes first, but "A-99" and "A-100" will sort so the flowfile with "A-100" comes first.
+** If "priority" attribute cannot be parsed as a long, unicode string ordering will be used. For example: "99" and "100" will be ordered so the FlowFile with "99" comes first, but "A-99" and "A-100" will sort so the FlowFile with "A-100" comes first.
 
 NOTE: With a <<load_balance_strategy>> configured, the connection has a queue per node in addition to the local queue. The prioritizer will sort the data in each queue independently.
 
@@ -1694,17 +1694,17 @@ be performed. The number of active tasks is shown in the top-right corner of the
 for more information). See <<terminating_tasks>> for how to terminate the running tasks.
 
 [[terminating_tasks]]
-=== Terminating a Component's tasks
+=== Terminating a Component's Tasks
 
 When a component is stopped, it does not interrupt the currently running tasks. This allows for the current execution to complete while no new
-tasks are scheduled, which is the desired behaviour in many cases. In some cases, it is desirable to terminate the running tasks, particularly
+tasks are scheduled, which is the desired behavior in many cases. In some cases, it is desirable to terminate the running tasks, particularly
 in cases where a task has hung and is no longer responsive, or while developing new flows.
 
 To be able to terminate the running task(s), the component must first be stopped (see <<stopping_components>>). Once the component is in the
-Stopped state, the Terminate option will become available only if there are tasks still running (See <<processor_anatomy>>). The Terminate option
-(image:iconTerminate.png["Terminate"]) can be accessed either via the context menu or the Operations Palette while the component is selected.
+Stopped state, the Terminate option will become available only if there are tasks still running (see <<processor_anatomy>>). The Terminate option
+(image:iconTerminate.png["Terminate"]) can be accessed via the context menu or the Operate Palette while the component is selected.
 
-The number of tasks that are actively being terminated will be displayed in parentheses next to the number of active tasks e.g. image:terminated-thread.png["Terminated-Threads"]. For example, if there is one active task at the time that Terminate is selected, this will display "0 (1)" - meaning
+The number of tasks that are actively being terminated will be displayed in parentheses next to the number of active tasks (image:terminated-thread.png["Terminated-Threads"]). For example, if there is one active task at the time that Terminate is selected, this will display "0 (1)" - meaning
 0 active tasks and 1 task being terminated.
 
 A task may not terminate immediately, as different components may respond to the Terminate command differently. However, the components can be
@@ -2160,7 +2160,7 @@ The FlowFiles enqueued in a Connection can be viewed when necessary. The Queue l
 a Connection's context menu. The listing will return the top 100 FlowFiles in the active queue according to the
 configured priority. The listing can be performed even if the source and destination are actively running.
 
-Additionally, details for a Flowfile in the listing can be viewed by clicking the "Details" button (image:iconDetails.png["Details"]) in the left most column. From here, the FlowFile details and attributes are available as well as buttons for
+Additionally, details for a FlowFile in the listing can be viewed by clicking the "Details" button (image:iconDetails.png["Details"]) in the left most column. From here, the FlowFile details and attributes are available as well as buttons for
 downloading or viewing the content. Viewing the content is only available if the `nifi.content.viewer.url` has been configured.
 If the source or destination of the Connection are actively running, there is a chance that the desired FlowFile will
 no longer be in the active queue.
@@ -2761,7 +2761,7 @@ The provenance event types are:
 |FORK                    |Indicates that one or more FlowFiles were derived from a parent FlowFile
 |JOIN                    |Indicates that a single FlowFile is derived from joining together multiple parent FlowFiles
 |RECEIVE                 |Indicates a provenance event for receiving data from an external process
-|REMOTE_INVOCATION       |Indicates that a remote invocation was requested to an external endpoint (e.g. deleting a remote resource)
+|REMOTE_INVOCATION       |Indicates that a remote invocation was requested to an external endpoint (e.g., deleting a remote resource)
 |REPLAY                  |Indicates a provenance event for replaying a FlowFile
 |ROUTE                   |Indicates that a FlowFile was routed to a specified relationship and provides information about why the FlowFile was routed to this relationship
 |SEND                    |Indicates a provenance event for sending data to an external process
@@ -2868,7 +2868,7 @@ java.arg.13=-XX:+UseG1GC
 Many of the same system properties are supported by both the Persistent and Write Ahead configurations, however the default values have been chosen for a Persistent Provenance configuration. The following exceptions and recommendations should be noted when changing to a Write Ahead configuration:
 
 * `nifi.provenance.repository.journal.count` is not relevant to a Write Ahead configuration
-* `nifi.provenance.repository.concurrent.merge.threads` and `nifi.provenance.repository.warm.cache.frequency` are new properties.  The default values of `2` for threads and blank for frequency (i.e. disabled) should remain for most installations.
+* `nifi.provenance.repository.concurrent.merge.threads` and `nifi.provenance.repository.warm.cache.frequency` are new properties.  The default values of `2` for threads and blank for frequency (i.e., disabled) should remain for most installations.
 * Change the settings for `nifi.provenance.repository.max.storage.time` (default value of `24 hours`) and `nifi.provenance.repository.max.storage.size` (default value of `1 GB`) to values more suitable for your production environment
 * Change `nifi.provenance.repository.index.shard.size` from the default value of `500 MB` to `4 GB`
 * Change `nifi.provenance.repository.index.threads` from the default value of `2` to either `4` or `8` as the Write Ahead repository enables this to scale better