You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nifi.apache.org by mc...@apache.org on 2015/01/21 16:50:18 UTC

svn commit: r1653568 - /incubator/nifi/site/trunk/content/docs/nifi-docs/overview.html

Author: mcgilman
Date: Wed Jan 21 15:50:18 2015
New Revision: 1653568

URL: http://svn.apache.org/r1653568
Log:
Updating overview to match 0.0.1 release

Modified:
    incubator/nifi/site/trunk/content/docs/nifi-docs/overview.html

Modified: incubator/nifi/site/trunk/content/docs/nifi-docs/overview.html
URL: http://svn.apache.org/viewvc/incubator/nifi/site/trunk/content/docs/nifi-docs/overview.html?rev=1653568&r1=1653567&r2=1653568&view=diff
==============================================================================
--- incubator/nifi/site/trunk/content/docs/nifi-docs/overview.html (original)
+++ incubator/nifi/site/trunk/content/docs/nifi-docs/overview.html Wed Jan 21 15:50:18 2015
@@ -426,16 +426,6 @@ body.book #toc,body.book #preamble,body.
 .show-for-print{display:inherit!important}}
 </style>
 <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.2.0/css/font-awesome.min.css">
-<script>
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
-  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-
-  ga('create', 'UA-57264262-1', 'auto');
-  ga('send', 'pageview');
-
-</script>
 </head>
 <body class="article">
 <div id="header">
@@ -462,9 +452,9 @@ body.book #toc,body.book #preamble,body.
 <div class="sectionbody">
 <div class="paragraph">
 <p>Put simply NiFi was built to automate the flow of data between systems.  While
-the term <em>dataflow</em> is used in a variety of contexts we&#8217;ll use it here
+the term <em>dataflow</em> is used in a variety of contexts, we&#8217;ll use it here
 to mean the automated and managed flow of information between systems.  This
-problem space has been around ever since enterprises had more than one system
+problem space has been around ever since enterprises had more than one system,
 where some of the systems created data and some of the systems consumed data.
 The problems and solution patterns that emerged have been discussed and
 articulated extensively.  A comprehensive and readily consumed form is found in
@@ -493,7 +483,7 @@ the <em>Enterprise Integration Patterns<
 </dd>
 <dt class="hdlist1">Systems evolve at different rates</dt>
 <dd>
-<p>The protocols and formats used by a given system can change anytime and often irrespective of the systems around them.  Dataflow exists to connect what is essentially a massively distributed system of components loosely or not-at-all designed to work together.</p>
+<p>The protocols and formats used by a given system can change anytime and often irrespective of the systems around them.  Dataflow exists to connect what is essentially a massively distributed system of components that are loosely or not-at-all designed to work together.</p>
 </dd>
 <dt class="hdlist1">Compliance and security</dt>
 <dd>
@@ -513,7 +503,7 @@ success of a given enterprise.  These in
 Architecture <a href="#soa">[soa]</a>, the rise of the API <a href="#api">[api]</a><a href="#api2">[api2]</a>, Internet of Things <a href="#iot">[iot]</a>,
 and Big Data <a href="#bigdata">[bigdata]</a>.  In addition, the level of rigor necessary for
 compliance, privacy, and security is constantly on the rise.  Even still with
-all of these new concepts coming about the patterns and needs of dataflow is
+all of these new concepts coming about, the patterns and needs of dataflow are
 still largely the same.  The primary differences then are the scope of
 complexity, the rate of change necessary to adapt, and that at scale
 the edge case becomes common occurrence.  NiFi is built to help tackle these
@@ -546,16 +536,16 @@ the main NiFi concepts and how they map
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">FlowFile</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Information Packet</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">A FlowFile represents the objects moving through the system and for each one NiFi
-keeps track of a Map of key/value pair attribute strings and its associated
+<td class="tableblock halign-left valign-top"><p class="tableblock">A FlowFile represents each object moving through the system and for each one, NiFi
+keeps track of a map of key/value pair attribute strings and its associated
 content of zero or more bytes.</p></td>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">FlowFile Processor</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Black Box</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock">Processors are what actually performs work.  In <a href="#eip">[eip]</a> terms a processor is
-doing some combination of data Routing, Transformation, or mediation between
-systems.  Processors have access to attributes of a given flow file and its
+<td class="tableblock halign-left valign-top"><p class="tableblock">Processors actually perform the work.  In <a href="#eip">[eip]</a> terms a processor is
+doing some combination of data Routing, Transformation, or Mediation between
+systems.  Processors have access to attributes of a given FlowFile and its
 content stream.  Processors can operate on zero or more FlowFiles in a given unit of work
 and either commit that work or rollback.</p></td>
 </tr>
@@ -564,7 +554,7 @@ and either commit that work or rollback.
 <td class="tableblock halign-left valign-top"><p class="tableblock">Bounded Buffer</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Connections provide the actual linkage between processors.  These act as queues
 and allow various processes to interact at differing rates.  These queues then
-can be prioritized dynamically and can have upper bounds on load which enables
+can be prioritized dynamically and can have upper bounds on load which enable
 back pressure.</p></td>
 </tr>
 <tr>
@@ -579,7 +569,7 @@ between processors.</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">Process Group</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">subnet</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">A Process Group is a specific set of processes and their connections which can
-receive data via input ports and which can send data out via output ports.  In
+receive data via input ports and send data out via output ports.  In
 this manner process groups allow creation of entirely new components simply by
 composition of other components.</p></td>
 </tr>
@@ -669,10 +659,10 @@ components of NiFi then living within th
 <p>A NiFi cluster is comprised of one or more <em>NiFi Nodes</em> (Node) controlled
 by a single NiFi Cluster Manager (NCM).  The design of clustering is a simple
 master/slave model where the NCM is the master and the Nodes are the slaves.
-The NCM&#8217;s reason for existence is to keep track of which Nodes are in the flow,
+The NCM&#8217;s reason for existence is to keep track of which Nodes are in the cluster,
 their status, and to replicate requests to modify or observe the
 flow.  Fundamentally then the NCM keeps the state of the cluster consistent.
-While the model is that of master and slave if the master dies the Nodes are all
+While the model is that of master and slave, if the master dies the Nodes are all
 instructed to continue operating as they were to ensure the data flow remains live.
 The absence of the NCM simply means new nodes cannot come on-line and flow changes
 cannot occur until the NCM is restored.</p>
@@ -684,7 +674,7 @@ cannot occur until the NCM is restored.<
 <div class="sectionbody">
 <div class="paragraph">
 <p>NiFi is designed to fully leverage the capabilities of the underlying host system
-its is operating on.  This maximization of resources is particularly strong with
+it is operating on.  This maximization of resources is particularly strong with
 regard to CPU and disk.  Many more details will
 be provided on best practices and configuration tips in the Administration Guide.</p>
 </div>
@@ -696,23 +686,23 @@ be provided on best practices and config
 one can expect to see will vary greatly on how the system is configured.  Given
 that there are pluggable approaches to most of the major NiFi subsystems the
 performance will depend on the implementation.  But, for something concrete and broadly
-applicable lets consider the out of the box default implementations that are used.
+applicable, let&#8217;s consider the out-of-the-box default implementations that are used.
 These are all persistent with guaranteed delivery and do so using local disk.  So
-being conservative assume roughly 50 MB/s read/write rate on modest disks or RAID volumes
-within a typical server.  NiFi for a large class of data flows then should be able to
-efficiently reach one hundred or more MB/s of throughput.  That is because linear growth
-is expected for each physical parition and content repository added to NiFi.  This will
+being conservative, assume roughly 50 MB/s read/write rate on modest disks or RAID volumes
+within a typical server.  NiFi for a large class of dataflows then should be able to
+efficiently reach 100 or more MB/s of throughput.  That is because linear growth
+is expected for each physical partition and content repository added to NiFi.  This will
 bottleneck at some point on the FlowFile repository and provenance repository.
 We plan to provide a benchmarking/performance test template to
 include in the build which will allow users to easily test their system and
 to identify where bottlenecks are and at which point they might become a factor.  It
-should also make it easy for system administrators to make changes and to verity the impact.</p>
+should also make it easy for system administrators to make changes and to verify the impact.</p>
 </dd>
 <dt class="hdlist1">For CPU</dt>
 <dd>
-<p>The FlowController acts as the engine dictating when a given processor will be
+<p>The Flow Controller acts as the engine dictating when a particular processor will be
 given a thread to execute.  Processors should be written to return the thread
-as soon as they&#8217;re done executing their task.  The FlowController can be given a
+as soon as they&#8217;re done executing their task.  The Flow Controller can be given a
 configuration value indicating how many threads there should be for the various
 thread pools it maintains.  The ideal number of threads to use will depend on the
 resources of the host system in terms of numbers of cores, whether that system is
@@ -738,7 +728,7 @@ how well the application will run over t
 <dl>
 <dt class="hdlist1">Guaranteed Delivery</dt>
 <dd>
-<p>A core philosophy of NiFi has been that even at very high scale guaranteed delivery
+<p>A core philosophy of NiFi has been that even at very high scale, guaranteed delivery
 is a must.  This is achieved through effective use of a purpose-built persistent
 write-ahead log and content repository.  Together they are designed in such a way
 as to allow for very high transaction rates, effective load-spreading, copy-on-write,
@@ -753,13 +743,13 @@ as it reaches a specified age (its value
 <dt class="hdlist1">Prioritized Queuing</dt>
 <dd>
 <p>NiFi allows the setting of one or more prioritization schemes for how data is
-retrieved from a queue.  The default is oldest first but there are times when
+retrieved from a queue.  The default is oldest first, but there are times when
 data should be pulled newest first, largest first, or some other custom scheme.</p>
 </dd>
-<dt class="hdlist1">Flow Specific QoS (latency v throughput, loss tolerance, etc..)</dt>
+<dt class="hdlist1">Flow Specific QoS (latency v throughput, loss tolerance, etc.)</dt>
 <dd>
 <p>There are points of a dataflow where the data is absolutely critical and it is
-loss intolerant.  There are times when it must be processed and delivered within
+loss intolerant.  There are also times when it must be processed and delivered within
 seconds to be of any value.  NiFi enables the fine-grained flow specific configuration
 of these concerns.</p>
 </dd>
@@ -775,23 +765,23 @@ troubleshooting, optimization, and other
 <p>NiFi&#8217;s content repository is designed to act as a rolling buffer of history.  Data
 is removed only as it ages off the content repository or as space is needed.  This
 combined with the data provenance capability makes for an incredibly useful basis
-to enable click-to-content, download of content, replay, and all at a specific
-point in and objects lifecycle which can even span generations.</p>
+to enable click-to-content, download of content, and replay, all at a specific
+point in an object&#8217;s lifecycle which can even span generations.</p>
 </dd>
 <dt class="hdlist1">Visual Command and Control</dt>
 <dd>
 <p>Dataflows can become quite complex.  Being able to visualize those flows and express
-them visually can help greatly to reduce that complexity and to identify areas which
+them visually can help greatly to reduce that complexity and to identify areas that
 need to be simplified.  NiFi enables not only the visual establishment of dataflows but
 it does so in real-time.  Rather than being <em>design and deploy</em> it is much more like
-molding clay.  If you make a change to the dataflow that change is taking effect.  Changes
+molding clay.  If you make a change to the dataflow that change immediately takes effect.  Changes
 are fine-grained and isolated to the affected components.  You don&#8217;t need to stop an entire
 flow or set of flows just to make some specific modification.</p>
 </dd>
 <dt class="hdlist1">Flow Templates</dt>
 <dd>
 <p>Dataflows tend to be highly pattern oriented and while there are often many different
-ways to solve a problem it helps greatly to be able to share those best practices.  Templates
+ways to solve a problem, it helps greatly to be able to share those best practices.  Templates
 allow subject matter experts to build and publish their flow designs and for others to benefit
 and collaborate on them.</p>
 </dd>
@@ -809,8 +799,8 @@ either side of the sender/recipient equa
 <dt class="hdlist1">User to system</dt>
 <dd>
 <p>NiFi enables 2-Way SSL authentication and provides pluggable authorization so that it can properly control
-a users access and at particular levels (read-only, dataflow manager, admin).  If a user enters a
-sensitive property like a password into the flow it is immediately encrypted server side and never again exposed
+a user&#8217;s access and at particular levels (read-only, dataflow manager, admin).  If a user enters a
+sensitive property like a password into the flow, it is immediately encrypted server side and never again exposed
 on the client side even in its encrypted form.</p>
 </dd>
 </dl>
@@ -830,7 +820,7 @@ on the client side even in its encrypted
 <p>For any component based system one problem that can quickly occur is dependency nightmares.  NiFi addresses this by providing a custom class loader model
 ensuring that each extension bundle is exposed to a very limited set of dependencies.  As a result extensions can be built with little concern for whether
 they might conflict with another extension.  The concept of these extension bundles is called <em>NiFi Archives</em> and will be discussed in greater detail
-in the developers guide.</p>
+in the developer&#8217;s guide.</p>
 </dd>
 </dl>
 </div>
@@ -839,15 +829,17 @@ in the developers guide.</p>
 <dd>
 <p>NiFi is designed to scale-out through the use of clustering many nodes together as described above.  If a single node is provisioned and configured
 to handle hundreds of MB/s then a modest cluster could be configured to handle GB/s.  This then brings about interesting challenges of load balancing
-and fail-over between NiFi and the systems from which it gets data.  Use of asynchronous queuing based protocols like messaging services, Kafka, etc.. can
-help.  Use of NiFi&#8217;s <em>site-to-site</em> feature is also very effective as it is a protocol that allows NiFi and a client (could be another NiFi cluster) to talk to eachother, share information
+and fail-over between NiFi and the systems from which it gets data.  Use of asynchronous queuing based protocols like messaging services, Kafka, etc., can
+help.  Use of NiFi&#8217;s <em>site-to-site</em> feature is also very effective as it is a protocol that allows NiFi and a client (could be another NiFi cluster) to talk to each other, share information
 about loading, and to exchange data on specific authorized ports.</p>
 </dd>
 </dl>
 </div>
 </div>
 </div>
-<h1 id="references" class="sect0"><a class="anchor" href="#references"></a>References</h1>
+<div class="sect1">
+<h2 id="references"><a class="anchor" href="#references"></a>References</h2>
+<div class="sectionbody">
 <div class="ulist bibliography">
 <ul class="bibliography">
 <li>
@@ -877,10 +869,12 @@ about loading, and to exchange data on s
 </ul>
 </div>
 </div>
+</div>
+</div>
 <div id="footer">
 <div id="footer-text">
-Last updated 2014-12-31 12:06:24 EST
+Last updated 2015-01-21 10:20:49 EST
 </div>
 </div>
 </body>
-</html>
+</html>
\ No newline at end of file