You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nifi.apache.org by "Mark Payne (JIRA)" <ji...@apache.org> on 2015/02/09 14:47:34 UTC

[jira] [Comment Edited] (NIFI-101) Create experimental processors to aid in learning

    [ https://issues.apache.org/jira/browse/NIFI-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303330#comment-14303330 ] 

Mark Payne edited comment on NIFI-101 at 2/9/15 1:47 PM:
---------------------------------------------------------

I would first recommend that we develop a "Hello World" Processor. Sticking with the design that is typically used in streaming frameworks and even in the Hadoop World, I'd recommend a CountWords Processor. Suggested design:

Two Relationships: "Many Words", "Few Words"
Single Property: "Count Threshold" -> Positive integer, that indicates how many words must be contained in the content of a FlowFile in order to be routed to "Many Words"

Processor would count how many words there are in an input FlowFile and route the FlowFile to either "Many Words" or "Few Words," depending on the number of words in the content of the FlowFile.

Adds a "word.count" attribute.
Emits a Provenance ROUTE event, indicating which Relationship the FlowFile was sent to.

Because we generally don't want to buffer the content in memory, would recommend we use a BufferedInputStream and read one byte at a time. If the byte represents white space and the previous byte did not, increment the counter.


was (Author: markap14):
I would recommend that we first developer a "Hello World" Processor. Sticking with the design that is typically used in streaming frameworks and even in the Hadoop World, I'd recommend a CountWords Processor. Suggested design:

Two Relationships: "Many Words", "Few Words"
Single Property: "Count Threshold" -> Positive integer, that indicates how many words must be contained in the content of a FlowFile in order to be routed to "Many Words"

Processor would count how many words there are in an input FlowFile and route the FlowFile to either "Many Words" or "Few Words," depending on the number of words in the content of the FlowFile.

Adds a "word.count" attribute.
Emits a Provenance ROUTE event, indicating which Relationship the FlowFile was sent to.

Because we generally don't want to buffer the content in memory, would recommend we use a BufferedInputStream and read one byte at a time. If the byte represents white space and the previous byte did not, increment the counter.

> Create experimental processors to aid in learning
> -------------------------------------------------
>
>                 Key: NIFI-101
>                 URL: https://issues.apache.org/jira/browse/NIFI-101
>             Project: Apache NiFi
>          Issue Type: Task
>          Components: Examples
>            Reporter: Joseph Witt
>            Priority: Trivial
>
> It will be ideal to provide good 'use-anywhere' processors to allow users to easily setup dataflows anywhere on madeup data both for learning how to command and control nifi as well as design/implementation patterns for building processors.
> Two potentially good examples here are:
> - Processor(s) to interact with IRC (sending/receiving)
> One approach: A single processor which both sends and receives data from a given IRC channel.  The user can configure the IRC host, username, password, channel, etc...  The connection then is held open and the processor will produce an output flow file for every message received in the IRC channel which will have as attributes the message content, sender, time, etc..  That same processor can also read flow files from its queue which contain message text in an attribute.  In this manner the processor can support bidirectional interaction with IRC.
> Would then also be interesting to make it really easy for a user to generate a message via the UI as well as easily to consume a message via the UI.  These could be very generic processors/widgets for creation/consumption and good for these sorts of cases.
> There are active IRC channels which are great for demonstration of relatively active datastreams.  Weather updates, Wikipedia updates, etc...
> - Processor(s) to interact with RSS/Atom
> This would allow subscription to specific feeds of interest to generate test/variable data.  RSS/Atom requires keeping some state which is also an interesting problem to think through.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)