You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nifi.apache.org by "Mark Payne (JIRA)" <ji...@apache.org> on 2015/10/14 15:04:05 UTC

[jira] [Commented] (NIFI-942) Create RouteText processor

    [ https://issues.apache.org/jira/browse/NIFI-942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14956867#comment-14956867 ] 

Mark Payne commented on NIFI-942:
---------------------------------

[~JPercivall] I was reviewing this code and was getting a bit confused. As I read through it, we are trying to determine if a FlowFile matches based on all lines matching or any line matching and then a "match" depends on all user-defined properties matching or any user-defined property matching, and it just got really confusing. I tried to reword the documentation on the Processor so that it was more clear, but I couldn't come up with any documentation that I thought was sufficiently clear, either. Generally, I believe this means that the design needs work. Given that I wrote up the initial proposal, and even I can't concisely describe it - users are certainly going to have a tough time with it!

I think the answer is to break this into two Processors: RouteLinesOfText and RouteTextFlowFile. The first one would route each line of text individually. The second would route the entire FlowFile as a single unit. Previously, we were trying to fit both of these concepts into a single Processor that is just too confusing. If we separate it out into two separate processors, I believe the documentation becomes far more clear and concise, and the code will as well. You okay with that?

Thanks
-Mark

> Create RouteText processor
> --------------------------
>
>                 Key: NIFI-942
>                 URL: https://issues.apache.org/jira/browse/NIFI-942
>             Project: Apache NiFi
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Mark Payne
>            Assignee: Joseph Percivall
>             Fix For: 0.4.0
>
>         Attachments: NIFI-942.patch
>
>
> The idea is to route individual lines of a text file to different relationships. This allows for splitting lines based on some criteria or filtering out specific lines, and would be a much more convenient alternative than RouteOnContent for textual data.
> A discussion for this took place on the users mailing list (http://mail-archives.apache.org/mod_mbox/nifi-users/201509.mbox/%3CCAKpk5PxjszdX-NXMMf6Pcet4x7Y5GmrT7_jn9uyzS-h_a9TG3A%40mail.gmail.com%3E)
> The way that I could see this working is to have a few different properties:
> Routing Strategy:
> - Route each line to matching Property Name (default)
> - Route matching lines to 'matched' if all match
> - Route matching lines to 'matched' if any match
> - Route FlowFile to 'matched' if all lines match
> - Route FlowFile to 'matched' if any line matches
> A Match Strategy
> - Starts With
> - Ends With
> - Contains
> - Equals
> - Matches Regular Expression
> - Contains Regular Expression
> And then user-defined properties that indicate what to search each line of text for.
> So to find lines that begin with the < character
> You would simply add a property named "Begins with Less Than" and set the value to : <
> Then set the Match Strategy to Starts With
> And Routing Strategy to "Route each line to matching Property Name"
> Then, any line that begins with < will be routed to the Begins with Less Than relationship.
> This would be a simple way to pull out any particular lines of interest in a text file.
> I can see this being very useful for processing log files, CSV, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)