You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Jeroen Jacobs <je...@headincloud.be> on 2016/02/01 01:49:10 UTC

Struggling with ExtractText

Hi,


I'm looking at the documentation for the ReplaceText processor, and I'm at a total loss on how I should get started. I have a flowfile with the following content:


/opsbot url logstash


I want to end up with the following attributes:


message.destination: opsbot

message.action: url

message.parameters: logstash


(an action can have multiple parameters. If that's the case, they should all end up in message.parameters).


How do I get started on this? Do I need to create 3 different attributes, each with their own regex? This doesn't seem efficient. Or can I do this in one go? If so, can someone explain me how, and what the correct regex should be?


A few examples on the documentation page could make this a lot more understandable to people who are new to NiFi.



Kind regards,


Jeroen





Re: Struggling with ExtractText

Posted by Matthew Burgess <ma...@gmail.com>.
Jeroen,

You can achieve this with either (as you mentioned) 3 different attributes (each with their own regex), or you can have a single regex and use the grouping/index that ExtractText provides.

For the latter, add a property called “line” (for example) that uses something like the following regex:

/(\S+) (\S+) (\S+)

This splits the line into the three attributes you mention, except the attribute names are based on the group index so:

line.1 = opsbot
line.2 = url
line.3 = logstash 

If you need the attributes named as you mention, you can add an UpdateAttribute processor that loads the values above into the attributes you wish:

message.destination = ${line.1}
message.action = ${line.2}
message.parameters = ${line.3}

The regex and such would change if parameters were “the rest of the line”, but that’s the last (\S+), the approach would remain the same.

Hope this helps!  Cheers,
Matt


From:  Jeroen Jacobs <je...@headincloud.be>
Reply-To:  <us...@nifi.apache.org>
Date:  Sunday, January 31, 2016 at 7:49 PM
To:  "users@nifi.apache.org" <us...@nifi.apache.org>
Subject:  Struggling with ExtractText



Hi,



I'm looking at the documentation for the ReplaceText processor, and I'm at a total loss on how I should get started. I have a flowfile with the following content:



/opsbot url logstash



I want to end up with the following attributes:



message.destination: opsbot

message.action: url

message.parameters: logstash



(an action can have multiple parameters. If that's the case, they should all end up in message.parameters).



How do I get started on this? Do I need to create 3 different attributes, each with their own regex? This doesn't seem efficient. Or can I do this in one go? If so, can someone explain me how, and what the correct regex should be?



A few examples on the documentation page could make this a lot more understandable to people who are new to NiFi.





Kind regards,



Jeroen