You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Pompilio Ramirez <po...@gmail.com> on 2017/04/06 14:58:58 UTC

nifi attributes logics

Hello,

I cant find a way to take 2 attributes and create an attribute with only
the difference between them.

Has anyone accomplished that? In general want to define routing to
processing groups that do things. I want to define end points ( say
customer1 / customer2  / customer3 )
I get one flowfile and I define what the customer(s) need to normalize the
data using advance update attribute.

So flowfile condition is met and I build attributes under actions:

So I enrich flowfile with attribute:

So customer1 would have an attribute "ACTIONS" that says ( convert / merge
/ enrich / convert )
Customer2 has "ACTIONS" ( convert / merge / gzip )
Customer3 has "ACTIONS" ( convert / merge  )

I then want to take that flowfile and have logic that says all customers
need convert / merge so I'll take that flowfile and do convert ( this is
very expensive ) ( or do all the common steps as one flowfile ) before they
clone.

Knowing my steps are linear so just want to find a way to build logic to
know my common steps and create attributes with the difference.

thoughts?

Re: nifi attributes logics

Posted by Joseph Niemiec <jo...@gmail.com>.
I would chain attribute processors in a forward faced manor. You have a set
of 4 known steps that may be labeled together in your ACTION Attribute
which is what type of processing needs to be completed yes?

I would chain some Route on Attributes together as a bottom pipe and use
Regex to look at the commands required to execute on the flow file. My idea
isnt very dynamic in nature its just a standard pipe with some regex to
decide if it will send it up to the processor or not, and from there its
passed to the next RouteOnAttribute type Gate.

I called my 'convert' a, and b, so A in my transform line is always before
B. So I checked if the very first flag was an A.. On the second I ignored
the first char and checked the Second as B. I could see this turning into a
more efficient binary check.

${ACTION:matches('^a.*')}
${ACTION:matches('^.,b')}



On Thu, Apr 6, 2017 at 2:25 PM, Pompilio Ramirez <po...@gmail.com>
wrote:

> Thank you Andy.
>
> We have n number of data points that send data to us to normalize and our
> end points have different normalized requirements.
> So we are also juggling ease of dataflow configuration ( many people
> implementing dataflows ) so trying to keep things simple and have only one
> point in which a DFM needs to go to configure a dataflow.
>
> So dataFromSiteX comes in to our system.
>
> That data is X format .... we "normalize" it through identify Mime type in
> to a standard format we have. And then we route based on who our end
> customers are and their requirements ( zip'd / merged to X size /
> proprietary stuff ).
>
> We are trying to find the most efficient method that balances dataflow
> configuration simplicity ( CM / DFM ) versus technical efficiency.
> To meet the req we could certainly accomplish it using multiple update
> routes on different pieces of the dataflow, but then a DFM would have to
> know that they need to configure many UpdateAttributes.
>
> We are going to do something similar to your suggestion and probably
> extend the route on attribute code, we are also going to create an external
> "U/I or script or something" that will allow a DFM to configure a dataflow
> and that way we can "validate" the initial dataflow configuration, since we
> could have many attributes and settings that we want to pre-populate one
> time and in one place. It feels weird to have a separate UI that populates
> the "nifi UI" but validating a dataflow through the dataflow lifecycle
> builds extra complexity for CM / Tier 1 troubleshooting.
>
>
> On Thu, Apr 6, 2017 at 1:20 PM Andy LoPresto <al...@apache.org> wrote:
>
>> Hi,
>>
>> I’m not sure I fully understand the question, as you say that “customers”
>> have attributes, but I was under the impression that the customers were
>> various processors/“endpoints”. Flowfiles are the atomic units of data
>> passing through the flow, and that is where attributes are stored.
>>
>> Regardless, to perform complex string comparisons, I believe your best
>> option is an ExecuteScript processor. In any of the supported languages,
>> you can quickly parse the attributes into collections (i.e. java.util.List)
>> via String split on delimiter, perform set arithmetic (A - B), and then
>> route based on the results.
>>
>> Andy LoPresto
>> alopresto@apache.org
>> *alopresto.apache@gmail.com <al...@gmail.com>*
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>
>> On Apr 6, 2017, at 7:58 AM, Pompilio Ramirez <po...@gmail.com>
>> wrote:
>>
>> Hello,
>>
>> I cant find a way to take 2 attributes and create an attribute with only
>> the difference between them.
>>
>> Has anyone accomplished that? In general want to define routing to
>> processing groups that do things. I want to define end points ( say
>> customer1 / customer2  / customer3 )
>> I get one flowfile and I define what the customer(s) need to normalize
>> the data using advance update attribute.
>>
>> So flowfile condition is met and I build attributes under actions:
>>
>> So I enrich flowfile with attribute:
>>
>> So customer1 would have an attribute "ACTIONS" that says ( convert /
>> merge / enrich / convert )
>> Customer2 has "ACTIONS" ( convert / merge / gzip )
>> Customer3 has "ACTIONS" ( convert / merge  )
>>
>> I then want to take that flowfile and have logic that says all customers
>> need convert / merge so I'll take that flowfile and do convert ( this is
>> very expensive ) ( or do all the common steps as one flowfile ) before they
>> clone.
>>
>> Knowing my steps are linear so just want to find a way to build logic to
>> know my common steps and create attributes with the difference.
>>
>> thoughts?
>>
>>
>>


-- 
Joseph

Re: nifi attributes logics

Posted by Pompilio Ramirez <po...@gmail.com>.
Thank you Andy.

We have n number of data points that send data to us to normalize and our
end points have different normalized requirements.
So we are also juggling ease of dataflow configuration ( many people
implementing dataflows ) so trying to keep things simple and have only one
point in which a DFM needs to go to configure a dataflow.

So dataFromSiteX comes in to our system.

That data is X format .... we "normalize" it through identify Mime type in
to a standard format we have. And then we route based on who our end
customers are and their requirements ( zip'd / merged to X size /
proprietary stuff ).

We are trying to find the most efficient method that balances dataflow
configuration simplicity ( CM / DFM ) versus technical efficiency.
To meet the req we could certainly accomplish it using multiple update
routes on different pieces of the dataflow, but then a DFM would have to
know that they need to configure many UpdateAttributes.

We are going to do something similar to your suggestion and probably extend
the route on attribute code, we are also going to create an external "U/I
or script or something" that will allow a DFM to configure a dataflow and
that way we can "validate" the initial dataflow configuration, since we
could have many attributes and settings that we want to pre-populate one
time and in one place. It feels weird to have a separate UI that populates
the "nifi UI" but validating a dataflow through the dataflow lifecycle
builds extra complexity for CM / Tier 1 troubleshooting.


On Thu, Apr 6, 2017 at 1:20 PM Andy LoPresto <al...@apache.org> wrote:

> Hi,
>
> I’m not sure I fully understand the question, as you say that “customers”
> have attributes, but I was under the impression that the customers were
> various processors/“endpoints”. Flowfiles are the atomic units of data
> passing through the flow, and that is where attributes are stored.
>
> Regardless, to perform complex string comparisons, I believe your best
> option is an ExecuteScript processor. In any of the supported languages,
> you can quickly parse the attributes into collections (i.e. java.util.List)
> via String split on delimiter, perform set arithmetic (A - B), and then
> route based on the results.
>
> Andy LoPresto
> alopresto@apache.org
> *alopresto.apache@gmail.com <al...@gmail.com>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Apr 6, 2017, at 7:58 AM, Pompilio Ramirez <po...@gmail.com> wrote:
>
> Hello,
>
> I cant find a way to take 2 attributes and create an attribute with only
> the difference between them.
>
> Has anyone accomplished that? In general want to define routing to
> processing groups that do things. I want to define end points ( say
> customer1 / customer2  / customer3 )
> I get one flowfile and I define what the customer(s) need to normalize the
> data using advance update attribute.
>
> So flowfile condition is met and I build attributes under actions:
>
> So I enrich flowfile with attribute:
>
> So customer1 would have an attribute "ACTIONS" that says ( convert / merge
> / enrich / convert )
> Customer2 has "ACTIONS" ( convert / merge / gzip )
> Customer3 has "ACTIONS" ( convert / merge  )
>
> I then want to take that flowfile and have logic that says all customers
> need convert / merge so I'll take that flowfile and do convert ( this is
> very expensive ) ( or do all the common steps as one flowfile ) before they
> clone.
>
> Knowing my steps are linear so just want to find a way to build logic to
> know my common steps and create attributes with the difference.
>
> thoughts?
>
>
>

Re: nifi attributes logics

Posted by Andy LoPresto <al...@apache.org>.
Hi,

I’m not sure I fully understand the question, as you say that “customers” have attributes, but I was under the impression that the customers were various processors/“endpoints”. Flowfiles are the atomic units of data passing through the flow, and that is where attributes are stored.

Regardless, to perform complex string comparisons, I believe your best option is an ExecuteScript processor. In any of the supported languages, you can quickly parse the attributes into collections (i.e. java.util.List) via String split on delimiter, perform set arithmetic (A - B), and then route based on the results.

Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Apr 6, 2017, at 7:58 AM, Pompilio Ramirez <po...@gmail.com> wrote:
> 
> Hello,
> 
> I cant find a way to take 2 attributes and create an attribute with only the difference between them.
> 
> Has anyone accomplished that? In general want to define routing to processing groups that do things. I want to define end points ( say customer1 / customer2  / customer3 )
> I get one flowfile and I define what the customer(s) need to normalize the data using advance update attribute.
> 
> So flowfile condition is met and I build attributes under actions:
> 
> So I enrich flowfile with attribute:
> 
> So customer1 would have an attribute "ACTIONS" that says ( convert / merge / enrich / convert )
> Customer2 has "ACTIONS" ( convert / merge / gzip )
> Customer3 has "ACTIONS" ( convert / merge  )
> 
> I then want to take that flowfile and have logic that says all customers need convert / merge so I'll take that flowfile and do convert ( this is very expensive ) ( or do all the common steps as one flowfile ) before they clone.
> 
> Knowing my steps are linear so just want to find a way to build logic to know my common steps and create attributes with the difference.
> 
> thoughts?