You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Jeff Zemerick <jz...@apache.org> on 2017/04/25 14:39:22 UTC

How to identify MiNiFi source edge devices

When processing data in NiFi that was received via MiNiFi edge devices I
need to be able to identify the source of the data. All of the data on the
edge devices will be pulled from a database and will not contain any data
that self-identifies the source. My attempt to solve this was to write a
processor that reads a configuration file on the edge device to get its
device ID and put that ID as an attribute in the flowfile. This appears to
work, but, I was wondering if there is a more recommended approach?

Thanks,
Jeff

Re: How to identify MiNiFi source edge devices

Posted by Andy LoPresto <al...@apache.org>.
Jeff,

That’s a great explanation and a common thought exercise scenario we’ve used when planning other features of MiNiFi. I think what Andre suggested below would be the easiest and most successful way to accomplish what you are looking for. UpdateAttribute will let you get as specific as you want by pulling hostname or from variable registry (or you could even run a stream command on the host or read from a file on system to get some unique identifier), and then all of your downstream processors have access to that attribute. You can also filter provenance data within NiFi using that discriminator.

Good luck.


Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Apr 25, 2017, at 9:57 AM, Jeff Zemerick <jz...@apache.org> wrote:
> 
> Aldrin,
> 
> To simplify it, the situation is analogous to a deployment of temperature sensors. Each sensor has a unique ID that is assigned by us at deployment time and each sensor periodically adds a new row to a database table that is stored on the sensor. Each sensor uses the same database schema so if you combined all the rows you couldn't tell which rows originated from which sensor. In NiFi, I need to do different things based on where the data originated and I need to associate the sensor's ID with its data. (Such as inserting the data into DynamoDB with the sensor ID as the Hash key and a timestamp as the Range key.) The goal is to use the same MiNiFi configuration for all devices.
> 
> I can easily use the ExecuteSQL processor to grab the new rows. But I need some way to attach an attribute to the data that identifies where it originated. That was what led to the initial question in this thread. The Variable Registry along with the UpdateAttribute processor appears to satisfy that need cleaner than a custom processor.
> 
> I hope that explains the situation a bit!
> 
> Thanks,
> Jeff
> 
> 
> 
> On Tue, Apr 25, 2017 at 11:17 AM, Aldrin Piri <aldrinpiri@gmail.com <ma...@gmail.com>> wrote:
> Jeff,
> 
> Could you expand upon what a device id is in your case?  Something intrinsic to the device? The agent?  Are these generated and assigned during provisioning?   How are you making use of these when the data arrives at its desired destination?
> 
> What you are expressing is certainly a common need.  Would welcome any perspective on what your deployment looks like such that we can frame uses people are rolling out to guide assumptions that get made during our development and design processes.
> 
> Thanks for diving in and exploring!
> --Aldrin
> 
> 
> On Tue, Apr 25, 2017 at 11:05 AM, Andre <andre-lists@fucs.org <ma...@fucs.org>> wrote:
> Jeff,
> 
> That would be next suggestion. :-)
> 
> Cheers
> 
> On Wed, Apr 26, 2017 at 1:04 AM, Jeff Zemerick <jzemerick@apache.org <ma...@apache.org>> wrote:
> It is possible. I will take a look to see if the hostname is sufficient for the device ID.
> 
> I just learned about the Variable Registry. It seems if I use the Variable Registry to store the device ID it would be available to the UpdateAttribute processor. Is that correct?
> 
> Thanks,
> Jeff
> 
> 
> On Tue, Apr 25, 2017 at 10:48 AM, Andre <andre-lists@fucs.org <ma...@fucs.org>> wrote:
> Jeff,
> 
> Would if be feasible for you use UpdateAttribute (which I believe is part of MiNiFi core processors) and use the ${hostname(true)} Expression language function?
> 
> More about it can be found here:
> 
> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#hostname <https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#hostname>
> 
> Cheers
> 
> On Wed, Apr 26, 2017 at 12:39 AM, Jeff Zemerick <jzemerick@apache.org <ma...@apache.org>> wrote:
> When processing data in NiFi that was received via MiNiFi edge devices I need to be able to identify the source of the data. All of the data on the edge devices will be pulled from a database and will not contain any data that self-identifies the source. My attempt to solve this was to write a processor that reads a configuration file on the edge device to get its device ID and put that ID as an attribute in the flowfile. This appears to work, but, I was wondering if there is a more recommended approach?
> 
> Thanks,
> Jeff
> 
> 
> 
> 
> 


Re: How to identify MiNiFi source edge devices

Posted by Jeff Zemerick <jz...@apache.org>.
Aldrin,

To simplify it, the situation is analogous to a deployment of temperature
sensors. Each sensor has a unique ID that is assigned by us at deployment
time and each sensor periodically adds a new row to a database table that
is stored on the sensor. Each sensor uses the same database schema so if
you combined all the rows you couldn't tell which rows originated from
which sensor. In NiFi, I need to do different things based on where the
data originated and I need to associate the sensor's ID with its data.
(Such as inserting the data into DynamoDB with the sensor ID as the Hash
key and a timestamp as the Range key.) The goal is to use the same MiNiFi
configuration for all devices.

I can easily use the ExecuteSQL processor to grab the new rows. But I need
some way to attach an attribute to the data that identifies where it
originated. That was what led to the initial question in this thread. The
Variable Registry along with the UpdateAttribute processor appears to
satisfy that need cleaner than a custom processor.

I hope that explains the situation a bit!

Thanks,
Jeff



On Tue, Apr 25, 2017 at 11:17 AM, Aldrin Piri <al...@gmail.com> wrote:

> Jeff,
>
> Could you expand upon what a device id is in your case?  Something
> intrinsic to the device? The agent?  Are these generated and assigned
> during provisioning?   How are you making use of these when the data
> arrives at its desired destination?
>
> What you are expressing is certainly a common need.  Would welcome any
> perspective on what your deployment looks like such that we can frame uses
> people are rolling out to guide assumptions that get made during our
> development and design processes.
>
> Thanks for diving in and exploring!
> --Aldrin
>
>
> On Tue, Apr 25, 2017 at 11:05 AM, Andre <an...@fucs.org> wrote:
>
>> Jeff,
>>
>> That would be next suggestion. :-)
>>
>> Cheers
>>
>> On Wed, Apr 26, 2017 at 1:04 AM, Jeff Zemerick <jz...@apache.org>
>> wrote:
>>
>>> It is possible. I will take a look to see if the hostname is sufficient
>>> for the device ID.
>>>
>>> I just learned about the Variable Registry. It seems if I use the
>>> Variable Registry to store the device ID it would be available to the
>>> UpdateAttribute processor. Is that correct?
>>>
>>> Thanks,
>>> Jeff
>>>
>>>
>>> On Tue, Apr 25, 2017 at 10:48 AM, Andre <an...@fucs.org> wrote:
>>>
>>>> Jeff,
>>>>
>>>> Would if be feasible for you use UpdateAttribute (which I believe is
>>>> part of MiNiFi core processors) and use the ${hostname(true)} Expression
>>>> language function?
>>>>
>>>> More about it can be found here:
>>>>
>>>> https://nifi.apache.org/docs/nifi-docs/html/expression-langu
>>>> age-guide.html#hostname
>>>>
>>>> Cheers
>>>>
>>>> On Wed, Apr 26, 2017 at 12:39 AM, Jeff Zemerick <jz...@apache.org>
>>>> wrote:
>>>>
>>>>> When processing data in NiFi that was received via MiNiFi edge devices
>>>>> I need to be able to identify the source of the data. All of the data on
>>>>> the edge devices will be pulled from a database and will not contain any
>>>>> data that self-identifies the source. My attempt to solve this was to write
>>>>> a processor that reads a configuration file on the edge device to get its
>>>>> device ID and put that ID as an attribute in the flowfile. This appears to
>>>>> work, but, I was wondering if there is a more recommended approach?
>>>>>
>>>>> Thanks,
>>>>> Jeff
>>>>>
>>>>
>>>>
>>>
>>
>

Re: How to identify MiNiFi source edge devices

Posted by Aldrin Piri <al...@gmail.com>.
Jeff,

Could you expand upon what a device id is in your case?  Something
intrinsic to the device? The agent?  Are these generated and assigned
during provisioning?   How are you making use of these when the data
arrives at its desired destination?

What you are expressing is certainly a common need.  Would welcome any
perspective on what your deployment looks like such that we can frame uses
people are rolling out to guide assumptions that get made during our
development and design processes.

Thanks for diving in and exploring!
--Aldrin


On Tue, Apr 25, 2017 at 11:05 AM, Andre <an...@fucs.org> wrote:

> Jeff,
>
> That would be next suggestion. :-)
>
> Cheers
>
> On Wed, Apr 26, 2017 at 1:04 AM, Jeff Zemerick <jz...@apache.org>
> wrote:
>
>> It is possible. I will take a look to see if the hostname is sufficient
>> for the device ID.
>>
>> I just learned about the Variable Registry. It seems if I use the
>> Variable Registry to store the device ID it would be available to the
>> UpdateAttribute processor. Is that correct?
>>
>> Thanks,
>> Jeff
>>
>>
>> On Tue, Apr 25, 2017 at 10:48 AM, Andre <an...@fucs.org> wrote:
>>
>>> Jeff,
>>>
>>> Would if be feasible for you use UpdateAttribute (which I believe is
>>> part of MiNiFi core processors) and use the ${hostname(true)} Expression
>>> language function?
>>>
>>> More about it can be found here:
>>>
>>> https://nifi.apache.org/docs/nifi-docs/html/expression-langu
>>> age-guide.html#hostname
>>>
>>> Cheers
>>>
>>> On Wed, Apr 26, 2017 at 12:39 AM, Jeff Zemerick <jz...@apache.org>
>>> wrote:
>>>
>>>> When processing data in NiFi that was received via MiNiFi edge devices
>>>> I need to be able to identify the source of the data. All of the data on
>>>> the edge devices will be pulled from a database and will not contain any
>>>> data that self-identifies the source. My attempt to solve this was to write
>>>> a processor that reads a configuration file on the edge device to get its
>>>> device ID and put that ID as an attribute in the flowfile. This appears to
>>>> work, but, I was wondering if there is a more recommended approach?
>>>>
>>>> Thanks,
>>>> Jeff
>>>>
>>>
>>>
>>
>

Re: How to identify MiNiFi source edge devices

Posted by Andre <an...@fucs.org>.
Jeff,

That would be next suggestion. :-)

Cheers

On Wed, Apr 26, 2017 at 1:04 AM, Jeff Zemerick <jz...@apache.org> wrote:

> It is possible. I will take a look to see if the hostname is sufficient
> for the device ID.
>
> I just learned about the Variable Registry. It seems if I use the Variable
> Registry to store the device ID it would be available to the
> UpdateAttribute processor. Is that correct?
>
> Thanks,
> Jeff
>
>
> On Tue, Apr 25, 2017 at 10:48 AM, Andre <an...@fucs.org> wrote:
>
>> Jeff,
>>
>> Would if be feasible for you use UpdateAttribute (which I believe is part
>> of MiNiFi core processors) and use the ${hostname(true)} Expression
>> language function?
>>
>> More about it can be found here:
>>
>> https://nifi.apache.org/docs/nifi-docs/html/expression-langu
>> age-guide.html#hostname
>>
>> Cheers
>>
>> On Wed, Apr 26, 2017 at 12:39 AM, Jeff Zemerick <jz...@apache.org>
>> wrote:
>>
>>> When processing data in NiFi that was received via MiNiFi edge devices I
>>> need to be able to identify the source of the data. All of the data on the
>>> edge devices will be pulled from a database and will not contain any data
>>> that self-identifies the source. My attempt to solve this was to write a
>>> processor that reads a configuration file on the edge device to get its
>>> device ID and put that ID as an attribute in the flowfile. This appears to
>>> work, but, I was wondering if there is a more recommended approach?
>>>
>>> Thanks,
>>> Jeff
>>>
>>
>>
>

Re: How to identify MiNiFi source edge devices

Posted by Jeff Zemerick <jz...@apache.org>.
It is possible. I will take a look to see if the hostname is sufficient for
the device ID.

I just learned about the Variable Registry. It seems if I use the Variable
Registry to store the device ID it would be available to the
UpdateAttribute processor. Is that correct?

Thanks,
Jeff


On Tue, Apr 25, 2017 at 10:48 AM, Andre <an...@fucs.org> wrote:

> Jeff,
>
> Would if be feasible for you use UpdateAttribute (which I believe is part
> of MiNiFi core processors) and use the ${hostname(true)} Expression
> language function?
>
> More about it can be found here:
>
> https://nifi.apache.org/docs/nifi-docs/html/expression-
> language-guide.html#hostname
>
> Cheers
>
> On Wed, Apr 26, 2017 at 12:39 AM, Jeff Zemerick <jz...@apache.org>
> wrote:
>
>> When processing data in NiFi that was received via MiNiFi edge devices I
>> need to be able to identify the source of the data. All of the data on the
>> edge devices will be pulled from a database and will not contain any data
>> that self-identifies the source. My attempt to solve this was to write a
>> processor that reads a configuration file on the edge device to get its
>> device ID and put that ID as an attribute in the flowfile. This appears to
>> work, but, I was wondering if there is a more recommended approach?
>>
>> Thanks,
>> Jeff
>>
>
>

Re: How to identify MiNiFi source edge devices

Posted by Andre <an...@fucs.org>.
Jeff,

Would if be feasible for you use UpdateAttribute (which I believe is part
of MiNiFi core processors) and use the ${hostname(true)} Expression
language function?

More about it can be found here:

https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#hostname

Cheers

On Wed, Apr 26, 2017 at 12:39 AM, Jeff Zemerick <jz...@apache.org>
wrote:

> When processing data in NiFi that was received via MiNiFi edge devices I
> need to be able to identify the source of the data. All of the data on the
> edge devices will be pulled from a database and will not contain any data
> that self-identifies the source. My attempt to solve this was to write a
> processor that reads a configuration file on the edge device to get its
> device ID and put that ID as an attribute in the flowfile. This appears to
> work, but, I was wondering if there is a more recommended approach?
>
> Thanks,
> Jeff
>