You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by "Peddy, Sai" <Sa...@capitalone.com> on 2016/11/09 16:14:49 UTC

Getting the number of logs

Hi All,

I’m currently working on a use case to be able to track the number of individual logs that come in and put that information in ElasticSearch. I wanted to see if there is an easy way to do this and whether anyone had any good ideas?

Current approach I am considering: Route the Log Files coming in – to a Split Text & Route Text Processor to make sure no empty logs get through and get the individual log count when files contain multiple logs – At the end of this the total number of logs are visible in the UI queue, where it displays the queueCount, but this information is not readily available to any processor. Current thought process is that I can use the ExecuteScript Processor and update a local file to keep track and insert the document into elastic search hourly.

Any advice would be appreciated

Thanks,
Sai Peddy

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Getting the number of logs

Posted by "Peddy, Sai" <Sa...@capitalone.com>.
Hi Joe & Andy,

Thanks for the great tips and heads up on the counters being only in memory. Yes, we are thinking it shouldn’t be too big a problem if we upload the counter data frequently and just reset it ourselves when we do upload it. I have a custom processor that is working for updating the counter and I think its generic enough for others to use, so I’ll try to get this contributed back soon. 

I wanted to see if anyone had tips on getting the counter data within the Nifi Flow? What would be the best approach for this? Should a person only be able to request to a certain counter’s value, based off name, or be able to get all counter metrics available to use in the flow. Would you guys suggest that this data gets written to disk? Otherwise how else could I make it available in the flow? For my use case, I approached this aspect with just an API call as we were in a hurry, but I would love to create this custom processor as well – I think having counter data available within the flow could be useful for a lot of use cases. 

Thanks,
Sai Peddy
Sai.Peddy@capitalone.com


On 11/10/16, 11:00 AM, "Joe Percivall" <jo...@yahoo.com.INVALID> wrote:

    I hadn't received Andy's response before I sent mine to the User's list. For tracking, here is a link to my response: http://mail-archives.apache.org/mod_mbox/nifi-users/201611.mbox/raw/%3C215272108.1093060.1478740617664@mail.yahoo.com%3E/2
    Joe - - - - - - Joseph Percivalllinkedin.com/in/Percivalle: joepercivall@yahoo.com
     
    
        On Wednesday, November 9, 2016 7:53 PM, Andy LoPresto <al...@apache.org> wrote:
     
    
     Sai,
    I’d suggest you look at using a ControllerStatusReportingTask [1], which monitors the processor and provides statistics from that component. If you need to use this data within NiFi, you can also use SiteToSiteProvenanceReportingTask [2], which can export provenance events as data that can be consumed by (the same or a different) instance of NiFi. Both of these may be overkill for your use case (the provenance reporting task will offload all of the provenance events from the application), and if so, you may be able to use counters [3] to do this quickly and easily (but be aware that the values are just held in memory, so if you’re writing to ES hourly, you should be ok, but they won’t persist across restart). Your initial thought to use ExecuteScript would also work. 
    I believe Joe Percivall had done some work on SEP and window/aggregate calculations before. That may also help with what you are doing. 
    Joe Percivall·4:51 PM
    Here's the link to the processor: https://github.com/JPercivall/nifi/blob/newRollingState/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/RollingWindowAggregator.java
    Here's the ticket: https://issues.apache.org/jira/browse/NIFI-1682?jql=project%20%3D%20NIFI%20AND%20text%20~%20%22rolling%20window%22
    
    Keep in mind that this work is old and will need to be updated. I do have a pending PR for UpdateAttribute with State though: https://github.com/apache/nifi/pull/319
    
    [1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.controller.ControllerStatusReportingTask/index.html[2] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.reporting.SiteToSiteProvenanceReportingTask/index.html[3] https://community.hortonworks.com/questions/50622/apache-nifi-what-are-counters-in-nifi.html
    Andy LoPrestoalopresto@apache.orgalopresto.apache@gmail.comPGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
    
    On Nov 9, 2016, at 8:14 AM, Peddy, Sai <Sa...@capitalone.com> wrote:
    Hi All,
    
    I’m currently working on a use case to be able to track the number of individual logs that come in and put that information in ElasticSearch. I wanted to see if there is an easy way to do this and whether anyone had any good ideas?
    
    Current approach I am considering: Route the Log Files coming in – to a Split Text & Route Text Processor to make sure no empty logs get through and get the individual log count when files contain multiple logs – At the end of this the total number of logs are visible in the UI queue, where it displays the queueCount, but this information is not readily available to any processor. Current thought process is that I can use the ExecuteScript Processor and update a local file to keep track and insert the document into elastic search hourly.
    
    Any advice would be appreciated
    
    Thanks,
    Sai Peddy
    
    ________________________________________________________
    
    The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
    
    
    
    
       


________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Getting the number of logs

Posted by Joe Percivall <jo...@yahoo.com.INVALID>.
I hadn't received Andy's response before I sent mine to the User's list. For tracking, here is a link to my response: http://mail-archives.apache.org/mod_mbox/nifi-users/201611.mbox/raw/%3C215272108.1093060.1478740617664@mail.yahoo.com%3E/2
Joe - - - - - - Joseph Percivalllinkedin.com/in/Percivalle: joepercivall@yahoo.com
 

    On Wednesday, November 9, 2016 7:53 PM, Andy LoPresto <al...@apache.org> wrote:
 

 Sai,
I’d suggest you look at using a ControllerStatusReportingTask [1], which monitors the processor and provides statistics from that component. If you need to use this data within NiFi, you can also use SiteToSiteProvenanceReportingTask [2], which can export provenance events as data that can be consumed by (the same or a different) instance of NiFi. Both of these may be overkill for your use case (the provenance reporting task will offload all of the provenance events from the application), and if so, you may be able to use counters [3] to do this quickly and easily (but be aware that the values are just held in memory, so if you’re writing to ES hourly, you should be ok, but they won’t persist across restart). Your initial thought to use ExecuteScript would also work. 
I believe Joe Percivall had done some work on SEP and window/aggregate calculations before. That may also help with what you are doing. 
Joe Percivall·4:51 PM
Here's the link to the processor: https://github.com/JPercivall/nifi/blob/newRollingState/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/RollingWindowAggregator.java
Here's the ticket: https://issues.apache.org/jira/browse/NIFI-1682?jql=project%20%3D%20NIFI%20AND%20text%20~%20%22rolling%20window%22

Keep in mind that this work is old and will need to be updated. I do have a pending PR for UpdateAttribute with State though: https://github.com/apache/nifi/pull/319

[1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.controller.ControllerStatusReportingTask/index.html[2] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.reporting.SiteToSiteProvenanceReportingTask/index.html[3] https://community.hortonworks.com/questions/50622/apache-nifi-what-are-counters-in-nifi.html
Andy LoPrestoalopresto@apache.orgalopresto.apache@gmail.comPGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

On Nov 9, 2016, at 8:14 AM, Peddy, Sai <Sa...@capitalone.com> wrote:
Hi All,

I’m currently working on a use case to be able to track the number of individual logs that come in and put that information in ElasticSearch. I wanted to see if there is an easy way to do this and whether anyone had any good ideas?

Current approach I am considering: Route the Log Files coming in – to a Split Text & Route Text Processor to make sure no empty logs get through and get the individual log count when files contain multiple logs – At the end of this the total number of logs are visible in the UI queue, where it displays the queueCount, but this information is not readily available to any processor. Current thought process is that I can use the ExecuteScript Processor and update a local file to keep track and insert the document into elastic search hourly.

Any advice would be appreciated

Thanks,
Sai Peddy

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.




   

Re: Getting the number of logs

Posted by Andy LoPresto <al...@apache.org>.
Sai,

I’d suggest you look at using a ControllerStatusReportingTask [1], which monitors the processor and provides statistics from that component. If you need to use this data within NiFi, you can also use SiteToSiteProvenanceReportingTask [2], which can export provenance events as data that can be consumed by (the same or a different) instance of NiFi. Both of these may be overkill for your use case (the provenance reporting task will offload all of the provenance events from the application), and if so, you may be able to use counters [3] to do this quickly and easily (but be aware that the values are just held in memory, so if you’re writing to ES hourly, you should be ok, but they won’t persist across restart). Your initial thought to use ExecuteScript would also work.

I believe Joe Percivall had done some work on SEP and window/aggregate calculations before. That may also help with what you are doing.

Joe Percivall·4:51 PM
Here's the link to the processor: https://github.com/JPercivall/nifi/blob/newRollingState/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/RollingWindowAggregator.java
Here's the ticket: https://issues.apache.org/jira/browse/NIFI-1682?jql=project%20%3D%20NIFI%20AND%20text%20~%20%22rolling%20window%22

Keep in mind that this work is old and will need to be updated. I do have a pending PR for UpdateAttribute with State though: https://github.com/apache/nifi/pull/319


[1] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.controller.ControllerStatusReportingTask/index.html <https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.controller.ControllerStatusReportingTask/index.html>
[2] https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.reporting.SiteToSiteProvenanceReportingTask/index.html <https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.reporting.SiteToSiteProvenanceReportingTask/index.html>
[3] https://community.hortonworks.com/questions/50622/apache-nifi-what-are-counters-in-nifi.html <https://community.hortonworks.com/questions/50622/apache-nifi-what-are-counters-in-nifi.html>

Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Nov 9, 2016, at 8:14 AM, Peddy, Sai <Sa...@capitalone.com> wrote:
> 
> Hi All,
> 
> I’m currently working on a use case to be able to track the number of individual logs that come in and put that information in ElasticSearch. I wanted to see if there is an easy way to do this and whether anyone had any good ideas?
> 
> Current approach I am considering: Route the Log Files coming in – to a Split Text & Route Text Processor to make sure no empty logs get through and get the individual log count when files contain multiple logs – At the end of this the total number of logs are visible in the UI queue, where it displays the queueCount, but this information is not readily available to any processor. Current thought process is that I can use the ExecuteScript Processor and update a local file to keep track and insert the document into elastic search hourly.
> 
> Any advice would be appreciated
> 
> Thanks,
> Sai Peddy
> 
> ________________________________________________________
> 
> The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.