You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nifi.apache.org by "Joseph Witt (JIRA)" <ji...@apache.org> on 2015/01/16 04:03:34 UTC

[jira] [Updated] (NIFI-252) Provide ability to produce more complicated queries against the provenance data

     [ https://issues.apache.org/jira/browse/NIFI-252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joseph Witt updated NIFI-252:
-----------------------------
    Summary: Provide ability to produce more complicated queries against the provenance data  (was: Enhancement to the Core to support metrics )

> Provide ability to produce more complicated queries against the provenance data
> -------------------------------------------------------------------------------
>
>                 Key: NIFI-252
>                 URL: https://issues.apache.org/jira/browse/NIFI-252
>             Project: Apache NiFi
>          Issue Type: Wish
>          Components: Core Framework
>    Affects Versions: 0.0.1
>            Reporter: Teresa Jackson
>
> I'd like to propose an addition or enhancement be made to the Core to support volume management, trend analysis by way of databasing attributes and content so that it is query-able and made available for display. This information would then be used for statistical roll ups, metrics, trend analysis, etc..
> Ideally, we'd do it by capturing running totals by receiving copies of local provenance events.  This component would be like local provenance in that it would retain the data for some configurable period of time, based on the amount of disk space allocated for that process.  In addition, these roll ups could be sent somewhere for even longer retention.
> The goal is to keep as many hooks as possible to making it possible for other programs/services to ingest both the local provenance logs, and the rolled up summaries.  There's a growing base of people who are comfortable with NIFI graphs, and local provenance, so I think that it makes sense to build off that.
> The issue I'm facing is that Provenance is fine for tracking one file if you have a starting point, but it is not designed to do counting, summarization and correlation of data. And it doesn't support advanced queries.
> Here are some of the most immediate and pressing use cases for this design.
> 1.  How much traffic came in yesterday (or last week)?
> 2. Provide statistical counts on items of interest within a flow for a given flow/date range.
> 3.  When was the last file sent to "System X"?
> 4. Did anything get sent to "System Y"?
> 5. How much data was marked with a certain tag?
> 6. How much data was scanned?
> 7. How much data was detected?
> 8. How much of a particular type of data was received in bytes?
> 9. How much data was processed by file count?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)