You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nifi.apache.org by "Mark Payne (JIRA)" <ji...@apache.org> on 2015/11/09 21:27:10 UTC

[jira] [Created] (NIFI-1135) For Provenance Query, bring back Event Summaries instead of the Events themselves

Mark Payne created NIFI-1135:
--------------------------------

             Summary: For Provenance Query, bring back Event Summaries instead of the Events themselves
                 Key: NIFI-1135
                 URL: https://issues.apache.org/jira/browse/NIFI-1135
             Project: Apache NiFi
          Issue Type: Improvement
          Components: Core Framework, Core UI
            Reporter: Mark Payne
             Fix For: 1.0.0


Currently, when we query Provenance, we pull back up to 1000 events. These are full Provenance Events with attributes, etc. If the query takes a long time, we will request those objects that already have matched the query many times. This amounts to a great deal of heap being used and sending back very large JSON objects (10+ MB is not uncommon and it could potentially be far worse).

We should instead use a ProvenanceEventSummary object. This object should contain just the info shown in the results table and the pointer to the actual event in the Provenance Store. This allows us to return the queries much faster, store less data in the heap, and provide less data back to the end user with virtually the same experience.

The one place that this would differ in UX is when the user clicks the "info" button to view the entire provenance event, we would have to pull the event back from the server, rather than already having that in memory.

We should consider storing all of the fields in the results table in Lucene to provide faster results. Otherwise, we could still get potentially better results with the current approach if we just ensure that the first fields that we store are those in the results table. This allows us to read just a small portion of the event from file and deserializing just a small amount of data before moving on to the next event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)