You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Mark Payne (JIRA)" <ji...@apache.org> on 2017/01/27 14:58:24 UTC

[jira] [Commented] (NIFI-1847) improve provenance space utilization

    [ https://issues.apache.org/jira/browse/NIFI-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842946#comment-15842946 ] 

Mark Payne commented on NIFI-1847:
----------------------------------

I have no problem with this proposal - except that the wording "recommend the max size be changed to a percentage" - I would not want to *change* how it worked but rather give the user the option of choosing one or the other by introducing a new property (nifi.provenance.repository.max.storage.size would stay but also nifi.provenance.repository.max.storage.percentage would be added).

> improve provenance space utilization
> ------------------------------------
>
>                 Key: NIFI-1847
>                 URL: https://issues.apache.org/jira/browse/NIFI-1847
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 0.5.1
>            Reporter: Ben Icore
>            Assignee: Joe Skora
>
> currently the max storage size of the provenance repo is specified in bytes.  this is ok if there is a single provenance repo.  If multple repos are specified, the space can be significantly under utilized.
> consider the following examples
> repo 1 has 500GB of space
> repo 2 has 500GB of space
> max storeage size would likely be set at 900GB, since the combine space is 1TB.  900GB seems like a "safe" value, because provenance informaiton is generally stripped evenly accross the repos, however this is not garanteed.  with the max size is considerablly larger than the size of any given partition, any given partition could easily reach 100%
> The only safe way to prevent a given partion in the above example from filling is to set the max size at say 450GB, however this caps the entire provenance repo at 450GB, effectively rendering 650GB of disk space unuseable.
> If the repo sizes where of uneven size, say
> repo 1 has 700GB of space
> repo 2 has 300GB of space
> you would have the same 1TB of provenance space, but this individual repos are uneven, so the 900GB of storage would definately cause repo 2 to run out of disk space.  The only way to ensure that repo 2 did not run out of disk space would be to set the max size to 250GB, effectively loosing 750GB of disk space
> recommend the max size be changed to a percentage and applyed to the individual repos.  provenance records should still be distributed as evenly as possible, but if one repo has exceed its max, information would written to the other
> so in example 1 
> repo 1 has 500GB of space
> repo 2 has 500GB of space
> max space is 90%
> effective and "usable" repo space would be 900GB
> so in example 1 
> repo 1 has 700GB of space
> repo 2 has 300GB of space
> max space is 90%
> effective and "usable" repo space would be 900GB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)