You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Rosso, Roland" <Ro...@AdventHealth.com> on 2021/01/20 00:17:22 UTC

NiFi Cluster 1.9.2 Content Repository

Hello all,

We have  a 3 node NiFi 1.9.2 cluster for which the content repository config is below. I don't have the entire history of this install, however we are only able to retrieve the content from the flows that ran within the past couple of minutes.
All others when trying to view NiFi Data Provenance -> Content  will show
Replay
Content is no longer available in Content Repository

Checking all 3 nodes at intervals, the content repository size on disk is:
Node 1:  1.1G, goes up to 2.1G, back down to 1.1G. This is currently the coordinator
Node 2:  5.9G, static
Node 3: 201M, static

Is there a default size for the content repository set around 4GB? Looking at the documentation, I can't seem to find the answer to that question.

# Content Repository
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.content.claim.max.appendable.size=1 MB
nifi.content.claim.max.flow.files=100
nifi.content.repository.directory.default=/u12/nifi/data/content_repository
nifi.content.repository.archive.max.retention.period=12 hours
nifi.content.repository.archive.max.usage.percentage=50%
nifi.content.repository.archive.enabled=true
nifi.content.repository.always.sync=false
nifi.content.viewer.url=../nifi-content-viewer/

Thank you for your help,
Roland

This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, notify us immediately by telephone and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.

RE: [EXTERNAL] Re: NiFi Cluster 1.9.2 Content Repository

Posted by "Rosso, Roland" <Ro...@AdventHealth.com>.
Joe,
Following up on this. I see what you mean concerning the 50% value. NiFi is monitoring the disk utilization, not the actual usage by NiFi within the content repository.
I’m hesitant to change nifi.content.repository.archive.max.usage.percentage=50%
I could look into adding another content repository on all 3 nodes but that will get cumbersome to some extent.
I’m also looking into a File System Content Repository Property to fix the content  repo directory to a pre-established size, say 1TB for instance. I can’t find it in the guide, but is there one?
Thanks again.

[cid:image001.png@01D6EF2D.8D405A60]

Roland Rosso
AdventHealth
Big Data Administrator | Corporate Analytics
O: 407-805-8532

From: Rosso, Roland <Ro...@AdventHealth.com>
Sent: Tuesday, January 19, 2021 11:23 PM
To: users@nifi.apache.org
Subject: RE: [EXTERNAL] Re: NiFi Cluster 1.9.2 Content Repository

Joe,
The disks are 9TB drives, with over 3Tb available. This is true for each node
Ideally, we would be able to replay up to 12 hours for any of the flows in order to troubleshoot/investigate data issues.

Thanks,
Roland Rosso
AdventHealth
Big Data Administrator | Corporate Analytics
O: 407-805-8532

From: Joe Witt <jo...@gmail.com>>
Sent: Tuesday, January 19, 2021 10:26 PM
To: users@nifi.apache.org<ma...@nifi.apache.org>
Subject: [EXTERNAL] Re: NiFi Cluster 1.9.2 Content Repository

Hello

The key value to watch is the 50% value. That means we will work to remove content no longer reachable in the flow until we are at 50% of the available disk space for that volume.

So how big is the disk there for each node?

Can you share more about what you are hoping to see happen against what is happening?

Thanks

On Tue, Jan 19, 2021 at 5:18 PM Rosso, Roland <Ro...@adventhealth.com>> wrote:
Hello all,

We have  a 3 node NiFi 1.9.2 cluster for which the content repository config is below. I don’t have the entire history of this install, however we are only able to retrieve the content from the flows that ran within the past couple of minutes.
All others when trying to view NiFi Data Provenance -> Content  will show
Replay
Content is no longer available in Content Repository

Checking all 3 nodes at intervals, the content repository size on disk is:
Node 1:  1.1G, goes up to 2.1G, back down to 1.1G. This is currently the coordinator
Node 2:  5.9G, static
Node 3: 201M, static

Is there a default size for the content repository set around 4GB? Looking at the documentation, I can’t seem to find the answer to that question.

# Content Repository
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.content.claim.max.appendable.size=1 MB
nifi.content.claim.max.flow.files=100
nifi.content.repository.directory.default=/u12/nifi/data/content_repository
nifi.content.repository.archive.max.retention.period=12 hours
nifi.content.repository.archive.max.usage.percentage=50%
nifi.content.repository.archive.enabled=true
nifi.content.repository.always.sync=false
nifi.content.viewer.url=../nifi-content-viewer/

Thank you for your help,
Roland

This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, notify us immediately by telephone and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.
This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, notify us immediately by telephone and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.
This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, notify us immediately by telephone and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.

RE: [EXTERNAL] NiFi Cluster 1.9.2 Content Repository

Posted by "Rosso, Roland" <Ro...@AdventHealth.com>.
Thank you for your reply Mark,
Somehow it went to junk…. And it’s definitely not. 😊 I better understand how NiFi looks at Content Storage.

Here are the 3 options I see which I asked about yesterday as well:
- change nifi.content.repository.archive.max.usage.percentage=50% but I’m hesitant to do that due to the nature of the services using this storage, it does fluctuate quite a bit.
- I could look into adding another content repository on all 3 nodes but that will get cumbersome to some extent.
- I’m also looking into a File System Content Repository Property to fix the content  repo directory to a pre-established size, say 1TB for instance. I can’t find it in the guide, but is there one? I think this would be the best option for our use-case.
Thanks,
Roland Rosso
AdventHealth
Big Data Administrator | Corporate Analytics
O: 407-805-8532

From: Mark Payne <ma...@hotmail.com>
Sent: Wednesday, January 20, 2021 9:25 AM
To: users@nifi.apache.org
Subject: Re: [EXTERNAL] NiFi Cluster 1.9.2 Content Repository

Roland,

So based on the configuration below, that makes sense. The content repo will keep archived data as long as the disk usage is below 50%, and for up to 12 hours.

So if you have 3 TB free, that’s 67% used, approximately, so data will be aged off of the archive very quickly. So note that the configuration is not saying “Use up to 50% of the drive for archive.” It’s saying “Keep archiving until 50% of the drive is used.”

Thanks
-Mark



On Jan 19, 2021, at 11:23 PM, Rosso, Roland <Ro...@AdventHealth.com>> wrote:

Joe,
The disks are 9TB drives, with over 3Tb available. This is true for each node
Ideally, we would be able to replay up to 12 hours for any of the flows in order to troubleshoot/investigate data issues.

Thanks,
Roland Rosso
AdventHealth
Big Data Administrator | Corporate Analytics
O: 407-805-8532

From: Joe Witt <jo...@gmail.com>>
Sent: Tuesday, January 19, 2021 10:26 PM
To: users@nifi.apache.org<ma...@nifi.apache.org>
Subject: [EXTERNAL] Re: NiFi Cluster 1.9.2 Content Repository

Hello

The key value to watch is the 50% value. That means we will work to remove content no longer reachable in the flow until we are at 50% of the available disk space for that volume.

So how big is the disk there for each node?

Can you share more about what you are hoping to see happen against what is happening?

Thanks

On Tue, Jan 19, 2021 at 5:18 PM Rosso, Roland <Ro...@adventhealth.com>> wrote:
Hello all,

We have  a 3 node NiFi 1.9.2 cluster for which the content repository config is below. I don’t have the entire history of this install, however we are only able to retrieve the content from the flows that ran within the past couple of minutes.
All others when trying to view NiFi Data Provenance -> Content  will show
Replay
Content is no longer available in Content Repository

Checking all 3 nodes at intervals, the content repository size on disk is:
Node 1:  1.1G, goes up to 2.1G, back down to 1.1G. This is currently the coordinator
Node 2:  5.9G, static
Node 3: 201M, static

Is there a default size for the content repository set around 4GB? Looking at the documentation, I can’t seem to find the answer to that question.

# Content Repository
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.content.claim.max.appendable.size=1 MB
nifi.content.claim.max.flow.files=100
nifi.content.repository.directory.default=/u12/nifi/data/content_repository
nifi.content.repository.archive.max.retention.period=12 hours
nifi.content.repository.archive.max.usage.percentage=50%
nifi.content.repository.archive.enabled=true
nifi.content.repository.always.sync=false
nifi.content.viewer.url=../nifi-content-viewer/

Thank you for your help,
Roland

This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, notify us immediately by telephone and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.
This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, notify us immediately by telephone and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.

This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, notify us immediately by telephone and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.

Re: [EXTERNAL] NiFi Cluster 1.9.2 Content Repository

Posted by Mark Payne <ma...@hotmail.com>.
Roland,

So based on the configuration below, that makes sense. The content repo will keep archived data as long as the disk usage is below 50%, and for up to 12 hours.

So if you have 3 TB free, that’s 67% used, approximately, so data will be aged off of the archive very quickly. So note that the configuration is not saying “Use up to 50% of the drive for archive.” It’s saying “Keep archiving until 50% of the drive is used.”

Thanks
-Mark


On Jan 19, 2021, at 11:23 PM, Rosso, Roland <Ro...@AdventHealth.com>> wrote:

Joe,
The disks are 9TB drives, with over 3Tb available. This is true for each node
Ideally, we would be able to replay up to 12 hours for any of the flows in order to troubleshoot/investigate data issues.

Thanks,
Roland Rosso
AdventHealth
Big Data Administrator | Corporate Analytics
O: 407-805-8532

From: Joe Witt <jo...@gmail.com>>
Sent: Tuesday, January 19, 2021 10:26 PM
To: users@nifi.apache.org<ma...@nifi.apache.org>
Subject: [EXTERNAL] Re: NiFi Cluster 1.9.2 Content Repository

Hello

The key value to watch is the 50% value. That means we will work to remove content no longer reachable in the flow until we are at 50% of the available disk space for that volume.

So how big is the disk there for each node?

Can you share more about what you are hoping to see happen against what is happening?

Thanks

On Tue, Jan 19, 2021 at 5:18 PM Rosso, Roland <Ro...@adventhealth.com>> wrote:
Hello all,

We have  a 3 node NiFi 1.9.2 cluster for which the content repository config is below. I don’t have the entire history of this install, however we are only able to retrieve the content from the flows that ran within the past couple of minutes.
All others when trying to view NiFi Data Provenance -> Content  will show
Replay
Content is no longer available in Content Repository

Checking all 3 nodes at intervals, the content repository size on disk is:
Node 1:  1.1G, goes up to 2.1G, back down to 1.1G. This is currently the coordinator
Node 2:  5.9G, static
Node 3: 201M, static

Is there a default size for the content repository set around 4GB? Looking at the documentation, I can’t seem to find the answer to that question.

# Content Repository
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.content.claim.max.appendable.size=1 MB
nifi.content.claim.max.flow.files=100
nifi.content.repository.directory.default=/u12/nifi/data/content_repository
nifi.content.repository.archive.max.retention.period=12 hours
nifi.content.repository.archive.max.usage.percentage=50%
nifi.content.repository.archive.enabled=true
nifi.content.repository.always.sync=false
nifi.content.viewer.url=../nifi-content-viewer/

Thank you for your help,
Roland

This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, notify us immediately by telephone and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.
This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, notify us immediately by telephone and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.


RE: [EXTERNAL] Re: NiFi Cluster 1.9.2 Content Repository

Posted by "Rosso, Roland" <Ro...@AdventHealth.com>.
Joe,
The disks are 9TB drives, with over 3Tb available. This is true for each node
Ideally, we would be able to replay up to 12 hours for any of the flows in order to troubleshoot/investigate data issues.

Thanks,
Roland Rosso
AdventHealth
Big Data Administrator | Corporate Analytics
O: 407-805-8532

From: Joe Witt <jo...@gmail.com>
Sent: Tuesday, January 19, 2021 10:26 PM
To: users@nifi.apache.org
Subject: [EXTERNAL] Re: NiFi Cluster 1.9.2 Content Repository

Hello

The key value to watch is the 50% value. That means we will work to remove content no longer reachable in the flow until we are at 50% of the available disk space for that volume.

So how big is the disk there for each node?

Can you share more about what you are hoping to see happen against what is happening?

Thanks

On Tue, Jan 19, 2021 at 5:18 PM Rosso, Roland <Ro...@adventhealth.com>> wrote:
Hello all,

We have  a 3 node NiFi 1.9.2 cluster for which the content repository config is below. I don’t have the entire history of this install, however we are only able to retrieve the content from the flows that ran within the past couple of minutes.
All others when trying to view NiFi Data Provenance -> Content  will show
Replay
Content is no longer available in Content Repository

Checking all 3 nodes at intervals, the content repository size on disk is:
Node 1:  1.1G, goes up to 2.1G, back down to 1.1G. This is currently the coordinator
Node 2:  5.9G, static
Node 3: 201M, static

Is there a default size for the content repository set around 4GB? Looking at the documentation, I can’t seem to find the answer to that question.

# Content Repository
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.content.claim.max.appendable.size=1 MB
nifi.content.claim.max.flow.files=100
nifi.content.repository.directory.default=/u12/nifi/data/content_repository
nifi.content.repository.archive.max.retention.period=12 hours
nifi.content.repository.archive.max.usage.percentage=50%
nifi.content.repository.archive.enabled=true
nifi.content.repository.always.sync=false
nifi.content.viewer.url=../nifi-content-viewer/

Thank you for your help,
Roland

This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, notify us immediately by telephone and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.
This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, notify us immediately by telephone and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.

Re: NiFi Cluster 1.9.2 Content Repository

Posted by Joe Witt <jo...@gmail.com>.
Hello

The key value to watch is the 50% value. That means we will work to remove
content no longer reachable in the flow until we are at 50% of the
available disk space for that volume.

So how big is the disk there for each node?

Can you share more about what you are hoping to see happen against what is
happening?

Thanks

On Tue, Jan 19, 2021 at 5:18 PM Rosso, Roland <Ro...@adventhealth.com>
wrote:

> Hello all,
>
>
>
> We have  a 3 node NiFi 1.9.2 cluster for which the content repository
> config is below. I don’t have the entire history of this install, however
> we are only able to retrieve the content from the flows that ran within the
> past couple of minutes.
>
> All others when trying to view NiFi Data Provenance -> Content  will show
>
> *Replay*
>
> Content is no longer available in Content Repository
>
>
>
> Checking all 3 nodes at intervals, the content repository size on disk is:
>
> Node 1:  1.1G, goes up to 2.1G, back down to 1.1G. This is currently the
> coordinator
>
> Node 2:  5.9G, static
>
> Node 3: 201M, static
>
>
>
> Is there a default size for the content repository set around 4GB? Looking
> at the documentation, I can’t seem to find the answer to that question.
>
>
>
> # Content Repository
>
>
> nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
>
> nifi.content.claim.max.appendable.size=1 MB
>
> nifi.content.claim.max.flow.files=100
>
> nifi.content.repository.directory.default=/u12/nifi/data/content_repository
>
> nifi.content.repository.archive.max.retention.period=12 hours
>
> nifi.content.repository.archive.max.usage.percentage=50%
>
> nifi.content.repository.archive.enabled=true
>
> nifi.content.repository.always.sync=false
>
> nifi.content.viewer.url=../nifi-content-viewer/
>
>
>
> Thank you for your help,
>
> *Roland*
>
>
> This message (including any attachments) is intended only for the use of
> the individual or entity to which it is addressed and may contain
> information that is non-public, proprietary, privileged, confidential, and
> exempt from disclosure under applicable law or may constitute as attorney
> work product. If you are not the intended recipient, you are hereby
> notified that any use, dissemination, distribution, or copying of this
> communication is strictly prohibited. If you have received this
> communication in error, notify us immediately by telephone and (i) destroy
> this message if a facsimile or (ii) delete this message immediately if this
> is an electronic communication. Thank you.
>