You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Jo...@swisscom.com on 2019/01/04 09:49:12 UTC

How to analyse the content_repository?

Hi guys

We have an issue with the content repo. It seems that it gets filled up over time even though archiving is disabled. We have an uptime of 179 hours and we use about 7% of our disk space (picture below) right now. NiFi GUI (on top left side) tells me that we have 4.12 GB data as flowfiles in NiFi. Can somebody explain how we find out why the content_repo is growing? After a restart of NiFi the space is back to 0% utilization…

“nifi.content.repository.archive.enabled=false”, so we shouldn’t have any archive.

Any help to troubleshoot would be appreciated.

Cheers Josef
[cid:image001.png@01D4A41B.20B397E0]


[cid:image002.png@01D4A41B.20B397E0]

Re: How to analyse the content_repository?

Posted by Jo...@swisscom.com.
Yes we are using several custom processors,… anybody an idea what codesnipplet needs to be present to prevent flowfiles from staying in the content repo after processing it?


From: Mike Thomsen <mi...@gmail.com>
Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
Date: Friday, 4 January 2019 at 16:34
To: "users@nifi.apache.org" <us...@nifi.apache.org>
Subject: Re: How to analyse the content_repository?

Are you using any custom processors?

On Fri, Jan 4, 2019 at 6:20 AM <Jo...@swisscom.com>> wrote:
So we are getting closer, however still not really what I would expect ;-). Please check my screenshot, I’ve counted the files in the content_repo on one of the 8 cluster nodes. We have 12451 files. In the NiFi GUI I see 1655 files. So regarding your explanation I would expect at least 1655 files distributed over the 8 nodes active. But I see a lot more than that. It’s even worse, I can restart NiFi and the content_repo is empty (beside the few GB from the GUI of course)… if the files would be active somewhere the content_repo would not be empty after a NiFi restart…

Cheers Josef


[cid:168197f7f794cff311]




From: Pierre Villard <pi...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Friday, 4 January 2019 at 12:11
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: How to analyse the content_repository?

A claim is active as long as a flow file belonging to this claim still exists in the workflow. If you have a claim made of 100 flow files and 1 flow file is still 'active' (it'll appear in the UI) then the whole claim is active. The 99 other flow files are not included in the counter of the UI but the actual content of the 99 flow files is not removed as long as the claim is active. The claim will be removed only when all flow files are not in the workflow anymore. It really depends of the nature of the workflow and flow files are processed : if you have large and tiny flow files processed at different rates, you could have huge differences between what you see in the UI and what is used in the content repository.

Someone might chime in an provide more useful info though ;)

Pierre

Le ven. 4 janv. 2019 à 11:59, <Jo...@swisscom.com>> a écrit :
Hi Pierre

I’m already familiar with this article, thank you anyway :-). We have archiving disabled. So I don’t get why we see more disk usage then the active flows (Active Content Claims?) right now in the GUI.

Cheers Josef


From: Pierre Villard <pi...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Friday, 4 January 2019 at 11:54
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: How to analyse the content_repository?

Hi Josef,

You might be interested by this article:
https://community.hortonworks.com/articles/82308/understanding-how-nifis-content-repository-archivi.html

Pierre

Le ven. 4 janv. 2019 à 10:49, <Jo...@swisscom.com>> a écrit :
Hi guys

We have an issue with the content repo. It seems that it gets filled up over time even though archiving is disabled. We have an uptime of 179 hours and we use about 7% of our disk space (picture below) right now. NiFi GUI (on top left side) tells me that we have 4.12 GB data as flowfiles in NiFi. Can somebody explain how we find out why the content_repo is growing? After a restart of NiFi the space is back to 0% utilization…

“nifi.content.repository.archive.enabled=false”, so we shouldn’t have any archive.

Any help to troubleshoot would be appreciated.

Cheers Josef
[cid:168187da4c84cff311]


[cid:168187da4c85b16b22]

Re: How to analyse the content_repository?

Posted by Mike Thomsen <mi...@gmail.com>.
Are you using any custom processors?

On Fri, Jan 4, 2019 at 6:20 AM <Jo...@swisscom.com> wrote:

> So we are getting closer, however still not really what I would expect
> ;-). Please check my screenshot, I’ve counted the files in the content_repo
> on one of the 8 cluster nodes. We have 12451 files. In the NiFi GUI I see
> 1655 files. So regarding your explanation I would expect at least 1655
> files distributed over the 8 nodes active. But I see a lot more than that.
> It’s even worse, I can restart NiFi and the content_repo is empty (beside
> the few GB from the GUI of course)… if the files would be active somewhere
> the content_repo would not be empty after a NiFi restart…
>
>
>
> Cheers Josef
>
>
>
>
>
>
>
>
>
>
>
>
>
> *From: *Pierre Villard <pi...@gmail.com>
> *Reply-To: *"users@nifi.apache.org" <us...@nifi.apache.org>
> *Date: *Friday, 4 January 2019 at 12:11
> *To: *"users@nifi.apache.org" <us...@nifi.apache.org>
> *Subject: *Re: How to analyse the content_repository?
>
>
>
> A claim is active as long as a flow file belonging to this claim still
> exists in the workflow. If you have a claim made of 100 flow files and 1
> flow file is still 'active' (it'll appear in the UI) then the whole claim
> is active. The 99 other flow files are not included in the counter of the
> UI but the actual content of the 99 flow files is not removed as long as
> the claim is active. The claim will be removed only when all flow files are
> not in the workflow anymore. It really depends of the nature of the
> workflow and flow files are processed : if you have large and tiny flow
> files processed at different rates, you could have huge differences between
> what you see in the UI and what is used in the content repository.
>
>
>
> Someone might chime in an provide more useful info though ;)
>
>
>
> Pierre
>
>
>
> Le ven. 4 janv. 2019 à 11:59, <Jo...@swisscom.com> a écrit :
>
> Hi Pierre
>
>
>
> I’m already familiar with this article, thank you anyway :-). We have
> archiving disabled. So I don’t get why we see more disk usage then the
> active flows (Active Content Claims?) right now in the GUI.
>
>
> Cheers Josef
>
>
>
>
>
> *From: *Pierre Villard <pi...@gmail.com>
> *Reply-To: *"users@nifi.apache.org" <us...@nifi.apache.org>
> *Date: *Friday, 4 January 2019 at 11:54
> *To: *"users@nifi.apache.org" <us...@nifi.apache.org>
> *Subject: *Re: How to analyse the content_repository?
>
>
>
> Hi Josef,
>
>
>
> You might be interested by this article:
>
>
> https://community.hortonworks.com/articles/82308/understanding-how-nifis-content-repository-archivi.html
>
>
>
> Pierre
>
>
>
> Le ven. 4 janv. 2019 à 10:49, <Jo...@swisscom.com> a écrit :
>
> Hi guys
>
>
>
> We have an issue with the content repo. It seems that it gets filled up
> over time even though archiving is disabled. We have an uptime of 179 hours
> and we use about 7% of our disk space (picture below) right now. NiFi GUI
> (on top left side) tells me that we have 4.12 GB data as flowfiles in NiFi.
> Can somebody explain how we find out why the content_repo is growing? After
> a restart of NiFi the space is back to 0% utilization…
>
>
>
> “nifi.content.repository.archive.enabled=false”, so we shouldn’t have any
> archive.
>
>
>
> Any help to troubleshoot would be appreciated.
>
>
>
> Cheers Josef
>
> [image: cid:168187da4c84cff311]
>
>
>
>
>
> [image: cid:168187da4c85b16b22]
>
>

Re: How to analyse the content_repository?

Posted by Jo...@swisscom.com.
So we are getting closer, however still not really what I would expect ;-). Please check my screenshot, I’ve counted the files in the content_repo on one of the 8 cluster nodes. We have 12451 files. In the NiFi GUI I see 1655 files. So regarding your explanation I would expect at least 1655 files distributed over the 8 nodes active. But I see a lot more than that. It’s even worse, I can restart NiFi and the content_repo is empty (beside the few GB from the GUI of course)… if the files would be active somewhere the content_repo would not be empty after a NiFi restart…

Cheers Josef


[cid:image001.png@01D4A427.E5D04B20]




From: Pierre Villard <pi...@gmail.com>
Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
Date: Friday, 4 January 2019 at 12:11
To: "users@nifi.apache.org" <us...@nifi.apache.org>
Subject: Re: How to analyse the content_repository?

A claim is active as long as a flow file belonging to this claim still exists in the workflow. If you have a claim made of 100 flow files and 1 flow file is still 'active' (it'll appear in the UI) then the whole claim is active. The 99 other flow files are not included in the counter of the UI but the actual content of the 99 flow files is not removed as long as the claim is active. The claim will be removed only when all flow files are not in the workflow anymore. It really depends of the nature of the workflow and flow files are processed : if you have large and tiny flow files processed at different rates, you could have huge differences between what you see in the UI and what is used in the content repository.

Someone might chime in an provide more useful info though ;)

Pierre

Le ven. 4 janv. 2019 à 11:59, <Jo...@swisscom.com>> a écrit :
Hi Pierre

I’m already familiar with this article, thank you anyway :-). We have archiving disabled. So I don’t get why we see more disk usage then the active flows (Active Content Claims?) right now in the GUI.

Cheers Josef


From: Pierre Villard <pi...@gmail.com>>
Reply-To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Date: Friday, 4 January 2019 at 11:54
To: "users@nifi.apache.org<ma...@nifi.apache.org>" <us...@nifi.apache.org>>
Subject: Re: How to analyse the content_repository?

Hi Josef,

You might be interested by this article:
https://community.hortonworks.com/articles/82308/understanding-how-nifis-content-repository-archivi.html

Pierre

Le ven. 4 janv. 2019 à 10:49, <Jo...@swisscom.com>> a écrit :
Hi guys

We have an issue with the content repo. It seems that it gets filled up over time even though archiving is disabled. We have an uptime of 179 hours and we use about 7% of our disk space (picture below) right now. NiFi GUI (on top left side) tells me that we have 4.12 GB data as flowfiles in NiFi. Can somebody explain how we find out why the content_repo is growing? After a restart of NiFi the space is back to 0% utilization…

“nifi.content.repository.archive.enabled=false”, so we shouldn’t have any archive.

Any help to troubleshoot would be appreciated.

Cheers Josef
[cid:168187da4c84cff311]


[cid:168187da4c85b16b22]

Re: How to analyse the content_repository?

Posted by Pierre Villard <pi...@gmail.com>.
A claim is active as long as a flow file belonging to this claim still
exists in the workflow. If you have a claim made of 100 flow files and 1
flow file is still 'active' (it'll appear in the UI) then the whole claim
is active. The 99 other flow files are not included in the counter of the
UI but the actual content of the 99 flow files is not removed as long as
the claim is active. The claim will be removed only when all flow files are
not in the workflow anymore. It really depends of the nature of the
workflow and flow files are processed : if you have large and tiny flow
files processed at different rates, you could have huge differences between
what you see in the UI and what is used in the content repository.

Someone might chime in an provide more useful info though ;)

Pierre

Le ven. 4 janv. 2019 à 11:59, <Jo...@swisscom.com> a écrit :

> Hi Pierre
>
>
>
> I’m already familiar with this article, thank you anyway :-). We have
> archiving disabled. So I don’t get why we see more disk usage then the
> active flows (Active Content Claims?) right now in the GUI.
>
>
> Cheers Josef
>
>
>
>
>
> *From: *Pierre Villard <pi...@gmail.com>
> *Reply-To: *"users@nifi.apache.org" <us...@nifi.apache.org>
> *Date: *Friday, 4 January 2019 at 11:54
> *To: *"users@nifi.apache.org" <us...@nifi.apache.org>
> *Subject: *Re: How to analyse the content_repository?
>
>
>
> Hi Josef,
>
>
>
> You might be interested by this article:
>
>
> https://community.hortonworks.com/articles/82308/understanding-how-nifis-content-repository-archivi.html
>
>
>
> Pierre
>
>
>
> Le ven. 4 janv. 2019 à 10:49, <Jo...@swisscom.com> a écrit :
>
> Hi guys
>
>
>
> We have an issue with the content repo. It seems that it gets filled up
> over time even though archiving is disabled. We have an uptime of 179 hours
> and we use about 7% of our disk space (picture below) right now. NiFi GUI
> (on top left side) tells me that we have 4.12 GB data as flowfiles in NiFi.
> Can somebody explain how we find out why the content_repo is growing? After
> a restart of NiFi the space is back to 0% utilization…
>
>
>
> “nifi.content.repository.archive.enabled=false”, so we shouldn’t have any
> archive.
>
>
>
> Any help to troubleshoot would be appreciated.
>
>
>
> Cheers Josef
>
> [image: cid:168187da4c84cff311]
>
>
>
>
>
> [image: cid:168187da4c85b16b22]
>
>

Re: How to analyse the content_repository?

Posted by Jo...@swisscom.com.
Hi Pierre

I’m already familiar with this article, thank you anyway :-). We have archiving disabled. So I don’t get why we see more disk usage then the active flows (Active Content Claims?) right now in the GUI.

Cheers Josef


From: Pierre Villard <pi...@gmail.com>
Reply-To: "users@nifi.apache.org" <us...@nifi.apache.org>
Date: Friday, 4 January 2019 at 11:54
To: "users@nifi.apache.org" <us...@nifi.apache.org>
Subject: Re: How to analyse the content_repository?

Hi Josef,

You might be interested by this article:
https://community.hortonworks.com/articles/82308/understanding-how-nifis-content-repository-archivi.html

Pierre

Le ven. 4 janv. 2019 à 10:49, <Jo...@swisscom.com>> a écrit :
Hi guys

We have an issue with the content repo. It seems that it gets filled up over time even though archiving is disabled. We have an uptime of 179 hours and we use about 7% of our disk space (picture below) right now. NiFi GUI (on top left side) tells me that we have 4.12 GB data as flowfiles in NiFi. Can somebody explain how we find out why the content_repo is growing? After a restart of NiFi the space is back to 0% utilization…

“nifi.content.repository.archive.enabled=false”, so we shouldn’t have any archive.

Any help to troubleshoot would be appreciated.

Cheers Josef
[cid:168187da4c84cff311]


[cid:168187da4c85b16b22]

Re: How to analyse the content_repository?

Posted by Pierre Villard <pi...@gmail.com>.
Hi Josef,

You might be interested by this article:
https://community.hortonworks.com/articles/82308/understanding-how-nifis-content-repository-archivi.html

Pierre

Le ven. 4 janv. 2019 à 10:49, <Jo...@swisscom.com> a écrit :

> Hi guys
>
>
>
> We have an issue with the content repo. It seems that it gets filled up
> over time even though archiving is disabled. We have an uptime of 179 hours
> and we use about 7% of our disk space (picture below) right now. NiFi GUI
> (on top left side) tells me that we have 4.12 GB data as flowfiles in NiFi.
> Can somebody explain how we find out why the content_repo is growing? After
> a restart of NiFi the space is back to 0% utilization…
>
>
>
> “nifi.content.repository.archive.enabled=false”, so we shouldn’t have any
> archive.
>
>
>
> Any help to troubleshoot would be appreciated.
>
>
>
> Cheers Josef
>
>
>
>
>
>