You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Mike Thomsen <mi...@gmail.com> on 2020/02/01 12:56:22 UTC

Potential 1.11.X showstopper

https://stackoverflow.com/questions/59991035/nifi-1-11-opening-more-than-50k-files/60017064#60017064

No idea if this is valid or not. I asked for clarification to see if there
might be a specific processor or something that is triggering this.

Re: Potential 1.11.X showstopper

Posted by Mark Payne <ma...@hotmail.com>.
As Joe mentioned earlier in the thread, the way to track down a "too many open files" problem is to run "lsof -p <pid>"
That will show all open files, and it often makes it pretty obvious what it is that's holding the file handles open.

> On Feb 6, 2020, at 2:12 PM, Mike Thomsen <mi...@gmail.com> wrote:
> 
> Can you share a description of your flows in terms of average flowfile
> size, queue size, data velocity, etc.?
> 
> Thanks,
> 
> Mike
> 
> On Thu, Feb 6, 2020 at 1:59 PM Elli Schwarz
> <el...@yahoo.com.invalid> wrote:
> 
>> We seem to be experiencing the same problems. We recently upgraded
>> several of our Nifis from 1.9.2 to 1.11.0, and now many of them are failing
>> with "too many open files". Nothing else changed other than the upgrade,
>> and our data volume is the same as before. The only solution we've been
>> able to come up with is to run a script to check for this condition and
>> restart the Nifi. Any other ideas?
>> Thank you!
>> 
>>    On Sunday, February 2, 2020, 9:11:34 AM EST, Mike Thomsen <
>> mikerthomsen@gmail.com> wrote:
>> 
>> Without further details, this is what I did to see if it was something
>> other than the usual issue of having not enough file handlers available.
>> Something like a legitimate case of someone forgetting to close file
>> objects or something in the code itself.
>> 
>> 1. Setup a 8core/32GB VM on AWS w/ Amazon AMI.
>> 2. Pushed 1.11.1RC1
>> 3. Pushed the RAM settings to 6/12GB
>> 4. Disabled flowfile archiving because I only allocated 8GB of storage.
>> 5. Setup a flow that used 2 generateflow instances to generate massive
>> amounts of garbage data using all available cores. (All queues were setup
>> to hold 250k flow files)
>> 6. Kicked it off and let it run for probably about 20 minutes.
>> 
>> No apparent problem with closing and releasing resources here.
>> 
>> On Sat, Feb 1, 2020 at 8:00 AM Joe Witt <jo...@gmail.com> wrote:
>> 
>>> these are usually very easy to find.
>>> 
>>> run lsof -p pid.  and share results
>>> 
>>> 
>>> thanks
>>> 
>>> On Sat, Feb 1, 2020 at 7:56 AM Mike Thomsen <mi...@gmail.com>
>>> wrote:
>>> 
>>>> 
>>>> 
>>> 
>> https://stackoverflow.com/questions/59991035/nifi-1-11-opening-more-than-50k-files/60017064#60017064
>>>> 
>>>> No idea if this is valid or not. I asked for clarification to see if
>>> there
>>>> might be a specific processor or something that is triggering this.
>>>> 
>>> 
>> 


Re: Potential 1.11.X showstopper

Posted by Pierre Villard <pi...@gmail.com>.
Hello,

I believe the problem reported here is similar to the one described in
https://issues.apache.org/jira/browse/NIFI-7114.

However, few community members and myself haven't been able to reproduce
the issue. Can anyone in a position to easily replicate the issue can
clarify:
- the exhaustive list of components (processors, controller services,
reporting tasks) running in the NiFi instance
- details of the NiFi setup: OS, Java version, NiFi version,
standalone/cluster installation, secured/unsecured installation

The only thing that seems common across the occurrences is the Java
version: 8u242. However, I have not been able to reproduce the issue with
this Java version... If someone able to replicate the issue can try to
downgrade the Java version and let the community know if this changes
something, that would be great.

Thanks,
Pierre


Le jeu. 6 févr. 2020 à 17:41, Mike Thomsen <mi...@gmail.com> a
écrit :

> My setup was very similar, but didn't have the site to site reporting.
>
> On Thu, Feb 6, 2020 at 5:13 PM Joe Witt <jo...@gmail.com> wrote:
>
> > yeah will investigate
> >
> > thanks
> >
> > On Thu, Feb 6, 2020 at 4:49 PM Ryan Hendrickson <
> > ryan.andrew.hendrickson@gmail.com> wrote:
> >
> > > Joe,
> > >    We're running:
> > >
> > >    - OpenJDK Java 1.8.0_242
> > >    - NiFi 1.11.0
> > >    - CentOS Linux 7.7.1908
> > >
> > >
> > >    We're seeing this across a dozen NiFi's with the same setup.  To
> > > reproduce the issue, Generate Flow Files 100GB across a couple million
> > > files -> Site to Site -> Receive data -> Merge Content.  We had no
> issues
> > > with this stack:
> > >
> > >    - OpenJDK Java 1.8.0_232.
> > >    - NiFi 1.9.2
> > >    - CentOS Linux 7.7.1908
> > >
> > >    Can your team setup a similar stack and test?
> > >
> > > Ryan
> > >
> > > On Thu, Feb 6, 2020 at 4:15 PM Joe Witt <jo...@gmail.com> wrote:
> > >
> > > > received a direct reply - Elli cannot share.
> > > >
> > > > I think unless someone else is able to replicate the behavior there
> > isn't
> > > > much more we can tackle on this.
> > > >
> > > > Thanks
> > > >
> > > > On Thu, Feb 6, 2020 at 4:10 PM Joe Witt <jo...@gmail.com> wrote:
> > > >
> > > > > Yes Elli it is possible.  Can we please get those lsof outputs in a
> > > JIRA?
> > > > > As well as more details about configuration?
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Thu, Feb 6, 2020 at 2:44 PM Andy LoPresto <alopresto@apache.org
> >
> > > > wrote:
> > > > >
> > > > >> I have no input on the specific issue you’re encountering, but a
> > > pattern
> > > > >> we have seen to reduce the overhead of multiple remote input ports
> > > being
> > > > >> required is to use a “central” remote input port and immediately
> > > follow
> > > > it
> > > > >> with a RouteOnAttribute to distribute specific flowfiles to the
> > > > appropriate
> > > > >> downstream flow / process group. Whatever sends data to this port
> > can
> > > > use
> > > > >> an UpdateAttribute to add some “tracking/routing” attribute on the
> > > > >> flowfiles before being sent. Inserting Merge/Split will likely
> > affect
> > > > your
> > > > >> timing due to waiting for bins to fill, depending on your volume.
> > S2S
> > > is
> > > > >> pretty good at transmitting data on-demand with low overhead on
> one
> > > > port;
> > > > >> it’s when you have many remote input ports that there is
> substantial
> > > > >> overhead.
> > > > >>
> > > > >>
> > > > >> Andy LoPresto
> > > > >> alopresto@apache.org
> > > > >> alopresto.apache@gmail.com
> > > > >> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D
> EF69
> > > > >>
> > > > >> > On Feb 6, 2020, at 2:34 PM, Elli Schwarz <
> > eliezer_schwarz@yahoo.com
> > > > .INVALID>
> > > > >> wrote:
> > > > >> >
> > > > >> > We ran that command - it appears the site-to-sites that are
> > causing
> > > > the
> > > > >> issue. We had a lot of remote process groups that weren't even
> being
> > > > used
> > > > >> (no data was being sent to that part of the dataflow), yet when
> > > running
> > > > the
> > > > >> lsof command they each had a large number of open files - almost
> > 2k! -
> > > > >> showing CLOSE_WAIT. Again, there were no flowfiles being sent to
> > them,
> > > > so
> > > > >> can it be some kind of bug that keeping a remote process group
> open
> > is
> > > > >> somehow opening files and not closing them? (BTW, the reason we
> had
> > to
> > > > >> upgrade from 1.9.2 to 1.11.0 was because we had upgraded our Java
> > > > version
> > > > >> and that cause an IllegalBlockingModeException - is it possible
> that
> > > > >> whatever fixed that problem is now causing an issue with open
> > files?)
> > > > >> >
> > > > >> > We now disabled all of the unused remote process groups. We
> still
> > > have
> > > > >> several remote process groups that we are using so if this is the
> > > issue
> > > > it
> > > > >> might be difficult to avoid, but at least we decreased the number
> of
> > > > remote
> > > > >> process groups we have. Another approach we are trying is a merge
> > > > content
> > > > >> before we send to the Nifi having the most issues, to have fewer
> > flow
> > > > files
> > > > >> sent at once site to site, and then splitting them after they are
> > > > received.
> > > > >> > Thank you!
> > > > >> >
> > > > >> >    On Thursday, February 6, 2020, 2:19:48 PM EST, Mike Thomsen <
> > > > >> mikerthomsen@gmail.com> wrote:
> > > > >> >
> > > > >> > Can you share a description of your flows in terms of average
> > > flowfile
> > > > >> size, queue size, data velocity, etc.?
> > > > >> > Thanks,
> > > > >> > Mike
> > > > >> >
> > > > >> > On Thu, Feb 6, 2020 at 1:59 PM Elli Schwarz <
> > > > eliezer_schwarz@yahoo.com.invalid>
> > > > >> wrote:
> > > > >> >
> > > > >> >  We seem to be experiencing the same problems. We recently
> > upgraded
> > > > >> several of our Nifis from 1.9.2 to 1.11.0, and now many of them
> are
> > > > failing
> > > > >> with "too many open files". Nothing else changed other than the
> > > upgrade,
> > > > >> and our data volume is the same as before. The only solution we've
> > > been
> > > > >> able to come up with is to run a script to check for this
> condition
> > > and
> > > > >> restart the Nifi. Any other ideas?
> > > > >> > Thank you!
> > > > >> >
> > > > >> >     On Sunday, February 2, 2020, 9:11:34 AM EST, Mike Thomsen <
> > > > >> mikerthomsen@gmail.com> wrote:
> > > > >> >
> > > > >> >  Without further details, this is what I did to see if it was
> > > > something
> > > > >> > other than the usual issue of having not enough file handlers
> > > > available.
> > > > >> > Something like a legitimate case of someone forgetting to close
> > file
> > > > >> > objects or something in the code itself.
> > > > >> >
> > > > >> > 1. Setup a 8core/32GB VM on AWS w/ Amazon AMI.
> > > > >> > 2. Pushed 1.11.1RC1
> > > > >> > 3. Pushed the RAM settings to 6/12GB
> > > > >> > 4. Disabled flowfile archiving because I only allocated 8GB of
> > > > storage.
> > > > >> > 5. Setup a flow that used 2 generateflow instances to generate
> > > massive
> > > > >> > amounts of garbage data using all available cores. (All queues
> > were
> > > > >> setup
> > > > >> > to hold 250k flow files)
> > > > >> > 6. Kicked it off and let it run for probably about 20 minutes.
> > > > >> >
> > > > >> > No apparent problem with closing and releasing resources here.
> > > > >> >
> > > > >> > On Sat, Feb 1, 2020 at 8:00 AM Joe Witt <jo...@gmail.com>
> > wrote:
> > > > >> >
> > > > >> >> these are usually very easy to find.
> > > > >> >>
> > > > >> >> run lsof -p pid.  and share results
> > > > >> >>
> > > > >> >>
> > > > >> >> thanks
> > > > >> >>
> > > > >> >> On Sat, Feb 1, 2020 at 7:56 AM Mike Thomsen <
> > > mikerthomsen@gmail.com>
> > > > >> >> wrote:
> > > > >> >>
> > > > >> >>>
> > > > >> >>>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> https://stackoverflow.com/questions/59991035/nifi-1-11-opening-more-than-50k-files/60017064#60017064
> > > > >> >>>
> > > > >> >>> No idea if this is valid or not. I asked for clarification to
> > see
> > > if
> > > > >> >> there
> > > > >> >>> might be a specific processor or something that is triggering
> > > this.
> > > > >> >>>
> > > > >> >>
> > > > >> >
> > > > >>
> > > > >>
> > > >
> > >
> >
>

Re: Potential 1.11.X showstopper

Posted by Mike Thomsen <mi...@gmail.com>.
My setup was very similar, but didn't have the site to site reporting.

On Thu, Feb 6, 2020 at 5:13 PM Joe Witt <jo...@gmail.com> wrote:

> yeah will investigate
>
> thanks
>
> On Thu, Feb 6, 2020 at 4:49 PM Ryan Hendrickson <
> ryan.andrew.hendrickson@gmail.com> wrote:
>
> > Joe,
> >    We're running:
> >
> >    - OpenJDK Java 1.8.0_242
> >    - NiFi 1.11.0
> >    - CentOS Linux 7.7.1908
> >
> >
> >    We're seeing this across a dozen NiFi's with the same setup.  To
> > reproduce the issue, Generate Flow Files 100GB across a couple million
> > files -> Site to Site -> Receive data -> Merge Content.  We had no issues
> > with this stack:
> >
> >    - OpenJDK Java 1.8.0_232.
> >    - NiFi 1.9.2
> >    - CentOS Linux 7.7.1908
> >
> >    Can your team setup a similar stack and test?
> >
> > Ryan
> >
> > On Thu, Feb 6, 2020 at 4:15 PM Joe Witt <jo...@gmail.com> wrote:
> >
> > > received a direct reply - Elli cannot share.
> > >
> > > I think unless someone else is able to replicate the behavior there
> isn't
> > > much more we can tackle on this.
> > >
> > > Thanks
> > >
> > > On Thu, Feb 6, 2020 at 4:10 PM Joe Witt <jo...@gmail.com> wrote:
> > >
> > > > Yes Elli it is possible.  Can we please get those lsof outputs in a
> > JIRA?
> > > > As well as more details about configuration?
> > > >
> > > > Thanks
> > > >
> > > > On Thu, Feb 6, 2020 at 2:44 PM Andy LoPresto <al...@apache.org>
> > > wrote:
> > > >
> > > >> I have no input on the specific issue you’re encountering, but a
> > pattern
> > > >> we have seen to reduce the overhead of multiple remote input ports
> > being
> > > >> required is to use a “central” remote input port and immediately
> > follow
> > > it
> > > >> with a RouteOnAttribute to distribute specific flowfiles to the
> > > appropriate
> > > >> downstream flow / process group. Whatever sends data to this port
> can
> > > use
> > > >> an UpdateAttribute to add some “tracking/routing” attribute on the
> > > >> flowfiles before being sent. Inserting Merge/Split will likely
> affect
> > > your
> > > >> timing due to waiting for bins to fill, depending on your volume.
> S2S
> > is
> > > >> pretty good at transmitting data on-demand with low overhead on one
> > > port;
> > > >> it’s when you have many remote input ports that there is substantial
> > > >> overhead.
> > > >>
> > > >>
> > > >> Andy LoPresto
> > > >> alopresto@apache.org
> > > >> alopresto.apache@gmail.com
> > > >> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> > > >>
> > > >> > On Feb 6, 2020, at 2:34 PM, Elli Schwarz <
> eliezer_schwarz@yahoo.com
> > > .INVALID>
> > > >> wrote:
> > > >> >
> > > >> > We ran that command - it appears the site-to-sites that are
> causing
> > > the
> > > >> issue. We had a lot of remote process groups that weren't even being
> > > used
> > > >> (no data was being sent to that part of the dataflow), yet when
> > running
> > > the
> > > >> lsof command they each had a large number of open files - almost
> 2k! -
> > > >> showing CLOSE_WAIT. Again, there were no flowfiles being sent to
> them,
> > > so
> > > >> can it be some kind of bug that keeping a remote process group open
> is
> > > >> somehow opening files and not closing them? (BTW, the reason we had
> to
> > > >> upgrade from 1.9.2 to 1.11.0 was because we had upgraded our Java
> > > version
> > > >> and that cause an IllegalBlockingModeException - is it possible that
> > > >> whatever fixed that problem is now causing an issue with open
> files?)
> > > >> >
> > > >> > We now disabled all of the unused remote process groups. We still
> > have
> > > >> several remote process groups that we are using so if this is the
> > issue
> > > it
> > > >> might be difficult to avoid, but at least we decreased the number of
> > > remote
> > > >> process groups we have. Another approach we are trying is a merge
> > > content
> > > >> before we send to the Nifi having the most issues, to have fewer
> flow
> > > files
> > > >> sent at once site to site, and then splitting them after they are
> > > received.
> > > >> > Thank you!
> > > >> >
> > > >> >    On Thursday, February 6, 2020, 2:19:48 PM EST, Mike Thomsen <
> > > >> mikerthomsen@gmail.com> wrote:
> > > >> >
> > > >> > Can you share a description of your flows in terms of average
> > flowfile
> > > >> size, queue size, data velocity, etc.?
> > > >> > Thanks,
> > > >> > Mike
> > > >> >
> > > >> > On Thu, Feb 6, 2020 at 1:59 PM Elli Schwarz <
> > > eliezer_schwarz@yahoo.com.invalid>
> > > >> wrote:
> > > >> >
> > > >> >  We seem to be experiencing the same problems. We recently
> upgraded
> > > >> several of our Nifis from 1.9.2 to 1.11.0, and now many of them are
> > > failing
> > > >> with "too many open files". Nothing else changed other than the
> > upgrade,
> > > >> and our data volume is the same as before. The only solution we've
> > been
> > > >> able to come up with is to run a script to check for this condition
> > and
> > > >> restart the Nifi. Any other ideas?
> > > >> > Thank you!
> > > >> >
> > > >> >     On Sunday, February 2, 2020, 9:11:34 AM EST, Mike Thomsen <
> > > >> mikerthomsen@gmail.com> wrote:
> > > >> >
> > > >> >  Without further details, this is what I did to see if it was
> > > something
> > > >> > other than the usual issue of having not enough file handlers
> > > available.
> > > >> > Something like a legitimate case of someone forgetting to close
> file
> > > >> > objects or something in the code itself.
> > > >> >
> > > >> > 1. Setup a 8core/32GB VM on AWS w/ Amazon AMI.
> > > >> > 2. Pushed 1.11.1RC1
> > > >> > 3. Pushed the RAM settings to 6/12GB
> > > >> > 4. Disabled flowfile archiving because I only allocated 8GB of
> > > storage.
> > > >> > 5. Setup a flow that used 2 generateflow instances to generate
> > massive
> > > >> > amounts of garbage data using all available cores. (All queues
> were
> > > >> setup
> > > >> > to hold 250k flow files)
> > > >> > 6. Kicked it off and let it run for probably about 20 minutes.
> > > >> >
> > > >> > No apparent problem with closing and releasing resources here.
> > > >> >
> > > >> > On Sat, Feb 1, 2020 at 8:00 AM Joe Witt <jo...@gmail.com>
> wrote:
> > > >> >
> > > >> >> these are usually very easy to find.
> > > >> >>
> > > >> >> run lsof -p pid.  and share results
> > > >> >>
> > > >> >>
> > > >> >> thanks
> > > >> >>
> > > >> >> On Sat, Feb 1, 2020 at 7:56 AM Mike Thomsen <
> > mikerthomsen@gmail.com>
> > > >> >> wrote:
> > > >> >>
> > > >> >>>
> > > >> >>>
> > > >> >>
> > > >>
> > >
> >
> https://stackoverflow.com/questions/59991035/nifi-1-11-opening-more-than-50k-files/60017064#60017064
> > > >> >>>
> > > >> >>> No idea if this is valid or not. I asked for clarification to
> see
> > if
> > > >> >> there
> > > >> >>> might be a specific processor or something that is triggering
> > this.
> > > >> >>>
> > > >> >>
> > > >> >
> > > >>
> > > >>
> > >
> >
>

Re: Potential 1.11.X showstopper

Posted by Joe Witt <jo...@gmail.com>.
yeah will investigate

thanks

On Thu, Feb 6, 2020 at 4:49 PM Ryan Hendrickson <
ryan.andrew.hendrickson@gmail.com> wrote:

> Joe,
>    We're running:
>
>    - OpenJDK Java 1.8.0_242
>    - NiFi 1.11.0
>    - CentOS Linux 7.7.1908
>
>
>    We're seeing this across a dozen NiFi's with the same setup.  To
> reproduce the issue, Generate Flow Files 100GB across a couple million
> files -> Site to Site -> Receive data -> Merge Content.  We had no issues
> with this stack:
>
>    - OpenJDK Java 1.8.0_232.
>    - NiFi 1.9.2
>    - CentOS Linux 7.7.1908
>
>    Can your team setup a similar stack and test?
>
> Ryan
>
> On Thu, Feb 6, 2020 at 4:15 PM Joe Witt <jo...@gmail.com> wrote:
>
> > received a direct reply - Elli cannot share.
> >
> > I think unless someone else is able to replicate the behavior there isn't
> > much more we can tackle on this.
> >
> > Thanks
> >
> > On Thu, Feb 6, 2020 at 4:10 PM Joe Witt <jo...@gmail.com> wrote:
> >
> > > Yes Elli it is possible.  Can we please get those lsof outputs in a
> JIRA?
> > > As well as more details about configuration?
> > >
> > > Thanks
> > >
> > > On Thu, Feb 6, 2020 at 2:44 PM Andy LoPresto <al...@apache.org>
> > wrote:
> > >
> > >> I have no input on the specific issue you’re encountering, but a
> pattern
> > >> we have seen to reduce the overhead of multiple remote input ports
> being
> > >> required is to use a “central” remote input port and immediately
> follow
> > it
> > >> with a RouteOnAttribute to distribute specific flowfiles to the
> > appropriate
> > >> downstream flow / process group. Whatever sends data to this port can
> > use
> > >> an UpdateAttribute to add some “tracking/routing” attribute on the
> > >> flowfiles before being sent. Inserting Merge/Split will likely affect
> > your
> > >> timing due to waiting for bins to fill, depending on your volume. S2S
> is
> > >> pretty good at transmitting data on-demand with low overhead on one
> > port;
> > >> it’s when you have many remote input ports that there is substantial
> > >> overhead.
> > >>
> > >>
> > >> Andy LoPresto
> > >> alopresto@apache.org
> > >> alopresto.apache@gmail.com
> > >> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> > >>
> > >> > On Feb 6, 2020, at 2:34 PM, Elli Schwarz <eliezer_schwarz@yahoo.com
> > .INVALID>
> > >> wrote:
> > >> >
> > >> > We ran that command - it appears the site-to-sites that are causing
> > the
> > >> issue. We had a lot of remote process groups that weren't even being
> > used
> > >> (no data was being sent to that part of the dataflow), yet when
> running
> > the
> > >> lsof command they each had a large number of open files - almost 2k! -
> > >> showing CLOSE_WAIT. Again, there were no flowfiles being sent to them,
> > so
> > >> can it be some kind of bug that keeping a remote process group open is
> > >> somehow opening files and not closing them? (BTW, the reason we had to
> > >> upgrade from 1.9.2 to 1.11.0 was because we had upgraded our Java
> > version
> > >> and that cause an IllegalBlockingModeException - is it possible that
> > >> whatever fixed that problem is now causing an issue with open files?)
> > >> >
> > >> > We now disabled all of the unused remote process groups. We still
> have
> > >> several remote process groups that we are using so if this is the
> issue
> > it
> > >> might be difficult to avoid, but at least we decreased the number of
> > remote
> > >> process groups we have. Another approach we are trying is a merge
> > content
> > >> before we send to the Nifi having the most issues, to have fewer flow
> > files
> > >> sent at once site to site, and then splitting them after they are
> > received.
> > >> > Thank you!
> > >> >
> > >> >    On Thursday, February 6, 2020, 2:19:48 PM EST, Mike Thomsen <
> > >> mikerthomsen@gmail.com> wrote:
> > >> >
> > >> > Can you share a description of your flows in terms of average
> flowfile
> > >> size, queue size, data velocity, etc.?
> > >> > Thanks,
> > >> > Mike
> > >> >
> > >> > On Thu, Feb 6, 2020 at 1:59 PM Elli Schwarz <
> > eliezer_schwarz@yahoo.com.invalid>
> > >> wrote:
> > >> >
> > >> >  We seem to be experiencing the same problems. We recently upgraded
> > >> several of our Nifis from 1.9.2 to 1.11.0, and now many of them are
> > failing
> > >> with "too many open files". Nothing else changed other than the
> upgrade,
> > >> and our data volume is the same as before. The only solution we've
> been
> > >> able to come up with is to run a script to check for this condition
> and
> > >> restart the Nifi. Any other ideas?
> > >> > Thank you!
> > >> >
> > >> >     On Sunday, February 2, 2020, 9:11:34 AM EST, Mike Thomsen <
> > >> mikerthomsen@gmail.com> wrote:
> > >> >
> > >> >  Without further details, this is what I did to see if it was
> > something
> > >> > other than the usual issue of having not enough file handlers
> > available.
> > >> > Something like a legitimate case of someone forgetting to close file
> > >> > objects or something in the code itself.
> > >> >
> > >> > 1. Setup a 8core/32GB VM on AWS w/ Amazon AMI.
> > >> > 2. Pushed 1.11.1RC1
> > >> > 3. Pushed the RAM settings to 6/12GB
> > >> > 4. Disabled flowfile archiving because I only allocated 8GB of
> > storage.
> > >> > 5. Setup a flow that used 2 generateflow instances to generate
> massive
> > >> > amounts of garbage data using all available cores. (All queues were
> > >> setup
> > >> > to hold 250k flow files)
> > >> > 6. Kicked it off and let it run for probably about 20 minutes.
> > >> >
> > >> > No apparent problem with closing and releasing resources here.
> > >> >
> > >> > On Sat, Feb 1, 2020 at 8:00 AM Joe Witt <jo...@gmail.com> wrote:
> > >> >
> > >> >> these are usually very easy to find.
> > >> >>
> > >> >> run lsof -p pid.  and share results
> > >> >>
> > >> >>
> > >> >> thanks
> > >> >>
> > >> >> On Sat, Feb 1, 2020 at 7:56 AM Mike Thomsen <
> mikerthomsen@gmail.com>
> > >> >> wrote:
> > >> >>
> > >> >>>
> > >> >>>
> > >> >>
> > >>
> >
> https://stackoverflow.com/questions/59991035/nifi-1-11-opening-more-than-50k-files/60017064#60017064
> > >> >>>
> > >> >>> No idea if this is valid or not. I asked for clarification to see
> if
> > >> >> there
> > >> >>> might be a specific processor or something that is triggering
> this.
> > >> >>>
> > >> >>
> > >> >
> > >>
> > >>
> >
>

Re: Potential 1.11.X showstopper

Posted by Ryan Hendrickson <ry...@gmail.com>.
Joe,
   We're running:

   - OpenJDK Java 1.8.0_242
   - NiFi 1.11.0
   - CentOS Linux 7.7.1908


   We're seeing this across a dozen NiFi's with the same setup.  To
reproduce the issue, Generate Flow Files 100GB across a couple million
files -> Site to Site -> Receive data -> Merge Content.  We had no issues
with this stack:

   - OpenJDK Java 1.8.0_232.
   - NiFi 1.9.2
   - CentOS Linux 7.7.1908

   Can your team setup a similar stack and test?

Ryan

On Thu, Feb 6, 2020 at 4:15 PM Joe Witt <jo...@gmail.com> wrote:

> received a direct reply - Elli cannot share.
>
> I think unless someone else is able to replicate the behavior there isn't
> much more we can tackle on this.
>
> Thanks
>
> On Thu, Feb 6, 2020 at 4:10 PM Joe Witt <jo...@gmail.com> wrote:
>
> > Yes Elli it is possible.  Can we please get those lsof outputs in a JIRA?
> > As well as more details about configuration?
> >
> > Thanks
> >
> > On Thu, Feb 6, 2020 at 2:44 PM Andy LoPresto <al...@apache.org>
> wrote:
> >
> >> I have no input on the specific issue you’re encountering, but a pattern
> >> we have seen to reduce the overhead of multiple remote input ports being
> >> required is to use a “central” remote input port and immediately follow
> it
> >> with a RouteOnAttribute to distribute specific flowfiles to the
> appropriate
> >> downstream flow / process group. Whatever sends data to this port can
> use
> >> an UpdateAttribute to add some “tracking/routing” attribute on the
> >> flowfiles before being sent. Inserting Merge/Split will likely affect
> your
> >> timing due to waiting for bins to fill, depending on your volume. S2S is
> >> pretty good at transmitting data on-demand with low overhead on one
> port;
> >> it’s when you have many remote input ports that there is substantial
> >> overhead.
> >>
> >>
> >> Andy LoPresto
> >> alopresto@apache.org
> >> alopresto.apache@gmail.com
> >> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> >>
> >> > On Feb 6, 2020, at 2:34 PM, Elli Schwarz <eliezer_schwarz@yahoo.com
> .INVALID>
> >> wrote:
> >> >
> >> > We ran that command - it appears the site-to-sites that are causing
> the
> >> issue. We had a lot of remote process groups that weren't even being
> used
> >> (no data was being sent to that part of the dataflow), yet when running
> the
> >> lsof command they each had a large number of open files - almost 2k! -
> >> showing CLOSE_WAIT. Again, there were no flowfiles being sent to them,
> so
> >> can it be some kind of bug that keeping a remote process group open is
> >> somehow opening files and not closing them? (BTW, the reason we had to
> >> upgrade from 1.9.2 to 1.11.0 was because we had upgraded our Java
> version
> >> and that cause an IllegalBlockingModeException - is it possible that
> >> whatever fixed that problem is now causing an issue with open files?)
> >> >
> >> > We now disabled all of the unused remote process groups. We still have
> >> several remote process groups that we are using so if this is the issue
> it
> >> might be difficult to avoid, but at least we decreased the number of
> remote
> >> process groups we have. Another approach we are trying is a merge
> content
> >> before we send to the Nifi having the most issues, to have fewer flow
> files
> >> sent at once site to site, and then splitting them after they are
> received.
> >> > Thank you!
> >> >
> >> >    On Thursday, February 6, 2020, 2:19:48 PM EST, Mike Thomsen <
> >> mikerthomsen@gmail.com> wrote:
> >> >
> >> > Can you share a description of your flows in terms of average flowfile
> >> size, queue size, data velocity, etc.?
> >> > Thanks,
> >> > Mike
> >> >
> >> > On Thu, Feb 6, 2020 at 1:59 PM Elli Schwarz <
> eliezer_schwarz@yahoo.com.invalid>
> >> wrote:
> >> >
> >> >  We seem to be experiencing the same problems. We recently upgraded
> >> several of our Nifis from 1.9.2 to 1.11.0, and now many of them are
> failing
> >> with "too many open files". Nothing else changed other than the upgrade,
> >> and our data volume is the same as before. The only solution we've been
> >> able to come up with is to run a script to check for this condition and
> >> restart the Nifi. Any other ideas?
> >> > Thank you!
> >> >
> >> >     On Sunday, February 2, 2020, 9:11:34 AM EST, Mike Thomsen <
> >> mikerthomsen@gmail.com> wrote:
> >> >
> >> >  Without further details, this is what I did to see if it was
> something
> >> > other than the usual issue of having not enough file handlers
> available.
> >> > Something like a legitimate case of someone forgetting to close file
> >> > objects or something in the code itself.
> >> >
> >> > 1. Setup a 8core/32GB VM on AWS w/ Amazon AMI.
> >> > 2. Pushed 1.11.1RC1
> >> > 3. Pushed the RAM settings to 6/12GB
> >> > 4. Disabled flowfile archiving because I only allocated 8GB of
> storage.
> >> > 5. Setup a flow that used 2 generateflow instances to generate massive
> >> > amounts of garbage data using all available cores. (All queues were
> >> setup
> >> > to hold 250k flow files)
> >> > 6. Kicked it off and let it run for probably about 20 minutes.
> >> >
> >> > No apparent problem with closing and releasing resources here.
> >> >
> >> > On Sat, Feb 1, 2020 at 8:00 AM Joe Witt <jo...@gmail.com> wrote:
> >> >
> >> >> these are usually very easy to find.
> >> >>
> >> >> run lsof -p pid.  and share results
> >> >>
> >> >>
> >> >> thanks
> >> >>
> >> >> On Sat, Feb 1, 2020 at 7:56 AM Mike Thomsen <mi...@gmail.com>
> >> >> wrote:
> >> >>
> >> >>>
> >> >>>
> >> >>
> >>
> https://stackoverflow.com/questions/59991035/nifi-1-11-opening-more-than-50k-files/60017064#60017064
> >> >>>
> >> >>> No idea if this is valid or not. I asked for clarification to see if
> >> >> there
> >> >>> might be a specific processor or something that is triggering this.
> >> >>>
> >> >>
> >> >
> >>
> >>
>

Re: Potential 1.11.X showstopper

Posted by Joe Witt <jo...@gmail.com>.
received a direct reply - Elli cannot share.

I think unless someone else is able to replicate the behavior there isn't
much more we can tackle on this.

Thanks

On Thu, Feb 6, 2020 at 4:10 PM Joe Witt <jo...@gmail.com> wrote:

> Yes Elli it is possible.  Can we please get those lsof outputs in a JIRA?
> As well as more details about configuration?
>
> Thanks
>
> On Thu, Feb 6, 2020 at 2:44 PM Andy LoPresto <al...@apache.org> wrote:
>
>> I have no input on the specific issue you’re encountering, but a pattern
>> we have seen to reduce the overhead of multiple remote input ports being
>> required is to use a “central” remote input port and immediately follow it
>> with a RouteOnAttribute to distribute specific flowfiles to the appropriate
>> downstream flow / process group. Whatever sends data to this port can use
>> an UpdateAttribute to add some “tracking/routing” attribute on the
>> flowfiles before being sent. Inserting Merge/Split will likely affect your
>> timing due to waiting for bins to fill, depending on your volume. S2S is
>> pretty good at transmitting data on-demand with low overhead on one port;
>> it’s when you have many remote input ports that there is substantial
>> overhead.
>>
>>
>> Andy LoPresto
>> alopresto@apache.org
>> alopresto.apache@gmail.com
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>
>> > On Feb 6, 2020, at 2:34 PM, Elli Schwarz <el...@yahoo.com.INVALID>
>> wrote:
>> >
>> > We ran that command - it appears the site-to-sites that are causing the
>> issue. We had a lot of remote process groups that weren't even being used
>> (no data was being sent to that part of the dataflow), yet when running the
>> lsof command they each had a large number of open files - almost 2k! -
>> showing CLOSE_WAIT. Again, there were no flowfiles being sent to them, so
>> can it be some kind of bug that keeping a remote process group open is
>> somehow opening files and not closing them? (BTW, the reason we had to
>> upgrade from 1.9.2 to 1.11.0 was because we had upgraded our Java version
>> and that cause an IllegalBlockingModeException - is it possible that
>> whatever fixed that problem is now causing an issue with open files?)
>> >
>> > We now disabled all of the unused remote process groups. We still have
>> several remote process groups that we are using so if this is the issue it
>> might be difficult to avoid, but at least we decreased the number of remote
>> process groups we have. Another approach we are trying is a merge content
>> before we send to the Nifi having the most issues, to have fewer flow files
>> sent at once site to site, and then splitting them after they are received.
>> > Thank you!
>> >
>> >    On Thursday, February 6, 2020, 2:19:48 PM EST, Mike Thomsen <
>> mikerthomsen@gmail.com> wrote:
>> >
>> > Can you share a description of your flows in terms of average flowfile
>> size, queue size, data velocity, etc.?
>> > Thanks,
>> > Mike
>> >
>> > On Thu, Feb 6, 2020 at 1:59 PM Elli Schwarz <el...@yahoo.com.invalid>
>> wrote:
>> >
>> >  We seem to be experiencing the same problems. We recently upgraded
>> several of our Nifis from 1.9.2 to 1.11.0, and now many of them are failing
>> with "too many open files". Nothing else changed other than the upgrade,
>> and our data volume is the same as before. The only solution we've been
>> able to come up with is to run a script to check for this condition and
>> restart the Nifi. Any other ideas?
>> > Thank you!
>> >
>> >     On Sunday, February 2, 2020, 9:11:34 AM EST, Mike Thomsen <
>> mikerthomsen@gmail.com> wrote:
>> >
>> >  Without further details, this is what I did to see if it was something
>> > other than the usual issue of having not enough file handlers available.
>> > Something like a legitimate case of someone forgetting to close file
>> > objects or something in the code itself.
>> >
>> > 1. Setup a 8core/32GB VM on AWS w/ Amazon AMI.
>> > 2. Pushed 1.11.1RC1
>> > 3. Pushed the RAM settings to 6/12GB
>> > 4. Disabled flowfile archiving because I only allocated 8GB of storage.
>> > 5. Setup a flow that used 2 generateflow instances to generate massive
>> > amounts of garbage data using all available cores. (All queues were
>> setup
>> > to hold 250k flow files)
>> > 6. Kicked it off and let it run for probably about 20 minutes.
>> >
>> > No apparent problem with closing and releasing resources here.
>> >
>> > On Sat, Feb 1, 2020 at 8:00 AM Joe Witt <jo...@gmail.com> wrote:
>> >
>> >> these are usually very easy to find.
>> >>
>> >> run lsof -p pid.  and share results
>> >>
>> >>
>> >> thanks
>> >>
>> >> On Sat, Feb 1, 2020 at 7:56 AM Mike Thomsen <mi...@gmail.com>
>> >> wrote:
>> >>
>> >>>
>> >>>
>> >>
>> https://stackoverflow.com/questions/59991035/nifi-1-11-opening-more-than-50k-files/60017064#60017064
>> >>>
>> >>> No idea if this is valid or not. I asked for clarification to see if
>> >> there
>> >>> might be a specific processor or something that is triggering this.
>> >>>
>> >>
>> >
>>
>>

Re: Potential 1.11.X showstopper

Posted by Joe Witt <jo...@gmail.com>.
Yes Elli it is possible.  Can we please get those lsof outputs in a JIRA?
As well as more details about configuration?

Thanks

On Thu, Feb 6, 2020 at 2:44 PM Andy LoPresto <al...@apache.org> wrote:

> I have no input on the specific issue you’re encountering, but a pattern
> we have seen to reduce the overhead of multiple remote input ports being
> required is to use a “central” remote input port and immediately follow it
> with a RouteOnAttribute to distribute specific flowfiles to the appropriate
> downstream flow / process group. Whatever sends data to this port can use
> an UpdateAttribute to add some “tracking/routing” attribute on the
> flowfiles before being sent. Inserting Merge/Split will likely affect your
> timing due to waiting for bins to fill, depending on your volume. S2S is
> pretty good at transmitting data on-demand with low overhead on one port;
> it’s when you have many remote input ports that there is substantial
> overhead.
>
>
> Andy LoPresto
> alopresto@apache.org
> alopresto.apache@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> > On Feb 6, 2020, at 2:34 PM, Elli Schwarz <el...@yahoo.com.INVALID>
> wrote:
> >
> > We ran that command - it appears the site-to-sites that are causing the
> issue. We had a lot of remote process groups that weren't even being used
> (no data was being sent to that part of the dataflow), yet when running the
> lsof command they each had a large number of open files - almost 2k! -
> showing CLOSE_WAIT. Again, there were no flowfiles being sent to them, so
> can it be some kind of bug that keeping a remote process group open is
> somehow opening files and not closing them? (BTW, the reason we had to
> upgrade from 1.9.2 to 1.11.0 was because we had upgraded our Java version
> and that cause an IllegalBlockingModeException - is it possible that
> whatever fixed that problem is now causing an issue with open files?)
> >
> > We now disabled all of the unused remote process groups. We still have
> several remote process groups that we are using so if this is the issue it
> might be difficult to avoid, but at least we decreased the number of remote
> process groups we have. Another approach we are trying is a merge content
> before we send to the Nifi having the most issues, to have fewer flow files
> sent at once site to site, and then splitting them after they are received.
> > Thank you!
> >
> >    On Thursday, February 6, 2020, 2:19:48 PM EST, Mike Thomsen <
> mikerthomsen@gmail.com> wrote:
> >
> > Can you share a description of your flows in terms of average flowfile
> size, queue size, data velocity, etc.?
> > Thanks,
> > Mike
> >
> > On Thu, Feb 6, 2020 at 1:59 PM Elli Schwarz <el...@yahoo.com.invalid>
> wrote:
> >
> >  We seem to be experiencing the same problems. We recently upgraded
> several of our Nifis from 1.9.2 to 1.11.0, and now many of them are failing
> with "too many open files". Nothing else changed other than the upgrade,
> and our data volume is the same as before. The only solution we've been
> able to come up with is to run a script to check for this condition and
> restart the Nifi. Any other ideas?
> > Thank you!
> >
> >     On Sunday, February 2, 2020, 9:11:34 AM EST, Mike Thomsen <
> mikerthomsen@gmail.com> wrote:
> >
> >  Without further details, this is what I did to see if it was something
> > other than the usual issue of having not enough file handlers available.
> > Something like a legitimate case of someone forgetting to close file
> > objects or something in the code itself.
> >
> > 1. Setup a 8core/32GB VM on AWS w/ Amazon AMI.
> > 2. Pushed 1.11.1RC1
> > 3. Pushed the RAM settings to 6/12GB
> > 4. Disabled flowfile archiving because I only allocated 8GB of storage.
> > 5. Setup a flow that used 2 generateflow instances to generate massive
> > amounts of garbage data using all available cores. (All queues were setup
> > to hold 250k flow files)
> > 6. Kicked it off and let it run for probably about 20 minutes.
> >
> > No apparent problem with closing and releasing resources here.
> >
> > On Sat, Feb 1, 2020 at 8:00 AM Joe Witt <jo...@gmail.com> wrote:
> >
> >> these are usually very easy to find.
> >>
> >> run lsof -p pid.  and share results
> >>
> >>
> >> thanks
> >>
> >> On Sat, Feb 1, 2020 at 7:56 AM Mike Thomsen <mi...@gmail.com>
> >> wrote:
> >>
> >>>
> >>>
> >>
> https://stackoverflow.com/questions/59991035/nifi-1-11-opening-more-than-50k-files/60017064#60017064
> >>>
> >>> No idea if this is valid or not. I asked for clarification to see if
> >> there
> >>> might be a specific processor or something that is triggering this.
> >>>
> >>
> >
>
>

Re: Potential 1.11.X showstopper

Posted by Andy LoPresto <al...@apache.org>.
I have no input on the specific issue you’re encountering, but a pattern we have seen to reduce the overhead of multiple remote input ports being required is to use a “central” remote input port and immediately follow it with a RouteOnAttribute to distribute specific flowfiles to the appropriate downstream flow / process group. Whatever sends data to this port can use an UpdateAttribute to add some “tracking/routing” attribute on the flowfiles before being sent. Inserting Merge/Split will likely affect your timing due to waiting for bins to fill, depending on your volume. S2S is pretty good at transmitting data on-demand with low overhead on one port; it’s when you have many remote input ports that there is substantial overhead. 


Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Feb 6, 2020, at 2:34 PM, Elli Schwarz <el...@yahoo.com.INVALID> wrote:
> 
> We ran that command - it appears the site-to-sites that are causing the issue. We had a lot of remote process groups that weren't even being used (no data was being sent to that part of the dataflow), yet when running the lsof command they each had a large number of open files - almost 2k! - showing CLOSE_WAIT. Again, there were no flowfiles being sent to them, so can it be some kind of bug that keeping a remote process group open is somehow opening files and not closing them? (BTW, the reason we had to upgrade from 1.9.2 to 1.11.0 was because we had upgraded our Java version and that cause an IllegalBlockingModeException - is it possible that whatever fixed that problem is now causing an issue with open files?)
> 
> We now disabled all of the unused remote process groups. We still have several remote process groups that we are using so if this is the issue it might be difficult to avoid, but at least we decreased the number of remote process groups we have. Another approach we are trying is a merge content before we send to the Nifi having the most issues, to have fewer flow files sent at once site to site, and then splitting them after they are received.
> Thank you!
> 
>    On Thursday, February 6, 2020, 2:19:48 PM EST, Mike Thomsen <mi...@gmail.com> wrote:  
> 
> Can you share a description of your flows in terms of average flowfile size, queue size, data velocity, etc.?
> Thanks,
> Mike
> 
> On Thu, Feb 6, 2020 at 1:59 PM Elli Schwarz <el...@yahoo.com.invalid> wrote:
> 
>  We seem to be experiencing the same problems. We recently upgraded several of our Nifis from 1.9.2 to 1.11.0, and now many of them are failing with "too many open files". Nothing else changed other than the upgrade, and our data volume is the same as before. The only solution we've been able to come up with is to run a script to check for this condition and restart the Nifi. Any other ideas?
> Thank you!
> 
>     On Sunday, February 2, 2020, 9:11:34 AM EST, Mike Thomsen <mi...@gmail.com> wrote:  
> 
>  Without further details, this is what I did to see if it was something
> other than the usual issue of having not enough file handlers available.
> Something like a legitimate case of someone forgetting to close file
> objects or something in the code itself.
> 
> 1. Setup a 8core/32GB VM on AWS w/ Amazon AMI.
> 2. Pushed 1.11.1RC1
> 3. Pushed the RAM settings to 6/12GB
> 4. Disabled flowfile archiving because I only allocated 8GB of storage.
> 5. Setup a flow that used 2 generateflow instances to generate massive
> amounts of garbage data using all available cores. (All queues were setup
> to hold 250k flow files)
> 6. Kicked it off and let it run for probably about 20 minutes.
> 
> No apparent problem with closing and releasing resources here.
> 
> On Sat, Feb 1, 2020 at 8:00 AM Joe Witt <jo...@gmail.com> wrote:
> 
>> these are usually very easy to find.
>> 
>> run lsof -p pid.  and share results
>> 
>> 
>> thanks
>> 
>> On Sat, Feb 1, 2020 at 7:56 AM Mike Thomsen <mi...@gmail.com>
>> wrote:
>> 
>>> 
>>> 
>> https://stackoverflow.com/questions/59991035/nifi-1-11-opening-more-than-50k-files/60017064#60017064
>>> 
>>> No idea if this is valid or not. I asked for clarification to see if
>> there
>>> might be a specific processor or something that is triggering this.
>>> 
>> 
>   


Re: Potential 1.11.X showstopper

Posted by Elli Schwarz <el...@yahoo.com.INVALID>.
 We ran that command - it appears the site-to-sites that are causing the issue. We had a lot of remote process groups that weren't even being used (no data was being sent to that part of the dataflow), yet when running the lsof command they each had a large number of open files - almost 2k! - showing CLOSE_WAIT. Again, there were no flowfiles being sent to them, so can it be some kind of bug that keeping a remote process group open is somehow opening files and not closing them? (BTW, the reason we had to upgrade from 1.9.2 to 1.11.0 was because we had upgraded our Java version and that cause an IllegalBlockingModeException - is it possible that whatever fixed that problem is now causing an issue with open files?)

We now disabled all of the unused remote process groups. We still have several remote process groups that we are using so if this is the issue it might be difficult to avoid, but at least we decreased the number of remote process groups we have. Another approach we are trying is a merge content before we send to the Nifi having the most issues, to have fewer flow files sent at once site to site, and then splitting them after they are received.
Thank you!

    On Thursday, February 6, 2020, 2:19:48 PM EST, Mike Thomsen <mi...@gmail.com> wrote:  
 
 Can you share a description of your flows in terms of average flowfile size, queue size, data velocity, etc.?
Thanks,
Mike

On Thu, Feb 6, 2020 at 1:59 PM Elli Schwarz <el...@yahoo.com.invalid> wrote:

 We seem to be experiencing the same problems. We recently upgraded several of our Nifis from 1.9.2 to 1.11.0, and now many of them are failing with "too many open files". Nothing else changed other than the upgrade, and our data volume is the same as before. The only solution we've been able to come up with is to run a script to check for this condition and restart the Nifi. Any other ideas?
Thank you!

    On Sunday, February 2, 2020, 9:11:34 AM EST, Mike Thomsen <mi...@gmail.com> wrote:  

 Without further details, this is what I did to see if it was something
other than the usual issue of having not enough file handlers available.
Something like a legitimate case of someone forgetting to close file
objects or something in the code itself.

1. Setup a 8core/32GB VM on AWS w/ Amazon AMI.
2. Pushed 1.11.1RC1
3. Pushed the RAM settings to 6/12GB
4. Disabled flowfile archiving because I only allocated 8GB of storage.
5. Setup a flow that used 2 generateflow instances to generate massive
amounts of garbage data using all available cores. (All queues were setup
to hold 250k flow files)
6. Kicked it off and let it run for probably about 20 minutes.

No apparent problem with closing and releasing resources here.

On Sat, Feb 1, 2020 at 8:00 AM Joe Witt <jo...@gmail.com> wrote:

> these are usually very easy to find.
>
> run lsof -p pid.  and share results
>
>
> thanks
>
> On Sat, Feb 1, 2020 at 7:56 AM Mike Thomsen <mi...@gmail.com>
> wrote:
>
> >
> >
> https://stackoverflow.com/questions/59991035/nifi-1-11-opening-more-than-50k-files/60017064#60017064
> >
> > No idea if this is valid or not. I asked for clarification to see if
> there
> > might be a specific processor or something that is triggering this.
> >
>
  
  

Re: Potential 1.11.X showstopper

Posted by Mike Thomsen <mi...@gmail.com>.
Can you share a description of your flows in terms of average flowfile
size, queue size, data velocity, etc.?

Thanks,

Mike

On Thu, Feb 6, 2020 at 1:59 PM Elli Schwarz
<el...@yahoo.com.invalid> wrote:

>  We seem to be experiencing the same problems. We recently upgraded
> several of our Nifis from 1.9.2 to 1.11.0, and now many of them are failing
> with "too many open files". Nothing else changed other than the upgrade,
> and our data volume is the same as before. The only solution we've been
> able to come up with is to run a script to check for this condition and
> restart the Nifi. Any other ideas?
> Thank you!
>
>     On Sunday, February 2, 2020, 9:11:34 AM EST, Mike Thomsen <
> mikerthomsen@gmail.com> wrote:
>
>  Without further details, this is what I did to see if it was something
> other than the usual issue of having not enough file handlers available.
> Something like a legitimate case of someone forgetting to close file
> objects or something in the code itself.
>
> 1. Setup a 8core/32GB VM on AWS w/ Amazon AMI.
> 2. Pushed 1.11.1RC1
> 3. Pushed the RAM settings to 6/12GB
> 4. Disabled flowfile archiving because I only allocated 8GB of storage.
> 5. Setup a flow that used 2 generateflow instances to generate massive
> amounts of garbage data using all available cores. (All queues were setup
> to hold 250k flow files)
> 6. Kicked it off and let it run for probably about 20 minutes.
>
> No apparent problem with closing and releasing resources here.
>
> On Sat, Feb 1, 2020 at 8:00 AM Joe Witt <jo...@gmail.com> wrote:
>
> > these are usually very easy to find.
> >
> > run lsof -p pid.  and share results
> >
> >
> > thanks
> >
> > On Sat, Feb 1, 2020 at 7:56 AM Mike Thomsen <mi...@gmail.com>
> > wrote:
> >
> > >
> > >
> >
> https://stackoverflow.com/questions/59991035/nifi-1-11-opening-more-than-50k-files/60017064#60017064
> > >
> > > No idea if this is valid or not. I asked for clarification to see if
> > there
> > > might be a specific processor or something that is triggering this.
> > >
> >
>

Re: Potential 1.11.X showstopper

Posted by Elli Schwarz <el...@yahoo.com.INVALID>.
 We seem to be experiencing the same problems. We recently upgraded several of our Nifis from 1.9.2 to 1.11.0, and now many of them are failing with "too many open files". Nothing else changed other than the upgrade, and our data volume is the same as before. The only solution we've been able to come up with is to run a script to check for this condition and restart the Nifi. Any other ideas?
Thank you!

    On Sunday, February 2, 2020, 9:11:34 AM EST, Mike Thomsen <mi...@gmail.com> wrote:  
 
 Without further details, this is what I did to see if it was something
other than the usual issue of having not enough file handlers available.
Something like a legitimate case of someone forgetting to close file
objects or something in the code itself.

1. Setup a 8core/32GB VM on AWS w/ Amazon AMI.
2. Pushed 1.11.1RC1
3. Pushed the RAM settings to 6/12GB
4. Disabled flowfile archiving because I only allocated 8GB of storage.
5. Setup a flow that used 2 generateflow instances to generate massive
amounts of garbage data using all available cores. (All queues were setup
to hold 250k flow files)
6. Kicked it off and let it run for probably about 20 minutes.

No apparent problem with closing and releasing resources here.

On Sat, Feb 1, 2020 at 8:00 AM Joe Witt <jo...@gmail.com> wrote:

> these are usually very easy to find.
>
> run lsof -p pid.  and share results
>
>
> thanks
>
> On Sat, Feb 1, 2020 at 7:56 AM Mike Thomsen <mi...@gmail.com>
> wrote:
>
> >
> >
> https://stackoverflow.com/questions/59991035/nifi-1-11-opening-more-than-50k-files/60017064#60017064
> >
> > No idea if this is valid or not. I asked for clarification to see if
> there
> > might be a specific processor or something that is triggering this.
> >
>
  

Re: Potential 1.11.X showstopper

Posted by Mike Thomsen <mi...@gmail.com>.
Without further details, this is what I did to see if it was something
other than the usual issue of having not enough file handlers available.
Something like a legitimate case of someone forgetting to close file
objects or something in the code itself.

1. Setup a 8core/32GB VM on AWS w/ Amazon AMI.
2. Pushed 1.11.1RC1
3. Pushed the RAM settings to 6/12GB
4. Disabled flowfile archiving because I only allocated 8GB of storage.
5. Setup a flow that used 2 generateflow instances to generate massive
amounts of garbage data using all available cores. (All queues were setup
to hold 250k flow files)
6. Kicked it off and let it run for probably about 20 minutes.

No apparent problem with closing and releasing resources here.

On Sat, Feb 1, 2020 at 8:00 AM Joe Witt <jo...@gmail.com> wrote:

> these are usually very easy to find.
>
> run lsof -p pid.  and share results
>
>
> thanks
>
> On Sat, Feb 1, 2020 at 7:56 AM Mike Thomsen <mi...@gmail.com>
> wrote:
>
> >
> >
> https://stackoverflow.com/questions/59991035/nifi-1-11-opening-more-than-50k-files/60017064#60017064
> >
> > No idea if this is valid or not. I asked for clarification to see if
> there
> > might be a specific processor or something that is triggering this.
> >
>

Re: Potential 1.11.X showstopper

Posted by Joe Witt <jo...@gmail.com>.
these are usually very easy to find.

run lsof -p pid.  and share results


thanks

On Sat, Feb 1, 2020 at 7:56 AM Mike Thomsen <mi...@gmail.com> wrote:

>
> https://stackoverflow.com/questions/59991035/nifi-1-11-opening-more-than-50k-files/60017064#60017064
>
> No idea if this is valid or not. I asked for clarification to see if there
> might be a specific processor or something that is triggering this.
>