You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by Isha Lamboo <is...@virtualsciences.nl> on 2023/06/12 12:36:33 UTC

DeleteHDFS behavior when idle

Hi all,

I have a question about behavior I see on one of our NiFi 1.18 clusters that has a lot of xHDFS processors. When I look at the number of tasks in the summary, the DeleteHDFS processors have a very high number (800-1000+) of tasks even if they have nothing in their incoming queues. The PutHDFS and FetchHDFS in contrast have no tasks listed when they have no files in the incoming queues. Even though the tasks take very little time (less than 100 millis per 5 mins), I’m wondering whether this causes problems when the cluster is heavily loaded during peak hours.

Is this a bug or some feature related to deleting files? Should I submit a ticket?

Thanks,

Isha

RE: DeleteHDFS behavior when idle

Posted by Isha Lamboo <is...@virtualsciences.nl>.
Thanks Bryan and Mark,

I've created a Jira ticket describing things as well as I can. I see Bryan already has a better idea of the cause than my guess in the ticket, I will add that as a comment.

Regards,

Isha

-----Oorspronkelijk bericht-----
Van: Bryan Bende <bb...@gmail.com> 
Verzonden: maandag 12 juni 2023 16:10
Aan: dev@nifi.apache.org
Onderwerp: Re: DeleteHDFS behavior when idle

The processor has @TriggerWhenEmpty so it is going to keep executing regardless of whether the incoming queue has data or not. I believe this was done early on for some processors that used Kerberos in order to allow the processor to have a chance to renew the Kerberos ticket, however we since moved away from need to do this, so unless there is another reason for having that, I would think it can be removed.

On Mon, Jun 12, 2023 at 9:25 AM Mark Payne <ma...@hotmail.com> wrote:

> Isha,
>
> If you have an incoming connection, and you’re seeing this, then it’s 
> a bug. If there is no incoming connection and this processor is used 
> as a source processor, it’s normal. Either way, it has rather little 
> overhead, and you can further reduce the overhead by increasing the 
> Yield Duration in settings. This is how long it will wait between 
> invocations if there’s nothing for it to do.
>
> Either way, best to file a Jira, though, to address the behavior for 
> running unnecessarily when there’s an incoming Connection.
>
> Thanks
> -Mark
>
>
> > On Jun 12, 2023, at 8:36 AM, Isha Lamboo 
> > <is...@virtualsciences.nl>
> wrote:
> >
> > Hi all,
> >
> > I have a question about behavior I see on one of our NiFi 1.18 
> > clusters
> that has a lot of xHDFS processors. When I look at the number of tasks 
> in the summary, the DeleteHDFS processors have a very high number 
> (800-1000+) of tasks even if they have nothing in their incoming 
> queues. The PutHDFS and FetchHDFS in contrast have no tasks listed 
> when they have no files in the incoming queues. Even though the tasks 
> take very little time (less than
> 100 millis per 5 mins), I’m wondering whether this causes problems 
> when the cluster is heavily loaded during peak hours.
> >
> > Is this a bug or some feature related to deleting files? Should I 
> > submit
> a ticket?
> >
> > Thanks,
> >
> > Isha
>
>

Re: DeleteHDFS behavior when idle

Posted by Bryan Bende <bb...@gmail.com>.
The processor has @TriggerWhenEmpty so it is going to keep executing
regardless of whether the incoming queue has data or not. I believe this
was done early on for some processors that used Kerberos in order to allow
the processor to have a chance to renew the Kerberos ticket, however we
since moved away from need to do this, so unless there is another reason
for having that, I would think it can be removed.

On Mon, Jun 12, 2023 at 9:25 AM Mark Payne <ma...@hotmail.com> wrote:

> Isha,
>
> If you have an incoming connection, and you’re seeing this, then it’s a
> bug. If there is no incoming connection and this processor is used as a
> source processor, it’s normal. Either way, it has rather little overhead,
> and you can further reduce the overhead by increasing the Yield Duration in
> settings. This is how long it will wait between invocations if there’s
> nothing for it to do.
>
> Either way, best to file a Jira, though, to address the behavior for
> running unnecessarily when there’s an incoming Connection.
>
> Thanks
> -Mark
>
>
> > On Jun 12, 2023, at 8:36 AM, Isha Lamboo <is...@virtualsciences.nl>
> wrote:
> >
> > Hi all,
> >
> > I have a question about behavior I see on one of our NiFi 1.18 clusters
> that has a lot of xHDFS processors. When I look at the number of tasks in
> the summary, the DeleteHDFS processors have a very high number (800-1000+)
> of tasks even if they have nothing in their incoming queues. The PutHDFS
> and FetchHDFS in contrast have no tasks listed when they have no files in
> the incoming queues. Even though the tasks take very little time (less than
> 100 millis per 5 mins), I’m wondering whether this causes problems when the
> cluster is heavily loaded during peak hours.
> >
> > Is this a bug or some feature related to deleting files? Should I submit
> a ticket?
> >
> > Thanks,
> >
> > Isha
>
>

Re: DeleteHDFS behavior when idle

Posted by Mark Payne <ma...@hotmail.com>.
Isha,

If you have an incoming connection, and you’re seeing this, then it’s a bug. If there is no incoming connection and this processor is used as a source processor, it’s normal. Either way, it has rather little overhead, and you can further reduce the overhead by increasing the Yield Duration in settings. This is how long it will wait between invocations if there’s nothing for it to do.

Either way, best to file a Jira, though, to address the behavior for running unnecessarily when there’s an incoming Connection.

Thanks
-Mark


> On Jun 12, 2023, at 8:36 AM, Isha Lamboo <is...@virtualsciences.nl> wrote:
> 
> Hi all,
> 
> I have a question about behavior I see on one of our NiFi 1.18 clusters that has a lot of xHDFS processors. When I look at the number of tasks in the summary, the DeleteHDFS processors have a very high number (800-1000+) of tasks even if they have nothing in their incoming queues. The PutHDFS and FetchHDFS in contrast have no tasks listed when they have no files in the incoming queues. Even though the tasks take very little time (less than 100 millis per 5 mins), I’m wondering whether this causes problems when the cluster is heavily loaded during peak hours.
> 
> Is this a bug or some feature related to deleting files? Should I submit a ticket?
> 
> Thanks,
> 
> Isha