You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Anders Synstad <an...@synstad.net> on 2021/05/03 11:30:07 UTC

Re: NPE problem in PutSplunk processor

Apologize for the very late reply on this thread, but finally had time to
see
if I was able to reproduce the problem in our LAB environment.

The LAB environment is a 3 node cluster running NiFi v1.13.2 on Ubuntu
18.04
with java version:

  ii  openjdk-11-jre-headless:amd64               11.0.9.1+1-0ubuntu1~18.04

In the end, the solution to reproducing the problem in our LAB ended up
being
introducing a fair amount of packet loss on the Splunk indexer peers:

  $ tc qdisc add dev eno1 root netem loss 25%

With NiFi logging on DEBUG, we where able to get the following stracktrace
and
message:

2021-05-03 13:08:01,295 DEBUG [Timer-Driven Process Thread-9]
o.a.nifi.processors.splunk.PutSplunk
PutSplunk[id=8b9bdc76-9f55-3840-b6c6-78daea74993f] Sender is not connected,
closing sender
2021-05-03 13:08:01,295 ERROR [Timer-Driven Process Thread-9]
o.a.nifi.processors.splunk.PutSplunk
PutSplunk[id=8b9bdc76-9f55-3840-b6c6-78daea74993f]
PutSplunk[id=8b9bdc76-9f55-3840-b6c6-78daea74993f] failed to process
session due to java.lang.NullPointe
rException; Processor Administratively Yielded for 1 sec:
java.lang.NullPointerException
java.lang.NullPointerException: null
        at
org.apache.nifi.processor.util.put.sender.SocketChannelSender.write(SocketChannelSender.java:98)
        at
org.apache.nifi.processor.util.put.sender.ChannelSender.send(ChannelSender.java:84)
        at
org.apache.nifi.processors.splunk.PutSplunk$2.process(PutSplunk.java:268)
        at
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2365)
        at
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2333)
        at
org.apache.nifi.processors.splunk.PutSplunk.processDelimitedMessages(PutSplunk.java:230)
        at
org.apache.nifi.processors.splunk.PutSplunk.onTrigger(PutSplunk.java:166)
        at
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1173)
        at
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:214)
        at
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
        at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
        at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at
java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
2021-05-03 13:08:01,296 WARN [Timer-Driven Process Thread-9]
o.a.n.controller.tasks.ConnectableTask Administratively Yielding
PutSplunk[id=8b9bdc76-9f55-3840-b6c6-78daea74993f] due to uncaught
Exception: java.lang.NullPointerException
java.lang.NullPointerException: null
        at
org.apache.nifi.processor.util.put.sender.SocketChannelSender.write(SocketChannelSender.java:98)
        at
org.apache.nifi.processor.util.put.sender.ChannelSender.send(ChannelSender.java:84)
        at
org.apache.nifi.processors.splunk.PutSplunk$2.process(PutSplunk.java:268)
        at
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2365)
        at
org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:2333)
        at
org.apache.nifi.processors.splunk.PutSplunk.processDelimitedMessages(PutSplunk.java:230)
        at
org.apache.nifi.processors.splunk.PutSplunk.onTrigger(PutSplunk.java:166)
        at
org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1173)
        at
org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:214)
        at
org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
        at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
        at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at
java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
        at
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
        at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)

Not a Java developer, but a wild guess that it might be an unhandled
exception
at line 268 in the PutSplunk.java code related to the initial DEBUG line?

https://github.com/apache/nifi/blob/aa741cc5967f62c3c38c2a47e712b7faa6fe19ff/nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/java/org/apache/nifi/processors/splunk/PutSplunk.java#L268

Hopefully this can help to identify the problem we're experiencing.

When it comes to *why* we are experiencing packet loss in the first place,
that
is something we'll need to investigate internally.


Regards,
Anders


On Tue, Jan 19, 2021 at 10:05 AM Anders Synstad <an...@synstad.net> wrote:

> Appreciate you looking into this.
>
> > Is the error occurring when the processor starts?
>
> No, usually comes at random times when it is running. Presumably when
> "something" happens with the network or the connection against Splunk?
>
> > If that's when a flow file is being processed, is there anything special
> > about this flow file (like no content at all)?
>
> Not as far as I have been able to identify. I've run the same data in
> parallell
> and it might fail in one pipeline and not the other. With the same data.
>
> As far as I can tell, when I restart the failing PutSplunk processors,
> it'll
> reprocess the failed flowfiles, and they tend to work just fine with no
> problems next time around.
>
> We have a number of PutSplunk processors. Some are getting these NPEs,
> while
> others are not. While not conclusive, the PutSplunk processors that seem
> to
> throw NPEs appears to be the more busy ones. But that might just be that
> they
> have more chances to fail due to more flowfiles and data being processed.
>
> In our lab environment, I've kinda managed to sometimes get the same error
> (if
> "lucky") when doing a rolling restart of the Splunk cluster while
> PutSplunk is
> running. So that might be a way to reproduce it, but not consistently.
>
> The connection against the Splunk indexers is through a loadbalancer, so
> that
> adds an extra component that could do something with the connection.
>
>
> Regards,
> Anders
>
>
> On Tue, Jan 19, 2021 at 8:36 AM Pierre Villard <
> pierre.villard.fr@gmail.com> wrote:
>
>> Not sure what is the best approach here... Having a NPE without a
>> stacktrace is definitely something we should improve on our side even if
>> the root cause is external. I did look at the code very quickly and didn't
>> see anything obvious.
>>
>> Is the error occurring when the processor starts?
>> If that's when a flow file is being processed, is there anything special
>> about this flow file (like no content at all)?
>>
>> Thanks,
>> Pierre
>>
>> Le ven. 15 janv. 2021 à 16:37, Anders Synstad <an...@synstad.net> a
>> écrit :
>>
>>> Thank you for the reply Pierre,
>>>
>>> I'm afraid there isn't a stack trace to post. While there are stack
>>> traces
>>> logged for other problems, the 2 loglines I've posted is all that is
>>> logged
>>> whenever this occurs.
>>>
>>> Enabling full debug logging on the installation in question is
>>> unfortunately
>>> not feasible.
>>>
>>> Found NIFI-4610 which appears to describe a similar problem but with a
>>> different processor (ExtractHL7Attributes). Might not be related, or
>>> maybe
>>> it's copy & pasted code with same bug or some libs being used with a
>>> problem? Just throwing out random ideas.
>>>
>>>
>>> Regards,
>>> Anders
>>>
>>> On Fri, Jan 15, 2021 at 11:38 AM Pierre Villard <
>>> pierre.villard.fr@gmail.com> wrote:
>>>
>>>> Hi Anders,
>>>>
>>>> Do you have a full stacktrace to share (probably available in
>>>> nifi-app.log file)?
>>>> It sounds like somethings that could be easily fixed.
>>>>
>>>> Thanks,
>>>> Pierre
>>>>
>>>> Le ven. 15 janv. 2021 à 14:03, Anders Synstad <an...@synstad.net> a
>>>> écrit :
>>>>
>>>>> Hi.
>>>>>
>>>>> Anyone experienced NullPointerExceptions in PutSplunk? The only thing
>>>>> logged is
>>>>> the following:
>>>>>
>>>>> 2020-11-11 12:04:06,340 ERROR [Timer-Driven Process Thread-60]
>>>>> o.a.nifi.processors.splunk.PutSplunk PutSplunk[id=<uuid>]
>>>>> PutSplunk[id=<uuid>] failed to process session due to
>>>>> java.lang.NullPointerException; Processor Administratively Yielded for 1
>>>>> sec: java.lang.NullPointerException
>>>>> java.lang.NullPointerException: null
>>>>>
>>>>> The root cause of the NPE is probably something external to NiFi and
>>>>> isn't the
>>>>> main problem. The problem is that the flowfile isn't finalized
>>>>> correctly, so
>>>>> it's not sent to the failure relationship. It's just stuck in the
>>>>> incoming
>>>>> connection until the processor gets restarted and starts processing the
>>>>> incoming connection from the start again, so the incoming connection
>>>>> slowly
>>>>> builds up with these until it's full. We've experienced the problem on
>>>>> NiFi
>>>>> versions from 1.7 (probably earlier) all the way up to latest 1.12.1.
>>>>>
>>>>> Is this a bug in PutSplunk not correctly handling this exception? Not
>>>>> fluent
>>>>> in java, but guessing maybe this happens around line 207 in the
>>>>> PutSplunk code?
>>>>>
>>>>>
>>>>> https://github.com/apache/nifi/blob/aa741cc5967f62c3c38c2a47e712b7faa6fe19ff/nifi-nar-bundles/nifi-splunk-bundle/nifi-splunk-processors/src/main/java/org/apache/nifi/processors/splunk/PutSplunk.java#L207
>>>>>
>>>>> Any tips or hints would be greatly appreciated.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Anders
>>>>>
>>>>>

NIFI Lookup processor service - NOT by GUID

Posted by John Gunvaldson <jg...@cox.net>.
All, 

General question on exploring the option for building a lookup service for NIFI.

Problem 1 --> When deploying a flow to another environment - GUID is not maintained, often the flow will obtain another GUID upon deployment into another environment.

Problem 2 --> If you have automated a service in another environment (lets call it Airflow for an example) - and job is to use the NipyApi, find the processor, and schedule the processor or alter a property or change a metric or something

Using NipyApi and using GUID - no problem, can pretty quickly identify the processor and do your work

However, now you have moved that flow to another environment - and the processor no longer has the same GUID, in fact changes after every deployment, constant change

I am exploring some ideas on how to solve this.

1. Build a cache of everything in NIFI - and if you traverse the cache along the expected Processor Group path — Root Canvas — Process Group A — Process Group B - then look for your processor named this here - and return something identifiable - probably current GUID?

2. Other ideas - what kind of “Recursive Name Lookup Services” have others built that work in this scenario?


Thanks in Advance
~ John