You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Jean-Sebastien Vachon <js...@brizodata.com> on 2020/02/14 20:00:54 UTC

Problem processing "huge" json objects

Hi all,

I am having some trouble processing 17 objects (total size 20.6GB ) through a JsonEvaluateJsonPath processor.
Originally, the JVM had only 6GB and I progressively upgraded the amount of Ram and it still fails with the following settings:

 -Xms16g -Xmx40g -XX:MaxPermSize=6G -XX:PermSize=4G

The exact error message is:

2020-02-14 19:55:53,799 ERROR [Timer-Driven Process Thread-316] o.a.n.p.standard.EvaluateJsonPath EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] failed to process session due to java.lang.OutOfMemoryError: Requested array size exceeds VM limit; Pro
cessor Administratively Yielded for 1 sec: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
        at java.lang.StringCoding.encode(StringCoding.java:350)
        at java.lang.String.getBytes(String.java:941)
        at org.apache.nifi.processors.standard.EvaluateJsonPath.lambda$onTrigger$3(EvaluateJsonPath.java:331)
        at org.apache.nifi.processors.standard.EvaluateJsonPath$$Lambda$840/682258977.process(Unknown Source)
        at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2665)
        at org.apache.nifi.processors.standard.EvaluateJsonPath.onTrigger(EvaluateJsonPath.java:329)
        at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
        at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
        at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:205)
        at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


Are there any other settings I can tune? if not what are my options?

Thanks

Re: Problem processing "huge" json objects

Posted by Jean-Sebastien Vachon <js...@brizodata.com>.
Hi

sorry for the late response. The Json looks like this

{
  "x": [ {}, {}, {} ]
}


________________________________
From: Mike Thomsen <mi...@gmail.com>
Sent: Saturday, February 15, 2020 2:54 PM
To: users@nifi.apache.org <us...@nifi.apache.org>
Subject: Re: Problem processing "huge" json objects

> JSON contains an array of object

Like this:

[ { }, {} ]

Or like this?

{
  "x": [ {}, {}, {} ]
}

Because if the latter, I might have a custom NAR file I can share that I had to use for a similar situation.

On Fri, Feb 14, 2020 at 3:58 PM Jean-Sebastien Vachon <js...@brizodata.com>> wrote:
The JSON contains an array of object that are to be inserted into a DB (and copied over to S3 for archival)...
I used a Split processor to cut them down to smaller chunks and it worked.

Thanks anyhow
________________________________
From: Pierre Villard <pi...@gmail.com>>
Sent: Friday, February 14, 2020 3:28 PM
To: users@nifi.apache.org<ma...@nifi.apache.org> <us...@nifi.apache.org>>
Subject: Re: Problem processing "huge" json objects

Hi,

Can't you use the Record processors? What are you trying to achieve?

Thanks,
Pierre

Le ven. 14 févr. 2020 à 12:01, Jean-Sebastien Vachon <js...@brizodata.com>> a écrit :
Hi all,

I am having some trouble processing 17 objects (total size 20.6GB ) through a JsonEvaluateJsonPath processor.
Originally, the JVM had only 6GB and I progressively upgraded the amount of Ram and it still fails with the following settings:

 -Xms16g -Xmx40g -XX:MaxPermSize=6G -XX:PermSize=4G

The exact error message is:

2020-02-14 19:55:53,799 ERROR [Timer-Driven Process Thread-316] o.a.n.p.standard.EvaluateJsonPath EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] failed to process session due to java.lang.OutOfMemoryError: Requested array size exceeds VM limit; Pro
cessor Administratively Yielded for 1 sec: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
        at java.lang.StringCoding.encode(StringCoding.java:350)
        at java.lang.String.getBytes(String.java:941)
        at org.apache.nifi.processors.standard.EvaluateJsonPath.lambda$onTrigger$3(EvaluateJsonPath.java:331)
        at org.apache.nifi.processors.standard.EvaluateJsonPath$$Lambda$840/682258977.process(Unknown Source)
        at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2665)
        at org.apache.nifi.processors.standard.EvaluateJsonPath.onTrigger(EvaluateJsonPath.java:329)
        at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
        at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
        at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:205)
        at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


Are there any other settings I can tune? if not what are my options?

Thanks

Re: Problem processing "huge" json objects

Posted by Mike Thomsen <mi...@gmail.com>.
> JSON contains an array of object

Like this:

[ { }, {} ]

Or like this?

{
  "x": [ {}, {}, {} ]
}

Because if the latter, I might have a custom NAR file I can share that I
had to use for a similar situation.

On Fri, Feb 14, 2020 at 3:58 PM Jean-Sebastien Vachon <
jsvachon@brizodata.com> wrote:

> The JSON contains an array of object that are to be inserted into a DB
> (and copied over to S3 for archival)...
> I used a Split processor to cut them down to smaller chunks and it worked.
>
> Thanks anyhow
> ------------------------------
> *From:* Pierre Villard <pi...@gmail.com>
> *Sent:* Friday, February 14, 2020 3:28 PM
> *To:* users@nifi.apache.org <us...@nifi.apache.org>
> *Subject:* Re: Problem processing "huge" json objects
>
> Hi,
>
> Can't you use the Record processors? What are you trying to achieve?
>
> Thanks,
> Pierre
>
> Le ven. 14 févr. 2020 à 12:01, Jean-Sebastien Vachon <
> jsvachon@brizodata.com> a écrit :
>
> Hi all,
>
> I am having some trouble processing 17 objects (total size 20.6GB )
> through a JsonEvaluateJsonPath processor.
> Originally, the JVM had only 6GB and I progressively upgraded the amount
> of Ram and it still fails with the following settings:
>
>  -Xms16g -Xmx40g -XX:MaxPermSize=6G -XX:PermSize=4G
>
> The exact error message is:
>
> 2020-02-14 19:55:53,799 ERROR [Timer-Driven Process Thread-316]
> o.a.n.p.standard.EvaluateJsonPath
> EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357]
> EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] failed to process
> session due to java.lang.OutOfMemoryError: Requested array size exceeds VM
> limit; Pro
> cessor Administratively Yielded for 1 sec: java.lang.OutOfMemoryError:
> Requested array size exceeds VM limit
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>         at java.lang.StringCoding.encode(StringCoding.java:350)
>         at java.lang.String.getBytes(String.java:941)
>         at
> org.apache.nifi.processors.standard.EvaluateJsonPath.lambda$onTrigger$3(EvaluateJsonPath.java:331)
>         at
> org.apache.nifi.processors.standard.EvaluateJsonPath$$Lambda$840/682258977.process(Unknown
> Source)
>         at
> org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2665)
>         at
> org.apache.nifi.processors.standard.EvaluateJsonPath.onTrigger(EvaluateJsonPath.java:329)
>         at
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
>         at
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
>         at
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:205)
>         at
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>
>
> Are there any other settings I can tune? if not what are my options?
>
> Thanks
>
>

Re: Problem processing "huge" json objects

Posted by Jean-Sebastien Vachon <js...@brizodata.com>.
The JSON contains an array of object that are to be inserted into a DB (and copied over to S3 for archival)...
I used a Split processor to cut them down to smaller chunks and it worked.

Thanks anyhow
________________________________
From: Pierre Villard <pi...@gmail.com>
Sent: Friday, February 14, 2020 3:28 PM
To: users@nifi.apache.org <us...@nifi.apache.org>
Subject: Re: Problem processing "huge" json objects

Hi,

Can't you use the Record processors? What are you trying to achieve?

Thanks,
Pierre

Le ven. 14 févr. 2020 à 12:01, Jean-Sebastien Vachon <js...@brizodata.com>> a écrit :
Hi all,

I am having some trouble processing 17 objects (total size 20.6GB ) through a JsonEvaluateJsonPath processor.
Originally, the JVM had only 6GB and I progressively upgraded the amount of Ram and it still fails with the following settings:

 -Xms16g -Xmx40g -XX:MaxPermSize=6G -XX:PermSize=4G

The exact error message is:

2020-02-14 19:55:53,799 ERROR [Timer-Driven Process Thread-316] o.a.n.p.standard.EvaluateJsonPath EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] failed to process session due to java.lang.OutOfMemoryError: Requested array size exceeds VM limit; Pro
cessor Administratively Yielded for 1 sec: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
        at java.lang.StringCoding.encode(StringCoding.java:350)
        at java.lang.String.getBytes(String.java:941)
        at org.apache.nifi.processors.standard.EvaluateJsonPath.lambda$onTrigger$3(EvaluateJsonPath.java:331)
        at org.apache.nifi.processors.standard.EvaluateJsonPath$$Lambda$840/682258977.process(Unknown Source)
        at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2665)
        at org.apache.nifi.processors.standard.EvaluateJsonPath.onTrigger(EvaluateJsonPath.java:329)
        at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
        at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
        at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:205)
        at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


Are there any other settings I can tune? if not what are my options?

Thanks

Re: Problem processing "huge" json objects

Posted by Pierre Villard <pi...@gmail.com>.
Hi,

Can't you use the Record processors? What are you trying to achieve?

Thanks,
Pierre

Le ven. 14 févr. 2020 à 12:01, Jean-Sebastien Vachon <js...@brizodata.com>
a écrit :

> Hi all,
>
> I am having some trouble processing 17 objects (total size 20.6GB )
> through a JsonEvaluateJsonPath processor.
> Originally, the JVM had only 6GB and I progressively upgraded the amount
> of Ram and it still fails with the following settings:
>
>  -Xms16g -Xmx40g -XX:MaxPermSize=6G -XX:PermSize=4G
>
> The exact error message is:
>
> 2020-02-14 19:55:53,799 ERROR [Timer-Driven Process Thread-316]
> o.a.n.p.standard.EvaluateJsonPath
> EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357]
> EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] failed to process
> session due to java.lang.OutOfMemoryError: Requested array size exceeds VM
> limit; Pro
> cessor Administratively Yielded for 1 sec: java.lang.OutOfMemoryError:
> Requested array size exceeds VM limit
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>         at java.lang.StringCoding.encode(StringCoding.java:350)
>         at java.lang.String.getBytes(String.java:941)
>         at
> org.apache.nifi.processors.standard.EvaluateJsonPath.lambda$onTrigger$3(EvaluateJsonPath.java:331)
>         at
> org.apache.nifi.processors.standard.EvaluateJsonPath$$Lambda$840/682258977.process(Unknown
> Source)
>         at
> org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2665)
>         at
> org.apache.nifi.processors.standard.EvaluateJsonPath.onTrigger(EvaluateJsonPath.java:329)
>         at
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
>         at
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
>         at
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:205)
>         at
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>
>
> Are there any other settings I can tune? if not what are my options?
>
> Thanks
>