You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by Jean-Sebastien Vachon <js...@brizodata.com> on 2020/02/14 20:00:54 UTC
Problem processing "huge" json objects
Hi all,
I am having some trouble processing 17 objects (total size 20.6GB ) through a JsonEvaluateJsonPath processor.
Originally, the JVM had only 6GB and I progressively upgraded the amount of Ram and it still fails with the following settings:
-Xms16g -Xmx40g -XX:MaxPermSize=6G -XX:PermSize=4G
The exact error message is:
2020-02-14 19:55:53,799 ERROR [Timer-Driven Process Thread-316] o.a.n.p.standard.EvaluateJsonPath EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] failed to process session due to java.lang.OutOfMemoryError: Requested array size exceeds VM limit; Pro
cessor Administratively Yielded for 1 sec: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.lang.StringCoding.encode(StringCoding.java:350)
at java.lang.String.getBytes(String.java:941)
at org.apache.nifi.processors.standard.EvaluateJsonPath.lambda$onTrigger$3(EvaluateJsonPath.java:331)
at org.apache.nifi.processors.standard.EvaluateJsonPath$$Lambda$840/682258977.process(Unknown Source)
at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2665)
at org.apache.nifi.processors.standard.EvaluateJsonPath.onTrigger(EvaluateJsonPath.java:329)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:205)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Are there any other settings I can tune? if not what are my options?
Thanks
Re: Problem processing "huge" json objects
Posted by Jean-Sebastien Vachon <js...@brizodata.com>.
Hi
sorry for the late response. The Json looks like this
{
"x": [ {}, {}, {} ]
}
________________________________
From: Mike Thomsen <mi...@gmail.com>
Sent: Saturday, February 15, 2020 2:54 PM
To: users@nifi.apache.org <us...@nifi.apache.org>
Subject: Re: Problem processing "huge" json objects
> JSON contains an array of object
Like this:
[ { }, {} ]
Or like this?
{
"x": [ {}, {}, {} ]
}
Because if the latter, I might have a custom NAR file I can share that I had to use for a similar situation.
On Fri, Feb 14, 2020 at 3:58 PM Jean-Sebastien Vachon <js...@brizodata.com>> wrote:
The JSON contains an array of object that are to be inserted into a DB (and copied over to S3 for archival)...
I used a Split processor to cut them down to smaller chunks and it worked.
Thanks anyhow
________________________________
From: Pierre Villard <pi...@gmail.com>>
Sent: Friday, February 14, 2020 3:28 PM
To: users@nifi.apache.org<ma...@nifi.apache.org> <us...@nifi.apache.org>>
Subject: Re: Problem processing "huge" json objects
Hi,
Can't you use the Record processors? What are you trying to achieve?
Thanks,
Pierre
Le ven. 14 févr. 2020 à 12:01, Jean-Sebastien Vachon <js...@brizodata.com>> a écrit :
Hi all,
I am having some trouble processing 17 objects (total size 20.6GB ) through a JsonEvaluateJsonPath processor.
Originally, the JVM had only 6GB and I progressively upgraded the amount of Ram and it still fails with the following settings:
-Xms16g -Xmx40g -XX:MaxPermSize=6G -XX:PermSize=4G
The exact error message is:
2020-02-14 19:55:53,799 ERROR [Timer-Driven Process Thread-316] o.a.n.p.standard.EvaluateJsonPath EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] failed to process session due to java.lang.OutOfMemoryError: Requested array size exceeds VM limit; Pro
cessor Administratively Yielded for 1 sec: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.lang.StringCoding.encode(StringCoding.java:350)
at java.lang.String.getBytes(String.java:941)
at org.apache.nifi.processors.standard.EvaluateJsonPath.lambda$onTrigger$3(EvaluateJsonPath.java:331)
at org.apache.nifi.processors.standard.EvaluateJsonPath$$Lambda$840/682258977.process(Unknown Source)
at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2665)
at org.apache.nifi.processors.standard.EvaluateJsonPath.onTrigger(EvaluateJsonPath.java:329)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:205)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Are there any other settings I can tune? if not what are my options?
Thanks
Re: Problem processing "huge" json objects
Posted by Mike Thomsen <mi...@gmail.com>.
> JSON contains an array of object
Like this:
[ { }, {} ]
Or like this?
{
"x": [ {}, {}, {} ]
}
Because if the latter, I might have a custom NAR file I can share that I
had to use for a similar situation.
On Fri, Feb 14, 2020 at 3:58 PM Jean-Sebastien Vachon <
jsvachon@brizodata.com> wrote:
> The JSON contains an array of object that are to be inserted into a DB
> (and copied over to S3 for archival)...
> I used a Split processor to cut them down to smaller chunks and it worked.
>
> Thanks anyhow
> ------------------------------
> *From:* Pierre Villard <pi...@gmail.com>
> *Sent:* Friday, February 14, 2020 3:28 PM
> *To:* users@nifi.apache.org <us...@nifi.apache.org>
> *Subject:* Re: Problem processing "huge" json objects
>
> Hi,
>
> Can't you use the Record processors? What are you trying to achieve?
>
> Thanks,
> Pierre
>
> Le ven. 14 févr. 2020 à 12:01, Jean-Sebastien Vachon <
> jsvachon@brizodata.com> a écrit :
>
> Hi all,
>
> I am having some trouble processing 17 objects (total size 20.6GB )
> through a JsonEvaluateJsonPath processor.
> Originally, the JVM had only 6GB and I progressively upgraded the amount
> of Ram and it still fails with the following settings:
>
> -Xms16g -Xmx40g -XX:MaxPermSize=6G -XX:PermSize=4G
>
> The exact error message is:
>
> 2020-02-14 19:55:53,799 ERROR [Timer-Driven Process Thread-316]
> o.a.n.p.standard.EvaluateJsonPath
> EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357]
> EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] failed to process
> session due to java.lang.OutOfMemoryError: Requested array size exceeds VM
> limit; Pro
> cessor Administratively Yielded for 1 sec: java.lang.OutOfMemoryError:
> Requested array size exceeds VM limit
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
> at java.lang.StringCoding.encode(StringCoding.java:350)
> at java.lang.String.getBytes(String.java:941)
> at
> org.apache.nifi.processors.standard.EvaluateJsonPath.lambda$onTrigger$3(EvaluateJsonPath.java:331)
> at
> org.apache.nifi.processors.standard.EvaluateJsonPath$$Lambda$840/682258977.process(Unknown
> Source)
> at
> org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2665)
> at
> org.apache.nifi.processors.standard.EvaluateJsonPath.onTrigger(EvaluateJsonPath.java:329)
> at
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
> at
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:205)
> at
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
>
> Are there any other settings I can tune? if not what are my options?
>
> Thanks
>
>
Re: Problem processing "huge" json objects
Posted by Jean-Sebastien Vachon <js...@brizodata.com>.
The JSON contains an array of object that are to be inserted into a DB (and copied over to S3 for archival)...
I used a Split processor to cut them down to smaller chunks and it worked.
Thanks anyhow
________________________________
From: Pierre Villard <pi...@gmail.com>
Sent: Friday, February 14, 2020 3:28 PM
To: users@nifi.apache.org <us...@nifi.apache.org>
Subject: Re: Problem processing "huge" json objects
Hi,
Can't you use the Record processors? What are you trying to achieve?
Thanks,
Pierre
Le ven. 14 févr. 2020 à 12:01, Jean-Sebastien Vachon <js...@brizodata.com>> a écrit :
Hi all,
I am having some trouble processing 17 objects (total size 20.6GB ) through a JsonEvaluateJsonPath processor.
Originally, the JVM had only 6GB and I progressively upgraded the amount of Ram and it still fails with the following settings:
-Xms16g -Xmx40g -XX:MaxPermSize=6G -XX:PermSize=4G
The exact error message is:
2020-02-14 19:55:53,799 ERROR [Timer-Driven Process Thread-316] o.a.n.p.standard.EvaluateJsonPath EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] failed to process session due to java.lang.OutOfMemoryError: Requested array size exceeds VM limit; Pro
cessor Administratively Yielded for 1 sec: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.lang.StringCoding.encode(StringCoding.java:350)
at java.lang.String.getBytes(String.java:941)
at org.apache.nifi.processors.standard.EvaluateJsonPath.lambda$onTrigger$3(EvaluateJsonPath.java:331)
at org.apache.nifi.processors.standard.EvaluateJsonPath$$Lambda$840/682258977.process(Unknown Source)
at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2665)
at org.apache.nifi.processors.standard.EvaluateJsonPath.onTrigger(EvaluateJsonPath.java:329)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:205)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Are there any other settings I can tune? if not what are my options?
Thanks
Re: Problem processing "huge" json objects
Posted by Pierre Villard <pi...@gmail.com>.
Hi,
Can't you use the Record processors? What are you trying to achieve?
Thanks,
Pierre
Le ven. 14 févr. 2020 à 12:01, Jean-Sebastien Vachon <js...@brizodata.com>
a écrit :
> Hi all,
>
> I am having some trouble processing 17 objects (total size 20.6GB )
> through a JsonEvaluateJsonPath processor.
> Originally, the JVM had only 6GB and I progressively upgraded the amount
> of Ram and it still fails with the following settings:
>
> -Xms16g -Xmx40g -XX:MaxPermSize=6G -XX:PermSize=4G
>
> The exact error message is:
>
> 2020-02-14 19:55:53,799 ERROR [Timer-Driven Process Thread-316]
> o.a.n.p.standard.EvaluateJsonPath
> EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357]
> EvaluateJsonPath[id=01701031-6cff-1beb-b313-ab0531781357] failed to process
> session due to java.lang.OutOfMemoryError: Requested array size exceeds VM
> limit; Pro
> cessor Administratively Yielded for 1 sec: java.lang.OutOfMemoryError:
> Requested array size exceeds VM limit
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
> at java.lang.StringCoding.encode(StringCoding.java:350)
> at java.lang.String.getBytes(String.java:941)
> at
> org.apache.nifi.processors.standard.EvaluateJsonPath.lambda$onTrigger$3(EvaluateJsonPath.java:331)
> at
> org.apache.nifi.processors.standard.EvaluateJsonPath$$Lambda$840/682258977.process(Unknown
> Source)
> at
> org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2665)
> at
> org.apache.nifi.processors.standard.EvaluateJsonPath.onTrigger(EvaluateJsonPath.java:329)
> at
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> at
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
> at
> org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:205)
> at
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
>
> Are there any other settings I can tune? if not what are my options?
>
> Thanks
>