You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by ph...@orange.com on 2016/04/07 15:26:31 UTC

nifi processor to parse+update the current json on the fly

Hello
I have a DF with a processor owning a  json document that need to be transformed before sent to an ElasticSearch processor
I have something like this
$.element.attributes[0].value  --> not to be changed
$.element.attributes[0].type--> not to be changed
...
$.element.attributes[10].value   -> I need to change the value
$.element.attributes[10].type    -> I need to change the type
...
....
$.element.attributes[15].value --> not to be changed
$.element.attributes[15].type--> not to be changed


Its not clear for me what is the right processor to take  to implement this :

EvaluateJsonPath  seems nice because I can identify the json path to be changed but can I replace   with it ?
ReplaceTxt ? seems to be string replace in txt loosing the  json structure ?

My requirement is to parse+update the current json on the fly ...  ( same than https://github.com/jayway/JsonPath)

Should I need to develop my own processor ?

Any help will be nice :)
Thx
Philippe



_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.


Re: nifi processor to parse+update the current json on the fly

Posted by Thad Guidry <th...@gmail.com>.
Yeap, I think Informatica's DataStage plugins have that ability also, to
let the user know its not streaming, but filling and emptying, filling and
emptying.

Dunno about IBM's :)

Thad
+ThadGuidry <https://www.google.com/+ThadGuidry>

Re: nifi processor to parse+update the current json on the fly

Posted by Joe Witt <jo...@gmail.com>.
Awesome analysis/input Thad and great references.  Reading through
now.  The flagging of the lack of stream orientation to the library is
great.

Somewhat related we've pondered adding an annotation that can be
placed on processors so that those which are able to operate on input
and output streams without loading full objects in memory get some
visual flag/indicator in the UI.  Idea being it would help dataflow
managers to at least realize what they're doing can create memory
congestion/scalability issues.  What do you think of that idea?

On Fri, Apr 8, 2016 at 10:03 PM, Thad Guidry <th...@gmail.com> wrote:
> Frank's work utilizes the Jolt spec(Apache 2 license), which is a great way
> to handle JsonToJson transforms in my opinion.
>
> Jolt is not a good fit for Process or Rules, (Use Groovy or Java, etc), but
> transforming Json in a great declarative way with Jolt beats the pants off
> of anything else out there. Although its not stream based, and can consume
> memory when your Json payload size is huge, like 300mb json files, etc, but
> fine for most Json payloads in the wild.
>
> "Two things to be aware of :
>
> Jolt is not "stream" based, so if you have a very large Json document to
> transform you need to have enough memory to hold it.
> The transform process will create and discard a lot of objects, so the
> garbage collector will have work to do.
> "
>
> A few more details about how it can be used are mentioned on its official
> page here:
> http://bazaarvoice.github.io/jolt/
>
> A demo of Jolt to see how you can transform Json to Json (click the
> Transform button):
> http://jolt-demo.appspot.com/#ritwickgupta
>
> Here's the rough performance of Jolt in 2013 where an 80k json file is
> shifted in about 5 secs. (authors notes on this slide are interesting), :
> https://docs.google.com/presentation/d/1sAiuiFC4Lzz4-064sg1p8EQt2ev0o442MfEbvrpD1ls/edit#slide=id.g9ac79e71_01
>
> Thad
> +ThadGuidry

Re: nifi processor to parse+update the current json on the fly

Posted by Thad Guidry <th...@gmail.com>.
Frank's work utilizes the Jolt spec(Apache 2 license), which is a great way
to handle JsonToJson transforms in my opinion.

Jolt is not a good fit for Process or Rules, (Use Groovy or Java, etc), but
transforming Json in a great declarative way with Jolt beats the pants off
of anything else out there. Although its not stream based, and can consume
memory when your Json payload size is huge, like 300mb json files, etc, but
fine for most Json payloads in the wild.

"Two things to be aware of :

   1. Jolt is not "stream" based, so if you have a very large Json document
   to transform you need to have enough memory to hold it.
   2. The transform process will create and discard a lot of objects, so
   the garbage collector will have work to do.
   ​"​

A few more details about how it can be used are mentioned on its official
page here:
http://bazaarvoice.github.io/jolt/

A demo of Jolt to see how you can transform Json to Json (click the
Transform button):
http://jolt-demo.appspot.com/#ritwickgupta

Here's the rough performance of Jolt in 2013 where an 80k json file is
shifted in about 5 secs. (authors notes on this slide are interesting), :
https://docs.google.com/presentation/d/1sAiuiFC4Lzz4-064sg1p8EQt2ev0o442MfEbvrpD1ls/edit#slide=id.g9ac79e71_01

Thad
+ThadGuidry <https://www.google.com/+ThadGuidry>

Re: nifi processor to parse+update the current json on the fly

Posted by Joe Witt <jo...@gmail.com>.
Agreed Michal.  What do you think about the above comment i made
regarding a current idea found in Github

"I think though what we just need to do is finally tackle
'https://issues.apache.org/jira/browse/NIFI-361' and here is a great
example to base it on.  The work Frank started here
'https://github.com/fsauer65/NiFi-Extensions/tree/master/nifi-jsontransform-bundle'
is a great base but we'd need to verify licensing and that the UI
elements are license/copyright friendly as well.
"

On Fri, Apr 8, 2016 at 3:28 PM, Michal Klempa <mi...@gmail.com> wrote:
> To my view, we are missing processor which would alter existing JSON
> with some attributes (not AttributesToJSON, which replaces the content
> as a whole). Something like AttributesDecorateJSON.
> Or, as your need is - something like JSONReplace processor, which
> would follow predefined replacement rules.
>
> On Fri, Apr 8, 2016 at 7:53 AM,  <ph...@orange.com> wrote:
>> Hello thx Thad , joe , all
>>
>> For different answers.
>>
>> I understand how to  know….
>>
>> Philippe
>>
>> Best regards
>>
>>
>>
>> De : Thad Guidry [mailto:thadguidry@gmail.com]
>> Envoyé : jeudi 7 avril 2016 17:32
>> À : users@nifi.apache.org
>> Objet : Re: nifi processor to parse+update the current json on the fly
>>
>>
>>
>> Philippe,
>>
>> I would encourage you to just use Groovy with JsonSlurper in the
>> ExecuteScript processor.  Its a blazing fast parser actually.
>>
>> http://groovy-lang.org/json.html
>>
>> http://docs.groovy-lang.org/latest/html/gapi/groovy/json/JsonSlurper.html
>>
>>
>> Thad
>>
>> +ThadGuidry
>>
>>
>>
>> _________________________________________________________________________________________________________________________
>>
>> Ce message et ses pieces jointes peuvent contenir des informations
>> confidentielles ou privilegiees et ne doivent donc
>> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu
>> ce message par erreur, veuillez le signaler
>> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
>> electroniques etant susceptibles d'alteration,
>> Orange decline toute responsabilite si ce message a ete altere, deforme ou
>> falsifie. Merci.
>>
>> This message and its attachments may contain confidential or privileged
>> information that may be protected by law;
>> they should not be distributed, used or copied without authorisation.
>> If you have received this email in error, please notify the sender and
>> delete this message and its attachments.
>> As emails may be altered, Orange is not liable for messages that have been
>> modified, changed or falsified.
>> Thank you.

Re: nifi processor to parse+update the current json on the fly

Posted by Michal Klempa <mi...@gmail.com>.
To my view, we are missing processor which would alter existing JSON
with some attributes (not AttributesToJSON, which replaces the content
as a whole). Something like AttributesDecorateJSON.
Or, as your need is - something like JSONReplace processor, which
would follow predefined replacement rules.

On Fri, Apr 8, 2016 at 7:53 AM,  <ph...@orange.com> wrote:
> Hello thx Thad , joe , all
>
> For different answers.
>
> I understand how to  know….
>
> Philippe
>
> Best regards
>
>
>
> De : Thad Guidry [mailto:thadguidry@gmail.com]
> Envoyé : jeudi 7 avril 2016 17:32
> À : users@nifi.apache.org
> Objet : Re: nifi processor to parse+update the current json on the fly
>
>
>
> Philippe,
>
> I would encourage you to just use Groovy with JsonSlurper in the
> ExecuteScript processor.  Its a blazing fast parser actually.
>
> http://groovy-lang.org/json.html
>
> http://docs.groovy-lang.org/latest/html/gapi/groovy/json/JsonSlurper.html
>
>
> Thad
>
> +ThadGuidry
>
>
>
> _________________________________________________________________________________________________________________________
>
> Ce message et ses pieces jointes peuvent contenir des informations
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and
> delete this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been
> modified, changed or falsified.
> Thank you.

RE: nifi processor to parse+update the current json on the fly

Posted by ph...@orange.com.
Hello thx Thad , joe , all
For different answers.
I understand how to  know….
Philippe
Best regards

De : Thad Guidry [mailto:thadguidry@gmail.com]
Envoyé : jeudi 7 avril 2016 17:32
À : users@nifi.apache.org
Objet : Re: nifi processor to parse+update the current json on the fly

Philippe,
I would encourage you to just use Groovy with JsonSlurper in the ExecuteScript processor.  Its a blazing fast parser actually.

http://groovy-lang.org/json.html

http://docs.groovy-lang.org/latest/html/gapi/groovy/json/JsonSlurper.html

Thad
+ThadGuidry<https://www.google.com/+ThadGuidry>


_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.


Re: nifi processor to parse+update the current json on the fly

Posted by Madhukar Thota <ma...@gmail.com>.
Here is an example of json to json conversion using Groovy with JsonSlurper.

http://funnifi.blogspot.com/2016/02/executescript-json-to-json-conversion.html

On Thu, Apr 7, 2016 at 11:31 AM, Thad Guidry <th...@gmail.com> wrote:

> Philippe,
>
> I would encourage you to just use Groovy with JsonSlurper in the
> ExecuteScript processor.  Its a blazing fast parser actually.
>
> http://groovy-lang.org/json.html
>
> http://docs.groovy-lang.org/latest/html/gapi/groovy/json/JsonSlurper.html
>
> Thad
> +ThadGuidry <https://www.google.com/+ThadGuidry>
>
>

Re: nifi processor to parse+update the current json on the fly

Posted by Thad Guidry <th...@gmail.com>.
Philippe,

I would encourage you to just use Groovy with JsonSlurper in the
ExecuteScript processor.  Its a blazing fast parser actually.

http://groovy-lang.org/json.html

http://docs.groovy-lang.org/latest/html/gapi/groovy/json/JsonSlurper.html

Thad
+ThadGuidry <https://www.google.com/+ThadGuidry>

Re: nifi processor to parse+update the current json on the fly

Posted by Joe Witt <jo...@gmail.com>.
Philippe

As far as I know here is the state of affairs for this:
1) You can use EvaluateJSONPath and ReplaceText in a combination for
some cases but it is more awkward and difficult than it should be.

2) You can use the execute script processors to write a groovy,
javascript, or other type of script to do precisely the manipulation
you want on the fly.

I think though what we just need to do is finally tackle
'https://issues.apache.org/jira/browse/NIFI-361' and here is a great
example to base it on.  The work Frank started here
'https://github.com/fsauer65/NiFi-Extensions/tree/master/nifi-jsontransform-bundle'
is a great base but we'd need to verify licensing and that the UI
elements are license/copyright friendly as well.

I believe if you wanted to collaborate/contribute on this one it would
be excellent.

Thanks
Joe

On Thu, Apr 7, 2016 at 9:26 AM,  <ph...@orange.com> wrote:
> Hello
>
> I have a DF with a processor owning a  json document that need to be
> transformed before sent to an ElasticSearch processor
>
> I have something like this
>
> $.element.attributes[0].value  à not to be changed
>
> $.element.attributes[0].typeà not to be changed
>
> …
>
> $.element.attributes[10].value   -> I need to change the value
>
> $.element.attributes[10].type    -> I need to change the type
>
> …
>
> ….
>
> $.element.attributes[15].value à not to be changed
>
> $.element.attributes[15].typeà not to be changed
>
>
>
>
>
> Its not clear for me what is the right processor to take  to implement this
> :
>
>
>
> EvaluateJsonPath  seems nice because I can identify the json path to be
> changed but can I replace   with it ?
>
> ReplaceTxt ? seems to be string replace in txt loosing the  json structure ?
>
>
>
> My requirement is to parse+update the current json on the fly …  ( same than
> https://github.com/jayway/JsonPath)
>
>
>
> Should I need to develop my own processor ?
>
>
>
> Any help will be nice J
>
> Thx
>
> Philippe
>
>
>
>
>
> _________________________________________________________________________________________________________________________
>
> Ce message et ses pieces jointes peuvent contenir des informations
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and
> delete this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been
> modified, changed or falsified.
> Thank you.