You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by shweta <sh...@gmail.com> on 2015/12/14 17:58:48 UTC

How to iterate through complex JSON objects.

Hi All,

I have a JSON which looks as following:-

item[{
    "tags": ["Java","Hadoop","nimbus"]
    "id": "2233"
    "title": "testing with Java"
    "comments":[{"post_id":"2233","body":"try option1"} ,
{"post_id":"2233","body":"try option2"} , {"post_id":"2233","body":"try
option3"}]},

{
    "tags": ["Java","Hadoop"]
    "id": "2232"
    "title": "testing with Java"
    "comments":[{"post_id":"2232","body":"try option1"} ,
{"post_id":"2232","body":"try option2"} , {"post_id":"2232","body":"try
option3"},

{
    "tags": ["Java"]
    "id": "2231"
    "title": "testing with Java"
    "comments":[{"post_id":"2231","body":"try option1"} ,
{"post_id":"2231","body":"try option2"} , {"post_id":"2231","body":"try
option3"}
]

I need to convert the JSON to CSV.

Id                     Tags                                  Title                                       
Body

2233  , <java><Hadoop><Nimbus>  ,   "testing with Java"   ,  <"try
option1"><"try option2"><"try option3">

2232  , <java><Hadoop>  ,   "testing with Java"   ,  <"try option1"><"try
option2"><"try option3">
.
.
I used a combination of EvaluateJSONPath and Replace Text for the same. 
First issue I'm facing is in parsing an array (Tags and Body). I couldn't
figure out how can iterate over the array of JSON values.

For Simplicity sake I configured EvaluateJSONPath with values shown in the
image:-

<http://apache-nifi-developer-list.39713.n7.nabble.com/file/n5776/EvaluateJsonPath.png> 

and Replace Text processor with 
Regex expression as [\S\s]+ and Replacement Values as
"${Body}","${Tags},"${Post_id}","${Title}".

It works for single record.

Can you please provide pointer how to iterate through value of arrays.

Thanks,
Shweta








--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-iterate-through-complex-JSON-objects-tp5776.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: How to iterate through complex JSON objects.

Posted by Joe Witt <jo...@gmail.com>.
Igor,

The term ETL has a lot of baggage associated with it.  What NiFi was
built to do is dataflow management.  There are already a lot of tools
out there that address the typical relational database ETL space and
NiFi doesn't need to replicate all of those functions.  So probably
best to just focus on use cases/problems and see if NiFi handles them
nicely now, doesn't handle them nicely now but should be made to do
so, or doesn't handle them nicely and it should always be left to some
other system.

Thanks
Joe

On Wed, Dec 16, 2015 at 7:10 PM, Igor Kravzov <ig...@gmail.com> wrote:
> The question is "Is NiFi supposed to be a full ETL tool"?
> On Dec 16, 2015 11:27 AM, "Angry Duck Studio" <an...@gmail.com>
> wrote:
>
>> Shweta,
>>
>> I think your issue demonstrates one of my minor complaints with NiFi --
>> that you always have to think in terms of several little, built-in pieces
>> to get a simple job done. Sometimes it's fun, like a puzzle, but other
>> times, I don't feel like dealing with it. That's why I wrote this:
>> https://github.com/mring33621/scripting-for-nifi. A short, custom JS or
>> Groovy script could have handled your JSON data munging in a single stroke.
>>
>> -Matt
>>
>> On Tue, Dec 15, 2015 at 8:40 PM, shweta <sh...@gmail.com> wrote:
>>
>> > Thanks Bryan!! Infact I followed the exact approach that you told. Just
>> > that
>> > I was clueless about using Mergecontent processor. So I wrote my custom
>> > script to combine the different outputs and executed it using Execute
>> > Stream
>> > command.
>> > Will try the same with Mergecontent.
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> >
>> http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-iterate-through-complex-JSON-objects-tp5776p5806.html
>> > Sent from the Apache NiFi Developer List mailing list archive at
>> > Nabble.com.
>> >
>>

Re: How to iterate through complex JSON objects.

Posted by Igor Kravzov <ig...@gmail.com>.
The question is "Is NiFi supposed to be a full ETL tool"?
On Dec 16, 2015 11:27 AM, "Angry Duck Studio" <an...@gmail.com>
wrote:

> Shweta,
>
> I think your issue demonstrates one of my minor complaints with NiFi --
> that you always have to think in terms of several little, built-in pieces
> to get a simple job done. Sometimes it's fun, like a puzzle, but other
> times, I don't feel like dealing with it. That's why I wrote this:
> https://github.com/mring33621/scripting-for-nifi. A short, custom JS or
> Groovy script could have handled your JSON data munging in a single stroke.
>
> -Matt
>
> On Tue, Dec 15, 2015 at 8:40 PM, shweta <sh...@gmail.com> wrote:
>
> > Thanks Bryan!! Infact I followed the exact approach that you told. Just
> > that
> > I was clueless about using Mergecontent processor. So I wrote my custom
> > script to combine the different outputs and executed it using Execute
> > Stream
> > command.
> > Will try the same with Mergecontent.
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-iterate-through-complex-JSON-objects-tp5776p5806.html
> > Sent from the Apache NiFi Developer List mailing list archive at
> > Nabble.com.
> >
>

Re: How to iterate through complex JSON objects.

Posted by Matthew Burgess <ma...@gmail.com>.
All,

I have submitted a patch for NIFI-210 to offer scripting capabilities, my GitHub feature branch is at:

https://github.com/mattyb149/nifi/tree/script-processors


I would truly appreciate any comments, questions, or suggestions about this capability.

Regards,
Matt




On 12/16/15, 11:41 AM, "Joe Witt" <jo...@gmail.com> wrote:

>It is a fair criticism that sometimes the cohesion level of processors
>can be simply too much.  Early on I used to 'fight' to find the right
>abstraction and argue that others do the same.  But what I've found is
>that it is better to let it happen naturally and to offer options.
>Matt, I think your approach of giving yourself an option to break into
>scripting in the middle of the flow in a way that lets you mangle data
>as needed but benefitting from the strength of the framework is
>perfect.  Matt Burgess is working on NIFI-210 to incorporate those
>languages and many others.
>
>Thanks
>Joe
>
>On Wed, Dec 16, 2015 at 8:27 AM, Angry Duck Studio
><an...@gmail.com> wrote:
>> Shweta,
>>
>> I think your issue demonstrates one of my minor complaints with NiFi --
>> that you always have to think in terms of several little, built-in pieces
>> to get a simple job done. Sometimes it's fun, like a puzzle, but other
>> times, I don't feel like dealing with it. That's why I wrote this:
>> https://github.com/mring33621/scripting-for-nifi. A short, custom JS or
>> Groovy script could have handled your JSON data munging in a single stroke.
>>
>> -Matt
>>
>> On Tue, Dec 15, 2015 at 8:40 PM, shweta <sh...@gmail.com> wrote:
>>
>>> Thanks Bryan!! Infact I followed the exact approach that you told. Just
>>> that
>>> I was clueless about using Mergecontent processor. So I wrote my custom
>>> script to combine the different outputs and executed it using Execute
>>> Stream
>>> command.
>>> Will try the same with Mergecontent.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-iterate-through-complex-JSON-objects-tp5776p5806.html
>>> Sent from the Apache NiFi Developer List mailing list archive at
>>> Nabble.com.
>>>


Re: How to iterate through complex JSON objects.

Posted by Joe Witt <jo...@gmail.com>.
It is a fair criticism that sometimes the cohesion level of processors
can be simply too much.  Early on I used to 'fight' to find the right
abstraction and argue that others do the same.  But what I've found is
that it is better to let it happen naturally and to offer options.
Matt, I think your approach of giving yourself an option to break into
scripting in the middle of the flow in a way that lets you mangle data
as needed but benefitting from the strength of the framework is
perfect.  Matt Burgess is working on NIFI-210 to incorporate those
languages and many others.

Thanks
Joe

On Wed, Dec 16, 2015 at 8:27 AM, Angry Duck Studio
<an...@gmail.com> wrote:
> Shweta,
>
> I think your issue demonstrates one of my minor complaints with NiFi --
> that you always have to think in terms of several little, built-in pieces
> to get a simple job done. Sometimes it's fun, like a puzzle, but other
> times, I don't feel like dealing with it. That's why I wrote this:
> https://github.com/mring33621/scripting-for-nifi. A short, custom JS or
> Groovy script could have handled your JSON data munging in a single stroke.
>
> -Matt
>
> On Tue, Dec 15, 2015 at 8:40 PM, shweta <sh...@gmail.com> wrote:
>
>> Thanks Bryan!! Infact I followed the exact approach that you told. Just
>> that
>> I was clueless about using Mergecontent processor. So I wrote my custom
>> script to combine the different outputs and executed it using Execute
>> Stream
>> command.
>> Will try the same with Mergecontent.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-iterate-through-complex-JSON-objects-tp5776p5806.html
>> Sent from the Apache NiFi Developer List mailing list archive at
>> Nabble.com.
>>

Re: How to iterate through complex JSON objects.

Posted by Angry Duck Studio <an...@gmail.com>.
Shweta,

I think your issue demonstrates one of my minor complaints with NiFi --
that you always have to think in terms of several little, built-in pieces
to get a simple job done. Sometimes it's fun, like a puzzle, but other
times, I don't feel like dealing with it. That's why I wrote this:
https://github.com/mring33621/scripting-for-nifi. A short, custom JS or
Groovy script could have handled your JSON data munging in a single stroke.

-Matt

On Tue, Dec 15, 2015 at 8:40 PM, shweta <sh...@gmail.com> wrote:

> Thanks Bryan!! Infact I followed the exact approach that you told. Just
> that
> I was clueless about using Mergecontent processor. So I wrote my custom
> script to combine the different outputs and executed it using Execute
> Stream
> command.
> Will try the same with Mergecontent.
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-iterate-through-complex-JSON-objects-tp5776p5806.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>

Re: How to iterate through complex JSON objects.

Posted by shweta <sh...@gmail.com>.
Thanks Bryan!! Infact I followed the exact approach that you told. Just that
I was clueless about using Mergecontent processor. So I wrote my custom
script to combine the different outputs and executed it using Execute Stream
command.
Will try the same with Mergecontent.



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-iterate-through-complex-JSON-objects-tp5776p5806.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: How to iterate through complex JSON objects.

Posted by Bryan Bende <bb...@gmail.com>.
As an alternative approach, could you use SplitJSON first to split on the
items array?

You would get a FlowFile for each item, then when you use EvaluateJSONPath
you would be dealing with only a single FlowFile so you could extract the
id and title and use ReplaceText like you were already doing.

Then use MergeContent at the end to merge them back together, or depending
what you are doing maybe they don't need to be merged.

On Tue, Dec 15, 2015 at 3:27 AM, shweta <sh...@gmail.com> wrote:

> Just figured out that by specifying the Return Type as Json in
> "EvaluateJsonPath" processor I got the entire array of values. So for JSON
> path expression  "$.item.*.id","$.item.*.title" , I got
> ["2233","2232","2231"],["testing with Java","testing with Java","testing
> with Java"]
> I'm just trying to figure out how I can transpose it and instead get
> something like this
>
> 2233, "testing with Java"
> 2232, "testing with Java"
> 2231, "testing with Java"
>
> to generate my desired csv.
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-iterate-through-complex-JSON-objects-tp5776p5791.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>

Re: How to iterate through complex JSON objects.

Posted by shweta <sh...@gmail.com>.
Just figured out that by specifying the Return Type as Json in
"EvaluateJsonPath" processor I got the entire array of values. So for JSON
path expression  "$.item.*.id","$.item.*.title" , I got 
["2233","2232","2231"],["testing with Java","testing with Java","testing
with Java"]
I'm just trying to figure out how I can transpose it and instead get
something like this

2233, "testing with Java"
2232, "testing with Java"
2231, "testing with Java"

to generate my desired csv.



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-iterate-through-complex-JSON-objects-tp5776p5791.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: How to iterate through complex JSON objects.

Posted by shweta <sh...@gmail.com>.
Hi Joe,

Thanks for the quick response. My bad I did not verify the JSON. It was a
typo.
I could get the array of values I wanted. 
But now the problem is since its not a scalar value being returned, I'm not
able to store it in a variable. EvaluateJSONPath 
throws an exception that unable to return a scalar value. The evaluated
value shown in exception however 
is correct.
How can I store non-scalar value as flow file attributes .

Thanks,
Shweta



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-iterate-through-complex-JSON-objects-tp5776p5789.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: How to iterate through complex JSON objects.

Posted by Joe Percivall <jo...@yahoo.com.INVALID>.
Hello Shweta,

Where did you get that JSON? I ask because it's not valid. I put it in a JSONPath evaluator[1] and cleaned it up:


{
"item":[
{
"tags": ["Java","Hadoop","nimbus"],
"id": "2233",
"title": "testing with Java",
"comments":[{"post_id":"2233","body":"try option1"} ,
{"post_id":"2233","body":"try option2"} , {"post_id":"2233","body":"tryoption3"}
]
},

{
"tags": ["Java","Hadoop"],
"id": "2232",
"title": "testing with Java",
"comments":[{"post_id":"2232","body":"try option1"} ,
{"post_id":"2232","body":"try option2"} , {"post_id":"2232","body":"tryoption3"}
]
},

{
"tags": ["Java"],
"id": "2231",
"title": "testing with Java",
"comments":[{"post_id":"2231","body":"try option1"} ,
{"post_id":"2231","body":"try option2"} , {"post_id":"2231","body":"tryoption3"}
]
}
]
}

You were missing the proper beginning and some commas. 

As for "iterating" over them you want to use the "*" path operator like such:

$.item.*.id

This path returns the ids for each item:

'0' => "2233"
'1' => "2232"
'2' => "2231"


[1] http://jsonpath.com/
Let me know if you have any other issues,
Joe
 - - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: joepercivall@yahoo.com




On Monday, December 14, 2015 12:57 PM, shweta <sh...@gmail.com> wrote:
Hi All,

I have a JSON which looks as following:-

item[{
    "tags": ["Java","Hadoop","nimbus"]
    "id": "2233"
    "title": "testing with Java"
    "comments":[{"post_id":"2233","body":"try option1"} ,
{"post_id":"2233","body":"try option2"} , {"post_id":"2233","body":"try
option3"}]},

{
    "tags": ["Java","Hadoop"]
    "id": "2232"
    "title": "testing with Java"
    "comments":[{"post_id":"2232","body":"try option1"} ,
{"post_id":"2232","body":"try option2"} , {"post_id":"2232","body":"try
option3"},

{
    "tags": ["Java"]
    "id": "2231"
    "title": "testing with Java"
    "comments":[{"post_id":"2231","body":"try option1"} ,
{"post_id":"2231","body":"try option2"} , {"post_id":"2231","body":"try
option3"}
]

I need to convert the JSON to CSV.

Id                     Tags                                  Title                                      
Body

2233  , <java><Hadoop><Nimbus>  ,   "testing with Java"   ,  <"try
option1"><"try option2"><"try option3">

2232  , <java><Hadoop>  ,   "testing with Java"   ,  <"try option1"><"try
option2"><"try option3">
.
.
I used a combination of EvaluateJSONPath and Replace Text for the same. 
First issue I'm facing is in parsing an array (Tags and Body). I couldn't
figure out how can iterate over the array of JSON values.

For Simplicity sake I configured EvaluateJSONPath with values shown in the
image:-

<http://apache-nifi-developer-list.39713.n7.nabble.com/file/n5776/EvaluateJsonPath.png> 

and Replace Text processor with 
Regex expression as [\S\s]+ and Replacement Values as
"${Body}","${Tags},"${Post_id}","${Title}".

It works for single record.

Can you please provide pointer how to iterate through value of arrays.

Thanks,
Shweta








--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/How-to-iterate-through-complex-JSON-objects-tp5776.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.