You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by françois lacombe <fr...@dcbrain.com> on 2019/02/06 11:06:04 UTC

Get nested Rows from Json string

Hi all,

I currently get a json string from my pgsql source with nested objects to
be converted into Flink's Row.
Nested json objects should go in nested Rows.
An avro schema rules the structure my source should conform to.

According to this json :
{
  "a":"b",
  "c":"d",
  "e":{
       "f":"g"
   }
}

("b", "d", Row("g")) is expected as a result according to my avro schema.

I wrote a recursive method which iterate over json objects and put nested
Rows at right indices in their parent but here is what outputs : ("b", "d",
"g")
Child Row is appended to the parent. I don't understand why.
Obviously, process is crashing arguing the top level Row arity doesn't
match serializers.

Is there some native methods in Flink to achieve that?
I don't feel so comfortable to have written my own json processor for this
job.

Do you have any hint which can help please ?

All the best

François

-- 

 <http://www.dcbrain.com/>   <https://twitter.com/dcbrain_feed?lang=fr>   
<https://www.linkedin.com/company/dcbrain>   
<https://www.youtube.com/channel/UCSJrWPBLQ58fHPN8lP_SEGw>


 Pensez à la 
planète, imprimer ce papier que si nécessaire 

Re: Get nested Rows from Json string

Posted by françois lacombe <fr...@dcbrain.com>.
Hi Rong,

Thank you for JIRA.
Understood it may be solved in a next release, I'll comment the ticket in
case of further input

All the best

François

Le sam. 9 févr. 2019 à 00:57, Rong Rong <wa...@gmail.com> a écrit :

> Hi François,
>
> I just did some research and seems like this is in fact a Stringify issue.
> If you try running one of the AvroRowDeSerializationSchemaTest [1],
> you will find out that only MAP, ARRAY are correctly stringify (Map using
> "{}" quote and Array using "[]" quote).
> However nested records are not quoted using "()".
>
> Wasn't sure if this is consider as a bug for the toString method of the
> type Row. I just filed a JIRA [2] for this issue, feel free to comment on
> the discussion.
>
> --
> Rong
>
> [1]
> https://github.com/apache/flink/blob/release-1.7/flink-formats/flink-avro/src/test/java/org/apache/flink/formats/avro/AvroRowDeSerializationSchemaTest.java
> [2] https://issues.apache.org/jira/browse/FLINK-11569
>
> On Fri, Feb 8, 2019 at 8:51 AM françois lacombe <
> francois.lacombe@dcbrain.com> wrote:
>
>> Hi Rong,
>>
>> Thank you for this answer.
>> I've changed Rows to Map, which ease the conversion process.
>>
>> Nevertheless I'm interested in any explanation about why row1.setField(i,
>> row2) appeends row2 at the end of row1.
>>
>> All the best
>>
>> François
>>
>> Le mer. 6 févr. 2019 à 19:33, Rong Rong <wa...@gmail.com> a écrit :
>>
>>> Hi François,
>>>
>>> I wasn't exactly sure this is a JSON object or JSON string you are
>>> trying to process.
>>> For a JSON string this [1] article might help.
>>> For a JSON object, I am assuming you are trying to convert it into a
>>> TableSource and processing using Table/SQL API, you could probably use the
>>> example here [2]
>>>
>>> BTW, a very remote hunch, this might be just a stringify issue how you
>>> print the row out.
>>>
>>> --
>>> Rong
>>>
>>> [1]:
>>> https://stackoverflow.com/questions/49380778/how-to-stream-a-json-using-flink
>>> [2]:
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table/sourceSinks.html#table-sources-sinks
>>>
>>> On Wed, Feb 6, 2019 at 3:06 AM françois lacombe <
>>> francois.lacombe@dcbrain.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I currently get a json string from my pgsql source with nested objects
>>>> to be converted into Flink's Row.
>>>> Nested json objects should go in nested Rows.
>>>> An avro schema rules the structure my source should conform to.
>>>>
>>>> According to this json :
>>>> {
>>>>   "a":"b",
>>>>   "c":"d",
>>>>   "e":{
>>>>        "f":"g"
>>>>    }
>>>> }
>>>>
>>>> ("b", "d", Row("g")) is expected as a result according to my avro
>>>> schema.
>>>>
>>>> I wrote a recursive method which iterate over json objects and put
>>>> nested Rows at right indices in their parent but here is what outputs :
>>>> ("b", "d", "g")
>>>> Child Row is appended to the parent. I don't understand why.
>>>> Obviously, process is crashing arguing the top level Row arity doesn't
>>>> match serializers.
>>>>
>>>> Is there some native methods in Flink to achieve that?
>>>> I don't feel so comfortable to have written my own json processor for
>>>> this job.
>>>>
>>>> Do you have any hint which can help please ?
>>>>
>>>> All the best
>>>>
>>>> François
>>>>
>>>>
>>>>
>>>> <http://www.dcbrain.com/>   <https://twitter.com/dcbrain_feed?lang=fr>
>>>>    <https://www.linkedin.com/company/dcbrain>
>>>> <https://www.youtube.com/channel/UCSJrWPBLQ58fHPN8lP_SEGw>
>>>>
>>>> [image: Arbre vert.jpg] Pensez à la planète, imprimer ce papier que si
>>>> nécessaire
>>>>
>>>
>>
>> <http://www.dcbrain.com/>   <https://twitter.com/dcbrain_feed?lang=fr>
>> <https://www.linkedin.com/company/dcbrain>
>> <https://www.youtube.com/channel/UCSJrWPBLQ58fHPN8lP_SEGw>
>>
>> [image: Arbre vert.jpg] Pensez à la planète, imprimer ce papier que si
>> nécessaire
>>
>

-- 

 <http://www.dcbrain.com/>   <https://twitter.com/dcbrain_feed?lang=fr>   
<https://www.linkedin.com/company/dcbrain>   
<https://www.youtube.com/channel/UCSJrWPBLQ58fHPN8lP_SEGw>


 Pensez à la 
planète, imprimer ce papier que si nécessaire 

Re: Get nested Rows from Json string

Posted by Rong Rong <wa...@gmail.com>.
Hi François,

I just did some research and seems like this is in fact a Stringify issue.
If you try running one of the AvroRowDeSerializationSchemaTest [1],
you will find out that only MAP, ARRAY are correctly stringify (Map using
"{}" quote and Array using "[]" quote).
However nested records are not quoted using "()".

Wasn't sure if this is consider as a bug for the toString method of the
type Row. I just filed a JIRA [2] for this issue, feel free to comment on
the discussion.

--
Rong

[1]
https://github.com/apache/flink/blob/release-1.7/flink-formats/flink-avro/src/test/java/org/apache/flink/formats/avro/AvroRowDeSerializationSchemaTest.java
[2] https://issues.apache.org/jira/browse/FLINK-11569

On Fri, Feb 8, 2019 at 8:51 AM françois lacombe <
francois.lacombe@dcbrain.com> wrote:

> Hi Rong,
>
> Thank you for this answer.
> I've changed Rows to Map, which ease the conversion process.
>
> Nevertheless I'm interested in any explanation about why row1.setField(i,
> row2) appeends row2 at the end of row1.
>
> All the best
>
> François
>
> Le mer. 6 févr. 2019 à 19:33, Rong Rong <wa...@gmail.com> a écrit :
>
>> Hi François,
>>
>> I wasn't exactly sure this is a JSON object or JSON string you are trying
>> to process.
>> For a JSON string this [1] article might help.
>> For a JSON object, I am assuming you are trying to convert it into a
>> TableSource and processing using Table/SQL API, you could probably use the
>> example here [2]
>>
>> BTW, a very remote hunch, this might be just a stringify issue how you
>> print the row out.
>>
>> --
>> Rong
>>
>> [1]:
>> https://stackoverflow.com/questions/49380778/how-to-stream-a-json-using-flink
>> [2]:
>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table/sourceSinks.html#table-sources-sinks
>>
>> On Wed, Feb 6, 2019 at 3:06 AM françois lacombe <
>> francois.lacombe@dcbrain.com> wrote:
>>
>>> Hi all,
>>>
>>> I currently get a json string from my pgsql source with nested objects
>>> to be converted into Flink's Row.
>>> Nested json objects should go in nested Rows.
>>> An avro schema rules the structure my source should conform to.
>>>
>>> According to this json :
>>> {
>>>   "a":"b",
>>>   "c":"d",
>>>   "e":{
>>>        "f":"g"
>>>    }
>>> }
>>>
>>> ("b", "d", Row("g")) is expected as a result according to my avro schema.
>>>
>>> I wrote a recursive method which iterate over json objects and put
>>> nested Rows at right indices in their parent but here is what outputs :
>>> ("b", "d", "g")
>>> Child Row is appended to the parent. I don't understand why.
>>> Obviously, process is crashing arguing the top level Row arity doesn't
>>> match serializers.
>>>
>>> Is there some native methods in Flink to achieve that?
>>> I don't feel so comfortable to have written my own json processor for
>>> this job.
>>>
>>> Do you have any hint which can help please ?
>>>
>>> All the best
>>>
>>> François
>>>
>>>
>>>
>>> <http://www.dcbrain.com/>   <https://twitter.com/dcbrain_feed?lang=fr>
>>> <https://www.linkedin.com/company/dcbrain>
>>> <https://www.youtube.com/channel/UCSJrWPBLQ58fHPN8lP_SEGw>
>>>
>>> [image: Arbre vert.jpg] Pensez à la planète, imprimer ce papier que si
>>> nécessaire
>>>
>>
>
> <http://www.dcbrain.com/>   <https://twitter.com/dcbrain_feed?lang=fr>
> <https://www.linkedin.com/company/dcbrain>
> <https://www.youtube.com/channel/UCSJrWPBLQ58fHPN8lP_SEGw>
>
> [image: Arbre vert.jpg] Pensez à la planète, imprimer ce papier que si
> nécessaire
>

Re: Get nested Rows from Json string

Posted by françois lacombe <fr...@dcbrain.com>.
Hi Rong,

Thank you for this answer.
I've changed Rows to Map, which ease the conversion process.

Nevertheless I'm interested in any explanation about why row1.setField(i,
row2) appeends row2 at the end of row1.

All the best

François

Le mer. 6 févr. 2019 à 19:33, Rong Rong <wa...@gmail.com> a écrit :

> Hi François,
>
> I wasn't exactly sure this is a JSON object or JSON string you are trying
> to process.
> For a JSON string this [1] article might help.
> For a JSON object, I am assuming you are trying to convert it into a
> TableSource and processing using Table/SQL API, you could probably use the
> example here [2]
>
> BTW, a very remote hunch, this might be just a stringify issue how you
> print the row out.
>
> --
> Rong
>
> [1]:
> https://stackoverflow.com/questions/49380778/how-to-stream-a-json-using-flink
> [2]:
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table/sourceSinks.html#table-sources-sinks
>
> On Wed, Feb 6, 2019 at 3:06 AM françois lacombe <
> francois.lacombe@dcbrain.com> wrote:
>
>> Hi all,
>>
>> I currently get a json string from my pgsql source with nested objects to
>> be converted into Flink's Row.
>> Nested json objects should go in nested Rows.
>> An avro schema rules the structure my source should conform to.
>>
>> According to this json :
>> {
>>   "a":"b",
>>   "c":"d",
>>   "e":{
>>        "f":"g"
>>    }
>> }
>>
>> ("b", "d", Row("g")) is expected as a result according to my avro schema.
>>
>> I wrote a recursive method which iterate over json objects and put nested
>> Rows at right indices in their parent but here is what outputs : ("b", "d",
>> "g")
>> Child Row is appended to the parent. I don't understand why.
>> Obviously, process is crashing arguing the top level Row arity doesn't
>> match serializers.
>>
>> Is there some native methods in Flink to achieve that?
>> I don't feel so comfortable to have written my own json processor for
>> this job.
>>
>> Do you have any hint which can help please ?
>>
>> All the best
>>
>> François
>>
>>
>>
>> <http://www.dcbrain.com/>   <https://twitter.com/dcbrain_feed?lang=fr>
>> <https://www.linkedin.com/company/dcbrain>
>> <https://www.youtube.com/channel/UCSJrWPBLQ58fHPN8lP_SEGw>
>>
>> [image: Arbre vert.jpg] Pensez à la planète, imprimer ce papier que si
>> nécessaire
>>
>

-- 

 <http://www.dcbrain.com/>   <https://twitter.com/dcbrain_feed?lang=fr>   
<https://www.linkedin.com/company/dcbrain>   
<https://www.youtube.com/channel/UCSJrWPBLQ58fHPN8lP_SEGw>


 Pensez à la 
planète, imprimer ce papier que si nécessaire 

Re: Get nested Rows from Json string

Posted by Rong Rong <wa...@gmail.com>.
Hi François,

I wasn't exactly sure this is a JSON object or JSON string you are trying
to process.
For a JSON string this [1] article might help.
For a JSON object, I am assuming you are trying to convert it into a
TableSource and processing using Table/SQL API, you could probably use the
example here [2]

BTW, a very remote hunch, this might be just a stringify issue how you
print the row out.

--
Rong

[1]:
https://stackoverflow.com/questions/49380778/how-to-stream-a-json-using-flink
[2]:
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table/sourceSinks.html#table-sources-sinks

On Wed, Feb 6, 2019 at 3:06 AM françois lacombe <
francois.lacombe@dcbrain.com> wrote:

> Hi all,
>
> I currently get a json string from my pgsql source with nested objects to
> be converted into Flink's Row.
> Nested json objects should go in nested Rows.
> An avro schema rules the structure my source should conform to.
>
> According to this json :
> {
>   "a":"b",
>   "c":"d",
>   "e":{
>        "f":"g"
>    }
> }
>
> ("b", "d", Row("g")) is expected as a result according to my avro schema.
>
> I wrote a recursive method which iterate over json objects and put nested
> Rows at right indices in their parent but here is what outputs : ("b", "d",
> "g")
> Child Row is appended to the parent. I don't understand why.
> Obviously, process is crashing arguing the top level Row arity doesn't
> match serializers.
>
> Is there some native methods in Flink to achieve that?
> I don't feel so comfortable to have written my own json processor for this
> job.
>
> Do you have any hint which can help please ?
>
> All the best
>
> François
>
>
>
> <http://www.dcbrain.com/>   <https://twitter.com/dcbrain_feed?lang=fr>
> <https://www.linkedin.com/company/dcbrain>
> <https://www.youtube.com/channel/UCSJrWPBLQ58fHPN8lP_SEGw>
>
> [image: Arbre vert.jpg] Pensez à la planète, imprimer ce papier que si
> nécessaire
>