You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Tamara Mendt <ta...@gmail.com> on 2015/05/28 12:24:58 UTC

JSON data source for Flink Job

Hello,

I have a JSON file containing multiple JSON objects and wish to use this as
a data source for a Flink Job.

What is the best way to do this?

Cheers,

Tamara

Re: JSON data source for Flink Job

Posted by Tamara Mendt <ta...@gmail.com>.
Ok great, thanks a lot =)



On Thu, May 28, 2015 at 12:39 PM, Stephan Ewen <se...@apache.org> wrote:

> Hi!
>
> This depends a bit how the JSON is formatted.
>
> If you want the source to be parallelizable, you need to have a way of
> splitting the file at object boundaries. Is there a character on which you
> can split? If yes, you can use theTextInputFormat (with a custom line break
> character), take the strings and parse them to JSON with your favorite
> library (like Jackson or so).
>
> Stephan
>
>
> On Thu, May 28, 2015 at 12:24 PM, Tamara Mendt <ta...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I have a JSON file containing multiple JSON objects and wish to use this
>> as a data source for a Flink Job.
>>
>> What is the best way to do this?
>>
>> Cheers,
>>
>> Tamara
>>
>
>


-- 
Tamara Mendt

Re: JSON data source for Flink Job

Posted by Stephan Ewen <se...@apache.org>.
Hi!

This depends a bit how the JSON is formatted.

If you want the source to be parallelizable, you need to have a way of
splitting the file at object boundaries. Is there a character on which you
can split? If yes, you can use theTextInputFormat (with a custom line break
character), take the strings and parse them to JSON with your favorite
library (like Jackson or so).

Stephan


On Thu, May 28, 2015 at 12:24 PM, Tamara Mendt <ta...@gmail.com> wrote:

> Hello,
>
> I have a JSON file containing multiple JSON objects and wish to use this
> as a data source for a Flink Job.
>
> What is the best way to do this?
>
> Cheers,
>
> Tamara
>