You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by Devendra Tagare <de...@datatorrent.com> on 2016/03/23 00:52:57 UTC

Streaming JSON parser

Hi All,

Starting this thread to get opinions for adding a streaming JSON parser for
converting a JSON to POJO.This parser would be in addition to the databind
parser (com.fasterxml.jackson.databind) we already have.

The advantage of a streaming JSON parser is,

1.The parser need not parse entire input to set the fields of the POJO.
2.Can be used with multiline JSON records.eg if a user is using the
AbstractFileInputOperator to read a file line by line & a JSON is spanning
multiple lines, then the existing parser will not work even if the required
fields are covered in the single line input.
3.These parsers have the least read/write overhead as compared to databind
or tree based parsers.

Please refer http://wiki.fasterxml.com/JacksonStreamingApi for more details.

The disadvantages are (from the documentation)

1.All content to read/write has to be processed in exact same order as
input comes in (or output is to go out) -- for random access, you need to
use Data Binding or Tree Model (which both actually use Streaming Api for
actual JSON reading/writing).
[Dev] This could be tricky if one row of input goes to one partition of the
parser and the other one goes to another.
[Dev] This also means that we cannot use it with the existing file
splitter,since different splits may not go to the same partition of the
parser.

2.No Java objects are created unless specifically requested; and even then
only very basic types are supported (Strings, byte[] for base64-encoded
binary content)
[Dev] Should be fine for the use-cases we are covering.

Please send across your inputs and comments.

Thanks,
Dev

Re: Streaming JSON parser

Posted by Sandesh Hegde <sa...@datatorrent.com>.
Found another JSON parser called Boon,

They presented the Benchmark with Jackson, we can evaluate it for our use
cases,

https://github.com/boonproject/boon
https://github.com/RichardHightower/json-parsers-benchmark





On Wed, Mar 23, 2016 at 12:59 AM Chinmay Kolhatkar <ch...@apache.org>
wrote:

> +1 for the Streaming JSON parser.
>
> On Wed, Mar 23, 2016 at 12:02 PM, Justin Mclean <ju...@me.com>
> wrote:
>
> > Hi,
> >
> > Would this help?
> >
> > http://johnzon.incubator.apache.org
> >
> > Justin
> >
>

Re: Streaming JSON parser

Posted by Chinmay Kolhatkar <ch...@apache.org>.
+1 for the Streaming JSON parser.

On Wed, Mar 23, 2016 at 12:02 PM, Justin Mclean <ju...@me.com> wrote:

> Hi,
>
> Would this help?
>
> http://johnzon.incubator.apache.org
>
> Justin
>

Re: Streaming JSON parser

Posted by Justin Mclean <ju...@me.com>.
Hi,

Would this help?

http://johnzon.incubator.apache.org

Justin

Re: Streaming JSON parser

Posted by Ashish Tadose <as...@gmail.com>.
+1 

This would be a good value add.
It will result in achieving higher throughput for input operators which needs to consume selective json fields.

Thx,
Ashish

> On 23-Mar-2016, at 10:31 AM, Bhupesh Chawda <bh...@datatorrent.com> wrote:
> 
> A multi-line JSON format is very common and is usually the case with REST
> API results.
> I think this could be a valuable addition.
> 
> Regarding the issues that you mentioned, I think it can be solved by having
> a custom file splitter which takes care of splitting on a JSON record
> boundary.
> 
> +1 for a streaming JSON parser.
> 
> ~Bhupesh
> 
> On Wed, Mar 23, 2016 at 5:22 AM, Devendra Tagare <de...@datatorrent.com>
> wrote:
> 
>> Hi All,
>> 
>> Starting this thread to get opinions for adding a streaming JSON parser for
>> converting a JSON to POJO.This parser would be in addition to the databind
>> parser (com.fasterxml.jackson.databind) we already have.
>> 
>> The advantage of a streaming JSON parser is,
>> 
>> 1.The parser need not parse entire input to set the fields of the POJO.
>> 2.Can be used with multiline JSON records.eg if a user is using the
>> AbstractFileInputOperator to read a file line by line & a JSON is spanning
>> multiple lines, then the existing parser will not work even if the required
>> fields are covered in the single line input.
>> 3.These parsers have the least read/write overhead as compared to databind
>> or tree based parsers.
>> 
>> Please refer http://wiki.fasterxml.com/JacksonStreamingApi for more
>> details.
>> 
>> The disadvantages are (from the documentation)
>> 
>> 1.All content to read/write has to be processed in exact same order as
>> input comes in (or output is to go out) -- for random access, you need to
>> use Data Binding or Tree Model (which both actually use Streaming Api for
>> actual JSON reading/writing).
>> [Dev] This could be tricky if one row of input goes to one partition of the
>> parser and the other one goes to another.
>> [Dev] This also means that we cannot use it with the existing file
>> splitter,since different splits may not go to the same partition of the
>> parser.
>> 
>> 2.No Java objects are created unless specifically requested; and even then
>> only very basic types are supported (Strings, byte[] for base64-encoded
>> binary content)
>> [Dev] Should be fine for the use-cases we are covering.
>> 
>> Please send across your inputs and comments.
>> 
>> Thanks,
>> Dev
>> 


Re: Streaming JSON parser

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
A multi-line JSON format is very common and is usually the case with REST
API results.
I think this could be a valuable addition.

Regarding the issues that you mentioned, I think it can be solved by having
a custom file splitter which takes care of splitting on a JSON record
boundary.

+1 for a streaming JSON parser.

~Bhupesh

On Wed, Mar 23, 2016 at 5:22 AM, Devendra Tagare <de...@datatorrent.com>
wrote:

> Hi All,
>
> Starting this thread to get opinions for adding a streaming JSON parser for
> converting a JSON to POJO.This parser would be in addition to the databind
> parser (com.fasterxml.jackson.databind) we already have.
>
> The advantage of a streaming JSON parser is,
>
> 1.The parser need not parse entire input to set the fields of the POJO.
> 2.Can be used with multiline JSON records.eg if a user is using the
> AbstractFileInputOperator to read a file line by line & a JSON is spanning
> multiple lines, then the existing parser will not work even if the required
> fields are covered in the single line input.
> 3.These parsers have the least read/write overhead as compared to databind
> or tree based parsers.
>
> Please refer http://wiki.fasterxml.com/JacksonStreamingApi for more
> details.
>
> The disadvantages are (from the documentation)
>
> 1.All content to read/write has to be processed in exact same order as
> input comes in (or output is to go out) -- for random access, you need to
> use Data Binding or Tree Model (which both actually use Streaming Api for
> actual JSON reading/writing).
> [Dev] This could be tricky if one row of input goes to one partition of the
> parser and the other one goes to another.
> [Dev] This also means that we cannot use it with the existing file
> splitter,since different splits may not go to the same partition of the
> parser.
>
> 2.No Java objects are created unless specifically requested; and even then
> only very basic types are supported (Strings, byte[] for base64-encoded
> binary content)
> [Dev] Should be fine for the use-cases we are covering.
>
> Please send across your inputs and comments.
>
> Thanks,
> Dev
>