You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Christopher Matta <cm...@mapr.com> on 2014/11/24 23:07:31 UTC

Validate your JSON files

I’ve been running across errors in Drill when a JSON record is invalid. To
reduce the number of these errors, I wrote this small, simple application
that will open a specified file, check if each line is a valid JSON record,
and error if it’s not:

https://github.com/cjmatta/jsonr

Usage:

[cmatta@ip-172-16-1-173 jsonar]$ ./jsonar -f ../tweets/2014/11/24/21/tweets.json
Checking for valid JSON in ../tweets/2014/11/24/21/tweets.json
CWARNING:root:JSON load error on line 16640 of
../tweets/2014/11/24/21/tweets.json
WARNING:root:JSON load error on line 16641 of
../tweets/2014/11/24/21/tweets.json
WARNING:root:JSON load error on line 16642 of
../tweets/2014/11/24/21/tweets.json
Checking line 17000
Done.

Please check it out, use it, contribute back if there’s something broken or
missing.

Chris Matta
cmatta@mapr.com
215-701-3146
​

Re: Validate your JSON files

Posted by Christopher Matta <cm...@mapr.com>.
After Ted’s feedback I’ve reimplemented the json validator using jackson.
This is the same library Drill uses to read json, and should be helpful in
validating if your data on disk is valid json.

I wouldn’t call myself a java developer by any means so if there are things
I overlooked, or missed, or there are improvements you want to add to this,
by all means send a pull request.

Future enhancements might include:

   - Counting how many records are valid
   - Maybe an “hdfs” flag to access data stored in HDFS/MapR-FS

https://github.com/cjmatta/jsonvalidator
​

Chris Matta
cmatta@mapr.com
215-701-3146

On Tue, Nov 25, 2014 at 7:31 AM, Christopher Matta <cm...@mapr.com> wrote:

> Ted, I'll take a look! Thanks.
>
> Chris Matta
> cmatta@mapr.com
> 215-701-3146
>
> On Tue, Nov 25, 2014 at 6:27 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
>> Chris,
>>
>> Your tool could be updated to use Jackson and would then have the exact
>> same semantics as Drill.
>>
>> It is still great as it is.... just could be slightly greater.
>>
>>
>> On Mon, Nov 24, 2014 at 11:09 PM, Steven Phillips <sphillips@maprtech.com
>> >
>> wrote:
>>
>> > No, Drill uses jackson to parse the json as a stream. It's fine if the
>> json
>> > record has newline characters.
>> >
>> > Your validation tool is still useful, in the case where each json
>> record is
>> > contained in a single line, which is common. Just be aware that it won't
>> > work in all cases.
>> >
>> > On Mon, Nov 24, 2014 at 3:04 PM, Christopher Matta <cm...@mapr.com>
>> > wrote:
>> >
>> > > Steven,
>> > > Yes it does, doesn't Drill  also require that the entire JSON record
>> be
>> > on
>> > > a single line?
>> > >
>> > > I wrote this for situations when the data set is too large to paste
>> into
>> > a
>> > > web-based validator.
>> > >
>> > > Chris Matta
>> > > cmatta@mapr.com
>> > > 215-701-3146
>> > >
>> > > On Mon, Nov 24, 2014 at 6:01 PM, Steven Phillips <
>> sphillips@maprtech.com
>> > >
>> > > wrote:
>> > >
>> > > > Christopher,
>> > > >
>> > > > Does your validator require that the entire json record be on a
>> single
>> > > > line?
>> > > >
>> > > > On Mon, Nov 24, 2014 at 2:57 PM, Aman Sinha <as...@maprtech.com>
>> > wrote:
>> > > >
>> > > > > BTW, there's a web based validator called jsonlint.com whose
>> source
>> > is
>> > > > > available at :  https://github.com/arc90/jsonlintdotcom
>> > > > >
>> > > > > On Mon, Nov 24, 2014 at 2:07 PM, Christopher Matta <
>> cmatta@mapr.com>
>> > > > > wrote:
>> > > > >
>> > > > > > I’ve been running across errors in Drill when a JSON record is
>> > > invalid.
>> > > > > To
>> > > > > > reduce the number of these errors, I wrote this small, simple
>> > > > application
>> > > > > > that will open a specified file, check if each line is a valid
>> JSON
>> > > > > record,
>> > > > > > and error if it’s not:
>> > > > > >
>> > > > > > https://github.com/cjmatta/jsonr
>> > > > > >
>> > > > > > Usage:
>> > > > > >
>> > > > > > [cmatta@ip-172-16-1-173 jsonar]$ ./jsonar -f
>> > > > > > ../tweets/2014/11/24/21/tweets.json
>> > > > > > Checking for valid JSON in ../tweets/2014/11/24/21/tweets.json
>> > > > > > CWARNING:root:JSON load error on line 16640 of
>> > > > > > ../tweets/2014/11/24/21/tweets.json
>> > > > > > WARNING:root:JSON load error on line 16641 of
>> > > > > > ../tweets/2014/11/24/21/tweets.json
>> > > > > > WARNING:root:JSON load error on line 16642 of
>> > > > > > ../tweets/2014/11/24/21/tweets.json
>> > > > > > Checking line 17000
>> > > > > > Done.
>> > > > > >
>> > > > > > Please check it out, use it, contribute back if there’s
>> something
>> > > > broken
>> > > > > or
>> > > > > > missing.
>> > > > > >
>> > > > > > Chris Matta
>> > > > > > cmatta@mapr.com
>> > > > > > 215-701-3146
>> > > > > > ​
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > >  Steven Phillips
>> > > >  Software Engineer
>> > > >
>> > > >  mapr.com
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> >  Steven Phillips
>> >  Software Engineer
>> >
>> >  mapr.com
>> >
>>
>
>

Re: Validate your JSON files

Posted by Christopher Matta <cm...@mapr.com>.
Ted, I'll take a look! Thanks.

Chris Matta
cmatta@mapr.com
215-701-3146

On Tue, Nov 25, 2014 at 6:27 AM, Ted Dunning <te...@gmail.com> wrote:

> Chris,
>
> Your tool could be updated to use Jackson and would then have the exact
> same semantics as Drill.
>
> It is still great as it is.... just could be slightly greater.
>
>
> On Mon, Nov 24, 2014 at 11:09 PM, Steven Phillips <sp...@maprtech.com>
> wrote:
>
> > No, Drill uses jackson to parse the json as a stream. It's fine if the
> json
> > record has newline characters.
> >
> > Your validation tool is still useful, in the case where each json record
> is
> > contained in a single line, which is common. Just be aware that it won't
> > work in all cases.
> >
> > On Mon, Nov 24, 2014 at 3:04 PM, Christopher Matta <cm...@mapr.com>
> > wrote:
> >
> > > Steven,
> > > Yes it does, doesn't Drill  also require that the entire JSON record be
> > on
> > > a single line?
> > >
> > > I wrote this for situations when the data set is too large to paste
> into
> > a
> > > web-based validator.
> > >
> > > Chris Matta
> > > cmatta@mapr.com
> > > 215-701-3146
> > >
> > > On Mon, Nov 24, 2014 at 6:01 PM, Steven Phillips <
> sphillips@maprtech.com
> > >
> > > wrote:
> > >
> > > > Christopher,
> > > >
> > > > Does your validator require that the entire json record be on a
> single
> > > > line?
> > > >
> > > > On Mon, Nov 24, 2014 at 2:57 PM, Aman Sinha <as...@maprtech.com>
> > wrote:
> > > >
> > > > > BTW, there's a web based validator called jsonlint.com whose
> source
> > is
> > > > > available at :  https://github.com/arc90/jsonlintdotcom
> > > > >
> > > > > On Mon, Nov 24, 2014 at 2:07 PM, Christopher Matta <
> cmatta@mapr.com>
> > > > > wrote:
> > > > >
> > > > > > I’ve been running across errors in Drill when a JSON record is
> > > invalid.
> > > > > To
> > > > > > reduce the number of these errors, I wrote this small, simple
> > > > application
> > > > > > that will open a specified file, check if each line is a valid
> JSON
> > > > > record,
> > > > > > and error if it’s not:
> > > > > >
> > > > > > https://github.com/cjmatta/jsonr
> > > > > >
> > > > > > Usage:
> > > > > >
> > > > > > [cmatta@ip-172-16-1-173 jsonar]$ ./jsonar -f
> > > > > > ../tweets/2014/11/24/21/tweets.json
> > > > > > Checking for valid JSON in ../tweets/2014/11/24/21/tweets.json
> > > > > > CWARNING:root:JSON load error on line 16640 of
> > > > > > ../tweets/2014/11/24/21/tweets.json
> > > > > > WARNING:root:JSON load error on line 16641 of
> > > > > > ../tweets/2014/11/24/21/tweets.json
> > > > > > WARNING:root:JSON load error on line 16642 of
> > > > > > ../tweets/2014/11/24/21/tweets.json
> > > > > > Checking line 17000
> > > > > > Done.
> > > > > >
> > > > > > Please check it out, use it, contribute back if there’s something
> > > > broken
> > > > > or
> > > > > > missing.
> > > > > >
> > > > > > Chris Matta
> > > > > > cmatta@mapr.com
> > > > > > 215-701-3146
> > > > > > ​
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >  Steven Phillips
> > > >  Software Engineer
> > > >
> > > >  mapr.com
> > > >
> > >
> >
> >
> >
> > --
> >  Steven Phillips
> >  Software Engineer
> >
> >  mapr.com
> >
>

Re: Validate your JSON files

Posted by Ted Dunning <te...@gmail.com>.
Chris,

Your tool could be updated to use Jackson and would then have the exact
same semantics as Drill.

It is still great as it is.... just could be slightly greater.


On Mon, Nov 24, 2014 at 11:09 PM, Steven Phillips <sp...@maprtech.com>
wrote:

> No, Drill uses jackson to parse the json as a stream. It's fine if the json
> record has newline characters.
>
> Your validation tool is still useful, in the case where each json record is
> contained in a single line, which is common. Just be aware that it won't
> work in all cases.
>
> On Mon, Nov 24, 2014 at 3:04 PM, Christopher Matta <cm...@mapr.com>
> wrote:
>
> > Steven,
> > Yes it does, doesn't Drill  also require that the entire JSON record be
> on
> > a single line?
> >
> > I wrote this for situations when the data set is too large to paste into
> a
> > web-based validator.
> >
> > Chris Matta
> > cmatta@mapr.com
> > 215-701-3146
> >
> > On Mon, Nov 24, 2014 at 6:01 PM, Steven Phillips <sphillips@maprtech.com
> >
> > wrote:
> >
> > > Christopher,
> > >
> > > Does your validator require that the entire json record be on a single
> > > line?
> > >
> > > On Mon, Nov 24, 2014 at 2:57 PM, Aman Sinha <as...@maprtech.com>
> wrote:
> > >
> > > > BTW, there's a web based validator called jsonlint.com whose source
> is
> > > > available at :  https://github.com/arc90/jsonlintdotcom
> > > >
> > > > On Mon, Nov 24, 2014 at 2:07 PM, Christopher Matta <cm...@mapr.com>
> > > > wrote:
> > > >
> > > > > I’ve been running across errors in Drill when a JSON record is
> > invalid.
> > > > To
> > > > > reduce the number of these errors, I wrote this small, simple
> > > application
> > > > > that will open a specified file, check if each line is a valid JSON
> > > > record,
> > > > > and error if it’s not:
> > > > >
> > > > > https://github.com/cjmatta/jsonr
> > > > >
> > > > > Usage:
> > > > >
> > > > > [cmatta@ip-172-16-1-173 jsonar]$ ./jsonar -f
> > > > > ../tweets/2014/11/24/21/tweets.json
> > > > > Checking for valid JSON in ../tweets/2014/11/24/21/tweets.json
> > > > > CWARNING:root:JSON load error on line 16640 of
> > > > > ../tweets/2014/11/24/21/tweets.json
> > > > > WARNING:root:JSON load error on line 16641 of
> > > > > ../tweets/2014/11/24/21/tweets.json
> > > > > WARNING:root:JSON load error on line 16642 of
> > > > > ../tweets/2014/11/24/21/tweets.json
> > > > > Checking line 17000
> > > > > Done.
> > > > >
> > > > > Please check it out, use it, contribute back if there’s something
> > > broken
> > > > or
> > > > > missing.
> > > > >
> > > > > Chris Matta
> > > > > cmatta@mapr.com
> > > > > 215-701-3146
> > > > > ​
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >  Steven Phillips
> > >  Software Engineer
> > >
> > >  mapr.com
> > >
> >
>
>
>
> --
>  Steven Phillips
>  Software Engineer
>
>  mapr.com
>

Re: Validate your JSON files

Posted by Steven Phillips <sp...@maprtech.com>.
No, Drill uses jackson to parse the json as a stream. It's fine if the json
record has newline characters.

Your validation tool is still useful, in the case where each json record is
contained in a single line, which is common. Just be aware that it won't
work in all cases.

On Mon, Nov 24, 2014 at 3:04 PM, Christopher Matta <cm...@mapr.com> wrote:

> Steven,
> Yes it does, doesn't Drill  also require that the entire JSON record be on
> a single line?
>
> I wrote this for situations when the data set is too large to paste into a
> web-based validator.
>
> Chris Matta
> cmatta@mapr.com
> 215-701-3146
>
> On Mon, Nov 24, 2014 at 6:01 PM, Steven Phillips <sp...@maprtech.com>
> wrote:
>
> > Christopher,
> >
> > Does your validator require that the entire json record be on a single
> > line?
> >
> > On Mon, Nov 24, 2014 at 2:57 PM, Aman Sinha <as...@maprtech.com> wrote:
> >
> > > BTW, there's a web based validator called jsonlint.com whose source is
> > > available at :  https://github.com/arc90/jsonlintdotcom
> > >
> > > On Mon, Nov 24, 2014 at 2:07 PM, Christopher Matta <cm...@mapr.com>
> > > wrote:
> > >
> > > > I’ve been running across errors in Drill when a JSON record is
> invalid.
> > > To
> > > > reduce the number of these errors, I wrote this small, simple
> > application
> > > > that will open a specified file, check if each line is a valid JSON
> > > record,
> > > > and error if it’s not:
> > > >
> > > > https://github.com/cjmatta/jsonr
> > > >
> > > > Usage:
> > > >
> > > > [cmatta@ip-172-16-1-173 jsonar]$ ./jsonar -f
> > > > ../tweets/2014/11/24/21/tweets.json
> > > > Checking for valid JSON in ../tweets/2014/11/24/21/tweets.json
> > > > CWARNING:root:JSON load error on line 16640 of
> > > > ../tweets/2014/11/24/21/tweets.json
> > > > WARNING:root:JSON load error on line 16641 of
> > > > ../tweets/2014/11/24/21/tweets.json
> > > > WARNING:root:JSON load error on line 16642 of
> > > > ../tweets/2014/11/24/21/tweets.json
> > > > Checking line 17000
> > > > Done.
> > > >
> > > > Please check it out, use it, contribute back if there’s something
> > broken
> > > or
> > > > missing.
> > > >
> > > > Chris Matta
> > > > cmatta@mapr.com
> > > > 215-701-3146
> > > > ​
> > > >
> > >
> >
> >
> >
> > --
> >  Steven Phillips
> >  Software Engineer
> >
> >  mapr.com
> >
>



-- 
 Steven Phillips
 Software Engineer

 mapr.com

Re: Validate your JSON files

Posted by Christopher Matta <cm...@mapr.com>.
Steven,
Yes it does, doesn't Drill  also require that the entire JSON record be on
a single line?

I wrote this for situations when the data set is too large to paste into a
web-based validator.

Chris Matta
cmatta@mapr.com
215-701-3146

On Mon, Nov 24, 2014 at 6:01 PM, Steven Phillips <sp...@maprtech.com>
wrote:

> Christopher,
>
> Does your validator require that the entire json record be on a single
> line?
>
> On Mon, Nov 24, 2014 at 2:57 PM, Aman Sinha <as...@maprtech.com> wrote:
>
> > BTW, there's a web based validator called jsonlint.com whose source is
> > available at :  https://github.com/arc90/jsonlintdotcom
> >
> > On Mon, Nov 24, 2014 at 2:07 PM, Christopher Matta <cm...@mapr.com>
> > wrote:
> >
> > > I’ve been running across errors in Drill when a JSON record is invalid.
> > To
> > > reduce the number of these errors, I wrote this small, simple
> application
> > > that will open a specified file, check if each line is a valid JSON
> > record,
> > > and error if it’s not:
> > >
> > > https://github.com/cjmatta/jsonr
> > >
> > > Usage:
> > >
> > > [cmatta@ip-172-16-1-173 jsonar]$ ./jsonar -f
> > > ../tweets/2014/11/24/21/tweets.json
> > > Checking for valid JSON in ../tweets/2014/11/24/21/tweets.json
> > > CWARNING:root:JSON load error on line 16640 of
> > > ../tweets/2014/11/24/21/tweets.json
> > > WARNING:root:JSON load error on line 16641 of
> > > ../tweets/2014/11/24/21/tweets.json
> > > WARNING:root:JSON load error on line 16642 of
> > > ../tweets/2014/11/24/21/tweets.json
> > > Checking line 17000
> > > Done.
> > >
> > > Please check it out, use it, contribute back if there’s something
> broken
> > or
> > > missing.
> > >
> > > Chris Matta
> > > cmatta@mapr.com
> > > 215-701-3146
> > > ​
> > >
> >
>
>
>
> --
>  Steven Phillips
>  Software Engineer
>
>  mapr.com
>

Re: Validate your JSON files

Posted by Steven Phillips <sp...@maprtech.com>.
Christopher,

Does your validator require that the entire json record be on a single line?

On Mon, Nov 24, 2014 at 2:57 PM, Aman Sinha <as...@maprtech.com> wrote:

> BTW, there's a web based validator called jsonlint.com whose source is
> available at :  https://github.com/arc90/jsonlintdotcom
>
> On Mon, Nov 24, 2014 at 2:07 PM, Christopher Matta <cm...@mapr.com>
> wrote:
>
> > I’ve been running across errors in Drill when a JSON record is invalid.
> To
> > reduce the number of these errors, I wrote this small, simple application
> > that will open a specified file, check if each line is a valid JSON
> record,
> > and error if it’s not:
> >
> > https://github.com/cjmatta/jsonr
> >
> > Usage:
> >
> > [cmatta@ip-172-16-1-173 jsonar]$ ./jsonar -f
> > ../tweets/2014/11/24/21/tweets.json
> > Checking for valid JSON in ../tweets/2014/11/24/21/tweets.json
> > CWARNING:root:JSON load error on line 16640 of
> > ../tweets/2014/11/24/21/tweets.json
> > WARNING:root:JSON load error on line 16641 of
> > ../tweets/2014/11/24/21/tweets.json
> > WARNING:root:JSON load error on line 16642 of
> > ../tweets/2014/11/24/21/tweets.json
> > Checking line 17000
> > Done.
> >
> > Please check it out, use it, contribute back if there’s something broken
> or
> > missing.
> >
> > Chris Matta
> > cmatta@mapr.com
> > 215-701-3146
> > ​
> >
>



-- 
 Steven Phillips
 Software Engineer

 mapr.com

Re: Validate your JSON files

Posted by Aman Sinha <as...@maprtech.com>.
BTW, there's a web based validator called jsonlint.com whose source is
available at :  https://github.com/arc90/jsonlintdotcom

On Mon, Nov 24, 2014 at 2:07 PM, Christopher Matta <cm...@mapr.com> wrote:

> I’ve been running across errors in Drill when a JSON record is invalid. To
> reduce the number of these errors, I wrote this small, simple application
> that will open a specified file, check if each line is a valid JSON record,
> and error if it’s not:
>
> https://github.com/cjmatta/jsonr
>
> Usage:
>
> [cmatta@ip-172-16-1-173 jsonar]$ ./jsonar -f
> ../tweets/2014/11/24/21/tweets.json
> Checking for valid JSON in ../tweets/2014/11/24/21/tweets.json
> CWARNING:root:JSON load error on line 16640 of
> ../tweets/2014/11/24/21/tweets.json
> WARNING:root:JSON load error on line 16641 of
> ../tweets/2014/11/24/21/tweets.json
> WARNING:root:JSON load error on line 16642 of
> ../tweets/2014/11/24/21/tweets.json
> Checking line 17000
> Done.
>
> Please check it out, use it, contribute back if there’s something broken or
> missing.
>
> Chris Matta
> cmatta@mapr.com
> 215-701-3146
> ​
>