You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@drill.apache.org by fritz wijaya <fr...@gmail.com> on 2015/05/06 08:49:30 UTC

Problem with malformed data from json

I'm recently run drill to explore data from hadoop cluster. But, I have
problem with when running the query againts our data source. The query
always failed due to malformed json data. Because, the data itself pretty
raw, It maybe contains some malformed json format in and there. Its
difficult to do cleaning itself due to the data size (around hundreds of
gigs text file).

My question is, Is there anyway to exclude/skip the malformed records, and
make it into separate result, just like the spark/shark do? How to solve
this problem elegantly?

Thank you.

Regards,
Fritz

Re: Problem with malformed data from json

Posted by Hanifi Gunes <hg...@maprtech.com>.

Currently, Drill does not support skipping bad records. It does, however,
pinpoint you where the problem is.

-Hanifi

On Tue, May 5, 2015 at 11:49 PM, fritz wijaya <fr...@gmail.com>
wrote:

> I'm recently run drill to explore data from hadoop cluster. But, I have
> problem with when running the query againts our data source. The query
> always failed due to malformed json data. Because, the data itself pretty
> raw, It maybe contains some malformed json format in and there. Its
> difficult to do cleaning itself due to the data size (around hundreds of
> gigs text file).
>
> My question is, Is there anyway to exclude/skip the malformed records, and
> make it into separate result, just like the spark/shark do? How to solve
> this problem elegantly?
>
> Thank you.
>
> Regards,
> Fritz
>