You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by John Omernik <jo...@omernik.com> on 2015/08/20 14:51:01 UTC

Bad Data in Files

Hey all,

I am trying to read some data in csv files that is pretty rough.  I am
getting errors similar to


https://issues.apache.org/jira/browse/DRILL-3428

when bad data is encountered.   In doing data exploration, I think the
ability to be made aware of where the bad data is VERY important.  But in
addition to this JIRA, it would be nice if Drill could nicely "move on"
from bad lines" For example, if it comes across a line that throws an
error, perhaps stop show the line, which file, and location, and then
somehow find a way to "exclude" that line. Perhaps I as I am reading it, I
just say "yep garbage, ignore it" Not sure how to do this, but, perhaps a
SKIP(filename, lineno) that I can add to a where clause?


This could be expanded even further, for better functionality, but it would
be very helpful as a data explorer in these cases. I'd be interested in
other's thoughts on the subject.

John

Re: Bad Data in Files

Posted by Neeraja Rentachintala <nr...@maprtech.com>.
Hi John
There is another JIRA open on a similar topic. Does this sound like what
you are looking for. Please update the JIRA with your comments.
https://issues.apache.org/jira/browse/DRILL-3454


On Thu, Aug 20, 2015 at 6:51 AM, John Omernik <jo...@omernik.com> wrote:

> Hey all,
>
> I am trying to read some data in csv files that is pretty rough.  I am
> getting errors similar to
>
>
> https://issues.apache.org/jira/browse/DRILL-3428
>
> when bad data is encountered.   In doing data exploration, I think the
> ability to be made aware of where the bad data is VERY important.  But in
> addition to this JIRA, it would be nice if Drill could nicely "move on"
> from bad lines" For example, if it comes across a line that throws an
> error, perhaps stop show the line, which file, and location, and then
> somehow find a way to "exclude" that line. Perhaps I as I am reading it, I
> just say "yep garbage, ignore it" Not sure how to do this, but, perhaps a
> SKIP(filename, lineno) that I can add to a where clause?
>
>
> This could be expanded even further, for better functionality, but it would
> be very helpful as a data explorer in these cases. I'd be interested in
> other's thoughts on the subject.
>
> John
>