You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by John Omernik <jo...@omernik.com> on 2015/08/20 14:51:01 UTC
Bad Data in Files
Hey all,
I am trying to read some data in csv files that is pretty rough. I am
getting errors similar to
https://issues.apache.org/jira/browse/DRILL-3428
when bad data is encountered. In doing data exploration, I think the
ability to be made aware of where the bad data is VERY important. But in
addition to this JIRA, it would be nice if Drill could nicely "move on"
from bad lines" For example, if it comes across a line that throws an
error, perhaps stop show the line, which file, and location, and then
somehow find a way to "exclude" that line. Perhaps I as I am reading it, I
just say "yep garbage, ignore it" Not sure how to do this, but, perhaps a
SKIP(filename, lineno) that I can add to a where clause?
This could be expanded even further, for better functionality, but it would
be very helpful as a data explorer in these cases. I'd be interested in
other's thoughts on the subject.
John
Re: Bad Data in Files
Posted by Neeraja Rentachintala <nr...@maprtech.com>.
Hi John
There is another JIRA open on a similar topic. Does this sound like what
you are looking for. Please update the JIRA with your comments.
https://issues.apache.org/jira/browse/DRILL-3454
On Thu, Aug 20, 2015 at 6:51 AM, John Omernik <jo...@omernik.com> wrote:
> Hey all,
>
> I am trying to read some data in csv files that is pretty rough. I am
> getting errors similar to
>
>
> https://issues.apache.org/jira/browse/DRILL-3428
>
> when bad data is encountered. In doing data exploration, I think the
> ability to be made aware of where the bad data is VERY important. But in
> addition to this JIRA, it would be nice if Drill could nicely "move on"
> from bad lines" For example, if it comes across a line that throws an
> error, perhaps stop show the line, which file, and location, and then
> somehow find a way to "exclude" that line. Perhaps I as I am reading it, I
> just say "yep garbage, ignore it" Not sure how to do this, but, perhaps a
> SKIP(filename, lineno) that I can add to a where clause?
>
>
> This could be expanded even further, for better functionality, but it would
> be very helpful as a data explorer in these cases. I'd be interested in
> other's thoughts on the subject.
>
> John
>