You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by François Méthot <fm...@gmail.com> on 2016/01/22 17:49:38 UTC

Query Return Error because of a single file

Hi Drill Community,


Using drill-embedded, I encountered this error while doing a query on
folders containing thousands of parquet files:



Error: SYSTEM ERROR: IOException: FAILED_TO_UNCOMPRESSED(5)



Fragment 1:9



After re-running the same query with the log level set to DEBUG, I tracked
the files that were scanned by Fragment 1:9, performed the same query on
each individual file until I got the same error.



It turned out that a column in one of the parquet file is causing this
issue. Whether it is an issue with our parquet writer or with the drill
reader remains to be determined.



My questions is :

Is there an option to have a fragment thread to move on to the next file
after it encounter such error, without completely spoiling the whole query
and result?



Also in this case, it would have been useful if it was clearly specified in
the log which parquet file is causing issue.



Thanks a lot

François

Re: Query Return Error because of a single file

Posted by Jason Altekruse <al...@gmail.com>.
Hello François,

Sorry that this question went unanswered for so long. We have gotten many
requests for this feature of skipping bad files, but we haven't come to a
consensus of how this feature should be implemented.

The problem largely comes out of the ambiguity of the definition of
skipping "some" bade files and different users expectations.

If the files are valid, and there is a bug in Drill that is preventing us
from reading them, it doesn't seem like the right behavior to just skip the
files. In this case if you could post a small parquet file that produces
this error I can take a look at is causing the issue, because I would like
to make sure this is fixed.

I completely agree that we should be failing with a helpful message that
point users to the file that failed to read. Many of these cases we catch
today and add the filename to the failure message, we should add one where
this is failing as well. I have filed a JIRA for fixing this as well as
reviewing the other storage plugins for similar cases that fail without
useful context information [1]/

[1] - https://issues.apache.org/jira/browse/DRILL-4426

On Fri, Jan 22, 2016 at 8:49 AM, François Méthot <fm...@gmail.com>
wrote:

> Hi Drill Community,
>
>
> Using drill-embedded, I encountered this error while doing a query on
> folders containing thousands of parquet files:
>
>
>
> Error: SYSTEM ERROR: IOException: FAILED_TO_UNCOMPRESSED(5)
>
>
>
> Fragment 1:9
>
>
>
> After re-running the same query with the log level set to DEBUG, I tracked
> the files that were scanned by Fragment 1:9, performed the same query on
> each individual file until I got the same error.
>
>
>
> It turned out that a column in one of the parquet file is causing this
> issue. Whether it is an issue with our parquet writer or with the drill
> reader remains to be determined.
>
>
>
> My questions is :
>
> Is there an option to have a fragment thread to move on to the next file
> after it encounter such error, without completely spoiling the whole query
> and result?
>
>
>
> Also in this case, it would have been useful if it was clearly specified in
> the log which parquet file is causing issue.
>
>
>
> Thanks a lot
>
> François
>