You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Naresh Peshwe <na...@gmail.com> on 2019/07/01 01:52:20 UTC

Option for silent failure while reading a list of files.

Hi All,
When I try to read a list parquet files from S3, my application errors out
if even one of the files are absent. When I searched for solutions most of
them suggested filtering the list of files (on presence) before calling
read.
Shouldn't this be handled by Spark by providing an option for continuing
without throwing an error? If not, could you point me to the thread where
this was discussed upon.


Regards,
Naresh

Re: Option for silent failure while reading a list of files.

Posted by Naresh Peshwe <na...@gmail.com>.
Thanks for your reply. It makes sense as to why the option is not provided.
(Since the user is the one who is imperatively asking spark to read the
files.)

Yes, I provide the list of files. I'll try the ignoreCorruptFiles option.
Also, I'll look into how I can avoid missing files or at least check if
file is present before reading.

Regards,
Naresh

On Mon, Jul 1, 2019, 19:34 Steve Loughran <st...@cloudera.com.invalid>
wrote:

> Where is this list of files coming from?
>
> If you made the list, then yes, the expectation is generally "supply a
> list of files which are present" on the basis that general convention is
> "missing files are considered bad"
>
> Though you could try setting spark.sql.files.ignoreCorruptFiles=true to
> see what happens
>
> Past discussion on the topic of : what if the set of files off s3 includes
> files which have been moved offline, where the conclusion was "you get to
> filter, sorry"
>
> https://issues.apache.org/jira/browse/SPARK-21797
>
>
>
> On Mon, Jul 1, 2019 at 2:52 AM Naresh Peshwe <na...@gmail.com>
> wrote:
>
>> Hi All,
>> When I try to read a list parquet files from S3, my application errors
>> out if even one of the files are absent. When I searched for solutions most
>> of them suggested filtering the list of files (on presence) before calling
>> read.
>> Shouldn't this be handled by Spark by providing an option for continuing
>> without throwing an error? If not, could you point me to the thread where
>> this was discussed upon.
>>
>>
>> Regards,
>> Naresh
>>
>

Re: Option for silent failure while reading a list of files.

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
Where is this list of files coming from?

If you made the list, then yes, the expectation is generally "supply a list
of files which are present" on the basis that general convention is
"missing files are considered bad"

Though you could try setting spark.sql.files.ignoreCorruptFiles=true to see
what happens

Past discussion on the topic of : what if the set of files off s3 includes
files which have been moved offline, where the conclusion was "you get to
filter, sorry"

https://issues.apache.org/jira/browse/SPARK-21797



On Mon, Jul 1, 2019 at 2:52 AM Naresh Peshwe <na...@gmail.com>
wrote:

> Hi All,
> When I try to read a list parquet files from S3, my application errors out
> if even one of the files are absent. When I searched for solutions most of
> them suggested filtering the list of files (on presence) before calling
> read.
> Shouldn't this be handled by Spark by providing an option for continuing
> without throwing an error? If not, could you point me to the thread where
> this was discussed upon.
>
>
> Regards,
> Naresh
>