You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Lian Jiang <ji...@gmail.com> on 2019/03/29 17:23:27 UTC

spark generates corrupted parquet files

Hi,

Occasionally, spark generates some parquet files having only 4 bytes. The
content is "PAR1". ETL spark jobs cannot handle such corrupted files and
ignore the whole partition containing such poison pill files, causing big
data loss.

Spark also generates 0 bytes parquet files but they can be handled by spark.

What could be cause for spark to generate such 4 bytes files? Any clue is
appreciated!