You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Arun Kumar <ar...@gmail.com> on 2013/10/29 12:03:37 UTC

Reading corrupted hadoop sequence files

Hi,

I am trying to read some hadoop sequence files. Some of the lines are not
being parsed and causing exceptions. This throws global exception and the
job dies and files could not be loaded. Is there a way to catch the
exception at each object and return null if the corresponding object could
not be formed. I have seen some solution for hadoop map-reduce framework
like this(
http://stackoverflow.com/questions/14920236/how-to-prevent-hadoop-job-to-fail-on-corrupted-input-file).
How to do it in spark?

Thanks