You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Michael Knapp <mi...@gmail.com> on 2015/08/03 19:10:15 UTC

EOFException when transmitting a class that extends Externalizable

Hi,

I am having a problem serializing a custom partitioner that I have written
that extends Externalizable.  The partitioner wraps a java TreeSet which
stores table splits.  There are thousands of splits.

I noticed earlier that my spark job was taking over 30 seconds just to
transmit a task to each worker, which is what motivated me to optimize the
serialization of the partitioner I wrote.  I read in your configuration
page that Kryo can only be used to serialize data in RDDs, but spark cannot
use Kryo to transmit the task closure to each executor.  That is why I
chose to use Externalizable for my custom partitioner.

I have unit tests that confirm my externalizable class can be serialized
using java serialization.  Unfortunately when it comes to the cluster
though, it stops working.  I added some logging to this and discovered that
it had serialized about 2000 splits out of 4000 before it encountered an
EOFException.  This is happening consistently on every node of my cluster.
I have no idea what could cause this or even how to get more information.

Would somebody please tell me what could possibly cause this or how to
troubleshoot it?

Mike Knapp