You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "R. Tyler Croy" <rt...@brokenco.de> on 2019/05/19 16:27:13 UTC

Object serialization for workers

Greetings! I am looking into the possibility of JRuby support for Spark, and
could use some pointers (references?) to orient myself a bit better within the
codebase.

JRuby fat jars load just fine in Spark but where things start to get
predictably dicey is with object serialization for RDDs getting sent to the
workers.

Having worked on something similar for Apache Storm
(https://github.com/jruby-gradle/redstorm), what we ended up doing was shimming
some classes to handy Ruby object/class serialization properly.

I'm expecting to do something similar in Spark but I'm not entirely sure which
interfaces/classes describe the serialization of RDDs. I'm figuring that I'll
need to implement a Ruby equivalent of the org.apache.spark.api.java.function
namespaces, but am not entirely where the pieces come together to turn those
into serialized objects.


Appreciate any direction you all might be able to share, in the meantime, I've
got my miner's cap on and am presently digging through core/ :)



Cheers

--
GitHub:  https://github.com/rtyler

GPG Key ID: 0F2298A980EE31ACCA0A7825E5C92681BEF6CEA2