You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jaonary Rabarisoa <ja...@gmail.com> on 2014/07/02 09:01:43 UTC

Configure and run external process with RDD.pipe

Hi all,

I need to run a complex external process with a lots of dependencies from
spark. The "pipe" and "addFile" function seem to be my friends but there
are just some issues that I need to solve.

Precisely, the process I want to run are C++ executable that may depend on
some libraries and additional file parameters. I bundle every things in one
tar file so I may have the following structure :

myalgo:
-- run.exe
-- libdepend_run.so
-- parameter_file


For example my algo may be a support vector machine with the trained model
file.

Now I need a way to deploy my bundled algo on every node and pipe the
executable on my RDD. My question is : is it possible to deploy my tar
files and extract them on every worker so that I can invoke my executable ?

Any ideas will be helpfull,

Cheers,