You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jaonary Rabarisoa <ja...@gmail.com> on 2014/07/02 09:01:43 UTC
Configure and run external process with RDD.pipe
Hi all,
I need to run a complex external process with a lots of dependencies from
spark. The "pipe" and "addFile" function seem to be my friends but there
are just some issues that I need to solve.
Precisely, the process I want to run are C++ executable that may depend on
some libraries and additional file parameters. I bundle every things in one
tar file so I may have the following structure :
myalgo:
-- run.exe
-- libdepend_run.so
-- parameter_file
For example my algo may be a support vector machine with the trained model
file.
Now I need a way to deploy my bundled algo on every node and pipe the
executable on my RDD. My question is : is it possible to deploy my tar
files and extract them on every worker so that I can invoke my executable ?
Any ideas will be helpfull,
Cheers,