You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2007/01/22 19:57:29 UTC

[jira] Resolved: (HADOOP-913) dynamically loading C++ mapper/reducer classes in map/reduce jobs

     [ https://issues.apache.org/jira/browse/HADOOP-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley resolved HADOOP-913.
----------------------------------

       Resolution: Duplicate
    Fix Version/s: 0.11.0

Duplicate of HADOOP-234.

> dynamically loading C++  mapper/reducer classes in map/reduce jobs
> ------------------------------------------------------------------
>
>                 Key: HADOOP-913
>                 URL: https://issues.apache.org/jira/browse/HADOOP-913
>             Project: Hadoop
>          Issue Type: New Feature
>            Reporter: Runping Qi
>             Fix For: 0.11.0
>
>
> It is highly desirable for the current map/reduce framework to be able to call functions in c++ (or other languages).
> I am proposing a generic entension to the current framework to achieve the above goal. 
> The extension is an application level solution, similar to 
> HadoopStreaming in spirit, thus does not have impact on Hadoop core.
> I will maintain the native map/reduce execution model. 
> The basic idea is to use socket/rpc to go through the language barrier.
> In particular, we can implement a generic mapper/reducer  class in Java as a proxy for calling functions in other language.
> The configure function of the class will create a process that will open a user specified shared lirary act as an RPC server.
> The map function of the class will just invoke an RPC call  the key/value pair. 
> Such an RPC call is expected to return a list of key/value pairs. The map function then can emit the outputs.
> The below is a sketch for the generic class:
>         public class MapRedCPPAdapter implements Mapper, Reducer {
>                 String sharedLibraryName;
>                 RPCProxy theServer;
>                 
>                 ...
>                 public void configure(JobConf job) {
>                         sharedLibraryName = job.get("shared.lib.name");
>                         theServer = createServer(sharedLibraryName );
>                }
>                public void close() {
>                         theServer.stop();
>                }
>                public void map(key, value, output, repoter) {
>                         ArrayList pairs = invokeRemoteMap(theServer, key, value);
>                         emit(pairs)
>                }
>                public void reduce (key, values, output, reporter) {
>                         ArrayList pairs = invokeRemoteReduce(theServer, key, value);
>                         emit(pairs)
>                }
>          }
> The cons of this approach include are the overhead associated with 
> RPC calls and creating an additional process per mapper/reducer task.
> The pros are thhat the extension is clean, generic, simple. It is applicable to other foreign languages too.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira