You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Runping Qi (JIRA)" <ji...@apache.org> on 2007/01/20 16:36:29 UTC

[jira] Created: (HADOOP-913) dynamically loading C++ mapper/reducer classes in map/reduce jobs

dynamically loading C++  mapper/reducer classes in map/reduce jobs
------------------------------------------------------------------

                 Key: HADOOP-913
                 URL: https://issues.apache.org/jira/browse/HADOOP-913
             Project: Hadoop
          Issue Type: New Feature
            Reporter: Runping Qi



It is highly desirable for the current map/reduce framework to be able to call functions in c++ (or other languages).

I am proposing a generic entension to the current framework to achieve the above goal. 
The extension is an application level solution, similar to 
HadoopStreaming in spirit, thus does not have impact on Hadoop core.
I will maintain the native map/reduce execution model. 

The basic idea is to use socket/rpc to go through the language barrier.
In particular, we can implement a generic mapper/reducer  class in Java as a proxy for calling functions in other language.
The configure function of the class will create a process that will open a user specified shared lirary act as an RPC server.
The map function of the class will just invoke an RPC call  the key/value pair. 
Such an RPC call is expected to return a list of key/value pairs. The map function then can emit the outputs.
The below is a sketch for the generic class:

        public class MapRedCPPAdapter implements Mapper, Reducer {
                String sharedLibraryName;
                RPCProxy theServer;
                
                ...

                public void configure(JobConf job) {
                        sharedLibraryName = job.get("shared.lib.name");
                        theServer = createServer(sharedLibraryName );
               }
               public void close() {
                        theServer.stop();
               }
               public void map(key, value, output, repoter) {
                        ArrayList pairs = invokeRemoteMap(theServer, key, value);
                        emit(pairs)
               }
               public void reduce (key, values, output, reporter) {
                        ArrayList pairs = invokeRemoteReduce(theServer, key, value);
                        emit(pairs)
               }
         }

The cons of this approach include are the overhead associated with 
RPC calls and creating an additional process per mapper/reducer task.
The pros are thhat the extension is clean, generic, simple. It is applicable to other foreign languages too.




-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Resolved: (HADOOP-913) dynamically loading C++ mapper/reducer classes in map/reduce jobs

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley resolved HADOOP-913.
----------------------------------

       Resolution: Duplicate
    Fix Version/s: 0.11.0

Duplicate of HADOOP-234.

> dynamically loading C++  mapper/reducer classes in map/reduce jobs
> ------------------------------------------------------------------
>
>                 Key: HADOOP-913
>                 URL: https://issues.apache.org/jira/browse/HADOOP-913
>             Project: Hadoop
>          Issue Type: New Feature
>            Reporter: Runping Qi
>             Fix For: 0.11.0
>
>
> It is highly desirable for the current map/reduce framework to be able to call functions in c++ (or other languages).
> I am proposing a generic entension to the current framework to achieve the above goal. 
> The extension is an application level solution, similar to 
> HadoopStreaming in spirit, thus does not have impact on Hadoop core.
> I will maintain the native map/reduce execution model. 
> The basic idea is to use socket/rpc to go through the language barrier.
> In particular, we can implement a generic mapper/reducer  class in Java as a proxy for calling functions in other language.
> The configure function of the class will create a process that will open a user specified shared lirary act as an RPC server.
> The map function of the class will just invoke an RPC call  the key/value pair. 
> Such an RPC call is expected to return a list of key/value pairs. The map function then can emit the outputs.
> The below is a sketch for the generic class:
>         public class MapRedCPPAdapter implements Mapper, Reducer {
>                 String sharedLibraryName;
>                 RPCProxy theServer;
>                 
>                 ...
>                 public void configure(JobConf job) {
>                         sharedLibraryName = job.get("shared.lib.name");
>                         theServer = createServer(sharedLibraryName );
>                }
>                public void close() {
>                         theServer.stop();
>                }
>                public void map(key, value, output, repoter) {
>                         ArrayList pairs = invokeRemoteMap(theServer, key, value);
>                         emit(pairs)
>                }
>                public void reduce (key, values, output, reporter) {
>                         ArrayList pairs = invokeRemoteReduce(theServer, key, value);
>                         emit(pairs)
>                }
>          }
> The cons of this approach include are the overhead associated with 
> RPC calls and creating an additional process per mapper/reducer task.
> The pros are thhat the extension is clean, generic, simple. It is applicable to other foreign languages too.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira