You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Andreas Papadakis <at...@yahoo.gr> on 2007/03/23 20:51:38 UTC

Hadoop for perl

Hello,

I want to ask if there is an implementation of Hadoop for perl...or something similar to Hadoop for python

Thank you

 		
---------------------------------
 Χρησιμοποιείτε Yahoo!
 Βαρεθήκατε τα ενοχλητικά μηνύ ματα (spam); Το Yahoo! Mail διαθέτει την καλύτερη δυνατή προστασία κατά των ενοχλητικών μηνυμάτων 
 http://login.yahoo.com/config/mail?.intl=gr

Re: Hadoop for perl

Posted by Owen O'Malley <ow...@yahoo-inc.com>.

Sorry, it looks like the list ate my attachment

-- Owen

#ifndef HADOOP_PIPES_HH
#define HADOOP_PIPES_HH

#ifdef SWIG
%module (directors="1") HadoopPipes
%include "std_string.i"
%feature("director") Mapper;
%feature("director") Reducer;
%feature("director") Partitioner;
%feature("director") RecordReader;
%feature("director") RecordWriter;
%feature("director") Factory;
#else
#include <string>
#endif

namespace HadoopPipes {

/**
* This interface defines the interface between application code and the
* foreign code interface to Hadoop Map/Reduce.
*/

/**
* A JobConf defines the properties for a job.
*/
class JobConf {
public:
   virtual bool hasKey(const std::string& key) const = 0;
   virtual const std::string& get(const std::string& key) const = 0;
   virtual int getInt(const std::string& key) const = 0;
   virtual float getFloat(const std::string& key) const = 0;
   virtual bool getBoolean(const std::string&key) const = 0;
   virtual ~JobConf() {}
};

/**
* Task context provides the information about the task and job.
*/
class TaskContext {
public:
   /**
    * Get the JobConf for the current task.
    */
   virtual const JobConf* getJobConf() = 0;

   /**
    * Get the current key.
    * @return the current key
    */
   virtual const std::string& getInputKey() = 0;

   /**
    * Get the current value.
    * @return the current value
    */
   virtual const std::string& getInputValue() = 0;

   /**
    * Generate an output record
    */
   virtual void emit(const std::string& key, const std::string&  
value) = 0;

   /**
    * Mark your task as having made progress without changing the status
    * message.
    */
   virtual void progress() = 0;

   /**
    * Set the status message and call progress.
    */
   virtual void setStatus(const std::string& status) = 0;

   /**
    * Get the name of the key class of the input to this task.
    */
   virtual const std::string& getInputKeyClass() = 0;

   /**
    * Get the name of the value class of the input to this task.
    */
   virtual const std::string& getInputValueClass() = 0;

   virtual ~TaskContext() {}
};

class MapContext: public TaskContext {
public:

   /**
    * Access the InputSplit of the mapper.
    */
   virtual const std::string& getInputSplit() = 0;

};

class ReduceContext: public TaskContext {
public:
   /**
    * Advance to the next value.
    */
   virtual bool nextValue() = 0;
};

class Closable {
public:
   virtual void close() {}
   virtual ~Closable() {}
};

/**
* The application's mapper class to do map.
*/
class Mapper: public Closable {
public:
   virtual void map(MapContext& context) = 0;
};

/**
* The application's reducer class to do reduce.
*/
class Reducer: public Closable {
public:
   virtual void reduce(ReduceContext& context) = 0;
};

/**
* User code to decide where each key should be sent.
*/
class Partitioner {
public:
   virtual int partition(const std::string& key, int numOfReduces) = 0;
   virtual ~Partitioner() {}
};

/**
* For applications that want to read the input directly for the map  
function
* they can define RecordReaders in C++.
*/
class RecordReader: public Closable {
public:
   virtual bool next(std::string& key, std::string& value) = 0;

   /**
    * The progress of the record reader through the split as a value  
between
    * 0.0 and 1.0.
    */
   virtual float getProgress() = 0;
};

/**
* An object to write key/value pairs as they are emited from the reduce.
*/
class RecordWriter: public Closable {
public:
   virtual void emit(const std::string& key,
                     const std::string& value) = 0;
};

/**
* A factory to create the necessary application objects.
*/
class Factory {
public:
   virtual Mapper* createMapper(MapContext& context) const = 0;
   virtual Reducer* createReducer(ReduceContext& context) const = 0;

   /**
    * Create a combiner, if this application has one.
    * @return the new combiner or NULL, if one is not needed
    */
   virtual Reducer* createCombiner(MapContext& context) const {
     return NULL;
   }

   /**
    * Create an application partitioner object.
    * @return the new partitioner or NULL, if the default partitioner  
should be
    *     used.
    */
   virtual Partitioner* createPartitioner(MapContext& context) const {
     return NULL;
   }

   /**
    * Create an application record reader.
    * @return the new RecordReader or NULL, if the Java RecordReader  
should be
    *    used.
    */
   virtual RecordReader* createRecordReader(MapContext& context) const {
     return NULL;
   }

   /**
    * Create an application record writer.
    * @return the new RecordWriter or NULL, if the Java RecordWriter  
should be
    *    used.
    */
   virtual RecordWriter* createRecordWriter(ReduceContext& context)  
const {
     return NULL;
   }

   virtual ~Factory() {}
};

/**
* Start the event handling loop that runs the task. This will use the  
given
* factory to create Mappers and Reducers and so on.
* @return true, if the task succeeded.
*/
bool runTask(const Factory& factory);

}

#endif

Re: Hadoop for perl

Posted by Owen O'Malley <ow...@yahoo-inc.com>.

On Mar 23, 2007, at 12:51 PM, Andreas Papadakis wrote:

> Hello,
>
> I want to ask if there is an implementation of Hadoop for perl...or  
> something similar to Hadoop for python

I'm working on http://issues.apache.org/jira/browse/HADOOP-234, which  
will provide c++ bindings that are being designed to be "swigable" so  
that they are available from python.

Here is the current interface:

Re: Hadoop for perl

Posted by Maximilian Schöfmann <sc...@googlemail.com>.

>
> > Well, there is one for Ruby - at least the MapReduce part: Starfish
> >
> > http://rufy.com/starfish/doc/
>
> Starfish isn't (or wasn't last time I looked) really much like
> MapReduce at all.
>

Hmm you're right. After looking at the code it seems that the author is
using the term "MapReduce" a bit lax - it might be useful for certain tasks
nonetheless.

Re: RE: Re: Hadoop for perl

Posted by Andreas Papadakis <at...@yahoo.gr>.

Actually, I am using the following configuration file

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>dist:22</value>
        <description>The name of the job.</description>
    </property>
    <property>
        <name>mapred.job.tracker</name>
        <value>dist:90007</value>
        <description>The name of the job.</description>
    </property>
    <property> 
        <name>mapred.map.tasks</name>
        <value>1</value>
        <description>
            define mapred.map tasks to be number of slave hosts
        </description> 
    </property> 
    
    <property> 
        <name>mapred.reduce.tasks</name>
        <value>1</value>
        <description>
            define mapred.reduce tasks to be number of slave hosts
        </description> 
    </property> 
    

    
    <property>
        <name>dfs.replication</name>
        <value>1</value>
</property>

</configuration>

and the following comman

bin/hadoop jar hadoop-*-examples.jar wordcount ./myfs/LICENSE.txt ./myFS/out 

I have already executed ./bin/start-all.sh and I have formatted ./myFS/


Richard Yang <ri...@richardyang.net> έγραψε: 
By the error message, it looks like 'workingdirectory' is out of touch.  Is
HDFS dfs portion functioning normally??

Best Regards
 
Richard Yang
richardyang@richardyang.net
kusanagiyang@gmail.com
 
 
-----Original Message-----
From: Andreas Papadakis [mailto:atsia2003@yahoo.gr] 
Sent: Friday, March 23, 2007 1:18 PM
To: hadoop-user@lucene.apache.org
Subject: Re: Re: Hadoop for perl

Is there any other way to use hadoop with perl?

I read that I can use it with python(using jython)... I tried the example
but jython cannot find the module org.apache.hadoop.fs...(I dont know much
about java). Anyone knows where can I find that module and where should I
place it??

Also, I tried the java example and I get too many errors....
I used the command
bin/hadoop jar hadoop-*-examples.jar wordcount  mydfs/LICENSE.txt mydfs/a

but I get the following errors:

java.lang.RuntimeException: java.net.SocketTimeoutException: timed out
waiting for rpc response
        at
org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:247)
        at org.apache.hadoop.mapred.JobConf.setInputPath(JobConf.java:150)
        at org.apache.hadoop.examples.WordCount.main(WordCount.java:137)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver
.java:71)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:143)
        at
org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:40)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
Caused by: java.net.SocketTimeoutException: timed out waiting for rpc
response
        at org.apache.hadoop.ipc.Client.call(Client.java:469)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164)
        at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:248)
        at org.apache.hadoop.dfs.DFSClient.(DFSClient.java:106)
        at
org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem
.java:65)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:160)
        at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91)
        at
org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:243)

Anyone can help me???
Thank you

Erik Hetzner  έγραψε: At Fri, 23 Mar 2007 20:58:38
+0100,
Maximilian SchΓΆfmann  wrote:
>
> > I want to ask if there is an implementation of Hadoop for perl...or
> > something similar to Hadoop for python
>
> Well, there is one for Ruby - at least the MapReduce part: Starfish
>
> http://rufy.com/starfish/doc/

Starfish isnβ&#65533;&#65533;t (or wasnβ&#65533;&#65533;t last time I
looked) really much like
MapReduce at all.

best,
Erik Hetzner


   
---------------------------------
 Χρησιμοποιείτε Yahoo!
 Βαρεθήκατε τα ενοχλητικά μηνύ ματα (spam); Το Yahoo! Mail διαθέτει την
καλύτερη δυνατή προστασία κατά των ενοχλητικών μηνυμάτων 
 http://login.yahoo.com/config/mail?.intl=gr 




 		
---------------------------------
 Χρησιμοποιείτε Yahoo!
 Βαρεθήκατε τα ενοχλητικά μηνύ ματα (spam); Το Yahoo! Mail διαθέτει την καλύτερη δυνατή προστασία κατά των ενοχλητικών μηνυμάτων 
 http://login.yahoo.com/config/mail?.intl=gr

RE: Re: Hadoop for perl

Posted by Richard Yang <ri...@richardyang.net>.

By the error message, it looks like 'workingdirectory' is out of touch.  Is
HDFS dfs portion functioning normally??

Best Regards
 
Richard Yang
richardyang@richardyang.net
kusanagiyang@gmail.com
 
 
-----Original Message-----
From: Andreas Papadakis [mailto:atsia2003@yahoo.gr] 
Sent: Friday, March 23, 2007 1:18 PM
To: hadoop-user@lucene.apache.org
Subject: Re: Re: Hadoop for perl

Is there any other way to use hadoop with perl?

I read that I can use it with python(using jython)... I tried the example
but jython cannot find the module org.apache.hadoop.fs...(I dont know much
about java). Anyone knows where can I find that module and where should I
place it??

Also, I tried the java example and I get too many errors....
I used the command
bin/hadoop jar hadoop-*-examples.jar wordcount  mydfs/LICENSE.txt mydfs/a

but I get the following errors:

java.lang.RuntimeException: java.net.SocketTimeoutException: timed out
waiting for rpc response
        at
org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:247)
        at org.apache.hadoop.mapred.JobConf.setInputPath(JobConf.java:150)
        at org.apache.hadoop.examples.WordCount.main(WordCount.java:137)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver
.java:71)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:143)
        at
org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:40)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
Caused by: java.net.SocketTimeoutException: timed out waiting for rpc
response
        at org.apache.hadoop.ipc.Client.call(Client.java:469)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164)
        at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:248)
        at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:106)
        at
org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem
.java:65)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:160)
        at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91)
        at
org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:243)

Anyone can help me???
Thank you

Erik Hetzner <er...@ucop.edu> έγραψε: At Fri, 23 Mar 2007 20:58:38
+0100,
Maximilian SchΓΆfmann  wrote:
>
> > I want to ask if there is an implementation of Hadoop for perl...or
> > something similar to Hadoop for python
>
> Well, there is one for Ruby - at least the MapReduce part: Starfish
>
> http://rufy.com/starfish/doc/

Starfish isnβ&#65533;&#65533;t (or wasnβ&#65533;&#65533;t last time I
looked) really much like
MapReduce at all.

best,
Erik Hetzner


 		
---------------------------------
 Χρησιμοποιείτε Yahoo!
 Βαρεθήκατε τα ενοχλητικά μηνύ ματα (spam); Το Yahoo! Mail διαθέτει την
καλύτερη δυνατή προστασία κατά των ενοχλητικών μηνυμάτων 
 http://login.yahoo.com/config/mail?.intl=gr

Re: Re: Hadoop for perl

Posted by Andreas Papadakis <at...@yahoo.gr>.

Is there any other way to use hadoop with perl?

I read that I can use it with python(using jython)... I tried the example but jython cannot find the module org.apache.hadoop.fs...(I dont know much about java). Anyone knows where can I find that module and where should I place it??

Also, I tried the java example and I get too many errors....
I used the command
bin/hadoop jar hadoop-*-examples.jar wordcount  mydfs/LICENSE.txt mydfs/a

but I get the following errors:

java.lang.RuntimeException: java.net.SocketTimeoutException: timed out waiting for rpc response
        at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:247)
        at org.apache.hadoop.mapred.JobConf.setInputPath(JobConf.java:150)
        at org.apache.hadoop.examples.WordCount.main(WordCount.java:137)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:143)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:40)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
Caused by: java.net.SocketTimeoutException: timed out waiting for rpc response
        at org.apache.hadoop.ipc.Client.call(Client.java:469)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164)
        at org.apache.hadoop.dfs.$Proxy0.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:248)
        at org.apache.hadoop.dfs.DFSClient.<init>(DFSClient.java:106)
        at org.apache.hadoop.dfs.DistributedFileSystem.initialize(DistributedFileSystem.java:65)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:160)
        at org.apache.hadoop.fs.FileSystem.getNamed(FileSystem.java:119)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:91)
        at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:243)

Anyone can help me???
Thank you

Erik Hetzner <er...@ucop.edu> έγραψε: At Fri, 23 Mar 2007 20:58:38 +0100,
Maximilian SchΓΆfmann  wrote:
>
> > I want to ask if there is an implementation of Hadoop for perl...or
> > something similar to Hadoop for python
>
> Well, there is one for Ruby - at least the MapReduce part: Starfish
>
> http://rufy.com/starfish/doc/

Starfish isnβ&#65533;&#65533;t (or wasnβ&#65533;&#65533;t last time I looked) really much like
MapReduce at all.

best,
Erik Hetzner


 		
---------------------------------
 Χρησιμοποιείτε Yahoo!
 Βαρεθήκατε τα ενοχλητικά μηνύ ματα (spam); Το Yahoo! Mail διαθέτει την καλύτερη δυνατή προστασία κατά των ενοχλητικών μηνυμάτων 
 http://login.yahoo.com/config/mail?.intl=gr

Re: Hadoop for perl

Posted by Erik Hetzner <er...@ucop.edu>.

At Fri, 23 Mar 2007 20:58:38 +0100,
Maximilian Schöfmann <sc...@googlemail.com> wrote:
>
> > I want to ask if there is an implementation of Hadoop for perl...or
> > something similar to Hadoop for python
>
> Well, there is one for Ruby - at least the MapReduce part: Starfish
>
> http://rufy.com/starfish/doc/

Starfish isn’t (or wasn’t last time I looked) really much like
MapReduce at all.

best,
Erik Hetzner

Re: Hadoop for perl

Posted by Maximilian Schöfmann <sc...@googlemail.com>.

> I want to ask if there is an implementation of Hadoop for perl...or  
> something similar to Hadoop for python

Well, there is one for Ruby - at least the MapReduce part: Starfish

http://rufy.com/starfish/doc/

HTH,
Max