You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by stack <st...@duboce.net> on 2008/09/24 05:48:28 UTC

Fwd: Hadoop + Python = Happy

Here is announcement of the Happy project that has been mentioned on this
list a few times recently.
St.Ack

---------- Forwarded message ----------
From: Colin Evans <co...@metaweb.com>
Date: Tue, Sep 23, 2008 at 8:32 PM
Subject: Hadoop + Python = Happy
To: core-user@hadoop.apache.org


Freebase is finally open-sourcing our Jython-based framework for writing
map-reduce jobs on Hadoop.  Happy tightly embeds Jython into the Hadoop
APIs, files off a lot of the sharp edges, and makes writing map-reduce
programs a breeze.  This is the 0.1 release, but we've been using Happy at
Freebase for a while, so it is stable and full-featured.  Take a look and
let me know if it is useful.

The project and docs are here:

http://code.google.com/p/happy/
http://www.mqlx.com/~colin/happy.html<http://www.mqlx.com/%7Ecolin/happy.html>

Here's an example word count program written in Happy:

---
import sys, happy, happy.log

happy.log.setLevel("debug")
log = happy.log.getLog("wordcount")

class WordCount(happy.HappyJob):
  def __init__(self, inputpath, outputpath):
      happy.HappyJob.__init__(self)
      self.inputpaths = inputpath
      self.outputpath = outputpath
      self.inputformat = "text

  def map(self, records, task):
      for _, value in records:
          for word in value.split():
              task.collect(word, "1")

  def reduce(self, key, values, task):
      count = 0;
      for _ in values: count += 1
      task.collect(key, str(count))
      log.debug(key + ":" + str(count))
      happy.results["words"] = happy.results.setdefault("words", 0) + count
      happy.results["unique"] = happy.results.setdefault("unique", 0) + 1

if __name__ == "__main__":
  if len(sys.argv) < 3:
      print "Usage: <inputpath> <outputpath>"
      sys.exit(-1)
  wc = WordCount(sys.argv[1], sys.argv[2])
  results = wc.run()
  print str(sum(results["words"])) + " total words"
  print str(sum(results["unique"])) + " unique words"
---


Thanks
Colin