You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Bryan A. P. Pendleton" <bp...@geekdom.net> on 2007/07/20 19:57:55 UTC

Just another report of performance improvements with recent releases...

I've been using the Hadoop project off and on for the last year in some
ongoing work studying Wikipedia. One of the tasks I developed computes the
revision-to-revision diff across all edits in the Wikipedia history. From
the time I first developed the job (last summer) to the latest operation
(last week, running on the 0.13.0 release), I've seen a pretty remarkable
increase in performance. Even though the the input size has more than
doubled, the time to run the job on Hadoop has dropped by half, for a
roughly 4x overall improvement in performance.  Thanks everyone!

-- 
Bryan A. P. Pendleton
Ph: (877) geek-1-bp