You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Rob Stewart <ro...@googlemail.com> on 2010/12/20 03:17:29 UTC

Hadoop Scalability - A Case Study: Concordance

Hi All,

I recently entered an Hadoop implementation for the SICSA Muticore
Challenge, held last week:
http://www.macs.hw.ac.uk/sicsawiki/index.php/Challenge_PhaseI

The aim was to implement the concordance application, in whichever
language, or framework you felt best. We ended up comparing a wide
variety, including Erlang, Parallel Haskell, Java with Fork/Join, and
OpenMP, amongst others.

Whilst most implementations gave very low runtime for very small
inputs, Hadoop was not able (and is not designed) to do so. But where
the Hadoop implementation shone through, was the scaling of input
size. I have written a summary of my implementation, optimizations,
and put a link to the complete set of slides I presented, at this
address:

http://www.macs.hw.ac.uk/~rs46/multicore_challenge1/

Perhaps the highlight of these results is (running on 16 nodes):
Benchmark 1
---
Input File: Bible.txt - 801,541 words
Runtime: 36 seconds

Benchmark 2
---
Input File: ascii100MB.txt - 18,030,005 words
Runtime: 65 seconds

That is an increase multiplier for input size of 22.5, but an increase
in runtime of just 1.8.
------------

Feedback would be welcome. It was interesting to see that some of the
shared memory implementations were not able to compute the 100mb file
without Out-Of-Memory errors. This was not a problem for Hadoop.

There is a plan to hold another Multicore Challenge, in May 2011. If
anyone wants to make any inquiries, I suggest you get in touch with
the faciliator, Hans-Wolfgang Loidl, who's named at the bottom of this
page:
http://www.sicsa.ac.uk/news/sicsa-multicore-challenge

Regards,


Rob Stewart