You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Jaesun Han <js...@gmail.com> on 2008/11/27 07:54:47 UTC

Hadoop Tutorial Workshop in South Korea

Hi, all

Korea Hadoop Community hosts half-day Hadoop Tutorial Workshop
on November 28(Friday) in Seoul, South Korea.
You can check and register the workshop in our website.
http://www.hadoop.or.kr/?document_srl=1945

Time: Friday, November 28, 14:00 ~ 18:00
Location: Seoul National University School of Dentistry main building 121
Free and open event (but limited by 100 persons)

Agenda
- Hadoop Overview
- Hadoop Installation & Management
- Managing a Hadoop Cluster
- MapReduce Programming
- Advanced MapReduce Programming


Look forward to seeing you there!

Jason

Best practices of using Hadoop

Posted by Ricky Ho <rh...@adobe.com>.
I am trying to get some answers to these kind of questions as they pop up frequently ...

1) What kind of problems fits best to Hadoop and what not ?

2) What is the dark side of Hadoop where other parallel processing model (e.g. MPI, TupleSpace ... etc) fits better ?

3) What is the demarcation point between choosing a Hadoop model versus a multi-thread share memory model ?

4) Given that we can partition and replicate a RDBMS table.  We can make it as big as we like and spread the workload across.  Why isn't that good enough for scalability ?  Why do we need BigTable or HBase which require an adoption of a new data model ?

5) Is there a general methodology that can transform any algorithm into the map/reduce form ?

6) How would one choose between Hadoop Java, Hadoop Streaming and PIG ?  Looks like if a problem can be solved in one, it can be solved in others.  If so, PIG is more attractive because it gives a higher level semantics.

I appreciate if anyone come across these decisions can share their thoughts.

Rgds,
ricky