You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Mehdi Mirakhorli <mi...@gmail.com> on 2011/09/21 20:22:48 UTC

Research on Hadoop Design

Hi everyone,



We are a group of researchers at DePaul University investigating the use of
tactics in open-source, high-performance, fault-tolerant software systems.
 Obviously we are not nearly as familiar with Hadoop as all of the
developers are, so we hoped you could help us.  We have focused particularly
on 5 arcjitectural tactics and have listed the tactics we found in the
attached table.



Do you think we found all the major occurrences of these tactics in Hadoop?
Any help or insights you could provide us would be very helpful.



Many thanks,

MM, JCH, YS, MC.







*Tactic*

*Number of tactic-related classes identified*

*Explanation*

*Packages / Location*

Heartbeat

10

The heartbeat with piggybacking is used in Hadoop to check the health status
of each task.

mapreduce\src\java\org\apache\hadoop\mapred\TaskTracker.java
mapreduce\src\java\org\apache\hadoop\mapreduce\server\tasktracker\TTConfig.java
mapreduce\src\contrib\mumak\src\java\org\apache\hadoop\mapred\SimulatorTaskTracker.java
mapreduce\src\contrib\mumak\src\java\org\apache\hadoop\mapred\SimulatorEngine.java
mapreduce\src\java\org\apache\hadoop\mapred\JobTracker.java
mapreduce\src\java\org\apache\hadoop\mapred\TaskTrackerManager.java
mapreduce\src\java\org\apache\hadoop\mapreduce\server\jobtracker\JTConfig.java
mapreduce\src\contrib\mumak\src\java\org\apache\hadoop\mapred\SimulatorJobTracker.java
mapreduce\src\java\org\apache\hadoop\mapred\InterTrackerProtocol.java
mapreduce\src\java\org\apache\hadoop\mapreduce\util\ConfigUtil.java

Resource Pooling

70

Thread pooling is used to increase the performance

mapred package

47

Block pooling is used to save and reuse data blocks

hdfs subsystem

3

Job pooling: For increasing the performace the scheduler actually organizes
jobs further into "pools", and shares resources fairly between these pools.

mapreduce\src\contrib\fairscheduler\src\java\org\apache\hadoop\mapred\Pool.java
mapreduce\src\contrib\fairscheduler\src\java\org\apache\hadoop\mapred\PoolManager.java
mapreduce\src\contrib\fairscheduler\src\java\org\apache\hadoop\mapred\PoolSchedulable.java

Scheduling

96

Three different scheduling service have been implemented to executes task
and jobs.
The scheduling strategies are fairScheduler, Dynamic Scheduling and Capacity
Scheduling

mapreduce\src\contrib\dynamic-scheduler\
src\java\org\apache\hadoop\mapred\DynamicPriorityScheduler.java
mapreduce\src\contrib\dynamic-scheduler\
mapreduce\src\contrib\fairscheduler\
mapreduce\src\contrib\capacity-scheduler\

Audit Trail

7

Audit log is used to capture information about authorization/authentication
events (success/failure)

mapreduce\src\java\org\apache\hadoop\mapred\AuditLogger.java
mapreduce\src\java\org\apache\hadoop\mapred\JobTracker.java
mapreduce\src\java\org\apache\hadoop\mapred\JobInProgress.java
mapreduce\src\java\org\apache\hadoop\mapred\ACLsManager.java
common\src\test\core\org\apache\hadoop\ipc\MiniRPCBenchmark.java
common\src\java\org\apache\hadoop\ipc\Server.java
common\src\java\org\apache\hadoop\security\authorize\ServiceAuthorizationManager.java

Authenticate

36

Authentication is used for controling users access

*common\src\java\org\apache\hadoop\security
*
mapreduce\src\java\org\apache\hadoop\mapred\pipes\OutputHandler.java
mapreduce\src\java\org\apache\hadoop\mapred\pipes\UpwardProtocol.java
common\src\java\org\apache\hadoop\ipc\Client.java
common\src\test\core\org\apache\hadoop\ipc\MiniRPCBenchmark.java
hdfs\src\java\org\apache\hadoop\hdfs\server\common\JspHelper.java
hdfs\src\java\org\apache\hadoop\hdfs\server\datanode\SecureDataNodeStarter.java
hdfs\src\java\org\apache\hadoop\hdfs\server\namenode\CancelDelegationTokenServlet.java
hdfs\src\java\org\apache\hadoop\hdfs\server\namenode\FSNamesystem.java
common\src\java\org\apache\hadoop\http\lib\StaticUserWebFilter.java
common\src\java\org\apache\hadoop\ipc\ConnectionHeader.java
mapreduce\src\java\org\apache\hadoop\mapred\pipes\Application.java
mapreduce\src\java\org\apache\hadoop\mapred\JobInProgress.java
mapreduce\src\java\org\apache\hadoop\mapred\JobTracker.java
mapreduce\src\java\org\apache\hadoop\mapred\TaskTracker.java
common\src\java\org\apache\hadoop\ipc\Server.java
common\src\java\org\apache\hadoop\ipc\metrics\RpcMetrics.java