You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Mehdi Mirakhorli <mi...@gmail.com> on 2011/09/21 20:22:48 UTC
Research on Hadoop Design
Hi everyone,
We are a group of researchers at DePaul University investigating the use of
tactics in open-source, high-performance, fault-tolerant software systems.
Obviously we are not nearly as familiar with Hadoop as all of the
developers are, so we hoped you could help us. We have focused particularly
on 5 arcjitectural tactics and have listed the tactics we found in the
attached table.
Do you think we found all the major occurrences of these tactics in Hadoop?
Any help or insights you could provide us would be very helpful.
Many thanks,
MM, JCH, YS, MC.
*Tactic*
*Number of tactic-related classes identified*
*Explanation*
*Packages / Location*
Heartbeat
10
The heartbeat with piggybacking is used in Hadoop to check the health status
of each task.
mapreduce\src\java\org\apache\hadoop\mapred\TaskTracker.java
mapreduce\src\java\org\apache\hadoop\mapreduce\server\tasktracker\TTConfig.java
mapreduce\src\contrib\mumak\src\java\org\apache\hadoop\mapred\SimulatorTaskTracker.java
mapreduce\src\contrib\mumak\src\java\org\apache\hadoop\mapred\SimulatorEngine.java
mapreduce\src\java\org\apache\hadoop\mapred\JobTracker.java
mapreduce\src\java\org\apache\hadoop\mapred\TaskTrackerManager.java
mapreduce\src\java\org\apache\hadoop\mapreduce\server\jobtracker\JTConfig.java
mapreduce\src\contrib\mumak\src\java\org\apache\hadoop\mapred\SimulatorJobTracker.java
mapreduce\src\java\org\apache\hadoop\mapred\InterTrackerProtocol.java
mapreduce\src\java\org\apache\hadoop\mapreduce\util\ConfigUtil.java
Resource Pooling
70
Thread pooling is used to increase the performance
mapred package
47
Block pooling is used to save and reuse data blocks
hdfs subsystem
3
Job pooling: For increasing the performace the scheduler actually organizes
jobs further into "pools", and shares resources fairly between these pools.
mapreduce\src\contrib\fairscheduler\src\java\org\apache\hadoop\mapred\Pool.java
mapreduce\src\contrib\fairscheduler\src\java\org\apache\hadoop\mapred\PoolManager.java
mapreduce\src\contrib\fairscheduler\src\java\org\apache\hadoop\mapred\PoolSchedulable.java
Scheduling
96
Three different scheduling service have been implemented to executes task
and jobs.
The scheduling strategies are fairScheduler, Dynamic Scheduling and Capacity
Scheduling
mapreduce\src\contrib\dynamic-scheduler\
src\java\org\apache\hadoop\mapred\DynamicPriorityScheduler.java
mapreduce\src\contrib\dynamic-scheduler\
mapreduce\src\contrib\fairscheduler\
mapreduce\src\contrib\capacity-scheduler\
Audit Trail
7
Audit log is used to capture information about authorization/authentication
events (success/failure)
mapreduce\src\java\org\apache\hadoop\mapred\AuditLogger.java
mapreduce\src\java\org\apache\hadoop\mapred\JobTracker.java
mapreduce\src\java\org\apache\hadoop\mapred\JobInProgress.java
mapreduce\src\java\org\apache\hadoop\mapred\ACLsManager.java
common\src\test\core\org\apache\hadoop\ipc\MiniRPCBenchmark.java
common\src\java\org\apache\hadoop\ipc\Server.java
common\src\java\org\apache\hadoop\security\authorize\ServiceAuthorizationManager.java
Authenticate
36
Authentication is used for controling users access
*common\src\java\org\apache\hadoop\security
*
mapreduce\src\java\org\apache\hadoop\mapred\pipes\OutputHandler.java
mapreduce\src\java\org\apache\hadoop\mapred\pipes\UpwardProtocol.java
common\src\java\org\apache\hadoop\ipc\Client.java
common\src\test\core\org\apache\hadoop\ipc\MiniRPCBenchmark.java
hdfs\src\java\org\apache\hadoop\hdfs\server\common\JspHelper.java
hdfs\src\java\org\apache\hadoop\hdfs\server\datanode\SecureDataNodeStarter.java
hdfs\src\java\org\apache\hadoop\hdfs\server\namenode\CancelDelegationTokenServlet.java
hdfs\src\java\org\apache\hadoop\hdfs\server\namenode\FSNamesystem.java
common\src\java\org\apache\hadoop\http\lib\StaticUserWebFilter.java
common\src\java\org\apache\hadoop\ipc\ConnectionHeader.java
mapreduce\src\java\org\apache\hadoop\mapred\pipes\Application.java
mapreduce\src\java\org\apache\hadoop\mapred\JobInProgress.java
mapreduce\src\java\org\apache\hadoop\mapred\JobTracker.java
mapreduce\src\java\org\apache\hadoop\mapred\TaskTracker.java
common\src\java\org\apache\hadoop\ipc\Server.java
common\src\java\org\apache\hadoop\ipc\metrics\RpcMetrics.java