You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2010/05/12 20:47:24 UTC

[Hadoop Wiki] Update of "IdeasOnLdapConfiguration" by SomeOtherAccount

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "IdeasOnLdapConfiguration" page has been changed by SomeOtherAccount.
http://wiki.apache.org/hadoop/IdeasOnLdapConfiguration

--------------------------------------------------

New page:
This is for HADOOP:5670.

First, a bit about LDAP (extremely simplified).

All objects in LDAP are defined by one or more object classes.  These object classes define the attributes that a given object can utilize.  The attributes in turn have definitions that determine what kinds of values can be used.  This includes things as string, integer, etc, but also whether or not the attribute can hold more than one value.  Object classes, attributes, and values are all defined in a schema definition.

One key feature around LDAP is the ability to search objects using a simple, RPN-style system.  Let's say we have an object class that has this definition:

objectclass: node
hostname: string
domain: string

and in our LDAP server, we have placed the following objects:

hostname=myhost1
objectclass=node
domain=example.com

hostname=myhost2
objectclass=node
domain=example.com

We can now do an LDAP search with (&(objectclass=node)(hostname=myhost1)) to find the 'myhost1' object.  Similarly, we can (&(objectless)(domain=example.com)) to find both myhost1 and myhost2 objects.

Let's apply these ideas to Hadoop.  Here are some rough objectclasses that we can use for demonstration purposes:

generic properties: hadoopGlobalConfig
hadoop.tmp.dir: string
fs.default.name: string
dfs.block.size: integer
dfs.replication: integer
clusterName: string

For datanodes: hadoopDataNode
hostname:  multi-string
dfs.data.dir: multi-string
dfs.datanode.du.reserved: integer
commonname: string

hadoopTaskTracker
commonname: string
hostname: multi-string
mapred.job.tracker: string
mapred.local.dir: multi-string
mapred.tasktracker.map.tasks.maximum: integer
mapred.tasktracker.reduce.tasks.maximum: integer

hadoopJobTracker
hostname:
mapred.reduce.tasks: integer
mapred.reduce.slowstart.completed.maps: numeric
mapred.queue.names: multi-string
mapred.jobtracker.taskScheduler: string
mapred.system.dir: string

For the namenode: hadoopNameNode
commonname: string
dfs.http.address: string
hostname: string
dfs.name.dir: multi-string

Let's define a simple grid:

clusterName=red
objectclass=hadoopGlobalConfig
hadoop.tmp.dir=/tmp
fs.default.name: hdfs://namenode:9000/
dfs.block.size: 128
dfs.replication: 3

commonname=master,cluster=red
objectclass=hadoopNameNode,hadoopJobTracker
dfs.http.address: http://masternode:50070/
hostname: masternode
dfs.name.dir: /nn1,/nn2
mapred.reduce.tasks: 1
mapred.reduce.slowstart.completed.maps: .55
mapred.queue.names: big,small
mapred.jobtracker.taskScheduler: capacity
mapred.system.dir: /system/mapred

commonname=simplecomputenode,cluster=red
objectclass=hadoopDataNode,hadoopTaskTracker
hostname:  node1,node2,node3
dfs.data.dir: /hdfs1, /hdfs2, /hdfs3
dfs.datanode.du.reserved: 10
mapred.job.tracker: commonname=jobtracker,cluster=red
mapred.local.dir: /mr1,/mr2,/mr3
mapred.tasktracker.map.tasks.maximum: 4
mapred.tasktracker.reduce.tasks.maximum: 4