You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Yoram Arnon (JIRA)" <ji...@apache.org> on 2006/03/21 03:02:00 UTC

[jira] Commented: (HADOOP-88) Configuration: separate client config from server config (and from other-server config)

    [ http://issues.apache.org/jira/browse/HADOOP-88?page=comments#action_12371177 ] 

Yoram Arnon commented on HADOOP-88:
-----------------------------------

I'd also separate the dfs config from the map-reduce config - they have nothing in common, and dfs has a life of its own in that it could support apps other than map-reduce.

Taking this one step further, I'd separate name node config from data node config, and job tracker config from task tracker config. While useful to have them all bunled up when running on a single node, they're typically running on distinct nodes in a real system, and definitely in different processes, so separate configs make sense.

As for the client config, it should be really really easy:
 config file, rather than directory
 config file can reside anywhere, including the same directory as the application, and can have any name
 no reliance on environment variables
 specify the config file on the command line (client -f <file>) allowing concurrent clients for multiple hadoop clusters
 as few knobs as possible, simple to configure manually

> Configuration: separate client config from server config (and from other-server config)
> ---------------------------------------------------------------------------------------
>
>          Key: HADOOP-88
>          URL: http://issues.apache.org/jira/browse/HADOOP-88
>      Project: Hadoop
>         Type: Wish
>     Reporter: Michel Tourn
>     Assignee: Doug Cutting

>
> servers = JobTracker, NameNode, TaskTracker, DataNode
> clients =  runs JobClient (to submit MapReduce jobs), or runs DFSShell (to browse )
> Server machines are administered together.
> So it is OK to have all server config together (esp file paths and network ports).
> This is stored in hadoop-default.xml or hadoop-mycluster.xml
> Client machines:
> there may be as many client machines as there are MapRed developers.
> the temp space for DFS needs to be writable by the active user.
> So it should be possible to select the client temp space directory for the machine and for the user.
> (The global /tmp is not an option as discussed elsewhere: partition may be full)
> Current situation: 
> Both the server and the clients have a copy of the server config: hadoop-default.xml
> But the XML property  "dfs.data.dir" is being used as a LOCAL directory path 
> on both the server machines (Data nodes) and the client machines.
> Effect:
> Exception in thread "main" java.io.IOException: No valid local directories in property: dfs.data.dir
>  at org.apache.hadoop.conf.Configuration.getFile(Configuration.java:286)
>  at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.newBackupFile(DFSClient.java:560)
>  ...
>  at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:267)
> Current Workaround:
> On the client use hadoop-site.xml to override dfs.data.dir
> One proposed solution:
> For the purpose of JobClient operations, use a different property in place of dfs.data.dir.
> (Ex: dfs.client.data.dir) 
> On the client, set this property in hadoop-site.xml so that it will override hadoop-default.xml 
> Another proposed solution:
> Handle the fact that the world is made of a federation of independant Hadoop systems.
> They can talk to each other (as peers) but they are administered separately.
> Each Hadoop system should have its own separate XML config file.
> Clients should be able to specify the Hadoop system they want to talk to.
> An advantage is that clients can then easily sync their local copy of a given Hadoop system config:
>  just pull its config file
> In this view of the world, a Job client is also a kind of independant (serverless) Hadoop system
> In this case the client config file may have its own dfs.data.dir, which is 
> separate from the dfs.data.dir in the server config file.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira