You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2007/12/02 06:58:43 UTC

[jira] Updated: (HADOOP-2185) Server ports: to roll or not to roll.

     [ https://issues.apache.org/jira/browse/HADOOP-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-2185:
-------------------------------------

    Status: Open  (was: Patch Available)

While running unit tests on trunk with this patch, I got a timeout for 

    [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
    [junit] Test org.apache.hadoop.dfs.TestHDFSServerPorts FAILED (timeout)

I will attach the stack trace to this JIRA.

> Server ports: to roll or not to roll.
> -------------------------------------
>
>                 Key: HADOOP-2185
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2185
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: conf, dfs, mapred
>    Affects Versions: 0.15.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.16.0
>
>         Attachments: FixedPorts.patch, FixedPorts2.patch
>
>
> Looked at the issues related to port rolling. My impression is that port rolling is required only for the unit tests to run.
> Even the name-node port should roll there, which we don't have now, in order to be able to start 2 cluster for testing say dist cp.
> For real clusters on the contrary port rolling is not desired and some times even prohibited.
> So we should have a way of to ban port rolling. My proposition is to
> # use ephemeral port 0 if port rolling is desired
> # if a specific port is specified then port rolling should not happen at all, meaning that a 
> server is either able or not able to start on that particular port.
> The desired port is specified via configuration parameters.
> - Name-node: fs.default.name = host:port
> - Data-node: dfs.datanode.port
> - Job-tracker: mapred.job.tracker = host:port
> - Task-tracker: mapred.task.tracker.report.bindAddress = host
>   Task-tracker currently does not have an option to specify port, it always uses the ephemeral port 0, 
>   and therefore I propose to add one.
> - Secondary node does not need a port to listen on.
> For info servers we have two sets of config variables *.info.bindAddress and *.info.port
> except for the task tracker, which calls them *.http.bindAddress and *.http.port instead of "info".
> With respect to the info servers I propose to completely eliminate the port parameters, and form 
> *.info.bindAddress = host:port
> Info servers should do the same thing, namely start or fail on the specified port if it is not 0,
> and start on any free port if it is ephemeral.
> For the task-tracker I would rename tasktracker.http.bindAddress to mapred.task.tracker.info.bindAddress
> For the data-node the info dfs.datanode.info.bindAddress should be included into the default config.
> Is there a reason why it is not there?
> This is the summary of proposed changes:
> || Server || current name = value || proposed name = value ||
> | NameNode | fs.default.name = host:port | same |
> | | dfs.info.bindAddress = host | dfs.http.bindAddress = host:port |
> | DataNode | dfs.datanode.bindAddress = host | dfs.datanode.bindAddress = host:port |
> | | dfs.datanode.port = port | eliminate |
> | | dfs.datanode.info.bindAddress = host | dfs.datanode.http.bindAddress = host:port |
> | | dfs.datanode.info.port = port | eliminate |
> | JobTracker | mapred.job.tracker = host:port | same |
> | | mapred.job.tracker.info.bindAddress = host | mapred.job.tracker.http.bindAddress = host:port |
> | | mapred.job.tracker.info.port = port | eliminate |
> | TaskTracker | mapred.task.tracker.report.bindAddress = host | mapred.task.tracker.report.bindAddress = host:port |
> | | tasktracker.http.bindAddress = host | mapred.task.tracker.http.bindAddress = host:port |
> | | tasktracker.http.port = port | eliminate |
> | SecondaryNameNode | dfs.secondary.info.bindAddress = host | dfs.secondary.http.bindAddress = host:port |
> | | dfs.secondary.info.port = port | eliminate |
> Do we also want to set some uniform naming convention for the configuration variables?
> Like having hdfs instead of dfs, or info instead of http, or systematically using either datanode
> or data.node would make that look better in my opinion.
> So these are all +*api*+ changes. I would +*really*+ like some feedback on this, especially from 
> people who deal with configuration issues on practice.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.