You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Saptarshi Guha <sa...@gmail.com> on 2010/01/09 18:44:07 UTC

Balancing a cluster when a new node is added

Hello,
I'm using Hadoop 0.20.1. I just added a new node to a 5 node
cluster(for a total of 6), there is already about 500GB across 5
nodes.
In order to distributed the data across the entire cluster (including
the new node) I ran

hadoop balancer
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left
To Move  Bytes Being Moved
The cluster is balanced. Exiting...
Balancing took 356.0 milliseconds

Clearly the cluster is not balanced, but how do I force it to be so?

Q2. On the DFS UI website, when I click on the existing nodes to see
data, I can, but when I click on the new node, i can't connect.
Does this happen when there are no files? The datanode log for this
machine does not show any errors. I have managed to copy a small file
this new machine (from the new machine, so the file is stored on this
machines section of the DFS)


2010-01-09 12:20:57,681 INFO org.apache.hadoop.http.HttpServer:
listener.getLocalPort() returned 50075
webServer.getConnectors()[0].getLocalPort() returned 50075
2010-01-09 12:20:57,681 INFO org.apache.hadoop.http.HttpServer: Jetty
bound to port 50075
2010-01-09 12:20:57,681 INFO org.mortbay.log: jetty-6.1.14
2010-01-09 12:21:02,148 INFO org.mortbay.log: Started
SelectChannelConnector@0.0.0.0:50075
2010-01-09 12:21:02,152 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=DataNode, sessionId=null
2010-01-09 12:21:02,165 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=DataNode, port=50020
2010-01-09 12:21:02,167 INFO org.apache.hadoop.ipc.Server: IPC Server
Responder: starting
2010-01-09 12:21:02,168 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 50020: starting
2010-01-09 12:21:02,168 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 50020: starting
2010-01-09 12:21:02,168 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration =
DatanodeRegistration(altair.stat.purdue.edu:50010, storageID=,
infoPort=50075, ipcPort=50020)
2010-01-09 12:21:02,169 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 50020: starting
2010-01-09 12:21:02,170 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 50020: starting
2010-01-09 12:21:02,173 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: New storage id
DS-1908504044-127.0.0.1-50010-1263057662169 is assigned to data-node
128.210.141.105:50010
2010-01-09 12:21:02,173 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(X.X.X.X:50010,
storageID=DS-1908504044-127.0.0.1-50010-1263057662169, infoPort=50075,
ipcPort=50020)In DataNode.run, data =
FSDataset{dirpath='/ln/meraki/hdfs/dfs/data/current'}
2010-01-09 12:21:02,173 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: using
BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec
2010-01-09 12:21:02,187 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0
blocks got processed in 2 msecs
2010-01-09 12:21:02,188 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic
block scanner.

Re: Balancing a cluster when a new node is added

Posted by Saptarshi Guha <sa...@gmail.com>.

Hi,
Yes, the config files are the same. I checked the namenode log
for eac of the 5 pre-existing nodes I get something like

2010-01-10 12:32:33,921 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.registerDatanode: node registration from
X.Y.Z.D:50010 storage DS-1908504044-127.0.0.1-50010-1263057662169

but not for the newly added node.
I just added the machine to the slaves file and restarted the cluster.
Is there something else I should do to the new node?

Regards
Saptarshi

On Sun, Jan 10, 2010 at 4:11 AM, Eli Collins <el...@cloudera.com> wrote:
> Have you verified this new DNs Hadoop configuration files are the same
> as the others? Do you see any errors in the NN when restarting HDFS on
> this new node?
>
> Thanks,
> Eli
>
> On Sat, Jan 9, 2010 at 9:44 AM, Saptarshi Guha <sa...@gmail.com> wrote:
>> Hello,
>> I'm using Hadoop 0.20.1. I just added a new node to a 5 node
>> cluster(for a total of 6), there is already about 500GB across 5
>> nodes.
>> In order to distributed the data across the entire cluster (including
>> the new node) I ran
>>
>> hadoop balancer
>> Time Stamp               Iteration#  Bytes Already Moved  Bytes Left
>> To Move  Bytes Being Moved
>> The cluster is balanced. Exiting...
>> Balancing took 356.0 milliseconds
>>
>> Clearly the cluster is not balanced, but how do I force it to be so?
>>
>> Q2. On the DFS UI website, when I click on the existing nodes to see
>> data, I can, but when I click on the new node, i can't connect.
>> Does this happen when there are no files? The datanode log for this
>> machine does not show any errors. I have managed to copy a small file
>> this new machine (from the new machine, so the file is stored on this
>> machines section of the DFS)
>>
>>
>> 2010-01-09 12:20:57,681 INFO org.apache.hadoop.http.HttpServer:
>> listener.getLocalPort() returned 50075
>> webServer.getConnectors()[0].getLocalPort() returned 50075
>> 2010-01-09 12:20:57,681 INFO org.apache.hadoop.http.HttpServer: Jetty
>> bound to port 50075
>> 2010-01-09 12:20:57,681 INFO org.mortbay.log: jetty-6.1.14
>> 2010-01-09 12:21:02,148 INFO org.mortbay.log: Started
>> SelectChannelConnector@0.0.0.0:50075
>> 2010-01-09 12:21:02,152 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>> Initializing JVM Metrics with processName=DataNode, sessionId=null
>> 2010-01-09 12:21:02,165 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
>> Initializing RPC Metrics with hostName=DataNode, port=50020
>> 2010-01-09 12:21:02,167 INFO org.apache.hadoop.ipc.Server: IPC Server
>> Responder: starting
>> 2010-01-09 12:21:02,168 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 0 on 50020: starting
>> 2010-01-09 12:21:02,168 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 1 on 50020: starting
>> 2010-01-09 12:21:02,168 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration =
>> DatanodeRegistration(altair.stat.purdue.edu:50010, storageID=,
>> infoPort=50075, ipcPort=50020)
>> 2010-01-09 12:21:02,169 INFO org.apache.hadoop.ipc.Server: IPC Server
>> listener on 50020: starting
>> 2010-01-09 12:21:02,170 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 2 on 50020: starting
>> 2010-01-09 12:21:02,173 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: New storage id
>> DS-1908504044-127.0.0.1-50010-1263057662169 is assigned to data-node
>> 128.210.141.105:50010
>> 2010-01-09 12:21:02,173 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> DatanodeRegistration(X.X.X.X:50010,
>> storageID=DS-1908504044-127.0.0.1-50010-1263057662169, infoPort=50075,
>> ipcPort=50020)In DataNode.run, data =
>> FSDataset{dirpath='/ln/meraki/hdfs/dfs/data/current'}
>> 2010-01-09 12:21:02,173 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: using
>> BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec
>> 2010-01-09 12:21:02,187 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0
>> blocks got processed in 2 msecs
>> 2010-01-09 12:21:02,188 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic
>> block scanner.
>>
>

Re: Balancing a cluster when a new node is added

Posted by Eli Collins <el...@cloudera.com>.

Have you verified this new DNs Hadoop configuration files are the same
as the others? Do you see any errors in the NN when restarting HDFS on
this new node?

Thanks,
Eli

On Sat, Jan 9, 2010 at 9:44 AM, Saptarshi Guha <sa...@gmail.com> wrote:
> Hello,
> I'm using Hadoop 0.20.1. I just added a new node to a 5 node
> cluster(for a total of 6), there is already about 500GB across 5
> nodes.
> In order to distributed the data across the entire cluster (including
> the new node) I ran
>
> hadoop balancer
> Time Stamp               Iteration#  Bytes Already Moved  Bytes Left
> To Move  Bytes Being Moved
> The cluster is balanced. Exiting...
> Balancing took 356.0 milliseconds
>
> Clearly the cluster is not balanced, but how do I force it to be so?
>
> Q2. On the DFS UI website, when I click on the existing nodes to see
> data, I can, but when I click on the new node, i can't connect.
> Does this happen when there are no files? The datanode log for this
> machine does not show any errors. I have managed to copy a small file
> this new machine (from the new machine, so the file is stored on this
> machines section of the DFS)
>
>
> 2010-01-09 12:20:57,681 INFO org.apache.hadoop.http.HttpServer:
> listener.getLocalPort() returned 50075
> webServer.getConnectors()[0].getLocalPort() returned 50075
> 2010-01-09 12:20:57,681 INFO org.apache.hadoop.http.HttpServer: Jetty
> bound to port 50075
> 2010-01-09 12:20:57,681 INFO org.mortbay.log: jetty-6.1.14
> 2010-01-09 12:21:02,148 INFO org.mortbay.log: Started
> SelectChannelConnector@0.0.0.0:50075
> 2010-01-09 12:21:02,152 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=DataNode, sessionId=null
> 2010-01-09 12:21:02,165 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
> Initializing RPC Metrics with hostName=DataNode, port=50020
> 2010-01-09 12:21:02,167 INFO org.apache.hadoop.ipc.Server: IPC Server
> Responder: starting
> 2010-01-09 12:21:02,168 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 50020: starting
> 2010-01-09 12:21:02,168 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 1 on 50020: starting
> 2010-01-09 12:21:02,168 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration =
> DatanodeRegistration(altair.stat.purdue.edu:50010, storageID=,
> infoPort=50075, ipcPort=50020)
> 2010-01-09 12:21:02,169 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 50020: starting
> 2010-01-09 12:21:02,170 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 2 on 50020: starting
> 2010-01-09 12:21:02,173 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: New storage id
> DS-1908504044-127.0.0.1-50010-1263057662169 is assigned to data-node
> 128.210.141.105:50010
> 2010-01-09 12:21:02,173 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(X.X.X.X:50010,
> storageID=DS-1908504044-127.0.0.1-50010-1263057662169, infoPort=50075,
> ipcPort=50020)In DataNode.run, data =
> FSDataset{dirpath='/ln/meraki/hdfs/dfs/data/current'}
> 2010-01-09 12:21:02,173 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: using
> BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec
> 2010-01-09 12:21:02,187 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0
> blocks got processed in 2 msecs
> 2010-01-09 12:21:02,188 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic
> block scanner.
>

Re: Balancing a cluster when a new node is added

Posted by Saptarshi Guha <sa...@gmail.com>.

I think that port itself is blocked. I'll contact the sysadmins.
Thanks

On Mon, Jan 11, 2010 at 7:46 PM, Saptarshi Guha
<sa...@gmail.com> wrote:
>> What happens if you give the balancer command a threshold?
>>
>>
> So I gave a threshold, (0.20) and it started to run and I got several
> errors like this
>
> 10/01/11 19:43:56 WARN balancer.Balancer: Error moving block
> 795170313073485718 from spica:50010 to altair:50010 through
> 128.210.141.89:50010: No route to host
>
> (altair is the node i added).
> I don't know why there isn't a route to the host, since I can start
> the node automatically(via ssh), as seen below the report shows it to
> be there. Is  a no route to host possible if the that particular 50010
> port is closed?
>
>>
>>> Q2. On the DFS UI website, when I click on the existing nodes to see
>>> data, I can, but when I click on the new node, i can't connect.
>>> Does this happen when there are no files? The datanode log for this
>>> machine does not show any errors. I have managed to copy a small file
>>> this new machine (from the new machine, so the file is stored on this
>>> machines section of the DFS)
>>
>>
>> Does the namenode actually recognize the new node?  What does dfsadmin
>> -report tell you?
>
> The report shows it to be present,
> Name: A.B.C.D:50010
> Decommission Status : Normal
> Configured Capacity: 1056894091264 (984.31 GB)
> DFS Used: 524288 (512 KB)
> Non DFS Used: 55336439808 (51.54 GB)
> DFS Remaining: 1001557127168(932.77 GB)
> DFS Used%: 0%
> DFS Remaining%: 94.76%
> Last contact: Mon Jan 11 19:40:35 EST 2010
>
>> Are you using a dfs.hosts (aka include) file?  Is it
>> listed?  Are you using a dfs.hosts.exclude file?  Is it listed there on
>> accident?
>>
> No dfs.hosts, nor excludes. I stopped the cluster (stop-dfs.sh) added
> the machine(called altair) to the cluster(in the slaves file) and
> bought it back up.
>
>
>>
>>
>

Re: Balancing a cluster when a new node is added

Posted by Saptarshi Guha <sa...@gmail.com>.

> What happens if you give the balancer command a threshold?
>
>
So I gave a threshold, (0.20) and it started to run and I got several
errors like this

10/01/11 19:43:56 WARN balancer.Balancer: Error moving block
795170313073485718 from spica:50010 to altair:50010 through
128.210.141.89:50010: No route to host

(altair is the node i added).
I don't know why there isn't a route to the host, since I can start
the node automatically(via ssh), as seen below the report shows it to
be there. Is  a no route to host possible if the that particular 50010
port is closed?

>
>> Q2. On the DFS UI website, when I click on the existing nodes to see
>> data, I can, but when I click on the new node, i can't connect.
>> Does this happen when there are no files? The datanode log for this
>> machine does not show any errors. I have managed to copy a small file
>> this new machine (from the new machine, so the file is stored on this
>> machines section of the DFS)
>
>
> Does the namenode actually recognize the new node?  What does dfsadmin
> -report tell you?

The report shows it to be present,
Name: A.B.C.D:50010
Decommission Status : Normal
Configured Capacity: 1056894091264 (984.31 GB)
DFS Used: 524288 (512 KB)
Non DFS Used: 55336439808 (51.54 GB)
DFS Remaining: 1001557127168(932.77 GB)
DFS Used%: 0%
DFS Remaining%: 94.76%
Last contact: Mon Jan 11 19:40:35 EST 2010

> Are you using a dfs.hosts (aka include) file?  Is it
> listed?  Are you using a dfs.hosts.exclude file?  Is it listed there on
> accident?
>
No dfs.hosts, nor excludes. I stopped the cluster (stop-dfs.sh) added
the machine(called altair) to the cluster(in the slaves file) and
bought it back up.


>
>

Re: Balancing a cluster when a new node is added

Posted by Allen Wittenauer <aw...@linkedin.com>.



On 1/9/10 9:44 AM, "Saptarshi Guha" <sa...@gmail.com> wrote:
> hadoop balancer
> Time Stamp               Iteration#  Bytes Already Moved  Bytes Left
> To Move  Bytes Being Moved
> The cluster is balanced. Exiting...
> Balancing took 356.0 milliseconds
> 
> Clearly the cluster is not balanced, but how do I force it to be so?

What happens if you give the balancer command a threshold?



> Q2. On the DFS UI website, when I click on the existing nodes to see
> data, I can, but when I click on the new node, i can't connect.
> Does this happen when there are no files? The datanode log for this
> machine does not show any errors. I have managed to copy a small file
> this new machine (from the new machine, so the file is stored on this
> machines section of the DFS)


Does the namenode actually recognize the new node?  What does dfsadmin
-report tell you?  Are you using a dfs.hosts (aka include) file?  Is it
listed?  Are you using a dfs.hosts.exclude file?  Is it listed there on
accident?