You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by divye sheth <di...@gmail.com> on 2014/03/04 14:54:40 UTC

Question on DFS Balancing

Hi,

I am new to the mailing list.

I am using Hadoop 0.20.2 with an append r1056497 version. The question I
have is related to balancing. I have a 5 datanode cluster and each node has
2 disks attached to it. The second disk was added when the first disk was
reaching its capacity.

Now the scenario that I am facing is, when the new disk was added hadoop
automatically moved over some data to the new disk. But over the time I
notice that data is no longer being written to the second disk. I have also
faced an issue on the datanode where the first disk had 100% utilization.

How can I overcome such scenario, is it not hadoop's job to balance the
disk utilization between multiple disks on single datanode?

Thanks
Divye Sheth

Re: Question on DFS Balancing

Posted by Azuryy Yu <az...@gmail.com>.
It don't need any downtime. just like Balancer, but this tool move blocks
peer to peer. you specified source node and destination node. then start.


On Wed, Mar 5, 2014 at 5:12 PM, divye sheth <di...@gmail.com> wrote:

> Does this require any downtime? I guess it should and any other
> precautions that I should take?
> Thanks Azuryy.
>
>
> On Wed, Mar 5, 2014 at 2:19 PM, Azuryy Yu <az...@gmail.com> wrote:
>
>> you can write a simple tool to move blocks peer to peer. I had such tool
>> before, but I cannot find it now.
>>
>> background: our cluster is not balanced, load balancer is very slow, so i
>> wrote this tool to move blocks from one node to another node.
>>
>>
>> On Wed, Mar 5, 2014 at 4:06 PM, divye sheth <di...@gmail.com> wrote:
>>
>>> I wont be in a position to fix that depending on HDFS-1804 as we are
>>> upgrading to CDH4 in the coming month. Just wanted a short term solution. I
>>> have read somewhere that manual movement of the blocks would help. Could
>>> some one guide me to the exact steps or precautions I should take while
>>> doing this? Data loss is a NO NO for me.
>>>
>>> Thanks
>>> Divye Sheth
>>>
>>>
>>> On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu <az...@gmail.com> wrote:
>>>
>>>> Hi,
>>>> That probably break something if you apply the patch from 2.x to
>>>> 0.20.x, but it depends on.
>>>>
>>>> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it
>>>> by yourself based on HDFS-1804.
>>>>
>>>>
>>>>
>>>> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com>wrote:
>>>>
>>>>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
>>>>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
>>>>> the short term to balance the disk utilization? The patch in the Jira, if
>>>>> applied to the version that I am using, will it break anything?
>>>>>
>>>>> Thanks
>>>>> Divye Sheth
>>>>>
>>>>>
>>>>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>>>>
>>>>>> You're probably looking for
>>>>>> https://issues.apache.org/jira/browse/HDFS-1804
>>>>>>
>>>>>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com>
>>>>>> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I am new to the mailing list.
>>>>>> >
>>>>>> > I am using Hadoop 0.20.2 with an append r1056497 version. The
>>>>>> question I
>>>>>> > have is related to balancing. I have a 5 datanode cluster and each
>>>>>> node has
>>>>>> > 2 disks attached to it. The second disk was added when the first
>>>>>> disk was
>>>>>> > reaching its capacity.
>>>>>> >
>>>>>> > Now the scenario that I am facing is, when the new disk was added
>>>>>> hadoop
>>>>>> > automatically moved over some data to the new disk. But over the
>>>>>> time I
>>>>>> > notice that data is no longer being written to the second disk. I
>>>>>> have also
>>>>>> > faced an issue on the datanode where the first disk had 100%
>>>>>> utilization.
>>>>>> >
>>>>>> > How can I overcome such scenario, is it not hadoop's job to balance
>>>>>> the disk
>>>>>> > utilization between multiple disks on single datanode?
>>>>>> >
>>>>>> > Thanks
>>>>>> > Divye Sheth
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Harsh J
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Question on DFS Balancing

Posted by Azuryy Yu <az...@gmail.com>.
It don't need any downtime. just like Balancer, but this tool move blocks
peer to peer. you specified source node and destination node. then start.


On Wed, Mar 5, 2014 at 5:12 PM, divye sheth <di...@gmail.com> wrote:

> Does this require any downtime? I guess it should and any other
> precautions that I should take?
> Thanks Azuryy.
>
>
> On Wed, Mar 5, 2014 at 2:19 PM, Azuryy Yu <az...@gmail.com> wrote:
>
>> you can write a simple tool to move blocks peer to peer. I had such tool
>> before, but I cannot find it now.
>>
>> background: our cluster is not balanced, load balancer is very slow, so i
>> wrote this tool to move blocks from one node to another node.
>>
>>
>> On Wed, Mar 5, 2014 at 4:06 PM, divye sheth <di...@gmail.com> wrote:
>>
>>> I wont be in a position to fix that depending on HDFS-1804 as we are
>>> upgrading to CDH4 in the coming month. Just wanted a short term solution. I
>>> have read somewhere that manual movement of the blocks would help. Could
>>> some one guide me to the exact steps or precautions I should take while
>>> doing this? Data loss is a NO NO for me.
>>>
>>> Thanks
>>> Divye Sheth
>>>
>>>
>>> On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu <az...@gmail.com> wrote:
>>>
>>>> Hi,
>>>> That probably break something if you apply the patch from 2.x to
>>>> 0.20.x, but it depends on.
>>>>
>>>> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it
>>>> by yourself based on HDFS-1804.
>>>>
>>>>
>>>>
>>>> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com>wrote:
>>>>
>>>>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
>>>>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
>>>>> the short term to balance the disk utilization? The patch in the Jira, if
>>>>> applied to the version that I am using, will it break anything?
>>>>>
>>>>> Thanks
>>>>> Divye Sheth
>>>>>
>>>>>
>>>>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>>>>
>>>>>> You're probably looking for
>>>>>> https://issues.apache.org/jira/browse/HDFS-1804
>>>>>>
>>>>>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com>
>>>>>> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I am new to the mailing list.
>>>>>> >
>>>>>> > I am using Hadoop 0.20.2 with an append r1056497 version. The
>>>>>> question I
>>>>>> > have is related to balancing. I have a 5 datanode cluster and each
>>>>>> node has
>>>>>> > 2 disks attached to it. The second disk was added when the first
>>>>>> disk was
>>>>>> > reaching its capacity.
>>>>>> >
>>>>>> > Now the scenario that I am facing is, when the new disk was added
>>>>>> hadoop
>>>>>> > automatically moved over some data to the new disk. But over the
>>>>>> time I
>>>>>> > notice that data is no longer being written to the second disk. I
>>>>>> have also
>>>>>> > faced an issue on the datanode where the first disk had 100%
>>>>>> utilization.
>>>>>> >
>>>>>> > How can I overcome such scenario, is it not hadoop's job to balance
>>>>>> the disk
>>>>>> > utilization between multiple disks on single datanode?
>>>>>> >
>>>>>> > Thanks
>>>>>> > Divye Sheth
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Harsh J
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Question on DFS Balancing

Posted by Azuryy Yu <az...@gmail.com>.
It don't need any downtime. just like Balancer, but this tool move blocks
peer to peer. you specified source node and destination node. then start.


On Wed, Mar 5, 2014 at 5:12 PM, divye sheth <di...@gmail.com> wrote:

> Does this require any downtime? I guess it should and any other
> precautions that I should take?
> Thanks Azuryy.
>
>
> On Wed, Mar 5, 2014 at 2:19 PM, Azuryy Yu <az...@gmail.com> wrote:
>
>> you can write a simple tool to move blocks peer to peer. I had such tool
>> before, but I cannot find it now.
>>
>> background: our cluster is not balanced, load balancer is very slow, so i
>> wrote this tool to move blocks from one node to another node.
>>
>>
>> On Wed, Mar 5, 2014 at 4:06 PM, divye sheth <di...@gmail.com> wrote:
>>
>>> I wont be in a position to fix that depending on HDFS-1804 as we are
>>> upgrading to CDH4 in the coming month. Just wanted a short term solution. I
>>> have read somewhere that manual movement of the blocks would help. Could
>>> some one guide me to the exact steps or precautions I should take while
>>> doing this? Data loss is a NO NO for me.
>>>
>>> Thanks
>>> Divye Sheth
>>>
>>>
>>> On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu <az...@gmail.com> wrote:
>>>
>>>> Hi,
>>>> That probably break something if you apply the patch from 2.x to
>>>> 0.20.x, but it depends on.
>>>>
>>>> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it
>>>> by yourself based on HDFS-1804.
>>>>
>>>>
>>>>
>>>> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com>wrote:
>>>>
>>>>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
>>>>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
>>>>> the short term to balance the disk utilization? The patch in the Jira, if
>>>>> applied to the version that I am using, will it break anything?
>>>>>
>>>>> Thanks
>>>>> Divye Sheth
>>>>>
>>>>>
>>>>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>>>>
>>>>>> You're probably looking for
>>>>>> https://issues.apache.org/jira/browse/HDFS-1804
>>>>>>
>>>>>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com>
>>>>>> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I am new to the mailing list.
>>>>>> >
>>>>>> > I am using Hadoop 0.20.2 with an append r1056497 version. The
>>>>>> question I
>>>>>> > have is related to balancing. I have a 5 datanode cluster and each
>>>>>> node has
>>>>>> > 2 disks attached to it. The second disk was added when the first
>>>>>> disk was
>>>>>> > reaching its capacity.
>>>>>> >
>>>>>> > Now the scenario that I am facing is, when the new disk was added
>>>>>> hadoop
>>>>>> > automatically moved over some data to the new disk. But over the
>>>>>> time I
>>>>>> > notice that data is no longer being written to the second disk. I
>>>>>> have also
>>>>>> > faced an issue on the datanode where the first disk had 100%
>>>>>> utilization.
>>>>>> >
>>>>>> > How can I overcome such scenario, is it not hadoop's job to balance
>>>>>> the disk
>>>>>> > utilization between multiple disks on single datanode?
>>>>>> >
>>>>>> > Thanks
>>>>>> > Divye Sheth
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Harsh J
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Question on DFS Balancing

Posted by Azuryy Yu <az...@gmail.com>.
It don't need any downtime. just like Balancer, but this tool move blocks
peer to peer. you specified source node and destination node. then start.


On Wed, Mar 5, 2014 at 5:12 PM, divye sheth <di...@gmail.com> wrote:

> Does this require any downtime? I guess it should and any other
> precautions that I should take?
> Thanks Azuryy.
>
>
> On Wed, Mar 5, 2014 at 2:19 PM, Azuryy Yu <az...@gmail.com> wrote:
>
>> you can write a simple tool to move blocks peer to peer. I had such tool
>> before, but I cannot find it now.
>>
>> background: our cluster is not balanced, load balancer is very slow, so i
>> wrote this tool to move blocks from one node to another node.
>>
>>
>> On Wed, Mar 5, 2014 at 4:06 PM, divye sheth <di...@gmail.com> wrote:
>>
>>> I wont be in a position to fix that depending on HDFS-1804 as we are
>>> upgrading to CDH4 in the coming month. Just wanted a short term solution. I
>>> have read somewhere that manual movement of the blocks would help. Could
>>> some one guide me to the exact steps or precautions I should take while
>>> doing this? Data loss is a NO NO for me.
>>>
>>> Thanks
>>> Divye Sheth
>>>
>>>
>>> On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu <az...@gmail.com> wrote:
>>>
>>>> Hi,
>>>> That probably break something if you apply the patch from 2.x to
>>>> 0.20.x, but it depends on.
>>>>
>>>> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it
>>>> by yourself based on HDFS-1804.
>>>>
>>>>
>>>>
>>>> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com>wrote:
>>>>
>>>>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
>>>>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
>>>>> the short term to balance the disk utilization? The patch in the Jira, if
>>>>> applied to the version that I am using, will it break anything?
>>>>>
>>>>> Thanks
>>>>> Divye Sheth
>>>>>
>>>>>
>>>>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>>>>
>>>>>> You're probably looking for
>>>>>> https://issues.apache.org/jira/browse/HDFS-1804
>>>>>>
>>>>>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com>
>>>>>> wrote:
>>>>>> > Hi,
>>>>>> >
>>>>>> > I am new to the mailing list.
>>>>>> >
>>>>>> > I am using Hadoop 0.20.2 with an append r1056497 version. The
>>>>>> question I
>>>>>> > have is related to balancing. I have a 5 datanode cluster and each
>>>>>> node has
>>>>>> > 2 disks attached to it. The second disk was added when the first
>>>>>> disk was
>>>>>> > reaching its capacity.
>>>>>> >
>>>>>> > Now the scenario that I am facing is, when the new disk was added
>>>>>> hadoop
>>>>>> > automatically moved over some data to the new disk. But over the
>>>>>> time I
>>>>>> > notice that data is no longer being written to the second disk. I
>>>>>> have also
>>>>>> > faced an issue on the datanode where the first disk had 100%
>>>>>> utilization.
>>>>>> >
>>>>>> > How can I overcome such scenario, is it not hadoop's job to balance
>>>>>> the disk
>>>>>> > utilization between multiple disks on single datanode?
>>>>>> >
>>>>>> > Thanks
>>>>>> > Divye Sheth
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Harsh J
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Question on DFS Balancing

Posted by divye sheth <di...@gmail.com>.
Does this require any downtime? I guess it should and any other precautions
that I should take?
Thanks Azuryy.


On Wed, Mar 5, 2014 at 2:19 PM, Azuryy Yu <az...@gmail.com> wrote:

> you can write a simple tool to move blocks peer to peer. I had such tool
> before, but I cannot find it now.
>
> background: our cluster is not balanced, load balancer is very slow, so i
> wrote this tool to move blocks from one node to another node.
>
>
> On Wed, Mar 5, 2014 at 4:06 PM, divye sheth <di...@gmail.com> wrote:
>
>> I wont be in a position to fix that depending on HDFS-1804 as we are
>> upgrading to CDH4 in the coming month. Just wanted a short term solution. I
>> have read somewhere that manual movement of the blocks would help. Could
>> some one guide me to the exact steps or precautions I should take while
>> doing this? Data loss is a NO NO for me.
>>
>> Thanks
>> Divye Sheth
>>
>>
>> On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu <az...@gmail.com> wrote:
>>
>>> Hi,
>>> That probably break something if you apply the patch from 2.x to 0.20.x,
>>> but it depends on.
>>>
>>> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it
>>> by yourself based on HDFS-1804.
>>>
>>>
>>>
>>> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com>wrote:
>>>
>>>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
>>>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
>>>> the short term to balance the disk utilization? The patch in the Jira, if
>>>> applied to the version that I am using, will it break anything?
>>>>
>>>> Thanks
>>>> Divye Sheth
>>>>
>>>>
>>>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>>> You're probably looking for
>>>>> https://issues.apache.org/jira/browse/HDFS-1804
>>>>>
>>>>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com>
>>>>> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I am new to the mailing list.
>>>>> >
>>>>> > I am using Hadoop 0.20.2 with an append r1056497 version. The
>>>>> question I
>>>>> > have is related to balancing. I have a 5 datanode cluster and each
>>>>> node has
>>>>> > 2 disks attached to it. The second disk was added when the first
>>>>> disk was
>>>>> > reaching its capacity.
>>>>> >
>>>>> > Now the scenario that I am facing is, when the new disk was added
>>>>> hadoop
>>>>> > automatically moved over some data to the new disk. But over the
>>>>> time I
>>>>> > notice that data is no longer being written to the second disk. I
>>>>> have also
>>>>> > faced an issue on the datanode where the first disk had 100%
>>>>> utilization.
>>>>> >
>>>>> > How can I overcome such scenario, is it not hadoop's job to balance
>>>>> the disk
>>>>> > utilization between multiple disks on single datanode?
>>>>> >
>>>>> > Thanks
>>>>> > Divye Sheth
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Question on DFS Balancing

Posted by divye sheth <di...@gmail.com>.
Does this require any downtime? I guess it should and any other precautions
that I should take?
Thanks Azuryy.


On Wed, Mar 5, 2014 at 2:19 PM, Azuryy Yu <az...@gmail.com> wrote:

> you can write a simple tool to move blocks peer to peer. I had such tool
> before, but I cannot find it now.
>
> background: our cluster is not balanced, load balancer is very slow, so i
> wrote this tool to move blocks from one node to another node.
>
>
> On Wed, Mar 5, 2014 at 4:06 PM, divye sheth <di...@gmail.com> wrote:
>
>> I wont be in a position to fix that depending on HDFS-1804 as we are
>> upgrading to CDH4 in the coming month. Just wanted a short term solution. I
>> have read somewhere that manual movement of the blocks would help. Could
>> some one guide me to the exact steps or precautions I should take while
>> doing this? Data loss is a NO NO for me.
>>
>> Thanks
>> Divye Sheth
>>
>>
>> On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu <az...@gmail.com> wrote:
>>
>>> Hi,
>>> That probably break something if you apply the patch from 2.x to 0.20.x,
>>> but it depends on.
>>>
>>> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it
>>> by yourself based on HDFS-1804.
>>>
>>>
>>>
>>> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com>wrote:
>>>
>>>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
>>>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
>>>> the short term to balance the disk utilization? The patch in the Jira, if
>>>> applied to the version that I am using, will it break anything?
>>>>
>>>> Thanks
>>>> Divye Sheth
>>>>
>>>>
>>>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>>> You're probably looking for
>>>>> https://issues.apache.org/jira/browse/HDFS-1804
>>>>>
>>>>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com>
>>>>> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I am new to the mailing list.
>>>>> >
>>>>> > I am using Hadoop 0.20.2 with an append r1056497 version. The
>>>>> question I
>>>>> > have is related to balancing. I have a 5 datanode cluster and each
>>>>> node has
>>>>> > 2 disks attached to it. The second disk was added when the first
>>>>> disk was
>>>>> > reaching its capacity.
>>>>> >
>>>>> > Now the scenario that I am facing is, when the new disk was added
>>>>> hadoop
>>>>> > automatically moved over some data to the new disk. But over the
>>>>> time I
>>>>> > notice that data is no longer being written to the second disk. I
>>>>> have also
>>>>> > faced an issue on the datanode where the first disk had 100%
>>>>> utilization.
>>>>> >
>>>>> > How can I overcome such scenario, is it not hadoop's job to balance
>>>>> the disk
>>>>> > utilization between multiple disks on single datanode?
>>>>> >
>>>>> > Thanks
>>>>> > Divye Sheth
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Question on DFS Balancing

Posted by divye sheth <di...@gmail.com>.
Does this require any downtime? I guess it should and any other precautions
that I should take?
Thanks Azuryy.


On Wed, Mar 5, 2014 at 2:19 PM, Azuryy Yu <az...@gmail.com> wrote:

> you can write a simple tool to move blocks peer to peer. I had such tool
> before, but I cannot find it now.
>
> background: our cluster is not balanced, load balancer is very slow, so i
> wrote this tool to move blocks from one node to another node.
>
>
> On Wed, Mar 5, 2014 at 4:06 PM, divye sheth <di...@gmail.com> wrote:
>
>> I wont be in a position to fix that depending on HDFS-1804 as we are
>> upgrading to CDH4 in the coming month. Just wanted a short term solution. I
>> have read somewhere that manual movement of the blocks would help. Could
>> some one guide me to the exact steps or precautions I should take while
>> doing this? Data loss is a NO NO for me.
>>
>> Thanks
>> Divye Sheth
>>
>>
>> On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu <az...@gmail.com> wrote:
>>
>>> Hi,
>>> That probably break something if you apply the patch from 2.x to 0.20.x,
>>> but it depends on.
>>>
>>> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it
>>> by yourself based on HDFS-1804.
>>>
>>>
>>>
>>> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com>wrote:
>>>
>>>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
>>>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
>>>> the short term to balance the disk utilization? The patch in the Jira, if
>>>> applied to the version that I am using, will it break anything?
>>>>
>>>> Thanks
>>>> Divye Sheth
>>>>
>>>>
>>>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>>> You're probably looking for
>>>>> https://issues.apache.org/jira/browse/HDFS-1804
>>>>>
>>>>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com>
>>>>> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I am new to the mailing list.
>>>>> >
>>>>> > I am using Hadoop 0.20.2 with an append r1056497 version. The
>>>>> question I
>>>>> > have is related to balancing. I have a 5 datanode cluster and each
>>>>> node has
>>>>> > 2 disks attached to it. The second disk was added when the first
>>>>> disk was
>>>>> > reaching its capacity.
>>>>> >
>>>>> > Now the scenario that I am facing is, when the new disk was added
>>>>> hadoop
>>>>> > automatically moved over some data to the new disk. But over the
>>>>> time I
>>>>> > notice that data is no longer being written to the second disk. I
>>>>> have also
>>>>> > faced an issue on the datanode where the first disk had 100%
>>>>> utilization.
>>>>> >
>>>>> > How can I overcome such scenario, is it not hadoop's job to balance
>>>>> the disk
>>>>> > utilization between multiple disks on single datanode?
>>>>> >
>>>>> > Thanks
>>>>> > Divye Sheth
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Question on DFS Balancing

Posted by divye sheth <di...@gmail.com>.
Does this require any downtime? I guess it should and any other precautions
that I should take?
Thanks Azuryy.


On Wed, Mar 5, 2014 at 2:19 PM, Azuryy Yu <az...@gmail.com> wrote:

> you can write a simple tool to move blocks peer to peer. I had such tool
> before, but I cannot find it now.
>
> background: our cluster is not balanced, load balancer is very slow, so i
> wrote this tool to move blocks from one node to another node.
>
>
> On Wed, Mar 5, 2014 at 4:06 PM, divye sheth <di...@gmail.com> wrote:
>
>> I wont be in a position to fix that depending on HDFS-1804 as we are
>> upgrading to CDH4 in the coming month. Just wanted a short term solution. I
>> have read somewhere that manual movement of the blocks would help. Could
>> some one guide me to the exact steps or precautions I should take while
>> doing this? Data loss is a NO NO for me.
>>
>> Thanks
>> Divye Sheth
>>
>>
>> On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu <az...@gmail.com> wrote:
>>
>>> Hi,
>>> That probably break something if you apply the patch from 2.x to 0.20.x,
>>> but it depends on.
>>>
>>> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it
>>> by yourself based on HDFS-1804.
>>>
>>>
>>>
>>> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com>wrote:
>>>
>>>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
>>>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
>>>> the short term to balance the disk utilization? The patch in the Jira, if
>>>> applied to the version that I am using, will it break anything?
>>>>
>>>> Thanks
>>>> Divye Sheth
>>>>
>>>>
>>>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>>> You're probably looking for
>>>>> https://issues.apache.org/jira/browse/HDFS-1804
>>>>>
>>>>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com>
>>>>> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I am new to the mailing list.
>>>>> >
>>>>> > I am using Hadoop 0.20.2 with an append r1056497 version. The
>>>>> question I
>>>>> > have is related to balancing. I have a 5 datanode cluster and each
>>>>> node has
>>>>> > 2 disks attached to it. The second disk was added when the first
>>>>> disk was
>>>>> > reaching its capacity.
>>>>> >
>>>>> > Now the scenario that I am facing is, when the new disk was added
>>>>> hadoop
>>>>> > automatically moved over some data to the new disk. But over the
>>>>> time I
>>>>> > notice that data is no longer being written to the second disk. I
>>>>> have also
>>>>> > faced an issue on the datanode where the first disk had 100%
>>>>> utilization.
>>>>> >
>>>>> > How can I overcome such scenario, is it not hadoop's job to balance
>>>>> the disk
>>>>> > utilization between multiple disks on single datanode?
>>>>> >
>>>>> > Thanks
>>>>> > Divye Sheth
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Question on DFS Balancing

Posted by Azuryy Yu <az...@gmail.com>.
you can write a simple tool to move blocks peer to peer. I had such tool
before, but I cannot find it now.

background: our cluster is not balanced, load balancer is very slow, so i
wrote this tool to move blocks from one node to another node.


On Wed, Mar 5, 2014 at 4:06 PM, divye sheth <di...@gmail.com> wrote:

> I wont be in a position to fix that depending on HDFS-1804 as we are
> upgrading to CDH4 in the coming month. Just wanted a short term solution. I
> have read somewhere that manual movement of the blocks would help. Could
> some one guide me to the exact steps or precautions I should take while
> doing this? Data loss is a NO NO for me.
>
> Thanks
> Divye Sheth
>
>
> On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu <az...@gmail.com> wrote:
>
>> Hi,
>> That probably break something if you apply the patch from 2.x to 0.20.x,
>> but it depends on.
>>
>> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by
>> yourself based on HDFS-1804.
>>
>>
>>
>> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com> wrote:
>>
>>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
>>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
>>> the short term to balance the disk utilization? The patch in the Jira, if
>>> applied to the version that I am using, will it break anything?
>>>
>>> Thanks
>>> Divye Sheth
>>>
>>>
>>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>>> You're probably looking for
>>>> https://issues.apache.org/jira/browse/HDFS-1804
>>>>
>>>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com>
>>>> wrote:
>>>> > Hi,
>>>> >
>>>> > I am new to the mailing list.
>>>> >
>>>> > I am using Hadoop 0.20.2 with an append r1056497 version. The
>>>> question I
>>>> > have is related to balancing. I have a 5 datanode cluster and each
>>>> node has
>>>> > 2 disks attached to it. The second disk was added when the first disk
>>>> was
>>>> > reaching its capacity.
>>>> >
>>>> > Now the scenario that I am facing is, when the new disk was added
>>>> hadoop
>>>> > automatically moved over some data to the new disk. But over the time
>>>> I
>>>> > notice that data is no longer being written to the second disk. I
>>>> have also
>>>> > faced an issue on the datanode where the first disk had 100%
>>>> utilization.
>>>> >
>>>> > How can I overcome such scenario, is it not hadoop's job to balance
>>>> the disk
>>>> > utilization between multiple disks on single datanode?
>>>> >
>>>> > Thanks
>>>> > Divye Sheth
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>
>

Re: Question on DFS Balancing

Posted by Azuryy Yu <az...@gmail.com>.
you can write a simple tool to move blocks peer to peer. I had such tool
before, but I cannot find it now.

background: our cluster is not balanced, load balancer is very slow, so i
wrote this tool to move blocks from one node to another node.


On Wed, Mar 5, 2014 at 4:06 PM, divye sheth <di...@gmail.com> wrote:

> I wont be in a position to fix that depending on HDFS-1804 as we are
> upgrading to CDH4 in the coming month. Just wanted a short term solution. I
> have read somewhere that manual movement of the blocks would help. Could
> some one guide me to the exact steps or precautions I should take while
> doing this? Data loss is a NO NO for me.
>
> Thanks
> Divye Sheth
>
>
> On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu <az...@gmail.com> wrote:
>
>> Hi,
>> That probably break something if you apply the patch from 2.x to 0.20.x,
>> but it depends on.
>>
>> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by
>> yourself based on HDFS-1804.
>>
>>
>>
>> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com> wrote:
>>
>>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
>>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
>>> the short term to balance the disk utilization? The patch in the Jira, if
>>> applied to the version that I am using, will it break anything?
>>>
>>> Thanks
>>> Divye Sheth
>>>
>>>
>>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>>> You're probably looking for
>>>> https://issues.apache.org/jira/browse/HDFS-1804
>>>>
>>>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com>
>>>> wrote:
>>>> > Hi,
>>>> >
>>>> > I am new to the mailing list.
>>>> >
>>>> > I am using Hadoop 0.20.2 with an append r1056497 version. The
>>>> question I
>>>> > have is related to balancing. I have a 5 datanode cluster and each
>>>> node has
>>>> > 2 disks attached to it. The second disk was added when the first disk
>>>> was
>>>> > reaching its capacity.
>>>> >
>>>> > Now the scenario that I am facing is, when the new disk was added
>>>> hadoop
>>>> > automatically moved over some data to the new disk. But over the time
>>>> I
>>>> > notice that data is no longer being written to the second disk. I
>>>> have also
>>>> > faced an issue on the datanode where the first disk had 100%
>>>> utilization.
>>>> >
>>>> > How can I overcome such scenario, is it not hadoop's job to balance
>>>> the disk
>>>> > utilization between multiple disks on single datanode?
>>>> >
>>>> > Thanks
>>>> > Divye Sheth
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>
>

Re: Question on DFS Balancing

Posted by Azuryy Yu <az...@gmail.com>.
you can write a simple tool to move blocks peer to peer. I had such tool
before, but I cannot find it now.

background: our cluster is not balanced, load balancer is very slow, so i
wrote this tool to move blocks from one node to another node.


On Wed, Mar 5, 2014 at 4:06 PM, divye sheth <di...@gmail.com> wrote:

> I wont be in a position to fix that depending on HDFS-1804 as we are
> upgrading to CDH4 in the coming month. Just wanted a short term solution. I
> have read somewhere that manual movement of the blocks would help. Could
> some one guide me to the exact steps or precautions I should take while
> doing this? Data loss is a NO NO for me.
>
> Thanks
> Divye Sheth
>
>
> On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu <az...@gmail.com> wrote:
>
>> Hi,
>> That probably break something if you apply the patch from 2.x to 0.20.x,
>> but it depends on.
>>
>> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by
>> yourself based on HDFS-1804.
>>
>>
>>
>> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com> wrote:
>>
>>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
>>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
>>> the short term to balance the disk utilization? The patch in the Jira, if
>>> applied to the version that I am using, will it break anything?
>>>
>>> Thanks
>>> Divye Sheth
>>>
>>>
>>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>>> You're probably looking for
>>>> https://issues.apache.org/jira/browse/HDFS-1804
>>>>
>>>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com>
>>>> wrote:
>>>> > Hi,
>>>> >
>>>> > I am new to the mailing list.
>>>> >
>>>> > I am using Hadoop 0.20.2 with an append r1056497 version. The
>>>> question I
>>>> > have is related to balancing. I have a 5 datanode cluster and each
>>>> node has
>>>> > 2 disks attached to it. The second disk was added when the first disk
>>>> was
>>>> > reaching its capacity.
>>>> >
>>>> > Now the scenario that I am facing is, when the new disk was added
>>>> hadoop
>>>> > automatically moved over some data to the new disk. But over the time
>>>> I
>>>> > notice that data is no longer being written to the second disk. I
>>>> have also
>>>> > faced an issue on the datanode where the first disk had 100%
>>>> utilization.
>>>> >
>>>> > How can I overcome such scenario, is it not hadoop's job to balance
>>>> the disk
>>>> > utilization between multiple disks on single datanode?
>>>> >
>>>> > Thanks
>>>> > Divye Sheth
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>
>

Re: Question on DFS Balancing

Posted by Azuryy Yu <az...@gmail.com>.
you can write a simple tool to move blocks peer to peer. I had such tool
before, but I cannot find it now.

background: our cluster is not balanced, load balancer is very slow, so i
wrote this tool to move blocks from one node to another node.


On Wed, Mar 5, 2014 at 4:06 PM, divye sheth <di...@gmail.com> wrote:

> I wont be in a position to fix that depending on HDFS-1804 as we are
> upgrading to CDH4 in the coming month. Just wanted a short term solution. I
> have read somewhere that manual movement of the blocks would help. Could
> some one guide me to the exact steps or precautions I should take while
> doing this? Data loss is a NO NO for me.
>
> Thanks
> Divye Sheth
>
>
> On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu <az...@gmail.com> wrote:
>
>> Hi,
>> That probably break something if you apply the patch from 2.x to 0.20.x,
>> but it depends on.
>>
>> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by
>> yourself based on HDFS-1804.
>>
>>
>>
>> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com> wrote:
>>
>>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
>>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
>>> the short term to balance the disk utilization? The patch in the Jira, if
>>> applied to the version that I am using, will it break anything?
>>>
>>> Thanks
>>> Divye Sheth
>>>
>>>
>>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>>> You're probably looking for
>>>> https://issues.apache.org/jira/browse/HDFS-1804
>>>>
>>>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com>
>>>> wrote:
>>>> > Hi,
>>>> >
>>>> > I am new to the mailing list.
>>>> >
>>>> > I am using Hadoop 0.20.2 with an append r1056497 version. The
>>>> question I
>>>> > have is related to balancing. I have a 5 datanode cluster and each
>>>> node has
>>>> > 2 disks attached to it. The second disk was added when the first disk
>>>> was
>>>> > reaching its capacity.
>>>> >
>>>> > Now the scenario that I am facing is, when the new disk was added
>>>> hadoop
>>>> > automatically moved over some data to the new disk. But over the time
>>>> I
>>>> > notice that data is no longer being written to the second disk. I
>>>> have also
>>>> > faced an issue on the datanode where the first disk had 100%
>>>> utilization.
>>>> >
>>>> > How can I overcome such scenario, is it not hadoop's job to balance
>>>> the disk
>>>> > utilization between multiple disks on single datanode?
>>>> >
>>>> > Thanks
>>>> > Divye Sheth
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>>
>>
>

Re: Question on DFS Balancing

Posted by divye sheth <di...@gmail.com>.
I wont be in a position to fix that depending on HDFS-1804 as we are
upgrading to CDH4 in the coming month. Just wanted a short term solution. I
have read somewhere that manual movement of the blocks would help. Could
some one guide me to the exact steps or precautions I should take while
doing this? Data loss is a NO NO for me.

Thanks
Divye Sheth


On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu <az...@gmail.com> wrote:

> Hi,
> That probably break something if you apply the patch from 2.x to 0.20.x,
> but it depends on.
>
> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by
> yourself based on HDFS-1804.
>
>
>
> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com> wrote:
>
>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
>> the short term to balance the disk utilization? The patch in the Jira, if
>> applied to the version that I am using, will it break anything?
>>
>> Thanks
>> Divye Sheth
>>
>>
>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> You're probably looking for
>>> https://issues.apache.org/jira/browse/HDFS-1804
>>>
>>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > I am new to the mailing list.
>>> >
>>> > I am using Hadoop 0.20.2 with an append r1056497 version. The question
>>> I
>>> > have is related to balancing. I have a 5 datanode cluster and each
>>> node has
>>> > 2 disks attached to it. The second disk was added when the first disk
>>> was
>>> > reaching its capacity.
>>> >
>>> > Now the scenario that I am facing is, when the new disk was added
>>> hadoop
>>> > automatically moved over some data to the new disk. But over the time I
>>> > notice that data is no longer being written to the second disk. I have
>>> also
>>> > faced an issue on the datanode where the first disk had 100%
>>> utilization.
>>> >
>>> > How can I overcome such scenario, is it not hadoop's job to balance
>>> the disk
>>> > utilization between multiple disks on single datanode?
>>> >
>>> > Thanks
>>> > Divye Sheth
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: Question on DFS Balancing

Posted by divye sheth <di...@gmail.com>.
I wont be in a position to fix that depending on HDFS-1804 as we are
upgrading to CDH4 in the coming month. Just wanted a short term solution. I
have read somewhere that manual movement of the blocks would help. Could
some one guide me to the exact steps or precautions I should take while
doing this? Data loss is a NO NO for me.

Thanks
Divye Sheth


On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu <az...@gmail.com> wrote:

> Hi,
> That probably break something if you apply the patch from 2.x to 0.20.x,
> but it depends on.
>
> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by
> yourself based on HDFS-1804.
>
>
>
> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com> wrote:
>
>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
>> the short term to balance the disk utilization? The patch in the Jira, if
>> applied to the version that I am using, will it break anything?
>>
>> Thanks
>> Divye Sheth
>>
>>
>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> You're probably looking for
>>> https://issues.apache.org/jira/browse/HDFS-1804
>>>
>>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > I am new to the mailing list.
>>> >
>>> > I am using Hadoop 0.20.2 with an append r1056497 version. The question
>>> I
>>> > have is related to balancing. I have a 5 datanode cluster and each
>>> node has
>>> > 2 disks attached to it. The second disk was added when the first disk
>>> was
>>> > reaching its capacity.
>>> >
>>> > Now the scenario that I am facing is, when the new disk was added
>>> hadoop
>>> > automatically moved over some data to the new disk. But over the time I
>>> > notice that data is no longer being written to the second disk. I have
>>> also
>>> > faced an issue on the datanode where the first disk had 100%
>>> utilization.
>>> >
>>> > How can I overcome such scenario, is it not hadoop's job to balance
>>> the disk
>>> > utilization between multiple disks on single datanode?
>>> >
>>> > Thanks
>>> > Divye Sheth
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: Question on DFS Balancing

Posted by divye sheth <di...@gmail.com>.
I wont be in a position to fix that depending on HDFS-1804 as we are
upgrading to CDH4 in the coming month. Just wanted a short term solution. I
have read somewhere that manual movement of the blocks would help. Could
some one guide me to the exact steps or precautions I should take while
doing this? Data loss is a NO NO for me.

Thanks
Divye Sheth


On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu <az...@gmail.com> wrote:

> Hi,
> That probably break something if you apply the patch from 2.x to 0.20.x,
> but it depends on.
>
> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by
> yourself based on HDFS-1804.
>
>
>
> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com> wrote:
>
>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
>> the short term to balance the disk utilization? The patch in the Jira, if
>> applied to the version that I am using, will it break anything?
>>
>> Thanks
>> Divye Sheth
>>
>>
>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> You're probably looking for
>>> https://issues.apache.org/jira/browse/HDFS-1804
>>>
>>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > I am new to the mailing list.
>>> >
>>> > I am using Hadoop 0.20.2 with an append r1056497 version. The question
>>> I
>>> > have is related to balancing. I have a 5 datanode cluster and each
>>> node has
>>> > 2 disks attached to it. The second disk was added when the first disk
>>> was
>>> > reaching its capacity.
>>> >
>>> > Now the scenario that I am facing is, when the new disk was added
>>> hadoop
>>> > automatically moved over some data to the new disk. But over the time I
>>> > notice that data is no longer being written to the second disk. I have
>>> also
>>> > faced an issue on the datanode where the first disk had 100%
>>> utilization.
>>> >
>>> > How can I overcome such scenario, is it not hadoop's job to balance
>>> the disk
>>> > utilization between multiple disks on single datanode?
>>> >
>>> > Thanks
>>> > Divye Sheth
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: Question on DFS Balancing

Posted by divye sheth <di...@gmail.com>.
I wont be in a position to fix that depending on HDFS-1804 as we are
upgrading to CDH4 in the coming month. Just wanted a short term solution. I
have read somewhere that manual movement of the blocks would help. Could
some one guide me to the exact steps or precautions I should take while
doing this? Data loss is a NO NO for me.

Thanks
Divye Sheth


On Wed, Mar 5, 2014 at 1:28 PM, Azuryy Yu <az...@gmail.com> wrote:

> Hi,
> That probably break something if you apply the patch from 2.x to 0.20.x,
> but it depends on.
>
> AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by
> yourself based on HDFS-1804.
>
>
>
> On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com> wrote:
>
>> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using
>> Hadoop 0.20.2 (we are in a process of upgrading) is there a workaround for
>> the short term to balance the disk utilization? The patch in the Jira, if
>> applied to the version that I am using, will it break anything?
>>
>> Thanks
>> Divye Sheth
>>
>>
>> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> You're probably looking for
>>> https://issues.apache.org/jira/browse/HDFS-1804
>>>
>>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > I am new to the mailing list.
>>> >
>>> > I am using Hadoop 0.20.2 with an append r1056497 version. The question
>>> I
>>> > have is related to balancing. I have a 5 datanode cluster and each
>>> node has
>>> > 2 disks attached to it. The second disk was added when the first disk
>>> was
>>> > reaching its capacity.
>>> >
>>> > Now the scenario that I am facing is, when the new disk was added
>>> hadoop
>>> > automatically moved over some data to the new disk. But over the time I
>>> > notice that data is no longer being written to the second disk. I have
>>> also
>>> > faced an issue on the datanode where the first disk had 100%
>>> utilization.
>>> >
>>> > How can I overcome such scenario, is it not hadoop's job to balance
>>> the disk
>>> > utilization between multiple disks on single datanode?
>>> >
>>> > Thanks
>>> > Divye Sheth
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: Question on DFS Balancing

Posted by Azuryy Yu <az...@gmail.com>.
Hi,
That probably break something if you apply the patch from 2.x to 0.20.x,
but it depends on.

AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by
yourself based on HDFS-1804.



On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com> wrote:

> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
> 0.20.2 (we are in a process of upgrading) is there a workaround for the
> short term to balance the disk utilization? The patch in the Jira, if
> applied to the version that I am using, will it break anything?
>
> Thanks
> Divye Sheth
>
>
> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>
>> You're probably looking for
>> https://issues.apache.org/jira/browse/HDFS-1804
>>
>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com> wrote:
>> > Hi,
>> >
>> > I am new to the mailing list.
>> >
>> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I
>> > have is related to balancing. I have a 5 datanode cluster and each node
>> has
>> > 2 disks attached to it. The second disk was added when the first disk
>> was
>> > reaching its capacity.
>> >
>> > Now the scenario that I am facing is, when the new disk was added hadoop
>> > automatically moved over some data to the new disk. But over the time I
>> > notice that data is no longer being written to the second disk. I have
>> also
>> > faced an issue on the datanode where the first disk had 100%
>> utilization.
>> >
>> > How can I overcome such scenario, is it not hadoop's job to balance the
>> disk
>> > utilization between multiple disks on single datanode?
>> >
>> > Thanks
>> > Divye Sheth
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Question on DFS Balancing

Posted by Azuryy Yu <az...@gmail.com>.
Hi,
That probably break something if you apply the patch from 2.x to 0.20.x,
but it depends on.

AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by
yourself based on HDFS-1804.



On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com> wrote:

> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
> 0.20.2 (we are in a process of upgrading) is there a workaround for the
> short term to balance the disk utilization? The patch in the Jira, if
> applied to the version that I am using, will it break anything?
>
> Thanks
> Divye Sheth
>
>
> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>
>> You're probably looking for
>> https://issues.apache.org/jira/browse/HDFS-1804
>>
>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com> wrote:
>> > Hi,
>> >
>> > I am new to the mailing list.
>> >
>> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I
>> > have is related to balancing. I have a 5 datanode cluster and each node
>> has
>> > 2 disks attached to it. The second disk was added when the first disk
>> was
>> > reaching its capacity.
>> >
>> > Now the scenario that I am facing is, when the new disk was added hadoop
>> > automatically moved over some data to the new disk. But over the time I
>> > notice that data is no longer being written to the second disk. I have
>> also
>> > faced an issue on the datanode where the first disk had 100%
>> utilization.
>> >
>> > How can I overcome such scenario, is it not hadoop's job to balance the
>> disk
>> > utilization between multiple disks on single datanode?
>> >
>> > Thanks
>> > Divye Sheth
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Question on DFS Balancing

Posted by Harsh J <ha...@cloudera.com>.
You can safely move block files between disks. Follow the instructions
here: http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F

On Tue, Mar 4, 2014 at 11:47 PM, divye sheth <di...@gmail.com> wrote:
> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
> 0.20.2 (we are in a process of upgrading) is there a workaround for the
> short term to balance the disk utilization? The patch in the Jira, if
> applied to the version that I am using, will it break anything?
>
> Thanks
> Divye Sheth
>
>
> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> You're probably looking for
>> https://issues.apache.org/jira/browse/HDFS-1804
>>
>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com> wrote:
>> > Hi,
>> >
>> > I am new to the mailing list.
>> >
>> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I
>> > have is related to balancing. I have a 5 datanode cluster and each node
>> > has
>> > 2 disks attached to it. The second disk was added when the first disk
>> > was
>> > reaching its capacity.
>> >
>> > Now the scenario that I am facing is, when the new disk was added hadoop
>> > automatically moved over some data to the new disk. But over the time I
>> > notice that data is no longer being written to the second disk. I have
>> > also
>> > faced an issue on the datanode where the first disk had 100%
>> > utilization.
>> >
>> > How can I overcome such scenario, is it not hadoop's job to balance the
>> > disk
>> > utilization between multiple disks on single datanode?
>> >
>> > Thanks
>> > Divye Sheth
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Question on DFS Balancing

Posted by Azuryy Yu <az...@gmail.com>.
Hi,
That probably break something if you apply the patch from 2.x to 0.20.x,
but it depends on.

AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by
yourself based on HDFS-1804.



On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com> wrote:

> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
> 0.20.2 (we are in a process of upgrading) is there a workaround for the
> short term to balance the disk utilization? The patch in the Jira, if
> applied to the version that I am using, will it break anything?
>
> Thanks
> Divye Sheth
>
>
> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>
>> You're probably looking for
>> https://issues.apache.org/jira/browse/HDFS-1804
>>
>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com> wrote:
>> > Hi,
>> >
>> > I am new to the mailing list.
>> >
>> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I
>> > have is related to balancing. I have a 5 datanode cluster and each node
>> has
>> > 2 disks attached to it. The second disk was added when the first disk
>> was
>> > reaching its capacity.
>> >
>> > Now the scenario that I am facing is, when the new disk was added hadoop
>> > automatically moved over some data to the new disk. But over the time I
>> > notice that data is no longer being written to the second disk. I have
>> also
>> > faced an issue on the datanode where the first disk had 100%
>> utilization.
>> >
>> > How can I overcome such scenario, is it not hadoop's job to balance the
>> disk
>> > utilization between multiple disks on single datanode?
>> >
>> > Thanks
>> > Divye Sheth
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Question on DFS Balancing

Posted by Azuryy Yu <az...@gmail.com>.
Hi,
That probably break something if you apply the patch from 2.x to 0.20.x,
but it depends on.

AFAIK, Balancer had a major refactor in HDFSv2, so you'd better fix it by
yourself based on HDFS-1804.



On Wed, Mar 5, 2014 at 3:47 PM, divye sheth <di...@gmail.com> wrote:

> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
> 0.20.2 (we are in a process of upgrading) is there a workaround for the
> short term to balance the disk utilization? The patch in the Jira, if
> applied to the version that I am using, will it break anything?
>
> Thanks
> Divye Sheth
>
>
> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>
>> You're probably looking for
>> https://issues.apache.org/jira/browse/HDFS-1804
>>
>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com> wrote:
>> > Hi,
>> >
>> > I am new to the mailing list.
>> >
>> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I
>> > have is related to balancing. I have a 5 datanode cluster and each node
>> has
>> > 2 disks attached to it. The second disk was added when the first disk
>> was
>> > reaching its capacity.
>> >
>> > Now the scenario that I am facing is, when the new disk was added hadoop
>> > automatically moved over some data to the new disk. But over the time I
>> > notice that data is no longer being written to the second disk. I have
>> also
>> > faced an issue on the datanode where the first disk had 100%
>> utilization.
>> >
>> > How can I overcome such scenario, is it not hadoop's job to balance the
>> disk
>> > utilization between multiple disks on single datanode?
>> >
>> > Thanks
>> > Divye Sheth
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: Question on DFS Balancing

Posted by Harsh J <ha...@cloudera.com>.
You can safely move block files between disks. Follow the instructions
here: http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F

On Tue, Mar 4, 2014 at 11:47 PM, divye sheth <di...@gmail.com> wrote:
> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
> 0.20.2 (we are in a process of upgrading) is there a workaround for the
> short term to balance the disk utilization? The patch in the Jira, if
> applied to the version that I am using, will it break anything?
>
> Thanks
> Divye Sheth
>
>
> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> You're probably looking for
>> https://issues.apache.org/jira/browse/HDFS-1804
>>
>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com> wrote:
>> > Hi,
>> >
>> > I am new to the mailing list.
>> >
>> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I
>> > have is related to balancing. I have a 5 datanode cluster and each node
>> > has
>> > 2 disks attached to it. The second disk was added when the first disk
>> > was
>> > reaching its capacity.
>> >
>> > Now the scenario that I am facing is, when the new disk was added hadoop
>> > automatically moved over some data to the new disk. But over the time I
>> > notice that data is no longer being written to the second disk. I have
>> > also
>> > faced an issue on the datanode where the first disk had 100%
>> > utilization.
>> >
>> > How can I overcome such scenario, is it not hadoop's job to balance the
>> > disk
>> > utilization between multiple disks on single datanode?
>> >
>> > Thanks
>> > Divye Sheth
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Question on DFS Balancing

Posted by Harsh J <ha...@cloudera.com>.
You can safely move block files between disks. Follow the instructions
here: http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F

On Tue, Mar 4, 2014 at 11:47 PM, divye sheth <di...@gmail.com> wrote:
> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
> 0.20.2 (we are in a process of upgrading) is there a workaround for the
> short term to balance the disk utilization? The patch in the Jira, if
> applied to the version that I am using, will it break anything?
>
> Thanks
> Divye Sheth
>
>
> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> You're probably looking for
>> https://issues.apache.org/jira/browse/HDFS-1804
>>
>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com> wrote:
>> > Hi,
>> >
>> > I am new to the mailing list.
>> >
>> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I
>> > have is related to balancing. I have a 5 datanode cluster and each node
>> > has
>> > 2 disks attached to it. The second disk was added when the first disk
>> > was
>> > reaching its capacity.
>> >
>> > Now the scenario that I am facing is, when the new disk was added hadoop
>> > automatically moved over some data to the new disk. But over the time I
>> > notice that data is no longer being written to the second disk. I have
>> > also
>> > faced an issue on the datanode where the first disk had 100%
>> > utilization.
>> >
>> > How can I overcome such scenario, is it not hadoop's job to balance the
>> > disk
>> > utilization between multiple disks on single datanode?
>> >
>> > Thanks
>> > Divye Sheth
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Question on DFS Balancing

Posted by Harsh J <ha...@cloudera.com>.
You can safely move block files between disks. Follow the instructions
here: http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F

On Tue, Mar 4, 2014 at 11:47 PM, divye sheth <di...@gmail.com> wrote:
> Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
> 0.20.2 (we are in a process of upgrading) is there a workaround for the
> short term to balance the disk utilization? The patch in the Jira, if
> applied to the version that I am using, will it break anything?
>
> Thanks
> Divye Sheth
>
>
> On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> You're probably looking for
>> https://issues.apache.org/jira/browse/HDFS-1804
>>
>> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com> wrote:
>> > Hi,
>> >
>> > I am new to the mailing list.
>> >
>> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I
>> > have is related to balancing. I have a 5 datanode cluster and each node
>> > has
>> > 2 disks attached to it. The second disk was added when the first disk
>> > was
>> > reaching its capacity.
>> >
>> > Now the scenario that I am facing is, when the new disk was added hadoop
>> > automatically moved over some data to the new disk. But over the time I
>> > notice that data is no longer being written to the second disk. I have
>> > also
>> > faced an issue on the datanode where the first disk had 100%
>> > utilization.
>> >
>> > How can I overcome such scenario, is it not hadoop's job to balance the
>> > disk
>> > utilization between multiple disks on single datanode?
>> >
>> > Thanks
>> > Divye Sheth
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Question on DFS Balancing

Posted by divye sheth <di...@gmail.com>.
Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
0.20.2 (we are in a process of upgrading) is there a workaround for the
short term to balance the disk utilization? The patch in the Jira, if
applied to the version that I am using, will it break anything?

Thanks
Divye Sheth


On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:

> You're probably looking for
> https://issues.apache.org/jira/browse/HDFS-1804
>
> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com> wrote:
> > Hi,
> >
> > I am new to the mailing list.
> >
> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I
> > have is related to balancing. I have a 5 datanode cluster and each node
> has
> > 2 disks attached to it. The second disk was added when the first disk was
> > reaching its capacity.
> >
> > Now the scenario that I am facing is, when the new disk was added hadoop
> > automatically moved over some data to the new disk. But over the time I
> > notice that data is no longer being written to the second disk. I have
> also
> > faced an issue on the datanode where the first disk had 100% utilization.
> >
> > How can I overcome such scenario, is it not hadoop's job to balance the
> disk
> > utilization between multiple disks on single datanode?
> >
> > Thanks
> > Divye Sheth
>
>
>
> --
> Harsh J
>

Re: Question on DFS Balancing

Posted by divye sheth <di...@gmail.com>.
Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
0.20.2 (we are in a process of upgrading) is there a workaround for the
short term to balance the disk utilization? The patch in the Jira, if
applied to the version that I am using, will it break anything?

Thanks
Divye Sheth


On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:

> You're probably looking for
> https://issues.apache.org/jira/browse/HDFS-1804
>
> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com> wrote:
> > Hi,
> >
> > I am new to the mailing list.
> >
> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I
> > have is related to balancing. I have a 5 datanode cluster and each node
> has
> > 2 disks attached to it. The second disk was added when the first disk was
> > reaching its capacity.
> >
> > Now the scenario that I am facing is, when the new disk was added hadoop
> > automatically moved over some data to the new disk. But over the time I
> > notice that data is no longer being written to the second disk. I have
> also
> > faced an issue on the datanode where the first disk had 100% utilization.
> >
> > How can I overcome such scenario, is it not hadoop's job to balance the
> disk
> > utilization between multiple disks on single datanode?
> >
> > Thanks
> > Divye Sheth
>
>
>
> --
> Harsh J
>

Re: Question on DFS Balancing

Posted by divye sheth <di...@gmail.com>.
Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
0.20.2 (we are in a process of upgrading) is there a workaround for the
short term to balance the disk utilization? The patch in the Jira, if
applied to the version that I am using, will it break anything?

Thanks
Divye Sheth


On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:

> You're probably looking for
> https://issues.apache.org/jira/browse/HDFS-1804
>
> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com> wrote:
> > Hi,
> >
> > I am new to the mailing list.
> >
> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I
> > have is related to balancing. I have a 5 datanode cluster and each node
> has
> > 2 disks attached to it. The second disk was added when the first disk was
> > reaching its capacity.
> >
> > Now the scenario that I am facing is, when the new disk was added hadoop
> > automatically moved over some data to the new disk. But over the time I
> > notice that data is no longer being written to the second disk. I have
> also
> > faced an issue on the datanode where the first disk had 100% utilization.
> >
> > How can I overcome such scenario, is it not hadoop's job to balance the
> disk
> > utilization between multiple disks on single datanode?
> >
> > Thanks
> > Divye Sheth
>
>
>
> --
> Harsh J
>

Re: Question on DFS Balancing

Posted by divye sheth <di...@gmail.com>.
Thanks Harsh. The jira is fixed in version 2.1.0 whereas I am using Hadoop
0.20.2 (we are in a process of upgrading) is there a workaround for the
short term to balance the disk utilization? The patch in the Jira, if
applied to the version that I am using, will it break anything?

Thanks
Divye Sheth


On Wed, Mar 5, 2014 at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:

> You're probably looking for
> https://issues.apache.org/jira/browse/HDFS-1804
>
> On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com> wrote:
> > Hi,
> >
> > I am new to the mailing list.
> >
> > I am using Hadoop 0.20.2 with an append r1056497 version. The question I
> > have is related to balancing. I have a 5 datanode cluster and each node
> has
> > 2 disks attached to it. The second disk was added when the first disk was
> > reaching its capacity.
> >
> > Now the scenario that I am facing is, when the new disk was added hadoop
> > automatically moved over some data to the new disk. But over the time I
> > notice that data is no longer being written to the second disk. I have
> also
> > faced an issue on the datanode where the first disk had 100% utilization.
> >
> > How can I overcome such scenario, is it not hadoop's job to balance the
> disk
> > utilization between multiple disks on single datanode?
> >
> > Thanks
> > Divye Sheth
>
>
>
> --
> Harsh J
>

Re: Question on DFS Balancing

Posted by Harsh J <ha...@cloudera.com>.
You're probably looking for https://issues.apache.org/jira/browse/HDFS-1804

On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com> wrote:
> Hi,
>
> I am new to the mailing list.
>
> I am using Hadoop 0.20.2 with an append r1056497 version. The question I
> have is related to balancing. I have a 5 datanode cluster and each node has
> 2 disks attached to it. The second disk was added when the first disk was
> reaching its capacity.
>
> Now the scenario that I am facing is, when the new disk was added hadoop
> automatically moved over some data to the new disk. But over the time I
> notice that data is no longer being written to the second disk. I have also
> faced an issue on the datanode where the first disk had 100% utilization.
>
> How can I overcome such scenario, is it not hadoop's job to balance the disk
> utilization between multiple disks on single datanode?
>
> Thanks
> Divye Sheth



-- 
Harsh J

Re: Question on DFS Balancing

Posted by Harsh J <ha...@cloudera.com>.
You're probably looking for https://issues.apache.org/jira/browse/HDFS-1804

On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com> wrote:
> Hi,
>
> I am new to the mailing list.
>
> I am using Hadoop 0.20.2 with an append r1056497 version. The question I
> have is related to balancing. I have a 5 datanode cluster and each node has
> 2 disks attached to it. The second disk was added when the first disk was
> reaching its capacity.
>
> Now the scenario that I am facing is, when the new disk was added hadoop
> automatically moved over some data to the new disk. But over the time I
> notice that data is no longer being written to the second disk. I have also
> faced an issue on the datanode where the first disk had 100% utilization.
>
> How can I overcome such scenario, is it not hadoop's job to balance the disk
> utilization between multiple disks on single datanode?
>
> Thanks
> Divye Sheth



-- 
Harsh J

Re: Question on DFS Balancing

Posted by Harsh J <ha...@cloudera.com>.
You're probably looking for https://issues.apache.org/jira/browse/HDFS-1804

On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com> wrote:
> Hi,
>
> I am new to the mailing list.
>
> I am using Hadoop 0.20.2 with an append r1056497 version. The question I
> have is related to balancing. I have a 5 datanode cluster and each node has
> 2 disks attached to it. The second disk was added when the first disk was
> reaching its capacity.
>
> Now the scenario that I am facing is, when the new disk was added hadoop
> automatically moved over some data to the new disk. But over the time I
> notice that data is no longer being written to the second disk. I have also
> faced an issue on the datanode where the first disk had 100% utilization.
>
> How can I overcome such scenario, is it not hadoop's job to balance the disk
> utilization between multiple disks on single datanode?
>
> Thanks
> Divye Sheth



-- 
Harsh J

Re: Question on DFS Balancing

Posted by Harsh J <ha...@cloudera.com>.
You're probably looking for https://issues.apache.org/jira/browse/HDFS-1804

On Tue, Mar 4, 2014 at 5:54 AM, divye sheth <di...@gmail.com> wrote:
> Hi,
>
> I am new to the mailing list.
>
> I am using Hadoop 0.20.2 with an append r1056497 version. The question I
> have is related to balancing. I have a 5 datanode cluster and each node has
> 2 disks attached to it. The second disk was added when the first disk was
> reaching its capacity.
>
> Now the scenario that I am facing is, when the new disk was added hadoop
> automatically moved over some data to the new disk. But over the time I
> notice that data is no longer being written to the second disk. I have also
> faced an issue on the datanode where the first disk had 100% utilization.
>
> How can I overcome such scenario, is it not hadoop's job to balance the disk
> utilization between multiple disks on single datanode?
>
> Thanks
> Divye Sheth



-- 
Harsh J