You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Georgi Ivanov <iv...@vesseltracker.com> on 2014/09/03 09:55:31 UTC

HDFS balance

Hi,
We have 11 nodes cluster.
Every hour a cron job is started to upload one file( ~1GB) to Hadoop on
node1. (plain hadoop fs -put)

This way node1 is getting full because the first replica is always
stored on the node where the command is executed.
Every day i am running re-balance, but this seems to be not enough.
The effect of this is :
host1 4.7TB/5.3TB
host[2-10] : 4.1/5.3

So i am always out of space on host1.

What i can do is , spread the job to all the nodes and execute the job
on random host.
I don't really like this solution as it involves some NFS mounts,
security issues etc.

Is there any better solution ?

Thanks in advance.
George


RE: HDFS balance

Posted by Jamal B <jm...@gmail.com>.
Yes.  We do it all the time.

The node which you move this cron job to only needs to have the hadoop
environment set up, and proper connectivity to the cluster in which it is
writing to.
On Sep 3, 2014 10:51 AM, "John Lilley" <jo...@redpoint.net> wrote:

> Can you run the load from an "edge node" that is not a DataNode?
> john
>
> John Lilley
> Chief Architect, RedPoint Global Inc.
> 1515 Walnut Street | Suite 300 | Boulder, CO 80302
> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>
>
> -----Original Message-----
> From: Georgi Ivanov [mailto:ivanov@vesseltracker.com]
> Sent: Wednesday, September 03, 2014 1:56 AM
> To: user@hadoop.apache.org
> Subject: HDFS balance
>
> Hi,
> We have 11 nodes cluster.
> Every hour a cron job is started to upload one file( ~1GB) to Hadoop on
> node1. (plain hadoop fs -put)
>
> This way node1 is getting full because the first replica is always stored
> on the node where the command is executed.
> Every day i am running re-balance, but this seems to be not enough.
> The effect of this is :
> host1 4.7TB/5.3TB
> host[2-10] : 4.1/5.3
>
> So i am always out of space on host1.
>
> What i can do is , spread the job to all the nodes and execute the job on
> random host.
> I don't really like this solution as it involves some NFS mounts, security
> issues etc.
>
> Is there any better solution ?
>
> Thanks in advance.
> George
>
>

RE: HDFS balance

Posted by Jamal B <jm...@gmail.com>.
Yes.  We do it all the time.

The node which you move this cron job to only needs to have the hadoop
environment set up, and proper connectivity to the cluster in which it is
writing to.
On Sep 3, 2014 10:51 AM, "John Lilley" <jo...@redpoint.net> wrote:

> Can you run the load from an "edge node" that is not a DataNode?
> john
>
> John Lilley
> Chief Architect, RedPoint Global Inc.
> 1515 Walnut Street | Suite 300 | Boulder, CO 80302
> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>
>
> -----Original Message-----
> From: Georgi Ivanov [mailto:ivanov@vesseltracker.com]
> Sent: Wednesday, September 03, 2014 1:56 AM
> To: user@hadoop.apache.org
> Subject: HDFS balance
>
> Hi,
> We have 11 nodes cluster.
> Every hour a cron job is started to upload one file( ~1GB) to Hadoop on
> node1. (plain hadoop fs -put)
>
> This way node1 is getting full because the first replica is always stored
> on the node where the command is executed.
> Every day i am running re-balance, but this seems to be not enough.
> The effect of this is :
> host1 4.7TB/5.3TB
> host[2-10] : 4.1/5.3
>
> So i am always out of space on host1.
>
> What i can do is , spread the job to all the nodes and execute the job on
> random host.
> I don't really like this solution as it involves some NFS mounts, security
> issues etc.
>
> Is there any better solution ?
>
> Thanks in advance.
> George
>
>

RE: HDFS balance

Posted by Jamal B <jm...@gmail.com>.
Yes.  We do it all the time.

The node which you move this cron job to only needs to have the hadoop
environment set up, and proper connectivity to the cluster in which it is
writing to.
On Sep 3, 2014 10:51 AM, "John Lilley" <jo...@redpoint.net> wrote:

> Can you run the load from an "edge node" that is not a DataNode?
> john
>
> John Lilley
> Chief Architect, RedPoint Global Inc.
> 1515 Walnut Street | Suite 300 | Boulder, CO 80302
> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>
>
> -----Original Message-----
> From: Georgi Ivanov [mailto:ivanov@vesseltracker.com]
> Sent: Wednesday, September 03, 2014 1:56 AM
> To: user@hadoop.apache.org
> Subject: HDFS balance
>
> Hi,
> We have 11 nodes cluster.
> Every hour a cron job is started to upload one file( ~1GB) to Hadoop on
> node1. (plain hadoop fs -put)
>
> This way node1 is getting full because the first replica is always stored
> on the node where the command is executed.
> Every day i am running re-balance, but this seems to be not enough.
> The effect of this is :
> host1 4.7TB/5.3TB
> host[2-10] : 4.1/5.3
>
> So i am always out of space on host1.
>
> What i can do is , spread the job to all the nodes and execute the job on
> random host.
> I don't really like this solution as it involves some NFS mounts, security
> issues etc.
>
> Is there any better solution ?
>
> Thanks in advance.
> George
>
>

RE: HDFS balance

Posted by Jamal B <jm...@gmail.com>.
Yes.  We do it all the time.

The node which you move this cron job to only needs to have the hadoop
environment set up, and proper connectivity to the cluster in which it is
writing to.
On Sep 3, 2014 10:51 AM, "John Lilley" <jo...@redpoint.net> wrote:

> Can you run the load from an "edge node" that is not a DataNode?
> john
>
> John Lilley
> Chief Architect, RedPoint Global Inc.
> 1515 Walnut Street | Suite 300 | Boulder, CO 80302
> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>
>
> -----Original Message-----
> From: Georgi Ivanov [mailto:ivanov@vesseltracker.com]
> Sent: Wednesday, September 03, 2014 1:56 AM
> To: user@hadoop.apache.org
> Subject: HDFS balance
>
> Hi,
> We have 11 nodes cluster.
> Every hour a cron job is started to upload one file( ~1GB) to Hadoop on
> node1. (plain hadoop fs -put)
>
> This way node1 is getting full because the first replica is always stored
> on the node where the command is executed.
> Every day i am running re-balance, but this seems to be not enough.
> The effect of this is :
> host1 4.7TB/5.3TB
> host[2-10] : 4.1/5.3
>
> So i am always out of space on host1.
>
> What i can do is , spread the job to all the nodes and execute the job on
> random host.
> I don't really like this solution as it involves some NFS mounts, security
> issues etc.
>
> Is there any better solution ?
>
> Thanks in advance.
> George
>
>

RE: HDFS balance

Posted by John Lilley <jo...@redpoint.net>.
Can you run the load from an "edge node" that is not a DataNode?
john

John Lilley
Chief Architect, RedPoint Global Inc.
1515 Walnut Street | Suite 300 | Boulder, CO 80302
T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net


-----Original Message-----
From: Georgi Ivanov [mailto:ivanov@vesseltracker.com] 
Sent: Wednesday, September 03, 2014 1:56 AM
To: user@hadoop.apache.org
Subject: HDFS balance

Hi,
We have 11 nodes cluster.
Every hour a cron job is started to upload one file( ~1GB) to Hadoop on node1. (plain hadoop fs -put)

This way node1 is getting full because the first replica is always stored on the node where the command is executed.
Every day i am running re-balance, but this seems to be not enough.
The effect of this is :
host1 4.7TB/5.3TB
host[2-10] : 4.1/5.3

So i am always out of space on host1.

What i can do is , spread the job to all the nodes and execute the job on random host.
I don't really like this solution as it involves some NFS mounts, security issues etc.

Is there any better solution ?

Thanks in advance.
George


Re: HDFS balance

Posted by AnilKumar B <ak...@gmail.com>.
Better to create one client/gateway node(where no DN is running) and
schedule your cron from that machine.

Thanks & Regards,
B Anil Kumar.


On Wed, Sep 3, 2014 at 1:25 PM, Georgi Ivanov <iv...@vesseltracker.com>
wrote:

> Hi,
> We have 11 nodes cluster.
> Every hour a cron job is started to upload one file( ~1GB) to Hadoop on
> node1. (plain hadoop fs -put)
>
> This way node1 is getting full because the first replica is always
> stored on the node where the command is executed.
> Every day i am running re-balance, but this seems to be not enough.
> The effect of this is :
> host1 4.7TB/5.3TB
> host[2-10] : 4.1/5.3
>
> So i am always out of space on host1.
>
> What i can do is , spread the job to all the nodes and execute the job
> on random host.
> I don't really like this solution as it involves some NFS mounts,
> security issues etc.
>
> Is there any better solution ?
>
> Thanks in advance.
> George
>
>

Re: HDFS balance

Posted by AnilKumar B <ak...@gmail.com>.
Better to create one client/gateway node(where no DN is running) and
schedule your cron from that machine.

Thanks & Regards,
B Anil Kumar.


On Wed, Sep 3, 2014 at 1:25 PM, Georgi Ivanov <iv...@vesseltracker.com>
wrote:

> Hi,
> We have 11 nodes cluster.
> Every hour a cron job is started to upload one file( ~1GB) to Hadoop on
> node1. (plain hadoop fs -put)
>
> This way node1 is getting full because the first replica is always
> stored on the node where the command is executed.
> Every day i am running re-balance, but this seems to be not enough.
> The effect of this is :
> host1 4.7TB/5.3TB
> host[2-10] : 4.1/5.3
>
> So i am always out of space on host1.
>
> What i can do is , spread the job to all the nodes and execute the job
> on random host.
> I don't really like this solution as it involves some NFS mounts,
> security issues etc.
>
> Is there any better solution ?
>
> Thanks in advance.
> George
>
>

Re: HDFS balance

Posted by AnilKumar B <ak...@gmail.com>.
Better to create one client/gateway node(where no DN is running) and
schedule your cron from that machine.

Thanks & Regards,
B Anil Kumar.


On Wed, Sep 3, 2014 at 1:25 PM, Georgi Ivanov <iv...@vesseltracker.com>
wrote:

> Hi,
> We have 11 nodes cluster.
> Every hour a cron job is started to upload one file( ~1GB) to Hadoop on
> node1. (plain hadoop fs -put)
>
> This way node1 is getting full because the first replica is always
> stored on the node where the command is executed.
> Every day i am running re-balance, but this seems to be not enough.
> The effect of this is :
> host1 4.7TB/5.3TB
> host[2-10] : 4.1/5.3
>
> So i am always out of space on host1.
>
> What i can do is , spread the job to all the nodes and execute the job
> on random host.
> I don't really like this solution as it involves some NFS mounts,
> security issues etc.
>
> Is there any better solution ?
>
> Thanks in advance.
> George
>
>

RE: HDFS balance

Posted by John Lilley <jo...@redpoint.net>.
Can you run the load from an "edge node" that is not a DataNode?
john

John Lilley
Chief Architect, RedPoint Global Inc.
1515 Walnut Street | Suite 300 | Boulder, CO 80302
T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net


-----Original Message-----
From: Georgi Ivanov [mailto:ivanov@vesseltracker.com] 
Sent: Wednesday, September 03, 2014 1:56 AM
To: user@hadoop.apache.org
Subject: HDFS balance

Hi,
We have 11 nodes cluster.
Every hour a cron job is started to upload one file( ~1GB) to Hadoop on node1. (plain hadoop fs -put)

This way node1 is getting full because the first replica is always stored on the node where the command is executed.
Every day i am running re-balance, but this seems to be not enough.
The effect of this is :
host1 4.7TB/5.3TB
host[2-10] : 4.1/5.3

So i am always out of space on host1.

What i can do is , spread the job to all the nodes and execute the job on random host.
I don't really like this solution as it involves some NFS mounts, security issues etc.

Is there any better solution ?

Thanks in advance.
George


Re: HDFS balance

Posted by AnilKumar B <ak...@gmail.com>.
Better to create one client/gateway node(where no DN is running) and
schedule your cron from that machine.

Thanks & Regards,
B Anil Kumar.


On Wed, Sep 3, 2014 at 1:25 PM, Georgi Ivanov <iv...@vesseltracker.com>
wrote:

> Hi,
> We have 11 nodes cluster.
> Every hour a cron job is started to upload one file( ~1GB) to Hadoop on
> node1. (plain hadoop fs -put)
>
> This way node1 is getting full because the first replica is always
> stored on the node where the command is executed.
> Every day i am running re-balance, but this seems to be not enough.
> The effect of this is :
> host1 4.7TB/5.3TB
> host[2-10] : 4.1/5.3
>
> So i am always out of space on host1.
>
> What i can do is , spread the job to all the nodes and execute the job
> on random host.
> I don't really like this solution as it involves some NFS mounts,
> security issues etc.
>
> Is there any better solution ?
>
> Thanks in advance.
> George
>
>

RE: HDFS balance

Posted by John Lilley <jo...@redpoint.net>.
Can you run the load from an "edge node" that is not a DataNode?
john

John Lilley
Chief Architect, RedPoint Global Inc.
1515 Walnut Street | Suite 300 | Boulder, CO 80302
T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net


-----Original Message-----
From: Georgi Ivanov [mailto:ivanov@vesseltracker.com] 
Sent: Wednesday, September 03, 2014 1:56 AM
To: user@hadoop.apache.org
Subject: HDFS balance

Hi,
We have 11 nodes cluster.
Every hour a cron job is started to upload one file( ~1GB) to Hadoop on node1. (plain hadoop fs -put)

This way node1 is getting full because the first replica is always stored on the node where the command is executed.
Every day i am running re-balance, but this seems to be not enough.
The effect of this is :
host1 4.7TB/5.3TB
host[2-10] : 4.1/5.3

So i am always out of space on host1.

What i can do is , spread the job to all the nodes and execute the job on random host.
I don't really like this solution as it involves some NFS mounts, security issues etc.

Is there any better solution ?

Thanks in advance.
George


RE: HDFS balance

Posted by John Lilley <jo...@redpoint.net>.
Can you run the load from an "edge node" that is not a DataNode?
john

John Lilley
Chief Architect, RedPoint Global Inc.
1515 Walnut Street | Suite 300 | Boulder, CO 80302
T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net


-----Original Message-----
From: Georgi Ivanov [mailto:ivanov@vesseltracker.com] 
Sent: Wednesday, September 03, 2014 1:56 AM
To: user@hadoop.apache.org
Subject: HDFS balance

Hi,
We have 11 nodes cluster.
Every hour a cron job is started to upload one file( ~1GB) to Hadoop on node1. (plain hadoop fs -put)

This way node1 is getting full because the first replica is always stored on the node where the command is executed.
Every day i am running re-balance, but this seems to be not enough.
The effect of this is :
host1 4.7TB/5.3TB
host[2-10] : 4.1/5.3

So i am always out of space on host1.

What i can do is , spread the job to all the nodes and execute the job on random host.
I don't really like this solution as it involves some NFS mounts, security issues etc.

Is there any better solution ?

Thanks in advance.
George