You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Vitaliy Semochkin <vi...@gmail.com> on 2010/07/14 13:16:37 UTC

hdfs system crashes when loading files bigger than local space left

Hi,

I have a small cluster with 5 nodes and one node is working as NameNode as
DataNode same time.
On NameNode I load amount of data (100GB) to hdfs that is bigger than local
space on the node left

Sometimes hadoop allows to load amount of data bigger than local space on
the node.
Sometimes hadoop crashes and I have to reformat hdfs.

The only robust solution I found for this problem is to remove namenode from
slaves list before loading data
restart cluster
upload data
add namenode to slaves list
restart cluster

in this case I never had hdfs crush.

Did anyone found more elegant solution for my problem?

Thanks in Advance,
Vitaliy

Re: hdfs system crashes when loading files bigger than local space left

Posted by Vitaliy Semochkin <vi...@gmail.com>.
On Fri, Jul 16, 2010 at 10:07 PM, Allen Wittenauer <awittenauer@linkedin.com
> wrote:

>
> On Jul 16, 2010, at 3:15 AM, Vitaliy Semochkin wrote:
> > That is likely way too small.
> > Will setting 512Mb be better in case the whole volume size is only 190Gb?
>
> I'd recommend at least 5gb.  I'm also assuming this same disk space isn't
> getting used for MapReduce.

Thank you for advise. I'll increase the amount to 6 gb (hope it will be
enough).
Same disk is used for MapReduce but M/R is not executed during loading.


> > Does hadoop detect/distinct the client that uploads data from datanode
> and not from datanode?
> > lets say I execute
>
> Yes.
>
> > hadoop -put someFile hdfs://namenode.mycompany.com/
> >
> > from namenode.mycompany.com and from some other pc. Will it be any
> different for hadoop and will hadoop orgonize data more balanced in the last
> case?
>
> Yes.
>
> Again, namenode is irrelevant.

I was doing it from namenode which was acting as datanode as well.


> Do not do put's from a datanode if you want the data to be reasonably
> balanced.

Thank you very much. Will perform putting from pc outside the hadoop
cluster.


Regards,
Vitaliy S

Re: hdfs system crashes when loading files bigger than local space left

Posted by Allen Wittenauer <aw...@linkedin.com>.
On Jul 16, 2010, at 3:15 AM, Vitaliy Semochkin wrote:
> That is likely way too small.
> Will setting 512Mb be better in case the whole volume size is only 190Gb?

I'd recommend at least 5gb.  I'm also assuming this same disk space isn't getting used for MapReduce.

> Does hadoop detect/distinct the client that uploads data from datanode and not from datanode?
> lets say I execute

Yes.

> hadoop -put someFile hdfs://namenode.mycompany.com/
> 
> from namenode.mycompany.com and from some other pc. Will it be any different for hadoop and will hadoop orgonize data more balanced in the last case?

Yes.

Again, namenode is irrelevant. Do not do put's from a datanode if you want the data to be reasonably balanced.

Re: hdfs system crashes when loading files bigger than local space left

Posted by Vitaliy Semochkin <vi...@gmail.com>.
On Thu, Jul 15, 2010 at 9:26 PM, Allen Wittenauer
<aw...@linkedin.com>wrote:

>
> On Jul 15, 2010, at 1:11 AM, Vitaliy Semochkin wrote:
>
> > >a) Have you set a reserved size for hdfs?
> > Yes. I set 128Mb as reserved size.
>
> That is likely way too small.

Will setting 512Mb be better in case the whole volume size is only 190Gb?


> > b) Are you loading data from the datanode?
> > Yes. But the datanode is running on same node as namenode (i have very
> small cluster, only 5 servers and wasting one node only for
> namenode/jobtracker seemed unreasonable to me)
>
> Where the NN is running is irrelevant to this particular problem.
>
> The problem is that if you start your data load on a machine also running a
> datanode process, the data will get put onto that node first.  This will
> cause your DFS to be majorly unbalanced.
>
> It is much better to load the data from another host outside the grid.
>

Does hadoop detect/distinct the client that uploads data from datanode and
not from datanode?
lets say I execute

hadoop -put someFile hdfs://namenode.mycompany.com/

from namenode.mycompany.com and from some other pc. Will it be any different
for hadoop and will hadoop orgonize data more balanced in the last case?

Thank you very much for replies,
Vitaliy S

Re: hdfs system crashes when loading files bigger than local space left

Posted by Allen Wittenauer <aw...@linkedin.com>.
On Jul 15, 2010, at 1:11 AM, Vitaliy Semochkin wrote:

> >a) Have you set a reserved size for hdfs?
> Yes. I set 128Mb as reserved size.

That is likely way too small.

> b) Are you loading data from the datanode?
> Yes. But the datanode is running on same node as namenode (i have very small cluster, only 5 servers and wasting one node only for namenode/jobtracker seemed unreasonable to me)

Where the NN is running is irrelevant to this particular problem.

The problem is that if you start your data load on a machine also running a datanode process, the data will get put onto that node first.  This will cause your DFS to be majorly unbalanced.

It is much better to load the data from another host outside the grid.


Re: hdfs system crashes when loading files bigger than local space left

Posted by Vitaliy Semochkin <vi...@gmail.com>.
>a) Have you set a reserved size for hdfs?
Yes. I set 128Mb as reserved size.

b) Are you loading data from the datanode?
Yes. But the datanode is running on same node as namenode (i have very small
cluster, only 5 servers and wasting one node only for namenode/jobtracker
seemed unreasonable to me)

On Wed, Jul 14, 2010 at 7:23 PM, Allen Wittenauer
<aw...@linkedin.com>wrote:

>
> On Jul 14, 2010, at 4:16 AM, Vitaliy Semochkin wrote:
> > Sometimes hadoop allows to load amount of data bigger than local space on
> the node.
> > Sometimes hadoop crashes and I have to reformat hdfs.
>
> a) Have you set a reserved size for hdfs?
>
> b) Are you loading data from the datanode?
>
>
>

Re: hdfs system crashes when loading files bigger than local space left

Posted by Allen Wittenauer <aw...@linkedin.com>.
On Jul 14, 2010, at 4:16 AM, Vitaliy Semochkin wrote:
> Sometimes hadoop allows to load amount of data bigger than local space on the node.
> Sometimes hadoop crashes and I have to reformat hdfs.

a) Have you set a reserved size for hdfs?

b) Are you loading data from the datanode?