You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Vitaliy Semochkin <vi...@gmail.com> on 2010/07/15 10:11:12 UTC

Re: hdfs system crashes when loading files bigger than local space left

>a) Have you set a reserved size for hdfs?
Yes. I set 128Mb as reserved size.

b) Are you loading data from the datanode?
Yes. But the datanode is running on same node as namenode (i have very small
cluster, only 5 servers and wasting one node only for namenode/jobtracker
seemed unreasonable to me)

On Wed, Jul 14, 2010 at 7:23 PM, Allen Wittenauer
<aw...@linkedin.com>wrote:

>
> On Jul 14, 2010, at 4:16 AM, Vitaliy Semochkin wrote:
> > Sometimes hadoop allows to load amount of data bigger than local space on
> the node.
> > Sometimes hadoop crashes and I have to reformat hdfs.
>
> a) Have you set a reserved size for hdfs?
>
> b) Are you loading data from the datanode?
>
>
>

Re: hdfs system crashes when loading files bigger than local space left

Posted by Vitaliy Semochkin <vi...@gmail.com>.
On Fri, Jul 16, 2010 at 10:07 PM, Allen Wittenauer <awittenauer@linkedin.com
> wrote:

>
> On Jul 16, 2010, at 3:15 AM, Vitaliy Semochkin wrote:
> > That is likely way too small.
> > Will setting 512Mb be better in case the whole volume size is only 190Gb?
>
> I'd recommend at least 5gb.  I'm also assuming this same disk space isn't
> getting used for MapReduce.

Thank you for advise. I'll increase the amount to 6 gb (hope it will be
enough).
Same disk is used for MapReduce but M/R is not executed during loading.


> > Does hadoop detect/distinct the client that uploads data from datanode
> and not from datanode?
> > lets say I execute
>
> Yes.
>
> > hadoop -put someFile hdfs://namenode.mycompany.com/
> >
> > from namenode.mycompany.com and from some other pc. Will it be any
> different for hadoop and will hadoop orgonize data more balanced in the last
> case?
>
> Yes.
>
> Again, namenode is irrelevant.

I was doing it from namenode which was acting as datanode as well.


> Do not do put's from a datanode if you want the data to be reasonably
> balanced.

Thank you very much. Will perform putting from pc outside the hadoop
cluster.


Regards,
Vitaliy S

Re: hdfs system crashes when loading files bigger than local space left

Posted by Allen Wittenauer <aw...@linkedin.com>.
On Jul 16, 2010, at 3:15 AM, Vitaliy Semochkin wrote:
> That is likely way too small.
> Will setting 512Mb be better in case the whole volume size is only 190Gb?

I'd recommend at least 5gb.  I'm also assuming this same disk space isn't getting used for MapReduce.

> Does hadoop detect/distinct the client that uploads data from datanode and not from datanode?
> lets say I execute

Yes.

> hadoop -put someFile hdfs://namenode.mycompany.com/
> 
> from namenode.mycompany.com and from some other pc. Will it be any different for hadoop and will hadoop orgonize data more balanced in the last case?

Yes.

Again, namenode is irrelevant. Do not do put's from a datanode if you want the data to be reasonably balanced.

Re: hdfs system crashes when loading files bigger than local space left

Posted by Vitaliy Semochkin <vi...@gmail.com>.
On Thu, Jul 15, 2010 at 9:26 PM, Allen Wittenauer
<aw...@linkedin.com>wrote:

>
> On Jul 15, 2010, at 1:11 AM, Vitaliy Semochkin wrote:
>
> > >a) Have you set a reserved size for hdfs?
> > Yes. I set 128Mb as reserved size.
>
> That is likely way too small.

Will setting 512Mb be better in case the whole volume size is only 190Gb?


> > b) Are you loading data from the datanode?
> > Yes. But the datanode is running on same node as namenode (i have very
> small cluster, only 5 servers and wasting one node only for
> namenode/jobtracker seemed unreasonable to me)
>
> Where the NN is running is irrelevant to this particular problem.
>
> The problem is that if you start your data load on a machine also running a
> datanode process, the data will get put onto that node first.  This will
> cause your DFS to be majorly unbalanced.
>
> It is much better to load the data from another host outside the grid.
>

Does hadoop detect/distinct the client that uploads data from datanode and
not from datanode?
lets say I execute

hadoop -put someFile hdfs://namenode.mycompany.com/

from namenode.mycompany.com and from some other pc. Will it be any different
for hadoop and will hadoop orgonize data more balanced in the last case?

Thank you very much for replies,
Vitaliy S

Re: hdfs system crashes when loading files bigger than local space left

Posted by Allen Wittenauer <aw...@linkedin.com>.
On Jul 15, 2010, at 1:11 AM, Vitaliy Semochkin wrote:

> >a) Have you set a reserved size for hdfs?
> Yes. I set 128Mb as reserved size.

That is likely way too small.

> b) Are you loading data from the datanode?
> Yes. But the datanode is running on same node as namenode (i have very small cluster, only 5 servers and wasting one node only for namenode/jobtracker seemed unreasonable to me)

Where the NN is running is irrelevant to this particular problem.

The problem is that if you start your data load on a machine also running a datanode process, the data will get put onto that node first.  This will cause your DFS to be majorly unbalanced.

It is much better to load the data from another host outside the grid.