You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Francisco de Freitas <ch...@gmail.com> on 2018/04/18 08:06:43 UTC

Journal node edits directory

We currently run journalnodes together with datanodes and they share the
same mount point for both the data dir and edits dir.

We ran into the issue where this shared mount point volume used for the
datanode got full and thus the journal node was unable to start due to
insufficient space.

How would you go about where to place the journal node edits? Is it
expected to grow very large and/or needs to be in a separate partition? Or
can I use e.g. tmpfs for it? Our namespace of 1PB with 5 journal nodes sees
the journal node edits size of about 5.4GB (on each journal node)

Thanks for any tips and best practices.

Re: Journal node edits directory

Posted by Francisco de Freitas <ch...@gmail.com>.
Hi Anu,

thanks a lot for the tips. Much appreciated. I'll try to implement those
changes.

Regards,
Francisco

On Wed, 18 Apr 2018 at 18:56 Anu Engineer <ae...@hortonworks.com> wrote:

> I would start off by asking that Journal nodes be on separate machines,
> maybe along with namenodes.
>
> If that is not possible, at least provide dedicated disks to journalnode
> process, that is not shared by your datanode process.
>
>
>
> >Is it expected to grow very large and/or needs to be in a separate
> partition?
>
> It is not the size of the journals that will hurt you; the datanode is a
> very high bandwidth application, that is it writes lots of data but can
> afford to be slower.
>
> Whereas journal nodes do not write too much data, but if they are waiting
> around for I/O to complete because of Datanode I/O,
>
> it might lead to your namenodes becoming slow, which means that your
> cluster will be slower. In other words, Journal I/O is latency sensitive.
>
>
>
> Thanks
>
> Anu
>
>
>
> *From: *Francisco de Freitas <ch...@gmail.com>
> *Date: *Wednesday, April 18, 2018 at 1:07 AM
> *To: *"user@hadoop.apache.org" <us...@hadoop.apache.org>
> *Subject: *Journal node edits directory
>
>
>
> We currently run journalnodes together with datanodes and they share the
> same mount point for both the data dir and edits dir.
>
>
>
> We ran into the issue where this shared mount point volume used for the
> datanode got full and thus the journal node was unable to start due to
> insufficient space.
>
>
>
> How would you go about where to place the journal node edits? Is it
> expected to grow very large and/or needs to be in a separate partition? Or
> can I use e.g. tmpfs for it? Our namespace of 1PB with 5 journal nodes sees
> the journal node edits size of about 5.4GB (on each journal node)
>
>
>
> Thanks for any tips and best practices.
>

Re: Journal node edits directory

Posted by Anu Engineer <ae...@hortonworks.com>.
I would start off by asking that Journal nodes be on separate machines, maybe along with namenodes.
If that is not possible, at least provide dedicated disks to journalnode process, that is not shared by your datanode process.

>Is it expected to grow very large and/or needs to be in a separate partition?
It is not the size of the journals that will hurt you; the datanode is a very high bandwidth application, that is it writes lots of data but can afford to be slower.
Whereas journal nodes do not write too much data, but if they are waiting around for I/O to complete because of Datanode I/O,
it might lead to your namenodes becoming slow, which means that your cluster will be slower. In other words, Journal I/O is latency sensitive.

Thanks
Anu

From: Francisco de Freitas <ch...@gmail.com>
Date: Wednesday, April 18, 2018 at 1:07 AM
To: "user@hadoop.apache.org" <us...@hadoop.apache.org>
Subject: Journal node edits directory

We currently run journalnodes together with datanodes and they share the same mount point for both the data dir and edits dir.

We ran into the issue where this shared mount point volume used for the datanode got full and thus the journal node was unable to start due to insufficient space.

How would you go about where to place the journal node edits? Is it expected to grow very large and/or needs to be in a separate partition? Or can I use e.g. tmpfs for it? Our namespace of 1PB with 5 journal nodes sees the journal node edits size of about 5.4GB (on each journal node)

Thanks for any tips and best practices.