You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Chris Embree <ce...@gmail.com> on 2013/07/09 04:12:19 UTC

How bad is this? :)

Hey Hadoop smart folks....

I have a tendency to seek optimum performance given my understanding, so
that led to me "brilliant" decision.  We settled on EXT4 for our underlying
FS for HDFS.   Greedy for speed I thought, let's turn the journal off and
gain the speed benefits.  After all, I have 3 copies of the data.

How much does this bother you, given we have a 21 node prod and only 10
node dev cluster.

I'm embarrassed to say I did not capture good pre and post change I/O.  In
my simple brain, not writing to journal just screams improved I/O.

Don't be shy, tell me how badly I have done bad things. (I originally said
"screwed the pooch" but I reconsidered our > USA audience. ;)

If I'm not incredibly wrong, should we consider higher speed (less safe)
file systems?

Correct/support my thinking.
Chris

Re: How bad is this? :)

Posted by Adam Faris <af...@linkedin.com>.
Hi Chris,

You should use a utility like iozone "http://www.iozone.org/" for benchmarking drives while tuning your filesystem.  You may be surprised at what measured values can show you. :)

We use ext4 for storing HDFS blocks on our compute nodes and journaling has been left on.  We also have 'writeback' enabled and commits are delayed by 30 seconds.  Slide 21 has suggestions for tuning ext4: "http://www.slideshare.net/allenwittenauer/2012-lihadoopperf"  Be warned that with these settings and 3 copies of each block, it's still possible to lose data in the event of a power loss.   ~2.5 years ago we had a datacenter power failure and I think lost 6-10 files due to block corruption.  Those files were actively being written when the power failure happened so we ended up rerunning those jobs.  Balancing performance vs exposure is something to keep in mind when making these kinds of changes.  

-- Adam

On Jul 9, 2013, at 12:25 AM, Harsh J <ha...@cloudera.com> wrote:

> This is what I remember: If you disable journalling, running fsck
> after a crash will (be required and) take longer. Certainly not a good
> idea to have an extra wait after the cluster loses power and is being
> restarted, etc.
> 
> On Tue, Jul 9, 2013 at 7:42 AM, Chris Embree <ce...@gmail.com> wrote:
>> Hey Hadoop smart folks....
>> 
>> I have a tendency to seek optimum performance given my understanding, so
>> that led to me "brilliant" decision.  We settled on EXT4 for our underlying
>> FS for HDFS.   Greedy for speed I thought, let's turn the journal off and
>> gain the speed benefits.  After all, I have 3 copies of the data.
>> 
>> How much does this bother you, given we have a 21 node prod and only 10 node
>> dev cluster.
>> 
>> I'm embarrassed to say I did not capture good pre and post change I/O.  In
>> my simple brain, not writing to journal just screams improved I/O.
>> 
>> Don't be shy, tell me how badly I have done bad things. (I originally said
>> "screwed the pooch" but I reconsidered our > USA audience. ;)
>> 
>> If I'm not incredibly wrong, should we consider higher speed (less safe)
>> file systems?
>> 
>> Correct/support my thinking.
>> Chris
> 
> 
> 
> --
> Harsh J


Re: How bad is this? :)

Posted by Adam Faris <af...@linkedin.com>.
Hi Chris,

You should use a utility like iozone "http://www.iozone.org/" for benchmarking drives while tuning your filesystem.  You may be surprised at what measured values can show you. :)

We use ext4 for storing HDFS blocks on our compute nodes and journaling has been left on.  We also have 'writeback' enabled and commits are delayed by 30 seconds.  Slide 21 has suggestions for tuning ext4: "http://www.slideshare.net/allenwittenauer/2012-lihadoopperf"  Be warned that with these settings and 3 copies of each block, it's still possible to lose data in the event of a power loss.   ~2.5 years ago we had a datacenter power failure and I think lost 6-10 files due to block corruption.  Those files were actively being written when the power failure happened so we ended up rerunning those jobs.  Balancing performance vs exposure is something to keep in mind when making these kinds of changes.  

-- Adam

On Jul 9, 2013, at 12:25 AM, Harsh J <ha...@cloudera.com> wrote:

> This is what I remember: If you disable journalling, running fsck
> after a crash will (be required and) take longer. Certainly not a good
> idea to have an extra wait after the cluster loses power and is being
> restarted, etc.
> 
> On Tue, Jul 9, 2013 at 7:42 AM, Chris Embree <ce...@gmail.com> wrote:
>> Hey Hadoop smart folks....
>> 
>> I have a tendency to seek optimum performance given my understanding, so
>> that led to me "brilliant" decision.  We settled on EXT4 for our underlying
>> FS for HDFS.   Greedy for speed I thought, let's turn the journal off and
>> gain the speed benefits.  After all, I have 3 copies of the data.
>> 
>> How much does this bother you, given we have a 21 node prod and only 10 node
>> dev cluster.
>> 
>> I'm embarrassed to say I did not capture good pre and post change I/O.  In
>> my simple brain, not writing to journal just screams improved I/O.
>> 
>> Don't be shy, tell me how badly I have done bad things. (I originally said
>> "screwed the pooch" but I reconsidered our > USA audience. ;)
>> 
>> If I'm not incredibly wrong, should we consider higher speed (less safe)
>> file systems?
>> 
>> Correct/support my thinking.
>> Chris
> 
> 
> 
> --
> Harsh J


Re: How bad is this? :)

Posted by Adam Faris <af...@linkedin.com>.
Hi Chris,

You should use a utility like iozone "http://www.iozone.org/" for benchmarking drives while tuning your filesystem.  You may be surprised at what measured values can show you. :)

We use ext4 for storing HDFS blocks on our compute nodes and journaling has been left on.  We also have 'writeback' enabled and commits are delayed by 30 seconds.  Slide 21 has suggestions for tuning ext4: "http://www.slideshare.net/allenwittenauer/2012-lihadoopperf"  Be warned that with these settings and 3 copies of each block, it's still possible to lose data in the event of a power loss.   ~2.5 years ago we had a datacenter power failure and I think lost 6-10 files due to block corruption.  Those files were actively being written when the power failure happened so we ended up rerunning those jobs.  Balancing performance vs exposure is something to keep in mind when making these kinds of changes.  

-- Adam

On Jul 9, 2013, at 12:25 AM, Harsh J <ha...@cloudera.com> wrote:

> This is what I remember: If you disable journalling, running fsck
> after a crash will (be required and) take longer. Certainly not a good
> idea to have an extra wait after the cluster loses power and is being
> restarted, etc.
> 
> On Tue, Jul 9, 2013 at 7:42 AM, Chris Embree <ce...@gmail.com> wrote:
>> Hey Hadoop smart folks....
>> 
>> I have a tendency to seek optimum performance given my understanding, so
>> that led to me "brilliant" decision.  We settled on EXT4 for our underlying
>> FS for HDFS.   Greedy for speed I thought, let's turn the journal off and
>> gain the speed benefits.  After all, I have 3 copies of the data.
>> 
>> How much does this bother you, given we have a 21 node prod and only 10 node
>> dev cluster.
>> 
>> I'm embarrassed to say I did not capture good pre and post change I/O.  In
>> my simple brain, not writing to journal just screams improved I/O.
>> 
>> Don't be shy, tell me how badly I have done bad things. (I originally said
>> "screwed the pooch" but I reconsidered our > USA audience. ;)
>> 
>> If I'm not incredibly wrong, should we consider higher speed (less safe)
>> file systems?
>> 
>> Correct/support my thinking.
>> Chris
> 
> 
> 
> --
> Harsh J


Re: How bad is this? :)

Posted by Adam Faris <af...@linkedin.com>.
Hi Chris,

You should use a utility like iozone "http://www.iozone.org/" for benchmarking drives while tuning your filesystem.  You may be surprised at what measured values can show you. :)

We use ext4 for storing HDFS blocks on our compute nodes and journaling has been left on.  We also have 'writeback' enabled and commits are delayed by 30 seconds.  Slide 21 has suggestions for tuning ext4: "http://www.slideshare.net/allenwittenauer/2012-lihadoopperf"  Be warned that with these settings and 3 copies of each block, it's still possible to lose data in the event of a power loss.   ~2.5 years ago we had a datacenter power failure and I think lost 6-10 files due to block corruption.  Those files were actively being written when the power failure happened so we ended up rerunning those jobs.  Balancing performance vs exposure is something to keep in mind when making these kinds of changes.  

-- Adam

On Jul 9, 2013, at 12:25 AM, Harsh J <ha...@cloudera.com> wrote:

> This is what I remember: If you disable journalling, running fsck
> after a crash will (be required and) take longer. Certainly not a good
> idea to have an extra wait after the cluster loses power and is being
> restarted, etc.
> 
> On Tue, Jul 9, 2013 at 7:42 AM, Chris Embree <ce...@gmail.com> wrote:
>> Hey Hadoop smart folks....
>> 
>> I have a tendency to seek optimum performance given my understanding, so
>> that led to me "brilliant" decision.  We settled on EXT4 for our underlying
>> FS for HDFS.   Greedy for speed I thought, let's turn the journal off and
>> gain the speed benefits.  After all, I have 3 copies of the data.
>> 
>> How much does this bother you, given we have a 21 node prod and only 10 node
>> dev cluster.
>> 
>> I'm embarrassed to say I did not capture good pre and post change I/O.  In
>> my simple brain, not writing to journal just screams improved I/O.
>> 
>> Don't be shy, tell me how badly I have done bad things. (I originally said
>> "screwed the pooch" but I reconsidered our > USA audience. ;)
>> 
>> If I'm not incredibly wrong, should we consider higher speed (less safe)
>> file systems?
>> 
>> Correct/support my thinking.
>> Chris
> 
> 
> 
> --
> Harsh J


Re: How bad is this? :)

Posted by Harsh J <ha...@cloudera.com>.
This is what I remember: If you disable journalling, running fsck
after a crash will (be required and) take longer. Certainly not a good
idea to have an extra wait after the cluster loses power and is being
restarted, etc.

On Tue, Jul 9, 2013 at 7:42 AM, Chris Embree <ce...@gmail.com> wrote:
> Hey Hadoop smart folks....
>
> I have a tendency to seek optimum performance given my understanding, so
> that led to me "brilliant" decision.  We settled on EXT4 for our underlying
> FS for HDFS.   Greedy for speed I thought, let's turn the journal off and
> gain the speed benefits.  After all, I have 3 copies of the data.
>
> How much does this bother you, given we have a 21 node prod and only 10 node
> dev cluster.
>
> I'm embarrassed to say I did not capture good pre and post change I/O.  In
> my simple brain, not writing to journal just screams improved I/O.
>
> Don't be shy, tell me how badly I have done bad things. (I originally said
> "screwed the pooch" but I reconsidered our > USA audience. ;)
>
> If I'm not incredibly wrong, should we consider higher speed (less safe)
> file systems?
>
> Correct/support my thinking.
> Chris



--
Harsh J

Re: How bad is this? :)

Posted by Harsh J <ha...@cloudera.com>.
This is what I remember: If you disable journalling, running fsck
after a crash will (be required and) take longer. Certainly not a good
idea to have an extra wait after the cluster loses power and is being
restarted, etc.

On Tue, Jul 9, 2013 at 7:42 AM, Chris Embree <ce...@gmail.com> wrote:
> Hey Hadoop smart folks....
>
> I have a tendency to seek optimum performance given my understanding, so
> that led to me "brilliant" decision.  We settled on EXT4 for our underlying
> FS for HDFS.   Greedy for speed I thought, let's turn the journal off and
> gain the speed benefits.  After all, I have 3 copies of the data.
>
> How much does this bother you, given we have a 21 node prod and only 10 node
> dev cluster.
>
> I'm embarrassed to say I did not capture good pre and post change I/O.  In
> my simple brain, not writing to journal just screams improved I/O.
>
> Don't be shy, tell me how badly I have done bad things. (I originally said
> "screwed the pooch" but I reconsidered our > USA audience. ;)
>
> If I'm not incredibly wrong, should we consider higher speed (less safe)
> file systems?
>
> Correct/support my thinking.
> Chris



--
Harsh J

Re: How bad is this? :)

Posted by Harsh J <ha...@cloudera.com>.
This is what I remember: If you disable journalling, running fsck
after a crash will (be required and) take longer. Certainly not a good
idea to have an extra wait after the cluster loses power and is being
restarted, etc.

On Tue, Jul 9, 2013 at 7:42 AM, Chris Embree <ce...@gmail.com> wrote:
> Hey Hadoop smart folks....
>
> I have a tendency to seek optimum performance given my understanding, so
> that led to me "brilliant" decision.  We settled on EXT4 for our underlying
> FS for HDFS.   Greedy for speed I thought, let's turn the journal off and
> gain the speed benefits.  After all, I have 3 copies of the data.
>
> How much does this bother you, given we have a 21 node prod and only 10 node
> dev cluster.
>
> I'm embarrassed to say I did not capture good pre and post change I/O.  In
> my simple brain, not writing to journal just screams improved I/O.
>
> Don't be shy, tell me how badly I have done bad things. (I originally said
> "screwed the pooch" but I reconsidered our > USA audience. ;)
>
> If I'm not incredibly wrong, should we consider higher speed (less safe)
> file systems?
>
> Correct/support my thinking.
> Chris



--
Harsh J

Re: How bad is this? :)

Posted by Harsh J <ha...@cloudera.com>.
This is what I remember: If you disable journalling, running fsck
after a crash will (be required and) take longer. Certainly not a good
idea to have an extra wait after the cluster loses power and is being
restarted, etc.

On Tue, Jul 9, 2013 at 7:42 AM, Chris Embree <ce...@gmail.com> wrote:
> Hey Hadoop smart folks....
>
> I have a tendency to seek optimum performance given my understanding, so
> that led to me "brilliant" decision.  We settled on EXT4 for our underlying
> FS for HDFS.   Greedy for speed I thought, let's turn the journal off and
> gain the speed benefits.  After all, I have 3 copies of the data.
>
> How much does this bother you, given we have a 21 node prod and only 10 node
> dev cluster.
>
> I'm embarrassed to say I did not capture good pre and post change I/O.  In
> my simple brain, not writing to journal just screams improved I/O.
>
> Don't be shy, tell me how badly I have done bad things. (I originally said
> "screwed the pooch" but I reconsidered our > USA audience. ;)
>
> If I'm not incredibly wrong, should we consider higher speed (less safe)
> file systems?
>
> Correct/support my thinking.
> Chris



--
Harsh J