You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Ted <r6...@gmail.com> on 2013/05/23 11:49:47 UTC

pauses during startup (maybe network related?)

Hi I'm running hadoop on my local laptop for development and
everything "works" but there's some annoying pauses during the startup
which causes the entire hadoop startup process to take up to 4 minutes
and I'm wondering what it is and if I can do anything about it.

I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2,
oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of
ram and an SSD so it shouldn't be slow.

When the system pauses, there is no cpu usage, no disk usage and no
network usage (although I suspect it's waiting for the network to
resolve or return something).

Here's some snippets from the namenode logs during startup where you
can see it just pauses for around 30 seconds or more with out errors
or anything :

...
2013-05-23 19:26:37,660 INFO
org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
hadoop-metrics2.properties
2013-05-23 19:26:37,676 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
MetricsSystem,sub=Stats registered.
2013-05-23 19:27:54,144 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
period at 10 second(s).
2013-05-23 19:27:54,144 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics
system started
...
2013-05-23 19:27:54,341 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The
dfs.support.append option is in your configuration, however append is
not supported. This configuration option is no longer required to
enable sync.
2013-05-23 19:27:54,341 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
accessTokenLifetime=0 min(s)
2013-05-23 19:28:19,918 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStateMBean and NameNodeMXBean
2013-05-23 19:28:19,937 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
occuring more than 10 times
...
2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 28 on 9000: starting
2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 31 on 9000: starting
2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK* NameSystem.registerDatanode: node registration from
127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513
2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology:
Adding a new node: /default-rack/127.0.0.1:50010


I already start the system with : export
HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop
system, maybe just 1 or 2 test files less than 100k, and there's no
CPU usage so it doesn't look like it's GC thrashing)

I should mention again, there's no errors and the system runs fine and
relatively speedy once started (considering it's on my laptop).

Does anyone know what's causing these pauses? (and how I can get rid of them)
Thanks.
-- 
Ted.

Re: pauses during startup (maybe network related?)

Posted by Ted <r6...@gmail.com>.
just an FYI in case anyone finds this thread in a web search....

I just edited my /etc/hosts file and added a mapping of

     127.0.0.1 <my machine name>

and everything starts up almost instantly, the difference is night and day :)

On 5/24/13, Harsh J <ha...@cloudera.com> wrote:
> You are spot on about the DNS lookup slowing things down. I've faced
> the same issue (before I had a local network DNS set up for the WiFi
> network I use).
>
>> but I'm still more just miffed at how it's knowing I'm a 192 address when
>> I told it to use localhost.
>
> There's a few configs you need to additionally change to make a
> perfect localhost setup. Otherwise, there are defaults in Apache
> Hadoop that bind to 0.0.0.0 and report the current system hostname
> (which changes if you get onto a network), causing what you're seeing.
>
> On Fri, May 24, 2013 at 7:42 AM, Ted <r6...@gmail.com> wrote:
>> thanks, I'm almost 100% sure it's network related now.
>>
>> What I tested was unpluggin my network :), the entire system starts in
>> just a few seconds.
>>
>> I decided to search on "reverse dns" in google and I see other people
>> have complained about very slow reverse dns lookups (some related to
>> hadoop / hbase too).
>>
>> I'm not sure why this is happenning yet though. I thought 127.0.0.1 or
>> localhost would have just resolved instantly - but it appears it's
>> some how finding my real IP instead, i.e. 192.168.1.5 seems to show up
>> in the log entries even though all my configurations say
>> localhost/127.0.0.1 and my /etc/hosts file has and entry for
>> localhost/127.0.0.1
>>
>> I think if I make a /etc/hosts entry for 192.168.1.5 everything will
>> be quick, that's what I'm going to test later. The only problem is I'm
>> on an dynamic IP... I've considered just making entries for all
>> reasonable permutations like 192.168.1.1 through 192.168.1.20... but
>> I'm still more just miffed at how it's knowing I'm a 192 address when
>> I told it to use localhost.
>>
>> On 5/24/13, Chris Nauroth <cn...@hortonworks.com> wrote:
>>> Hi Ted,
>>>
>>> 2013-05-23 19:28:19,937 INFO
>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>>> occuring more than 10 times
>>> ...
>>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 28 on 9000: starting
>>>
>>> There are a couple of relevant activities that happen during namenode
>>> startup in between these 2 log statements.  It loads the current fsimage
>>> (persistent copy of file system metadata), merges in the edits log
>>> (transaction log containing all file system metadata changes since the
>>> last
>>> checkpoint), and then saves back a new fsimage file after that merge.
>>>  Current versions of the Hadoop codebase will print some information to
>>> logs about the volume of activity during this checkpointing process, so
>>> I
>>> recommend looking for that in your logs to see if this explains it.
>>>  Depending on whether or not your have a large number of transactions
>>> queued since your last checkpoint, this whole process can cause namenode
>>> startup to take several minutes.
>>>
>>> If this becomes a regular problem, then you can run SecondaryNameNode or
>>> BackupNode to perform periodic checkpoints in addition to the checkpoint
>>> that occurs on namenode restart.  This is probably overkill for a dev
>>> environment on your laptop though.
>>>
>>> Hope this helps,
>>>
>>> Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>>
>>> On Thu, May 23, 2013 at 2:49 AM, Ted <r6...@gmail.com> wrote:
>>>
>>>> Hi I'm running hadoop on my local laptop for development and
>>>> everything "works" but there's some annoying pauses during the startup
>>>> which causes the entire hadoop startup process to take up to 4 minutes
>>>> and I'm wondering what it is and if I can do anything about it.
>>>>
>>>> I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2,
>>>> oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of
>>>> ram and an SSD so it shouldn't be slow.
>>>>
>>>> When the system pauses, there is no cpu usage, no disk usage and no
>>>> network usage (although I suspect it's waiting for the network to
>>>> resolve or return something).
>>>>
>>>> Here's some snippets from the namenode logs during startup where you
>>>> can see it just pauses for around 30 seconds or more with out errors
>>>> or anything :
>>>>
>>>> ...
>>>> 2013-05-23 19:26:37,660 INFO
>>>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
>>>> hadoop-metrics2.properties
>>>> 2013-05-23 19:26:37,676 INFO
>>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>>>> MetricsSystem,sub=Stats registered.
>>>> 2013-05-23 19:27:54,144 INFO
>>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
>>>> period at 10 second(s).
>>>> 2013-05-23 19:27:54,144 INFO
>>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics
>>>> system started
>>>> ...
>>>> 2013-05-23 19:27:54,341 WARN
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The
>>>> dfs.support.append option is in your configuration, however append is
>>>> not supported. This configuration option is no longer required to
>>>> enable sync.
>>>> 2013-05-23 19:27:54,341 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>>> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
>>>> accessTokenLifetime=0 min(s)
>>>> 2013-05-23 19:28:19,918 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>>>> FSNamesystemStateMBean and NameNodeMXBean
>>>> 2013-05-23 19:28:19,937 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>>>> occuring more than 10 times
>>>> ...
>>>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 28 on 9000: starting
>>>> 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 31 on 9000: starting
>>>> 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange:
>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>> 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513
>>>> 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology:
>>>> Adding a new node: /default-rack/127.0.0.1:50010
>>>>
>>>>
>>>> I already start the system with : export
>>>> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
>>>> I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop
>>>> system, maybe just 1 or 2 test files less than 100k, and there's no
>>>> CPU usage so it doesn't look like it's GC thrashing)
>>>>
>>>> I should mention again, there's no errors and the system runs fine and
>>>> relatively speedy once started (considering it's on my laptop).
>>>>
>>>> Does anyone know what's causing these pauses? (and how I can get rid of
>>>> them)
>>>> Thanks.
>>>> --
>>>> Ted.
>>>>
>>>
>>
>>
>> --
>> Ted.
>
>
>
> --
> Harsh J
>


-- 
Ted.

Re: pauses during startup (maybe network related?)

Posted by Ted <r6...@gmail.com>.
just an FYI in case anyone finds this thread in a web search....

I just edited my /etc/hosts file and added a mapping of

     127.0.0.1 <my machine name>

and everything starts up almost instantly, the difference is night and day :)

On 5/24/13, Harsh J <ha...@cloudera.com> wrote:
> You are spot on about the DNS lookup slowing things down. I've faced
> the same issue (before I had a local network DNS set up for the WiFi
> network I use).
>
>> but I'm still more just miffed at how it's knowing I'm a 192 address when
>> I told it to use localhost.
>
> There's a few configs you need to additionally change to make a
> perfect localhost setup. Otherwise, there are defaults in Apache
> Hadoop that bind to 0.0.0.0 and report the current system hostname
> (which changes if you get onto a network), causing what you're seeing.
>
> On Fri, May 24, 2013 at 7:42 AM, Ted <r6...@gmail.com> wrote:
>> thanks, I'm almost 100% sure it's network related now.
>>
>> What I tested was unpluggin my network :), the entire system starts in
>> just a few seconds.
>>
>> I decided to search on "reverse dns" in google and I see other people
>> have complained about very slow reverse dns lookups (some related to
>> hadoop / hbase too).
>>
>> I'm not sure why this is happenning yet though. I thought 127.0.0.1 or
>> localhost would have just resolved instantly - but it appears it's
>> some how finding my real IP instead, i.e. 192.168.1.5 seems to show up
>> in the log entries even though all my configurations say
>> localhost/127.0.0.1 and my /etc/hosts file has and entry for
>> localhost/127.0.0.1
>>
>> I think if I make a /etc/hosts entry for 192.168.1.5 everything will
>> be quick, that's what I'm going to test later. The only problem is I'm
>> on an dynamic IP... I've considered just making entries for all
>> reasonable permutations like 192.168.1.1 through 192.168.1.20... but
>> I'm still more just miffed at how it's knowing I'm a 192 address when
>> I told it to use localhost.
>>
>> On 5/24/13, Chris Nauroth <cn...@hortonworks.com> wrote:
>>> Hi Ted,
>>>
>>> 2013-05-23 19:28:19,937 INFO
>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>>> occuring more than 10 times
>>> ...
>>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 28 on 9000: starting
>>>
>>> There are a couple of relevant activities that happen during namenode
>>> startup in between these 2 log statements.  It loads the current fsimage
>>> (persistent copy of file system metadata), merges in the edits log
>>> (transaction log containing all file system metadata changes since the
>>> last
>>> checkpoint), and then saves back a new fsimage file after that merge.
>>>  Current versions of the Hadoop codebase will print some information to
>>> logs about the volume of activity during this checkpointing process, so
>>> I
>>> recommend looking for that in your logs to see if this explains it.
>>>  Depending on whether or not your have a large number of transactions
>>> queued since your last checkpoint, this whole process can cause namenode
>>> startup to take several minutes.
>>>
>>> If this becomes a regular problem, then you can run SecondaryNameNode or
>>> BackupNode to perform periodic checkpoints in addition to the checkpoint
>>> that occurs on namenode restart.  This is probably overkill for a dev
>>> environment on your laptop though.
>>>
>>> Hope this helps,
>>>
>>> Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>>
>>> On Thu, May 23, 2013 at 2:49 AM, Ted <r6...@gmail.com> wrote:
>>>
>>>> Hi I'm running hadoop on my local laptop for development and
>>>> everything "works" but there's some annoying pauses during the startup
>>>> which causes the entire hadoop startup process to take up to 4 minutes
>>>> and I'm wondering what it is and if I can do anything about it.
>>>>
>>>> I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2,
>>>> oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of
>>>> ram and an SSD so it shouldn't be slow.
>>>>
>>>> When the system pauses, there is no cpu usage, no disk usage and no
>>>> network usage (although I suspect it's waiting for the network to
>>>> resolve or return something).
>>>>
>>>> Here's some snippets from the namenode logs during startup where you
>>>> can see it just pauses for around 30 seconds or more with out errors
>>>> or anything :
>>>>
>>>> ...
>>>> 2013-05-23 19:26:37,660 INFO
>>>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
>>>> hadoop-metrics2.properties
>>>> 2013-05-23 19:26:37,676 INFO
>>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>>>> MetricsSystem,sub=Stats registered.
>>>> 2013-05-23 19:27:54,144 INFO
>>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
>>>> period at 10 second(s).
>>>> 2013-05-23 19:27:54,144 INFO
>>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics
>>>> system started
>>>> ...
>>>> 2013-05-23 19:27:54,341 WARN
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The
>>>> dfs.support.append option is in your configuration, however append is
>>>> not supported. This configuration option is no longer required to
>>>> enable sync.
>>>> 2013-05-23 19:27:54,341 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>>> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
>>>> accessTokenLifetime=0 min(s)
>>>> 2013-05-23 19:28:19,918 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>>>> FSNamesystemStateMBean and NameNodeMXBean
>>>> 2013-05-23 19:28:19,937 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>>>> occuring more than 10 times
>>>> ...
>>>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 28 on 9000: starting
>>>> 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 31 on 9000: starting
>>>> 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange:
>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>> 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513
>>>> 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology:
>>>> Adding a new node: /default-rack/127.0.0.1:50010
>>>>
>>>>
>>>> I already start the system with : export
>>>> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
>>>> I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop
>>>> system, maybe just 1 or 2 test files less than 100k, and there's no
>>>> CPU usage so it doesn't look like it's GC thrashing)
>>>>
>>>> I should mention again, there's no errors and the system runs fine and
>>>> relatively speedy once started (considering it's on my laptop).
>>>>
>>>> Does anyone know what's causing these pauses? (and how I can get rid of
>>>> them)
>>>> Thanks.
>>>> --
>>>> Ted.
>>>>
>>>
>>
>>
>> --
>> Ted.
>
>
>
> --
> Harsh J
>


-- 
Ted.

Re: pauses during startup (maybe network related?)

Posted by Ted <r6...@gmail.com>.
just an FYI in case anyone finds this thread in a web search....

I just edited my /etc/hosts file and added a mapping of

     127.0.0.1 <my machine name>

and everything starts up almost instantly, the difference is night and day :)

On 5/24/13, Harsh J <ha...@cloudera.com> wrote:
> You are spot on about the DNS lookup slowing things down. I've faced
> the same issue (before I had a local network DNS set up for the WiFi
> network I use).
>
>> but I'm still more just miffed at how it's knowing I'm a 192 address when
>> I told it to use localhost.
>
> There's a few configs you need to additionally change to make a
> perfect localhost setup. Otherwise, there are defaults in Apache
> Hadoop that bind to 0.0.0.0 and report the current system hostname
> (which changes if you get onto a network), causing what you're seeing.
>
> On Fri, May 24, 2013 at 7:42 AM, Ted <r6...@gmail.com> wrote:
>> thanks, I'm almost 100% sure it's network related now.
>>
>> What I tested was unpluggin my network :), the entire system starts in
>> just a few seconds.
>>
>> I decided to search on "reverse dns" in google and I see other people
>> have complained about very slow reverse dns lookups (some related to
>> hadoop / hbase too).
>>
>> I'm not sure why this is happenning yet though. I thought 127.0.0.1 or
>> localhost would have just resolved instantly - but it appears it's
>> some how finding my real IP instead, i.e. 192.168.1.5 seems to show up
>> in the log entries even though all my configurations say
>> localhost/127.0.0.1 and my /etc/hosts file has and entry for
>> localhost/127.0.0.1
>>
>> I think if I make a /etc/hosts entry for 192.168.1.5 everything will
>> be quick, that's what I'm going to test later. The only problem is I'm
>> on an dynamic IP... I've considered just making entries for all
>> reasonable permutations like 192.168.1.1 through 192.168.1.20... but
>> I'm still more just miffed at how it's knowing I'm a 192 address when
>> I told it to use localhost.
>>
>> On 5/24/13, Chris Nauroth <cn...@hortonworks.com> wrote:
>>> Hi Ted,
>>>
>>> 2013-05-23 19:28:19,937 INFO
>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>>> occuring more than 10 times
>>> ...
>>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 28 on 9000: starting
>>>
>>> There are a couple of relevant activities that happen during namenode
>>> startup in between these 2 log statements.  It loads the current fsimage
>>> (persistent copy of file system metadata), merges in the edits log
>>> (transaction log containing all file system metadata changes since the
>>> last
>>> checkpoint), and then saves back a new fsimage file after that merge.
>>>  Current versions of the Hadoop codebase will print some information to
>>> logs about the volume of activity during this checkpointing process, so
>>> I
>>> recommend looking for that in your logs to see if this explains it.
>>>  Depending on whether or not your have a large number of transactions
>>> queued since your last checkpoint, this whole process can cause namenode
>>> startup to take several minutes.
>>>
>>> If this becomes a regular problem, then you can run SecondaryNameNode or
>>> BackupNode to perform periodic checkpoints in addition to the checkpoint
>>> that occurs on namenode restart.  This is probably overkill for a dev
>>> environment on your laptop though.
>>>
>>> Hope this helps,
>>>
>>> Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>>
>>> On Thu, May 23, 2013 at 2:49 AM, Ted <r6...@gmail.com> wrote:
>>>
>>>> Hi I'm running hadoop on my local laptop for development and
>>>> everything "works" but there's some annoying pauses during the startup
>>>> which causes the entire hadoop startup process to take up to 4 minutes
>>>> and I'm wondering what it is and if I can do anything about it.
>>>>
>>>> I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2,
>>>> oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of
>>>> ram and an SSD so it shouldn't be slow.
>>>>
>>>> When the system pauses, there is no cpu usage, no disk usage and no
>>>> network usage (although I suspect it's waiting for the network to
>>>> resolve or return something).
>>>>
>>>> Here's some snippets from the namenode logs during startup where you
>>>> can see it just pauses for around 30 seconds or more with out errors
>>>> or anything :
>>>>
>>>> ...
>>>> 2013-05-23 19:26:37,660 INFO
>>>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
>>>> hadoop-metrics2.properties
>>>> 2013-05-23 19:26:37,676 INFO
>>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>>>> MetricsSystem,sub=Stats registered.
>>>> 2013-05-23 19:27:54,144 INFO
>>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
>>>> period at 10 second(s).
>>>> 2013-05-23 19:27:54,144 INFO
>>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics
>>>> system started
>>>> ...
>>>> 2013-05-23 19:27:54,341 WARN
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The
>>>> dfs.support.append option is in your configuration, however append is
>>>> not supported. This configuration option is no longer required to
>>>> enable sync.
>>>> 2013-05-23 19:27:54,341 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>>> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
>>>> accessTokenLifetime=0 min(s)
>>>> 2013-05-23 19:28:19,918 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>>>> FSNamesystemStateMBean and NameNodeMXBean
>>>> 2013-05-23 19:28:19,937 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>>>> occuring more than 10 times
>>>> ...
>>>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 28 on 9000: starting
>>>> 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 31 on 9000: starting
>>>> 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange:
>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>> 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513
>>>> 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology:
>>>> Adding a new node: /default-rack/127.0.0.1:50010
>>>>
>>>>
>>>> I already start the system with : export
>>>> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
>>>> I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop
>>>> system, maybe just 1 or 2 test files less than 100k, and there's no
>>>> CPU usage so it doesn't look like it's GC thrashing)
>>>>
>>>> I should mention again, there's no errors and the system runs fine and
>>>> relatively speedy once started (considering it's on my laptop).
>>>>
>>>> Does anyone know what's causing these pauses? (and how I can get rid of
>>>> them)
>>>> Thanks.
>>>> --
>>>> Ted.
>>>>
>>>
>>
>>
>> --
>> Ted.
>
>
>
> --
> Harsh J
>


-- 
Ted.

Re: pauses during startup (maybe network related?)

Posted by Ted <r6...@gmail.com>.
just an FYI in case anyone finds this thread in a web search....

I just edited my /etc/hosts file and added a mapping of

     127.0.0.1 <my machine name>

and everything starts up almost instantly, the difference is night and day :)

On 5/24/13, Harsh J <ha...@cloudera.com> wrote:
> You are spot on about the DNS lookup slowing things down. I've faced
> the same issue (before I had a local network DNS set up for the WiFi
> network I use).
>
>> but I'm still more just miffed at how it's knowing I'm a 192 address when
>> I told it to use localhost.
>
> There's a few configs you need to additionally change to make a
> perfect localhost setup. Otherwise, there are defaults in Apache
> Hadoop that bind to 0.0.0.0 and report the current system hostname
> (which changes if you get onto a network), causing what you're seeing.
>
> On Fri, May 24, 2013 at 7:42 AM, Ted <r6...@gmail.com> wrote:
>> thanks, I'm almost 100% sure it's network related now.
>>
>> What I tested was unpluggin my network :), the entire system starts in
>> just a few seconds.
>>
>> I decided to search on "reverse dns" in google and I see other people
>> have complained about very slow reverse dns lookups (some related to
>> hadoop / hbase too).
>>
>> I'm not sure why this is happenning yet though. I thought 127.0.0.1 or
>> localhost would have just resolved instantly - but it appears it's
>> some how finding my real IP instead, i.e. 192.168.1.5 seems to show up
>> in the log entries even though all my configurations say
>> localhost/127.0.0.1 and my /etc/hosts file has and entry for
>> localhost/127.0.0.1
>>
>> I think if I make a /etc/hosts entry for 192.168.1.5 everything will
>> be quick, that's what I'm going to test later. The only problem is I'm
>> on an dynamic IP... I've considered just making entries for all
>> reasonable permutations like 192.168.1.1 through 192.168.1.20... but
>> I'm still more just miffed at how it's knowing I'm a 192 address when
>> I told it to use localhost.
>>
>> On 5/24/13, Chris Nauroth <cn...@hortonworks.com> wrote:
>>> Hi Ted,
>>>
>>> 2013-05-23 19:28:19,937 INFO
>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>>> occuring more than 10 times
>>> ...
>>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 28 on 9000: starting
>>>
>>> There are a couple of relevant activities that happen during namenode
>>> startup in between these 2 log statements.  It loads the current fsimage
>>> (persistent copy of file system metadata), merges in the edits log
>>> (transaction log containing all file system metadata changes since the
>>> last
>>> checkpoint), and then saves back a new fsimage file after that merge.
>>>  Current versions of the Hadoop codebase will print some information to
>>> logs about the volume of activity during this checkpointing process, so
>>> I
>>> recommend looking for that in your logs to see if this explains it.
>>>  Depending on whether or not your have a large number of transactions
>>> queued since your last checkpoint, this whole process can cause namenode
>>> startup to take several minutes.
>>>
>>> If this becomes a regular problem, then you can run SecondaryNameNode or
>>> BackupNode to perform periodic checkpoints in addition to the checkpoint
>>> that occurs on namenode restart.  This is probably overkill for a dev
>>> environment on your laptop though.
>>>
>>> Hope this helps,
>>>
>>> Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>>
>>> On Thu, May 23, 2013 at 2:49 AM, Ted <r6...@gmail.com> wrote:
>>>
>>>> Hi I'm running hadoop on my local laptop for development and
>>>> everything "works" but there's some annoying pauses during the startup
>>>> which causes the entire hadoop startup process to take up to 4 minutes
>>>> and I'm wondering what it is and if I can do anything about it.
>>>>
>>>> I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2,
>>>> oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of
>>>> ram and an SSD so it shouldn't be slow.
>>>>
>>>> When the system pauses, there is no cpu usage, no disk usage and no
>>>> network usage (although I suspect it's waiting for the network to
>>>> resolve or return something).
>>>>
>>>> Here's some snippets from the namenode logs during startup where you
>>>> can see it just pauses for around 30 seconds or more with out errors
>>>> or anything :
>>>>
>>>> ...
>>>> 2013-05-23 19:26:37,660 INFO
>>>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
>>>> hadoop-metrics2.properties
>>>> 2013-05-23 19:26:37,676 INFO
>>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>>>> MetricsSystem,sub=Stats registered.
>>>> 2013-05-23 19:27:54,144 INFO
>>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
>>>> period at 10 second(s).
>>>> 2013-05-23 19:27:54,144 INFO
>>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics
>>>> system started
>>>> ...
>>>> 2013-05-23 19:27:54,341 WARN
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The
>>>> dfs.support.append option is in your configuration, however append is
>>>> not supported. This configuration option is no longer required to
>>>> enable sync.
>>>> 2013-05-23 19:27:54,341 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>>> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
>>>> accessTokenLifetime=0 min(s)
>>>> 2013-05-23 19:28:19,918 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>>>> FSNamesystemStateMBean and NameNodeMXBean
>>>> 2013-05-23 19:28:19,937 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>>>> occuring more than 10 times
>>>> ...
>>>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 28 on 9000: starting
>>>> 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server
>>>> handler 31 on 9000: starting
>>>> 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange:
>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>> 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513
>>>> 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology:
>>>> Adding a new node: /default-rack/127.0.0.1:50010
>>>>
>>>>
>>>> I already start the system with : export
>>>> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
>>>> I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop
>>>> system, maybe just 1 or 2 test files less than 100k, and there's no
>>>> CPU usage so it doesn't look like it's GC thrashing)
>>>>
>>>> I should mention again, there's no errors and the system runs fine and
>>>> relatively speedy once started (considering it's on my laptop).
>>>>
>>>> Does anyone know what's causing these pauses? (and how I can get rid of
>>>> them)
>>>> Thanks.
>>>> --
>>>> Ted.
>>>>
>>>
>>
>>
>> --
>> Ted.
>
>
>
> --
> Harsh J
>


-- 
Ted.

Re: pauses during startup (maybe network related?)

Posted by Harsh J <ha...@cloudera.com>.
You are spot on about the DNS lookup slowing things down. I've faced
the same issue (before I had a local network DNS set up for the WiFi
network I use).

> but I'm still more just miffed at how it's knowing I'm a 192 address when I told it to use localhost.

There's a few configs you need to additionally change to make a
perfect localhost setup. Otherwise, there are defaults in Apache
Hadoop that bind to 0.0.0.0 and report the current system hostname
(which changes if you get onto a network), causing what you're seeing.

On Fri, May 24, 2013 at 7:42 AM, Ted <r6...@gmail.com> wrote:
> thanks, I'm almost 100% sure it's network related now.
>
> What I tested was unpluggin my network :), the entire system starts in
> just a few seconds.
>
> I decided to search on "reverse dns" in google and I see other people
> have complained about very slow reverse dns lookups (some related to
> hadoop / hbase too).
>
> I'm not sure why this is happenning yet though. I thought 127.0.0.1 or
> localhost would have just resolved instantly - but it appears it's
> some how finding my real IP instead, i.e. 192.168.1.5 seems to show up
> in the log entries even though all my configurations say
> localhost/127.0.0.1 and my /etc/hosts file has and entry for
> localhost/127.0.0.1
>
> I think if I make a /etc/hosts entry for 192.168.1.5 everything will
> be quick, that's what I'm going to test later. The only problem is I'm
> on an dynamic IP... I've considered just making entries for all
> reasonable permutations like 192.168.1.1 through 192.168.1.20... but
> I'm still more just miffed at how it's knowing I'm a 192 address when
> I told it to use localhost.
>
> On 5/24/13, Chris Nauroth <cn...@hortonworks.com> wrote:
>> Hi Ted,
>>
>> 2013-05-23 19:28:19,937 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>> occuring more than 10 times
>> ...
>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 28 on 9000: starting
>>
>> There are a couple of relevant activities that happen during namenode
>> startup in between these 2 log statements.  It loads the current fsimage
>> (persistent copy of file system metadata), merges in the edits log
>> (transaction log containing all file system metadata changes since the last
>> checkpoint), and then saves back a new fsimage file after that merge.
>>  Current versions of the Hadoop codebase will print some information to
>> logs about the volume of activity during this checkpointing process, so I
>> recommend looking for that in your logs to see if this explains it.
>>  Depending on whether or not your have a large number of transactions
>> queued since your last checkpoint, this whole process can cause namenode
>> startup to take several minutes.
>>
>> If this becomes a regular problem, then you can run SecondaryNameNode or
>> BackupNode to perform periodic checkpoints in addition to the checkpoint
>> that occurs on namenode restart.  This is probably overkill for a dev
>> environment on your laptop though.
>>
>> Hope this helps,
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>> On Thu, May 23, 2013 at 2:49 AM, Ted <r6...@gmail.com> wrote:
>>
>>> Hi I'm running hadoop on my local laptop for development and
>>> everything "works" but there's some annoying pauses during the startup
>>> which causes the entire hadoop startup process to take up to 4 minutes
>>> and I'm wondering what it is and if I can do anything about it.
>>>
>>> I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2,
>>> oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of
>>> ram and an SSD so it shouldn't be slow.
>>>
>>> When the system pauses, there is no cpu usage, no disk usage and no
>>> network usage (although I suspect it's waiting for the network to
>>> resolve or return something).
>>>
>>> Here's some snippets from the namenode logs during startup where you
>>> can see it just pauses for around 30 seconds or more with out errors
>>> or anything :
>>>
>>> ...
>>> 2013-05-23 19:26:37,660 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
>>> hadoop-metrics2.properties
>>> 2013-05-23 19:26:37,676 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>>> MetricsSystem,sub=Stats registered.
>>> 2013-05-23 19:27:54,144 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
>>> period at 10 second(s).
>>> 2013-05-23 19:27:54,144 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics
>>> system started
>>> ...
>>> 2013-05-23 19:27:54,341 WARN
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The
>>> dfs.support.append option is in your configuration, however append is
>>> not supported. This configuration option is no longer required to
>>> enable sync.
>>> 2013-05-23 19:27:54,341 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
>>> accessTokenLifetime=0 min(s)
>>> 2013-05-23 19:28:19,918 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>>> FSNamesystemStateMBean and NameNodeMXBean
>>> 2013-05-23 19:28:19,937 INFO
>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>>> occuring more than 10 times
>>> ...
>>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 28 on 9000: starting
>>> 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 31 on 9000: starting
>>> 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange:
>>> BLOCK* NameSystem.registerDatanode: node registration from
>>> 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513
>>> 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology:
>>> Adding a new node: /default-rack/127.0.0.1:50010
>>>
>>>
>>> I already start the system with : export
>>> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
>>> I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop
>>> system, maybe just 1 or 2 test files less than 100k, and there's no
>>> CPU usage so it doesn't look like it's GC thrashing)
>>>
>>> I should mention again, there's no errors and the system runs fine and
>>> relatively speedy once started (considering it's on my laptop).
>>>
>>> Does anyone know what's causing these pauses? (and how I can get rid of
>>> them)
>>> Thanks.
>>> --
>>> Ted.
>>>
>>
>
>
> --
> Ted.



-- 
Harsh J

Re: pauses during startup (maybe network related?)

Posted by Harsh J <ha...@cloudera.com>.
You are spot on about the DNS lookup slowing things down. I've faced
the same issue (before I had a local network DNS set up for the WiFi
network I use).

> but I'm still more just miffed at how it's knowing I'm a 192 address when I told it to use localhost.

There's a few configs you need to additionally change to make a
perfect localhost setup. Otherwise, there are defaults in Apache
Hadoop that bind to 0.0.0.0 and report the current system hostname
(which changes if you get onto a network), causing what you're seeing.

On Fri, May 24, 2013 at 7:42 AM, Ted <r6...@gmail.com> wrote:
> thanks, I'm almost 100% sure it's network related now.
>
> What I tested was unpluggin my network :), the entire system starts in
> just a few seconds.
>
> I decided to search on "reverse dns" in google and I see other people
> have complained about very slow reverse dns lookups (some related to
> hadoop / hbase too).
>
> I'm not sure why this is happenning yet though. I thought 127.0.0.1 or
> localhost would have just resolved instantly - but it appears it's
> some how finding my real IP instead, i.e. 192.168.1.5 seems to show up
> in the log entries even though all my configurations say
> localhost/127.0.0.1 and my /etc/hosts file has and entry for
> localhost/127.0.0.1
>
> I think if I make a /etc/hosts entry for 192.168.1.5 everything will
> be quick, that's what I'm going to test later. The only problem is I'm
> on an dynamic IP... I've considered just making entries for all
> reasonable permutations like 192.168.1.1 through 192.168.1.20... but
> I'm still more just miffed at how it's knowing I'm a 192 address when
> I told it to use localhost.
>
> On 5/24/13, Chris Nauroth <cn...@hortonworks.com> wrote:
>> Hi Ted,
>>
>> 2013-05-23 19:28:19,937 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>> occuring more than 10 times
>> ...
>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 28 on 9000: starting
>>
>> There are a couple of relevant activities that happen during namenode
>> startup in between these 2 log statements.  It loads the current fsimage
>> (persistent copy of file system metadata), merges in the edits log
>> (transaction log containing all file system metadata changes since the last
>> checkpoint), and then saves back a new fsimage file after that merge.
>>  Current versions of the Hadoop codebase will print some information to
>> logs about the volume of activity during this checkpointing process, so I
>> recommend looking for that in your logs to see if this explains it.
>>  Depending on whether or not your have a large number of transactions
>> queued since your last checkpoint, this whole process can cause namenode
>> startup to take several minutes.
>>
>> If this becomes a regular problem, then you can run SecondaryNameNode or
>> BackupNode to perform periodic checkpoints in addition to the checkpoint
>> that occurs on namenode restart.  This is probably overkill for a dev
>> environment on your laptop though.
>>
>> Hope this helps,
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>> On Thu, May 23, 2013 at 2:49 AM, Ted <r6...@gmail.com> wrote:
>>
>>> Hi I'm running hadoop on my local laptop for development and
>>> everything "works" but there's some annoying pauses during the startup
>>> which causes the entire hadoop startup process to take up to 4 minutes
>>> and I'm wondering what it is and if I can do anything about it.
>>>
>>> I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2,
>>> oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of
>>> ram and an SSD so it shouldn't be slow.
>>>
>>> When the system pauses, there is no cpu usage, no disk usage and no
>>> network usage (although I suspect it's waiting for the network to
>>> resolve or return something).
>>>
>>> Here's some snippets from the namenode logs during startup where you
>>> can see it just pauses for around 30 seconds or more with out errors
>>> or anything :
>>>
>>> ...
>>> 2013-05-23 19:26:37,660 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
>>> hadoop-metrics2.properties
>>> 2013-05-23 19:26:37,676 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>>> MetricsSystem,sub=Stats registered.
>>> 2013-05-23 19:27:54,144 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
>>> period at 10 second(s).
>>> 2013-05-23 19:27:54,144 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics
>>> system started
>>> ...
>>> 2013-05-23 19:27:54,341 WARN
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The
>>> dfs.support.append option is in your configuration, however append is
>>> not supported. This configuration option is no longer required to
>>> enable sync.
>>> 2013-05-23 19:27:54,341 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
>>> accessTokenLifetime=0 min(s)
>>> 2013-05-23 19:28:19,918 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>>> FSNamesystemStateMBean and NameNodeMXBean
>>> 2013-05-23 19:28:19,937 INFO
>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>>> occuring more than 10 times
>>> ...
>>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 28 on 9000: starting
>>> 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 31 on 9000: starting
>>> 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange:
>>> BLOCK* NameSystem.registerDatanode: node registration from
>>> 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513
>>> 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology:
>>> Adding a new node: /default-rack/127.0.0.1:50010
>>>
>>>
>>> I already start the system with : export
>>> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
>>> I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop
>>> system, maybe just 1 or 2 test files less than 100k, and there's no
>>> CPU usage so it doesn't look like it's GC thrashing)
>>>
>>> I should mention again, there's no errors and the system runs fine and
>>> relatively speedy once started (considering it's on my laptop).
>>>
>>> Does anyone know what's causing these pauses? (and how I can get rid of
>>> them)
>>> Thanks.
>>> --
>>> Ted.
>>>
>>
>
>
> --
> Ted.



-- 
Harsh J

Re: pauses during startup (maybe network related?)

Posted by Harsh J <ha...@cloudera.com>.
You are spot on about the DNS lookup slowing things down. I've faced
the same issue (before I had a local network DNS set up for the WiFi
network I use).

> but I'm still more just miffed at how it's knowing I'm a 192 address when I told it to use localhost.

There's a few configs you need to additionally change to make a
perfect localhost setup. Otherwise, there are defaults in Apache
Hadoop that bind to 0.0.0.0 and report the current system hostname
(which changes if you get onto a network), causing what you're seeing.

On Fri, May 24, 2013 at 7:42 AM, Ted <r6...@gmail.com> wrote:
> thanks, I'm almost 100% sure it's network related now.
>
> What I tested was unpluggin my network :), the entire system starts in
> just a few seconds.
>
> I decided to search on "reverse dns" in google and I see other people
> have complained about very slow reverse dns lookups (some related to
> hadoop / hbase too).
>
> I'm not sure why this is happenning yet though. I thought 127.0.0.1 or
> localhost would have just resolved instantly - but it appears it's
> some how finding my real IP instead, i.e. 192.168.1.5 seems to show up
> in the log entries even though all my configurations say
> localhost/127.0.0.1 and my /etc/hosts file has and entry for
> localhost/127.0.0.1
>
> I think if I make a /etc/hosts entry for 192.168.1.5 everything will
> be quick, that's what I'm going to test later. The only problem is I'm
> on an dynamic IP... I've considered just making entries for all
> reasonable permutations like 192.168.1.1 through 192.168.1.20... but
> I'm still more just miffed at how it's knowing I'm a 192 address when
> I told it to use localhost.
>
> On 5/24/13, Chris Nauroth <cn...@hortonworks.com> wrote:
>> Hi Ted,
>>
>> 2013-05-23 19:28:19,937 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>> occuring more than 10 times
>> ...
>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 28 on 9000: starting
>>
>> There are a couple of relevant activities that happen during namenode
>> startup in between these 2 log statements.  It loads the current fsimage
>> (persistent copy of file system metadata), merges in the edits log
>> (transaction log containing all file system metadata changes since the last
>> checkpoint), and then saves back a new fsimage file after that merge.
>>  Current versions of the Hadoop codebase will print some information to
>> logs about the volume of activity during this checkpointing process, so I
>> recommend looking for that in your logs to see if this explains it.
>>  Depending on whether or not your have a large number of transactions
>> queued since your last checkpoint, this whole process can cause namenode
>> startup to take several minutes.
>>
>> If this becomes a regular problem, then you can run SecondaryNameNode or
>> BackupNode to perform periodic checkpoints in addition to the checkpoint
>> that occurs on namenode restart.  This is probably overkill for a dev
>> environment on your laptop though.
>>
>> Hope this helps,
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>> On Thu, May 23, 2013 at 2:49 AM, Ted <r6...@gmail.com> wrote:
>>
>>> Hi I'm running hadoop on my local laptop for development and
>>> everything "works" but there's some annoying pauses during the startup
>>> which causes the entire hadoop startup process to take up to 4 minutes
>>> and I'm wondering what it is and if I can do anything about it.
>>>
>>> I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2,
>>> oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of
>>> ram and an SSD so it shouldn't be slow.
>>>
>>> When the system pauses, there is no cpu usage, no disk usage and no
>>> network usage (although I suspect it's waiting for the network to
>>> resolve or return something).
>>>
>>> Here's some snippets from the namenode logs during startup where you
>>> can see it just pauses for around 30 seconds or more with out errors
>>> or anything :
>>>
>>> ...
>>> 2013-05-23 19:26:37,660 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
>>> hadoop-metrics2.properties
>>> 2013-05-23 19:26:37,676 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>>> MetricsSystem,sub=Stats registered.
>>> 2013-05-23 19:27:54,144 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
>>> period at 10 second(s).
>>> 2013-05-23 19:27:54,144 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics
>>> system started
>>> ...
>>> 2013-05-23 19:27:54,341 WARN
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The
>>> dfs.support.append option is in your configuration, however append is
>>> not supported. This configuration option is no longer required to
>>> enable sync.
>>> 2013-05-23 19:27:54,341 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
>>> accessTokenLifetime=0 min(s)
>>> 2013-05-23 19:28:19,918 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>>> FSNamesystemStateMBean and NameNodeMXBean
>>> 2013-05-23 19:28:19,937 INFO
>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>>> occuring more than 10 times
>>> ...
>>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 28 on 9000: starting
>>> 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 31 on 9000: starting
>>> 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange:
>>> BLOCK* NameSystem.registerDatanode: node registration from
>>> 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513
>>> 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology:
>>> Adding a new node: /default-rack/127.0.0.1:50010
>>>
>>>
>>> I already start the system with : export
>>> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
>>> I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop
>>> system, maybe just 1 or 2 test files less than 100k, and there's no
>>> CPU usage so it doesn't look like it's GC thrashing)
>>>
>>> I should mention again, there's no errors and the system runs fine and
>>> relatively speedy once started (considering it's on my laptop).
>>>
>>> Does anyone know what's causing these pauses? (and how I can get rid of
>>> them)
>>> Thanks.
>>> --
>>> Ted.
>>>
>>
>
>
> --
> Ted.



-- 
Harsh J

Re: pauses during startup (maybe network related?)

Posted by Harsh J <ha...@cloudera.com>.
You are spot on about the DNS lookup slowing things down. I've faced
the same issue (before I had a local network DNS set up for the WiFi
network I use).

> but I'm still more just miffed at how it's knowing I'm a 192 address when I told it to use localhost.

There's a few configs you need to additionally change to make a
perfect localhost setup. Otherwise, there are defaults in Apache
Hadoop that bind to 0.0.0.0 and report the current system hostname
(which changes if you get onto a network), causing what you're seeing.

On Fri, May 24, 2013 at 7:42 AM, Ted <r6...@gmail.com> wrote:
> thanks, I'm almost 100% sure it's network related now.
>
> What I tested was unpluggin my network :), the entire system starts in
> just a few seconds.
>
> I decided to search on "reverse dns" in google and I see other people
> have complained about very slow reverse dns lookups (some related to
> hadoop / hbase too).
>
> I'm not sure why this is happenning yet though. I thought 127.0.0.1 or
> localhost would have just resolved instantly - but it appears it's
> some how finding my real IP instead, i.e. 192.168.1.5 seems to show up
> in the log entries even though all my configurations say
> localhost/127.0.0.1 and my /etc/hosts file has and entry for
> localhost/127.0.0.1
>
> I think if I make a /etc/hosts entry for 192.168.1.5 everything will
> be quick, that's what I'm going to test later. The only problem is I'm
> on an dynamic IP... I've considered just making entries for all
> reasonable permutations like 192.168.1.1 through 192.168.1.20... but
> I'm still more just miffed at how it's knowing I'm a 192 address when
> I told it to use localhost.
>
> On 5/24/13, Chris Nauroth <cn...@hortonworks.com> wrote:
>> Hi Ted,
>>
>> 2013-05-23 19:28:19,937 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>> occuring more than 10 times
>> ...
>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 28 on 9000: starting
>>
>> There are a couple of relevant activities that happen during namenode
>> startup in between these 2 log statements.  It loads the current fsimage
>> (persistent copy of file system metadata), merges in the edits log
>> (transaction log containing all file system metadata changes since the last
>> checkpoint), and then saves back a new fsimage file after that merge.
>>  Current versions of the Hadoop codebase will print some information to
>> logs about the volume of activity during this checkpointing process, so I
>> recommend looking for that in your logs to see if this explains it.
>>  Depending on whether or not your have a large number of transactions
>> queued since your last checkpoint, this whole process can cause namenode
>> startup to take several minutes.
>>
>> If this becomes a regular problem, then you can run SecondaryNameNode or
>> BackupNode to perform periodic checkpoints in addition to the checkpoint
>> that occurs on namenode restart.  This is probably overkill for a dev
>> environment on your laptop though.
>>
>> Hope this helps,
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>> On Thu, May 23, 2013 at 2:49 AM, Ted <r6...@gmail.com> wrote:
>>
>>> Hi I'm running hadoop on my local laptop for development and
>>> everything "works" but there's some annoying pauses during the startup
>>> which causes the entire hadoop startup process to take up to 4 minutes
>>> and I'm wondering what it is and if I can do anything about it.
>>>
>>> I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2,
>>> oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of
>>> ram and an SSD so it shouldn't be slow.
>>>
>>> When the system pauses, there is no cpu usage, no disk usage and no
>>> network usage (although I suspect it's waiting for the network to
>>> resolve or return something).
>>>
>>> Here's some snippets from the namenode logs during startup where you
>>> can see it just pauses for around 30 seconds or more with out errors
>>> or anything :
>>>
>>> ...
>>> 2013-05-23 19:26:37,660 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
>>> hadoop-metrics2.properties
>>> 2013-05-23 19:26:37,676 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>>> MetricsSystem,sub=Stats registered.
>>> 2013-05-23 19:27:54,144 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
>>> period at 10 second(s).
>>> 2013-05-23 19:27:54,144 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics
>>> system started
>>> ...
>>> 2013-05-23 19:27:54,341 WARN
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The
>>> dfs.support.append option is in your configuration, however append is
>>> not supported. This configuration option is no longer required to
>>> enable sync.
>>> 2013-05-23 19:27:54,341 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
>>> accessTokenLifetime=0 min(s)
>>> 2013-05-23 19:28:19,918 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>>> FSNamesystemStateMBean and NameNodeMXBean
>>> 2013-05-23 19:28:19,937 INFO
>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>>> occuring more than 10 times
>>> ...
>>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 28 on 9000: starting
>>> 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 31 on 9000: starting
>>> 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange:
>>> BLOCK* NameSystem.registerDatanode: node registration from
>>> 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513
>>> 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology:
>>> Adding a new node: /default-rack/127.0.0.1:50010
>>>
>>>
>>> I already start the system with : export
>>> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
>>> I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop
>>> system, maybe just 1 or 2 test files less than 100k, and there's no
>>> CPU usage so it doesn't look like it's GC thrashing)
>>>
>>> I should mention again, there's no errors and the system runs fine and
>>> relatively speedy once started (considering it's on my laptop).
>>>
>>> Does anyone know what's causing these pauses? (and how I can get rid of
>>> them)
>>> Thanks.
>>> --
>>> Ted.
>>>
>>
>
>
> --
> Ted.



-- 
Harsh J

Re: pauses during startup (maybe network related?)

Posted by Ted <r6...@gmail.com>.
thanks, I'm almost 100% sure it's network related now.

What I tested was unpluggin my network :), the entire system starts in
just a few seconds.

I decided to search on "reverse dns" in google and I see other people
have complained about very slow reverse dns lookups (some related to
hadoop / hbase too).

I'm not sure why this is happenning yet though. I thought 127.0.0.1 or
localhost would have just resolved instantly - but it appears it's
some how finding my real IP instead, i.e. 192.168.1.5 seems to show up
in the log entries even though all my configurations say
localhost/127.0.0.1 and my /etc/hosts file has and entry for
localhost/127.0.0.1

I think if I make a /etc/hosts entry for 192.168.1.5 everything will
be quick, that's what I'm going to test later. The only problem is I'm
on an dynamic IP... I've considered just making entries for all
reasonable permutations like 192.168.1.1 through 192.168.1.20... but
I'm still more just miffed at how it's knowing I'm a 192 address when
I told it to use localhost.

On 5/24/13, Chris Nauroth <cn...@hortonworks.com> wrote:
> Hi Ted,
>
> 2013-05-23 19:28:19,937 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
> occuring more than 10 times
> ...
> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 28 on 9000: starting
>
> There are a couple of relevant activities that happen during namenode
> startup in between these 2 log statements.  It loads the current fsimage
> (persistent copy of file system metadata), merges in the edits log
> (transaction log containing all file system metadata changes since the last
> checkpoint), and then saves back a new fsimage file after that merge.
>  Current versions of the Hadoop codebase will print some information to
> logs about the volume of activity during this checkpointing process, so I
> recommend looking for that in your logs to see if this explains it.
>  Depending on whether or not your have a large number of transactions
> queued since your last checkpoint, this whole process can cause namenode
> startup to take several minutes.
>
> If this becomes a regular problem, then you can run SecondaryNameNode or
> BackupNode to perform periodic checkpoints in addition to the checkpoint
> that occurs on namenode restart.  This is probably overkill for a dev
> environment on your laptop though.
>
> Hope this helps,
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
> On Thu, May 23, 2013 at 2:49 AM, Ted <r6...@gmail.com> wrote:
>
>> Hi I'm running hadoop on my local laptop for development and
>> everything "works" but there's some annoying pauses during the startup
>> which causes the entire hadoop startup process to take up to 4 minutes
>> and I'm wondering what it is and if I can do anything about it.
>>
>> I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2,
>> oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of
>> ram and an SSD so it shouldn't be slow.
>>
>> When the system pauses, there is no cpu usage, no disk usage and no
>> network usage (although I suspect it's waiting for the network to
>> resolve or return something).
>>
>> Here's some snippets from the namenode logs during startup where you
>> can see it just pauses for around 30 seconds or more with out errors
>> or anything :
>>
>> ...
>> 2013-05-23 19:26:37,660 INFO
>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
>> hadoop-metrics2.properties
>> 2013-05-23 19:26:37,676 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>> MetricsSystem,sub=Stats registered.
>> 2013-05-23 19:27:54,144 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
>> period at 10 second(s).
>> 2013-05-23 19:27:54,144 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics
>> system started
>> ...
>> 2013-05-23 19:27:54,341 WARN
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The
>> dfs.support.append option is in your configuration, however append is
>> not supported. This configuration option is no longer required to
>> enable sync.
>> 2013-05-23 19:27:54,341 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
>> accessTokenLifetime=0 min(s)
>> 2013-05-23 19:28:19,918 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>> FSNamesystemStateMBean and NameNodeMXBean
>> 2013-05-23 19:28:19,937 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>> occuring more than 10 times
>> ...
>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 28 on 9000: starting
>> 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 31 on 9000: starting
>> 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange:
>> BLOCK* NameSystem.registerDatanode: node registration from
>> 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513
>> 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology:
>> Adding a new node: /default-rack/127.0.0.1:50010
>>
>>
>> I already start the system with : export
>> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
>> I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop
>> system, maybe just 1 or 2 test files less than 100k, and there's no
>> CPU usage so it doesn't look like it's GC thrashing)
>>
>> I should mention again, there's no errors and the system runs fine and
>> relatively speedy once started (considering it's on my laptop).
>>
>> Does anyone know what's causing these pauses? (and how I can get rid of
>> them)
>> Thanks.
>> --
>> Ted.
>>
>


-- 
Ted.

Re: pauses during startup (maybe network related?)

Posted by Ted <r6...@gmail.com>.
thanks, I'm almost 100% sure it's network related now.

What I tested was unpluggin my network :), the entire system starts in
just a few seconds.

I decided to search on "reverse dns" in google and I see other people
have complained about very slow reverse dns lookups (some related to
hadoop / hbase too).

I'm not sure why this is happenning yet though. I thought 127.0.0.1 or
localhost would have just resolved instantly - but it appears it's
some how finding my real IP instead, i.e. 192.168.1.5 seems to show up
in the log entries even though all my configurations say
localhost/127.0.0.1 and my /etc/hosts file has and entry for
localhost/127.0.0.1

I think if I make a /etc/hosts entry for 192.168.1.5 everything will
be quick, that's what I'm going to test later. The only problem is I'm
on an dynamic IP... I've considered just making entries for all
reasonable permutations like 192.168.1.1 through 192.168.1.20... but
I'm still more just miffed at how it's knowing I'm a 192 address when
I told it to use localhost.

On 5/24/13, Chris Nauroth <cn...@hortonworks.com> wrote:
> Hi Ted,
>
> 2013-05-23 19:28:19,937 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
> occuring more than 10 times
> ...
> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 28 on 9000: starting
>
> There are a couple of relevant activities that happen during namenode
> startup in between these 2 log statements.  It loads the current fsimage
> (persistent copy of file system metadata), merges in the edits log
> (transaction log containing all file system metadata changes since the last
> checkpoint), and then saves back a new fsimage file after that merge.
>  Current versions of the Hadoop codebase will print some information to
> logs about the volume of activity during this checkpointing process, so I
> recommend looking for that in your logs to see if this explains it.
>  Depending on whether or not your have a large number of transactions
> queued since your last checkpoint, this whole process can cause namenode
> startup to take several minutes.
>
> If this becomes a regular problem, then you can run SecondaryNameNode or
> BackupNode to perform periodic checkpoints in addition to the checkpoint
> that occurs on namenode restart.  This is probably overkill for a dev
> environment on your laptop though.
>
> Hope this helps,
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
> On Thu, May 23, 2013 at 2:49 AM, Ted <r6...@gmail.com> wrote:
>
>> Hi I'm running hadoop on my local laptop for development and
>> everything "works" but there's some annoying pauses during the startup
>> which causes the entire hadoop startup process to take up to 4 minutes
>> and I'm wondering what it is and if I can do anything about it.
>>
>> I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2,
>> oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of
>> ram and an SSD so it shouldn't be slow.
>>
>> When the system pauses, there is no cpu usage, no disk usage and no
>> network usage (although I suspect it's waiting for the network to
>> resolve or return something).
>>
>> Here's some snippets from the namenode logs during startup where you
>> can see it just pauses for around 30 seconds or more with out errors
>> or anything :
>>
>> ...
>> 2013-05-23 19:26:37,660 INFO
>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
>> hadoop-metrics2.properties
>> 2013-05-23 19:26:37,676 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>> MetricsSystem,sub=Stats registered.
>> 2013-05-23 19:27:54,144 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
>> period at 10 second(s).
>> 2013-05-23 19:27:54,144 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics
>> system started
>> ...
>> 2013-05-23 19:27:54,341 WARN
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The
>> dfs.support.append option is in your configuration, however append is
>> not supported. This configuration option is no longer required to
>> enable sync.
>> 2013-05-23 19:27:54,341 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
>> accessTokenLifetime=0 min(s)
>> 2013-05-23 19:28:19,918 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>> FSNamesystemStateMBean and NameNodeMXBean
>> 2013-05-23 19:28:19,937 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>> occuring more than 10 times
>> ...
>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 28 on 9000: starting
>> 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 31 on 9000: starting
>> 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange:
>> BLOCK* NameSystem.registerDatanode: node registration from
>> 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513
>> 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology:
>> Adding a new node: /default-rack/127.0.0.1:50010
>>
>>
>> I already start the system with : export
>> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
>> I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop
>> system, maybe just 1 or 2 test files less than 100k, and there's no
>> CPU usage so it doesn't look like it's GC thrashing)
>>
>> I should mention again, there's no errors and the system runs fine and
>> relatively speedy once started (considering it's on my laptop).
>>
>> Does anyone know what's causing these pauses? (and how I can get rid of
>> them)
>> Thanks.
>> --
>> Ted.
>>
>


-- 
Ted.

Re: pauses during startup (maybe network related?)

Posted by Ted <r6...@gmail.com>.
thanks, I'm almost 100% sure it's network related now.

What I tested was unpluggin my network :), the entire system starts in
just a few seconds.

I decided to search on "reverse dns" in google and I see other people
have complained about very slow reverse dns lookups (some related to
hadoop / hbase too).

I'm not sure why this is happenning yet though. I thought 127.0.0.1 or
localhost would have just resolved instantly - but it appears it's
some how finding my real IP instead, i.e. 192.168.1.5 seems to show up
in the log entries even though all my configurations say
localhost/127.0.0.1 and my /etc/hosts file has and entry for
localhost/127.0.0.1

I think if I make a /etc/hosts entry for 192.168.1.5 everything will
be quick, that's what I'm going to test later. The only problem is I'm
on an dynamic IP... I've considered just making entries for all
reasonable permutations like 192.168.1.1 through 192.168.1.20... but
I'm still more just miffed at how it's knowing I'm a 192 address when
I told it to use localhost.

On 5/24/13, Chris Nauroth <cn...@hortonworks.com> wrote:
> Hi Ted,
>
> 2013-05-23 19:28:19,937 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
> occuring more than 10 times
> ...
> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 28 on 9000: starting
>
> There are a couple of relevant activities that happen during namenode
> startup in between these 2 log statements.  It loads the current fsimage
> (persistent copy of file system metadata), merges in the edits log
> (transaction log containing all file system metadata changes since the last
> checkpoint), and then saves back a new fsimage file after that merge.
>  Current versions of the Hadoop codebase will print some information to
> logs about the volume of activity during this checkpointing process, so I
> recommend looking for that in your logs to see if this explains it.
>  Depending on whether or not your have a large number of transactions
> queued since your last checkpoint, this whole process can cause namenode
> startup to take several minutes.
>
> If this becomes a regular problem, then you can run SecondaryNameNode or
> BackupNode to perform periodic checkpoints in addition to the checkpoint
> that occurs on namenode restart.  This is probably overkill for a dev
> environment on your laptop though.
>
> Hope this helps,
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
> On Thu, May 23, 2013 at 2:49 AM, Ted <r6...@gmail.com> wrote:
>
>> Hi I'm running hadoop on my local laptop for development and
>> everything "works" but there's some annoying pauses during the startup
>> which causes the entire hadoop startup process to take up to 4 minutes
>> and I'm wondering what it is and if I can do anything about it.
>>
>> I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2,
>> oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of
>> ram and an SSD so it shouldn't be slow.
>>
>> When the system pauses, there is no cpu usage, no disk usage and no
>> network usage (although I suspect it's waiting for the network to
>> resolve or return something).
>>
>> Here's some snippets from the namenode logs during startup where you
>> can see it just pauses for around 30 seconds or more with out errors
>> or anything :
>>
>> ...
>> 2013-05-23 19:26:37,660 INFO
>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
>> hadoop-metrics2.properties
>> 2013-05-23 19:26:37,676 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>> MetricsSystem,sub=Stats registered.
>> 2013-05-23 19:27:54,144 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
>> period at 10 second(s).
>> 2013-05-23 19:27:54,144 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics
>> system started
>> ...
>> 2013-05-23 19:27:54,341 WARN
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The
>> dfs.support.append option is in your configuration, however append is
>> not supported. This configuration option is no longer required to
>> enable sync.
>> 2013-05-23 19:27:54,341 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
>> accessTokenLifetime=0 min(s)
>> 2013-05-23 19:28:19,918 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>> FSNamesystemStateMBean and NameNodeMXBean
>> 2013-05-23 19:28:19,937 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>> occuring more than 10 times
>> ...
>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 28 on 9000: starting
>> 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 31 on 9000: starting
>> 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange:
>> BLOCK* NameSystem.registerDatanode: node registration from
>> 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513
>> 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology:
>> Adding a new node: /default-rack/127.0.0.1:50010
>>
>>
>> I already start the system with : export
>> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
>> I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop
>> system, maybe just 1 or 2 test files less than 100k, and there's no
>> CPU usage so it doesn't look like it's GC thrashing)
>>
>> I should mention again, there's no errors and the system runs fine and
>> relatively speedy once started (considering it's on my laptop).
>>
>> Does anyone know what's causing these pauses? (and how I can get rid of
>> them)
>> Thanks.
>> --
>> Ted.
>>
>


-- 
Ted.

Re: pauses during startup (maybe network related?)

Posted by Ted <r6...@gmail.com>.
thanks, I'm almost 100% sure it's network related now.

What I tested was unpluggin my network :), the entire system starts in
just a few seconds.

I decided to search on "reverse dns" in google and I see other people
have complained about very slow reverse dns lookups (some related to
hadoop / hbase too).

I'm not sure why this is happenning yet though. I thought 127.0.0.1 or
localhost would have just resolved instantly - but it appears it's
some how finding my real IP instead, i.e. 192.168.1.5 seems to show up
in the log entries even though all my configurations say
localhost/127.0.0.1 and my /etc/hosts file has and entry for
localhost/127.0.0.1

I think if I make a /etc/hosts entry for 192.168.1.5 everything will
be quick, that's what I'm going to test later. The only problem is I'm
on an dynamic IP... I've considered just making entries for all
reasonable permutations like 192.168.1.1 through 192.168.1.20... but
I'm still more just miffed at how it's knowing I'm a 192 address when
I told it to use localhost.

On 5/24/13, Chris Nauroth <cn...@hortonworks.com> wrote:
> Hi Ted,
>
> 2013-05-23 19:28:19,937 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
> occuring more than 10 times
> ...
> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 28 on 9000: starting
>
> There are a couple of relevant activities that happen during namenode
> startup in between these 2 log statements.  It loads the current fsimage
> (persistent copy of file system metadata), merges in the edits log
> (transaction log containing all file system metadata changes since the last
> checkpoint), and then saves back a new fsimage file after that merge.
>  Current versions of the Hadoop codebase will print some information to
> logs about the volume of activity during this checkpointing process, so I
> recommend looking for that in your logs to see if this explains it.
>  Depending on whether or not your have a large number of transactions
> queued since your last checkpoint, this whole process can cause namenode
> startup to take several minutes.
>
> If this becomes a regular problem, then you can run SecondaryNameNode or
> BackupNode to perform periodic checkpoints in addition to the checkpoint
> that occurs on namenode restart.  This is probably overkill for a dev
> environment on your laptop though.
>
> Hope this helps,
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>
> On Thu, May 23, 2013 at 2:49 AM, Ted <r6...@gmail.com> wrote:
>
>> Hi I'm running hadoop on my local laptop for development and
>> everything "works" but there's some annoying pauses during the startup
>> which causes the entire hadoop startup process to take up to 4 minutes
>> and I'm wondering what it is and if I can do anything about it.
>>
>> I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2,
>> oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of
>> ram and an SSD so it shouldn't be slow.
>>
>> When the system pauses, there is no cpu usage, no disk usage and no
>> network usage (although I suspect it's waiting for the network to
>> resolve or return something).
>>
>> Here's some snippets from the namenode logs during startup where you
>> can see it just pauses for around 30 seconds or more with out errors
>> or anything :
>>
>> ...
>> 2013-05-23 19:26:37,660 INFO
>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
>> hadoop-metrics2.properties
>> 2013-05-23 19:26:37,676 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>> MetricsSystem,sub=Stats registered.
>> 2013-05-23 19:27:54,144 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
>> period at 10 second(s).
>> 2013-05-23 19:27:54,144 INFO
>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics
>> system started
>> ...
>> 2013-05-23 19:27:54,341 WARN
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The
>> dfs.support.append option is in your configuration, however append is
>> not supported. This configuration option is no longer required to
>> enable sync.
>> 2013-05-23 19:27:54,341 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
>> accessTokenLifetime=0 min(s)
>> 2013-05-23 19:28:19,918 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>> FSNamesystemStateMBean and NameNodeMXBean
>> 2013-05-23 19:28:19,937 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>> occuring more than 10 times
>> ...
>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 28 on 9000: starting
>> 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 31 on 9000: starting
>> 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange:
>> BLOCK* NameSystem.registerDatanode: node registration from
>> 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513
>> 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology:
>> Adding a new node: /default-rack/127.0.0.1:50010
>>
>>
>> I already start the system with : export
>> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
>> I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop
>> system, maybe just 1 or 2 test files less than 100k, and there's no
>> CPU usage so it doesn't look like it's GC thrashing)
>>
>> I should mention again, there's no errors and the system runs fine and
>> relatively speedy once started (considering it's on my laptop).
>>
>> Does anyone know what's causing these pauses? (and how I can get rid of
>> them)
>> Thanks.
>> --
>> Ted.
>>
>


-- 
Ted.

Re: pauses during startup (maybe network related?)

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hi Ted,

2013-05-23 19:28:19,937 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
occuring more than 10 times
...
2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 28 on 9000: starting

There are a couple of relevant activities that happen during namenode
startup in between these 2 log statements.  It loads the current fsimage
(persistent copy of file system metadata), merges in the edits log
(transaction log containing all file system metadata changes since the last
checkpoint), and then saves back a new fsimage file after that merge.
 Current versions of the Hadoop codebase will print some information to
logs about the volume of activity during this checkpointing process, so I
recommend looking for that in your logs to see if this explains it.
 Depending on whether or not your have a large number of transactions
queued since your last checkpoint, this whole process can cause namenode
startup to take several minutes.

If this becomes a regular problem, then you can run SecondaryNameNode or
BackupNode to perform periodic checkpoints in addition to the checkpoint
that occurs on namenode restart.  This is probably overkill for a dev
environment on your laptop though.

Hope this helps,

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Thu, May 23, 2013 at 2:49 AM, Ted <r6...@gmail.com> wrote:

> Hi I'm running hadoop on my local laptop for development and
> everything "works" but there's some annoying pauses during the startup
> which causes the entire hadoop startup process to take up to 4 minutes
> and I'm wondering what it is and if I can do anything about it.
>
> I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2,
> oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of
> ram and an SSD so it shouldn't be slow.
>
> When the system pauses, there is no cpu usage, no disk usage and no
> network usage (although I suspect it's waiting for the network to
> resolve or return something).
>
> Here's some snippets from the namenode logs during startup where you
> can see it just pauses for around 30 seconds or more with out errors
> or anything :
>
> ...
> 2013-05-23 19:26:37,660 INFO
> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
> hadoop-metrics2.properties
> 2013-05-23 19:26:37,676 INFO
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
> MetricsSystem,sub=Stats registered.
> 2013-05-23 19:27:54,144 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
> period at 10 second(s).
> 2013-05-23 19:27:54,144 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics
> system started
> ...
> 2013-05-23 19:27:54,341 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The
> dfs.support.append option is in your configuration, however append is
> not supported. This configuration option is no longer required to
> enable sync.
> 2013-05-23 19:27:54,341 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
> accessTokenLifetime=0 min(s)
> 2013-05-23 19:28:19,918 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
> FSNamesystemStateMBean and NameNodeMXBean
> 2013-05-23 19:28:19,937 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
> occuring more than 10 times
> ...
> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 28 on 9000: starting
> 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 31 on 9000: starting
> 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* NameSystem.registerDatanode: node registration from
> 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513
> 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology:
> Adding a new node: /default-rack/127.0.0.1:50010
>
>
> I already start the system with : export
> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
> I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop
> system, maybe just 1 or 2 test files less than 100k, and there's no
> CPU usage so it doesn't look like it's GC thrashing)
>
> I should mention again, there's no errors and the system runs fine and
> relatively speedy once started (considering it's on my laptop).
>
> Does anyone know what's causing these pauses? (and how I can get rid of
> them)
> Thanks.
> --
> Ted.
>

Re: pauses during startup (maybe network related?)

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hi Ted,

2013-05-23 19:28:19,937 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
occuring more than 10 times
...
2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 28 on 9000: starting

There are a couple of relevant activities that happen during namenode
startup in between these 2 log statements.  It loads the current fsimage
(persistent copy of file system metadata), merges in the edits log
(transaction log containing all file system metadata changes since the last
checkpoint), and then saves back a new fsimage file after that merge.
 Current versions of the Hadoop codebase will print some information to
logs about the volume of activity during this checkpointing process, so I
recommend looking for that in your logs to see if this explains it.
 Depending on whether or not your have a large number of transactions
queued since your last checkpoint, this whole process can cause namenode
startup to take several minutes.

If this becomes a regular problem, then you can run SecondaryNameNode or
BackupNode to perform periodic checkpoints in addition to the checkpoint
that occurs on namenode restart.  This is probably overkill for a dev
environment on your laptop though.

Hope this helps,

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Thu, May 23, 2013 at 2:49 AM, Ted <r6...@gmail.com> wrote:

> Hi I'm running hadoop on my local laptop for development and
> everything "works" but there's some annoying pauses during the startup
> which causes the entire hadoop startup process to take up to 4 minutes
> and I'm wondering what it is and if I can do anything about it.
>
> I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2,
> oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of
> ram and an SSD so it shouldn't be slow.
>
> When the system pauses, there is no cpu usage, no disk usage and no
> network usage (although I suspect it's waiting for the network to
> resolve or return something).
>
> Here's some snippets from the namenode logs during startup where you
> can see it just pauses for around 30 seconds or more with out errors
> or anything :
>
> ...
> 2013-05-23 19:26:37,660 INFO
> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
> hadoop-metrics2.properties
> 2013-05-23 19:26:37,676 INFO
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
> MetricsSystem,sub=Stats registered.
> 2013-05-23 19:27:54,144 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
> period at 10 second(s).
> 2013-05-23 19:27:54,144 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics
> system started
> ...
> 2013-05-23 19:27:54,341 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The
> dfs.support.append option is in your configuration, however append is
> not supported. This configuration option is no longer required to
> enable sync.
> 2013-05-23 19:27:54,341 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
> accessTokenLifetime=0 min(s)
> 2013-05-23 19:28:19,918 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
> FSNamesystemStateMBean and NameNodeMXBean
> 2013-05-23 19:28:19,937 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
> occuring more than 10 times
> ...
> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 28 on 9000: starting
> 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 31 on 9000: starting
> 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* NameSystem.registerDatanode: node registration from
> 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513
> 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology:
> Adding a new node: /default-rack/127.0.0.1:50010
>
>
> I already start the system with : export
> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
> I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop
> system, maybe just 1 or 2 test files less than 100k, and there's no
> CPU usage so it doesn't look like it's GC thrashing)
>
> I should mention again, there's no errors and the system runs fine and
> relatively speedy once started (considering it's on my laptop).
>
> Does anyone know what's causing these pauses? (and how I can get rid of
> them)
> Thanks.
> --
> Ted.
>

Re: pauses during startup (maybe network related?)

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hi Ted,

2013-05-23 19:28:19,937 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
occuring more than 10 times
...
2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 28 on 9000: starting

There are a couple of relevant activities that happen during namenode
startup in between these 2 log statements.  It loads the current fsimage
(persistent copy of file system metadata), merges in the edits log
(transaction log containing all file system metadata changes since the last
checkpoint), and then saves back a new fsimage file after that merge.
 Current versions of the Hadoop codebase will print some information to
logs about the volume of activity during this checkpointing process, so I
recommend looking for that in your logs to see if this explains it.
 Depending on whether or not your have a large number of transactions
queued since your last checkpoint, this whole process can cause namenode
startup to take several minutes.

If this becomes a regular problem, then you can run SecondaryNameNode or
BackupNode to perform periodic checkpoints in addition to the checkpoint
that occurs on namenode restart.  This is probably overkill for a dev
environment on your laptop though.

Hope this helps,

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Thu, May 23, 2013 at 2:49 AM, Ted <r6...@gmail.com> wrote:

> Hi I'm running hadoop on my local laptop for development and
> everything "works" but there's some annoying pauses during the startup
> which causes the entire hadoop startup process to take up to 4 minutes
> and I'm wondering what it is and if I can do anything about it.
>
> I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2,
> oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of
> ram and an SSD so it shouldn't be slow.
>
> When the system pauses, there is no cpu usage, no disk usage and no
> network usage (although I suspect it's waiting for the network to
> resolve or return something).
>
> Here's some snippets from the namenode logs during startup where you
> can see it just pauses for around 30 seconds or more with out errors
> or anything :
>
> ...
> 2013-05-23 19:26:37,660 INFO
> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
> hadoop-metrics2.properties
> 2013-05-23 19:26:37,676 INFO
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
> MetricsSystem,sub=Stats registered.
> 2013-05-23 19:27:54,144 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
> period at 10 second(s).
> 2013-05-23 19:27:54,144 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics
> system started
> ...
> 2013-05-23 19:27:54,341 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The
> dfs.support.append option is in your configuration, however append is
> not supported. This configuration option is no longer required to
> enable sync.
> 2013-05-23 19:27:54,341 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
> accessTokenLifetime=0 min(s)
> 2013-05-23 19:28:19,918 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
> FSNamesystemStateMBean and NameNodeMXBean
> 2013-05-23 19:28:19,937 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
> occuring more than 10 times
> ...
> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 28 on 9000: starting
> 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 31 on 9000: starting
> 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* NameSystem.registerDatanode: node registration from
> 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513
> 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology:
> Adding a new node: /default-rack/127.0.0.1:50010
>
>
> I already start the system with : export
> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
> I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop
> system, maybe just 1 or 2 test files less than 100k, and there's no
> CPU usage so it doesn't look like it's GC thrashing)
>
> I should mention again, there's no errors and the system runs fine and
> relatively speedy once started (considering it's on my laptop).
>
> Does anyone know what's causing these pauses? (and how I can get rid of
> them)
> Thanks.
> --
> Ted.
>

Re: pauses during startup (maybe network related?)

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hi Ted,

2013-05-23 19:28:19,937 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
occuring more than 10 times
...
2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 28 on 9000: starting

There are a couple of relevant activities that happen during namenode
startup in between these 2 log statements.  It loads the current fsimage
(persistent copy of file system metadata), merges in the edits log
(transaction log containing all file system metadata changes since the last
checkpoint), and then saves back a new fsimage file after that merge.
 Current versions of the Hadoop codebase will print some information to
logs about the volume of activity during this checkpointing process, so I
recommend looking for that in your logs to see if this explains it.
 Depending on whether or not your have a large number of transactions
queued since your last checkpoint, this whole process can cause namenode
startup to take several minutes.

If this becomes a regular problem, then you can run SecondaryNameNode or
BackupNode to perform periodic checkpoints in addition to the checkpoint
that occurs on namenode restart.  This is probably overkill for a dev
environment on your laptop though.

Hope this helps,

Chris Nauroth
Hortonworks
http://hortonworks.com/



On Thu, May 23, 2013 at 2:49 AM, Ted <r6...@gmail.com> wrote:

> Hi I'm running hadoop on my local laptop for development and
> everything "works" but there's some annoying pauses during the startup
> which causes the entire hadoop startup process to take up to 4 minutes
> and I'm wondering what it is and if I can do anything about it.
>
> I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2,
> oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of
> ram and an SSD so it shouldn't be slow.
>
> When the system pauses, there is no cpu usage, no disk usage and no
> network usage (although I suspect it's waiting for the network to
> resolve or return something).
>
> Here's some snippets from the namenode logs during startup where you
> can see it just pauses for around 30 seconds or more with out errors
> or anything :
>
> ...
> 2013-05-23 19:26:37,660 INFO
> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
> hadoop-metrics2.properties
> 2013-05-23 19:26:37,676 INFO
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
> MetricsSystem,sub=Stats registered.
> 2013-05-23 19:27:54,144 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
> period at 10 second(s).
> 2013-05-23 19:27:54,144 INFO
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics
> system started
> ...
> 2013-05-23 19:27:54,341 WARN
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The
> dfs.support.append option is in your configuration, however append is
> not supported. This configuration option is no longer required to
> enable sync.
> 2013-05-23 19:27:54,341 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
> accessTokenLifetime=0 min(s)
> 2013-05-23 19:28:19,918 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
> FSNamesystemStateMBean and NameNodeMXBean
> 2013-05-23 19:28:19,937 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
> occuring more than 10 times
> ...
> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 28 on 9000: starting
> 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 31 on 9000: starting
> 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange:
> BLOCK* NameSystem.registerDatanode: node registration from
> 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513
> 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology:
> Adding a new node: /default-rack/127.0.0.1:50010
>
>
> I already start the system with : export
> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
> I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop
> system, maybe just 1 or 2 test files less than 100k, and there's no
> CPU usage so it doesn't look like it's GC thrashing)
>
> I should mention again, there's no errors and the system runs fine and
> relatively speedy once started (considering it's on my laptop).
>
> Does anyone know what's causing these pauses? (and how I can get rid of
> them)
> Thanks.
> --
> Ted.
>