You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Gopi Krishna M <mg...@gmail.com> on 2013/04/04 14:33:47 UTC
Re: one minute delay in running a simple ls command on hadoop (maybe
near security groups..): hadoop 0.23.5
in case anybody has some idea on what I should look for in debugging this..
below is a strace of the simple bin/hadoop dfs -ls / command run on the
machine running namenode
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
49.56 0.000853 45 19 14 execve
36.20 0.000623 1 925 rt_sigprocmask
5.75 0.000099 2 62 brk
5.29 0.000091 9 10 getgid
3.20 0.000055 1 81 rt_sigaction
0.00 0.000000 0 71 read
0.00 0.000000 0 3 write
0.00 0.000000 0 49 11 open
0.00 0.000000 0 87 close
0.00 0.000000 0 71 31 stat
0.00 0.000000 0 36 fstat
0.00 0.000000 0 12 lseek
0.00 0.000000 0 72 mmap
0.00 0.000000 0 36 mprotect
0.00 0.000000 0 9 munmap
0.00 0.000000 0 19 rt_sigreturn
0.00 0.000000 0 2 2 ioctl
0.00 0.000000 0 40 31 access
0.00 0.000000 0 19 pipe
0.00 0.000000 0 8 dup2
0.00 0.000000 0 2 getpid
0.00 0.000000 0 20 clone
0.00 0.000000 0 38 19 wait4
0.00 0.000000 0 2 uname
0.00 0.000000 0 24 2 fcntl
0.00 0.000000 0 16 getdents
0.00 0.000000 0 2 readlink
0.00 0.000000 0 2 gettimeofday
0.00 0.000000 0 5 getrlimit
0.00 0.000000 0 10 getuid
0.00 0.000000 0 10 geteuid
0.00 0.000000 0 10 getegid
0.00 0.000000 0 2 getppid
0.00 0.000000 0 2 getpgrp
0.00 0.000000 0 5 arch_prctl
0.00 0.000000 0 2 1 futex
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 10 2 openat
0.00 0.000000 0 1 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.001721 1795 113 total
On Sun, Mar 31, 2013 at 11:47 AM, Gopi Krishna M <mg...@gmail.com> wrote:
> Hi..
>
> I have installed hadoop 0.23.5 and is working fine on two of my
> installations. In a new installation on Windows Azure VMs, I am seeing an
> inordinate delay (of 1 minute) when doing a simple dfs ls command.
>
> This is a 3 node cluster, running 64 bit Ubuntu 12.04.1 LTS, Sun jre
>
> command : bin/hadoop dfs -ls /
>
> With log set to DEBUG, following is the output on the console. The logs
> in bold show the sleep/timeout/delay of *1 minute*. Any pointers on what
> to check for / what is the reason for this delay?
>
> 13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field
> org.apache.hadoop.metrics2.lib.MutableRate
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess
> with annotation @org.apache.
> hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of
> successful kerberos logins
> and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
> 13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field
> org.apache.hadoop.metrics2.lib.MutableRate
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure
> with annotation @org.apache.
> hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of
> failed kerberos logins and
> latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
> 13/03/31 05:48:55 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group
> related metrics
> 13/03/31 05:49:55 DEBUG security.Groups: Creating new Groups object
> *13/03/31 05:49:55 DEBUG security.Groups: Group mapping
> impl=org.apache.hadoop.security.ShellBasedUni*
> *xGroupsMapping; cacheTimeout=300000*
> *13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login*
> 13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login commit
> 13/03/31 05:50:55 DEBUG security.UserGroupInformation: using local
> user:UnixPrincipal: hduser
> 13/03/31 05:50:55 DEBUG security.UserGroupInformation: UGI
> loginUser:hduser (auth:SIMPLE)
> 13/03/31 05:50:56 DEBUG hdfs.DFSClient: Short circuit read is false
> 13/03/31 05:50:56 DEBUG ipc.Client: The ping interval is 60000 ms.
> 13/03/31 05:50:56 DEBUG ipc.Client: Use SIMPLE authentication for protocol
> ClientProtocol
> 13/03/31 05:50:56 DEBUG ipc.Client: Connecting to XXXXXX:9000
>
>
> Thanks
> Gopi
>
>
Re: one minute delay in running a simple ls command on hadoop (maybe
near security groups..): hadoop 0.23.5
Posted by Gopi Krishna M <mg...@gmail.com>.
Hi Marcos,
executed it from shell.. returns immediately, no delay and the output is
just hduser..
hduser@hmaster-asia:~/installations/hadoop-0.23.5$ bash -c "id -Gn hduser"
hduser
one important thing I noticed on logs is (posted in my first mail), it
always get stuck *twice for exactly 60secs*. This looks like some timeout.
Went through settings which are close to 60secs, but no luck yet.
Output of strace shows futex as the last call after which it takes 2 mins
and then print the ls output. Not clear if it is stuck on futex there as
that seems to be the last system call. I tried with 0.23.6 but got the
same behavior.
*
*
*strace -t bin/hadoop dfs -ls / ; date*
.......
15:09:29 openat(AT_FDCWD,
"/home/hduser/installations/hadoop-0.23.5/share/hadoop/mapreduce/", O_RDON
LY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
15:09:29 getdents(3, /* 11 entries */, 32768) = 584
15:09:29 getdents(3, /* 0 entries */, 32768) = 0
15:09:29 close(3) = 0
15:09:29 mmap(NULL, 1052672, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7
ff716300000
15:09:29 mprotect(0x7ff716300000, 4096, PROT_NONE) = 0
15:09:29 clone(child_stack=0x7ff7163ffff0,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_T
HREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
parent_tidptr=0x7ff716400
9d0, tls=0x7ff716400700, child_tidptr=0x7ff7164009d0) = 42720
*15:09:29 futex*(0x7ff7164009d0, FUTEX_WAIT, 42720, NULLFound 5 items
drwxr-xr-x - hduser supergroup 0 2013-03-26 12:28
/aea8d83f-c0a6-4646-a88d-f9a89be92486
drwxr-xr-x - hduser supergroup 0 2013-03-06 12:43 /in
drwxr-xr-x - hduser supergroup 0 2013-03-04 12:57 /out1
drwxrwx--- - hduser supergroup 0 2013-03-04 12:32 /tmp
drwx------ - hduser supergroup 0 2013-03-04 12:49 /user
<unfinished ... exit status 0>
Thu Apr 4* 15:11:32 UTC 2013*
~/installations/hadoop-0.23.5$
On Thu, Apr 4, 2013 at 6:36 PM, MARCOS MEDRADO RUBINELLI <
marcosm@buscapecompany.com> wrote:
> Gopi,
>
> The namenode is essentially running bash -c "id -Gn hduser" and waiting
> for the response. You could try executing from the shell it to see if it
> does take a long time in Azure, or if the output is too complex to parse.
>
> Regards,
> Marcos
>
>
> In 04-04-2013 09:33, Gopi Krishna M wrote:
>
> in case anybody has some idea on what I should look for in debugging
> this.. below is a strace of the simple bin/hadoop dfs -ls / command run on
> the machine running namenode
>
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 49.56 0.000853 45 19 14 execve
> 36.20 0.000623 1 925 rt_sigprocmask
> 5.75 0.000099 2 62 brk
> 5.29 0.000091 9 10 getgid
> 3.20 0.000055 1 81 rt_sigaction
> 0.00 0.000000 0 71 read
> 0.00 0.000000 0 3 write
> 0.00 0.000000 0 49 11 open
> 0.00 0.000000 0 87 close
> 0.00 0.000000 0 71 31 stat
> 0.00 0.000000 0 36 fstat
> 0.00 0.000000 0 12 lseek
> 0.00 0.000000 0 72 mmap
> 0.00 0.000000 0 36 mprotect
> 0.00 0.000000 0 9 munmap
> 0.00 0.000000 0 19 rt_sigreturn
> 0.00 0.000000 0 2 2 ioctl
> 0.00 0.000000 0 40 31 access
> 0.00 0.000000 0 19 pipe
> 0.00 0.000000 0 8 dup2
> 0.00 0.000000 0 2 getpid
> 0.00 0.000000 0 20 clone
> 0.00 0.000000 0 38 19 wait4
> 0.00 0.000000 0 2 uname
> 0.00 0.000000 0 24 2 fcntl
> 0.00 0.000000 0 16 getdents
> 0.00 0.000000 0 2 readlink
> 0.00 0.000000 0 2 gettimeofday
> 0.00 0.000000 0 5 getrlimit
> 0.00 0.000000 0 10 getuid
> 0.00 0.000000 0 10 geteuid
> 0.00 0.000000 0 10 getegid
> 0.00 0.000000 0 2 getppid
> 0.00 0.000000 0 2 getpgrp
> 0.00 0.000000 0 5 arch_prctl
> 0.00 0.000000 0 2 1 futex
> 0.00 0.000000 0 1 set_tid_address
> 0.00 0.000000 0 10 2 openat
> 0.00 0.000000 0 1 set_robust_list
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 0.001721 1795 113 total
>
>
> On Sun, Mar 31, 2013 at 11:47 AM, Gopi Krishna M <mg...@gmail.com> wrote:
>
>> Hi..
>>
>> I have installed hadoop 0.23.5 and is working fine on two of my
>> installations. In a new installation on Windows Azure VMs, I am seeing an
>> inordinate delay (of 1 minute) when doing a simple dfs ls command.
>>
>> This is a 3 node cluster, running 64 bit Ubuntu 12.04.1 LTS, Sun jre
>>
>> command : bin/hadoop dfs -ls /
>>
>> With log set to DEBUG, following is the output on the console. The
>> logs in bold show the sleep/timeout/delay of *1 minute*. Any pointers
>> on what to check for / what is the reason for this delay?
>>
>> 13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field
>> org.apache.hadoop.metrics2.lib.MutableRate
>> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess
>> with annotation @org.apache.
>> hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of
>> successful kerberos logins
>> and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
>> 13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field
>> org.apache.hadoop.metrics2.lib.MutableRate
>> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure
>> with annotation @org.apache.
>> hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of
>> failed kerberos logins and
>> latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
>> 13/03/31 05:48:55 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and
>> group related metrics
>> 13/03/31 05:49:55 DEBUG security.Groups: Creating new Groups object
>> *13/03/31 05:49:55 DEBUG security.Groups: Group mapping
>> impl=org.apache.hadoop.security.ShellBasedUni*
>> *xGroupsMapping; cacheTimeout=300000*
>> *13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login*
>> 13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login commit
>> 13/03/31 05:50:55 DEBUG security.UserGroupInformation: using local
>> user:UnixPrincipal: hduser
>> 13/03/31 05:50:55 DEBUG security.UserGroupInformation: UGI
>> loginUser:hduser (auth:SIMPLE)
>> 13/03/31 05:50:56 DEBUG hdfs.DFSClient: Short circuit read is false
>> 13/03/31 05:50:56 DEBUG ipc.Client: The ping interval is 60000 ms.
>> 13/03/31 05:50:56 DEBUG ipc.Client: Use SIMPLE authentication for
>> protocol ClientProtocol
>> 13/03/31 05:50:56 DEBUG ipc.Client: Connecting to XXXXXX:9000
>>
>>
>> Thanks
>> Gopi
>>
>>
>
>
Re: one minute delay in running a simple ls command on hadoop (maybe
near security groups..): hadoop 0.23.5
Posted by Gopi Krishna M <mg...@gmail.com>.
Hi Marcos,
executed it from shell.. returns immediately, no delay and the output is
just hduser..
hduser@hmaster-asia:~/installations/hadoop-0.23.5$ bash -c "id -Gn hduser"
hduser
one important thing I noticed on logs is (posted in my first mail), it
always get stuck *twice for exactly 60secs*. This looks like some timeout.
Went through settings which are close to 60secs, but no luck yet.
Output of strace shows futex as the last call after which it takes 2 mins
and then print the ls output. Not clear if it is stuck on futex there as
that seems to be the last system call. I tried with 0.23.6 but got the
same behavior.
*
*
*strace -t bin/hadoop dfs -ls / ; date*
.......
15:09:29 openat(AT_FDCWD,
"/home/hduser/installations/hadoop-0.23.5/share/hadoop/mapreduce/", O_RDON
LY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
15:09:29 getdents(3, /* 11 entries */, 32768) = 584
15:09:29 getdents(3, /* 0 entries */, 32768) = 0
15:09:29 close(3) = 0
15:09:29 mmap(NULL, 1052672, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7
ff716300000
15:09:29 mprotect(0x7ff716300000, 4096, PROT_NONE) = 0
15:09:29 clone(child_stack=0x7ff7163ffff0,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_T
HREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
parent_tidptr=0x7ff716400
9d0, tls=0x7ff716400700, child_tidptr=0x7ff7164009d0) = 42720
*15:09:29 futex*(0x7ff7164009d0, FUTEX_WAIT, 42720, NULLFound 5 items
drwxr-xr-x - hduser supergroup 0 2013-03-26 12:28
/aea8d83f-c0a6-4646-a88d-f9a89be92486
drwxr-xr-x - hduser supergroup 0 2013-03-06 12:43 /in
drwxr-xr-x - hduser supergroup 0 2013-03-04 12:57 /out1
drwxrwx--- - hduser supergroup 0 2013-03-04 12:32 /tmp
drwx------ - hduser supergroup 0 2013-03-04 12:49 /user
<unfinished ... exit status 0>
Thu Apr 4* 15:11:32 UTC 2013*
~/installations/hadoop-0.23.5$
On Thu, Apr 4, 2013 at 6:36 PM, MARCOS MEDRADO RUBINELLI <
marcosm@buscapecompany.com> wrote:
> Gopi,
>
> The namenode is essentially running bash -c "id -Gn hduser" and waiting
> for the response. You could try executing from the shell it to see if it
> does take a long time in Azure, or if the output is too complex to parse.
>
> Regards,
> Marcos
>
>
> In 04-04-2013 09:33, Gopi Krishna M wrote:
>
> in case anybody has some idea on what I should look for in debugging
> this.. below is a strace of the simple bin/hadoop dfs -ls / command run on
> the machine running namenode
>
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 49.56 0.000853 45 19 14 execve
> 36.20 0.000623 1 925 rt_sigprocmask
> 5.75 0.000099 2 62 brk
> 5.29 0.000091 9 10 getgid
> 3.20 0.000055 1 81 rt_sigaction
> 0.00 0.000000 0 71 read
> 0.00 0.000000 0 3 write
> 0.00 0.000000 0 49 11 open
> 0.00 0.000000 0 87 close
> 0.00 0.000000 0 71 31 stat
> 0.00 0.000000 0 36 fstat
> 0.00 0.000000 0 12 lseek
> 0.00 0.000000 0 72 mmap
> 0.00 0.000000 0 36 mprotect
> 0.00 0.000000 0 9 munmap
> 0.00 0.000000 0 19 rt_sigreturn
> 0.00 0.000000 0 2 2 ioctl
> 0.00 0.000000 0 40 31 access
> 0.00 0.000000 0 19 pipe
> 0.00 0.000000 0 8 dup2
> 0.00 0.000000 0 2 getpid
> 0.00 0.000000 0 20 clone
> 0.00 0.000000 0 38 19 wait4
> 0.00 0.000000 0 2 uname
> 0.00 0.000000 0 24 2 fcntl
> 0.00 0.000000 0 16 getdents
> 0.00 0.000000 0 2 readlink
> 0.00 0.000000 0 2 gettimeofday
> 0.00 0.000000 0 5 getrlimit
> 0.00 0.000000 0 10 getuid
> 0.00 0.000000 0 10 geteuid
> 0.00 0.000000 0 10 getegid
> 0.00 0.000000 0 2 getppid
> 0.00 0.000000 0 2 getpgrp
> 0.00 0.000000 0 5 arch_prctl
> 0.00 0.000000 0 2 1 futex
> 0.00 0.000000 0 1 set_tid_address
> 0.00 0.000000 0 10 2 openat
> 0.00 0.000000 0 1 set_robust_list
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 0.001721 1795 113 total
>
>
> On Sun, Mar 31, 2013 at 11:47 AM, Gopi Krishna M <mg...@gmail.com> wrote:
>
>> Hi..
>>
>> I have installed hadoop 0.23.5 and is working fine on two of my
>> installations. In a new installation on Windows Azure VMs, I am seeing an
>> inordinate delay (of 1 minute) when doing a simple dfs ls command.
>>
>> This is a 3 node cluster, running 64 bit Ubuntu 12.04.1 LTS, Sun jre
>>
>> command : bin/hadoop dfs -ls /
>>
>> With log set to DEBUG, following is the output on the console. The
>> logs in bold show the sleep/timeout/delay of *1 minute*. Any pointers
>> on what to check for / what is the reason for this delay?
>>
>> 13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field
>> org.apache.hadoop.metrics2.lib.MutableRate
>> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess
>> with annotation @org.apache.
>> hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of
>> successful kerberos logins
>> and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
>> 13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field
>> org.apache.hadoop.metrics2.lib.MutableRate
>> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure
>> with annotation @org.apache.
>> hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of
>> failed kerberos logins and
>> latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
>> 13/03/31 05:48:55 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and
>> group related metrics
>> 13/03/31 05:49:55 DEBUG security.Groups: Creating new Groups object
>> *13/03/31 05:49:55 DEBUG security.Groups: Group mapping
>> impl=org.apache.hadoop.security.ShellBasedUni*
>> *xGroupsMapping; cacheTimeout=300000*
>> *13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login*
>> 13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login commit
>> 13/03/31 05:50:55 DEBUG security.UserGroupInformation: using local
>> user:UnixPrincipal: hduser
>> 13/03/31 05:50:55 DEBUG security.UserGroupInformation: UGI
>> loginUser:hduser (auth:SIMPLE)
>> 13/03/31 05:50:56 DEBUG hdfs.DFSClient: Short circuit read is false
>> 13/03/31 05:50:56 DEBUG ipc.Client: The ping interval is 60000 ms.
>> 13/03/31 05:50:56 DEBUG ipc.Client: Use SIMPLE authentication for
>> protocol ClientProtocol
>> 13/03/31 05:50:56 DEBUG ipc.Client: Connecting to XXXXXX:9000
>>
>>
>> Thanks
>> Gopi
>>
>>
>
>
Re: one minute delay in running a simple ls command on hadoop (maybe
near security groups..): hadoop 0.23.5
Posted by Gopi Krishna M <mg...@gmail.com>.
Hi Marcos,
executed it from shell.. returns immediately, no delay and the output is
just hduser..
hduser@hmaster-asia:~/installations/hadoop-0.23.5$ bash -c "id -Gn hduser"
hduser
one important thing I noticed on logs is (posted in my first mail), it
always get stuck *twice for exactly 60secs*. This looks like some timeout.
Went through settings which are close to 60secs, but no luck yet.
Output of strace shows futex as the last call after which it takes 2 mins
and then print the ls output. Not clear if it is stuck on futex there as
that seems to be the last system call. I tried with 0.23.6 but got the
same behavior.
*
*
*strace -t bin/hadoop dfs -ls / ; date*
.......
15:09:29 openat(AT_FDCWD,
"/home/hduser/installations/hadoop-0.23.5/share/hadoop/mapreduce/", O_RDON
LY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
15:09:29 getdents(3, /* 11 entries */, 32768) = 584
15:09:29 getdents(3, /* 0 entries */, 32768) = 0
15:09:29 close(3) = 0
15:09:29 mmap(NULL, 1052672, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7
ff716300000
15:09:29 mprotect(0x7ff716300000, 4096, PROT_NONE) = 0
15:09:29 clone(child_stack=0x7ff7163ffff0,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_T
HREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
parent_tidptr=0x7ff716400
9d0, tls=0x7ff716400700, child_tidptr=0x7ff7164009d0) = 42720
*15:09:29 futex*(0x7ff7164009d0, FUTEX_WAIT, 42720, NULLFound 5 items
drwxr-xr-x - hduser supergroup 0 2013-03-26 12:28
/aea8d83f-c0a6-4646-a88d-f9a89be92486
drwxr-xr-x - hduser supergroup 0 2013-03-06 12:43 /in
drwxr-xr-x - hduser supergroup 0 2013-03-04 12:57 /out1
drwxrwx--- - hduser supergroup 0 2013-03-04 12:32 /tmp
drwx------ - hduser supergroup 0 2013-03-04 12:49 /user
<unfinished ... exit status 0>
Thu Apr 4* 15:11:32 UTC 2013*
~/installations/hadoop-0.23.5$
On Thu, Apr 4, 2013 at 6:36 PM, MARCOS MEDRADO RUBINELLI <
marcosm@buscapecompany.com> wrote:
> Gopi,
>
> The namenode is essentially running bash -c "id -Gn hduser" and waiting
> for the response. You could try executing from the shell it to see if it
> does take a long time in Azure, or if the output is too complex to parse.
>
> Regards,
> Marcos
>
>
> In 04-04-2013 09:33, Gopi Krishna M wrote:
>
> in case anybody has some idea on what I should look for in debugging
> this.. below is a strace of the simple bin/hadoop dfs -ls / command run on
> the machine running namenode
>
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 49.56 0.000853 45 19 14 execve
> 36.20 0.000623 1 925 rt_sigprocmask
> 5.75 0.000099 2 62 brk
> 5.29 0.000091 9 10 getgid
> 3.20 0.000055 1 81 rt_sigaction
> 0.00 0.000000 0 71 read
> 0.00 0.000000 0 3 write
> 0.00 0.000000 0 49 11 open
> 0.00 0.000000 0 87 close
> 0.00 0.000000 0 71 31 stat
> 0.00 0.000000 0 36 fstat
> 0.00 0.000000 0 12 lseek
> 0.00 0.000000 0 72 mmap
> 0.00 0.000000 0 36 mprotect
> 0.00 0.000000 0 9 munmap
> 0.00 0.000000 0 19 rt_sigreturn
> 0.00 0.000000 0 2 2 ioctl
> 0.00 0.000000 0 40 31 access
> 0.00 0.000000 0 19 pipe
> 0.00 0.000000 0 8 dup2
> 0.00 0.000000 0 2 getpid
> 0.00 0.000000 0 20 clone
> 0.00 0.000000 0 38 19 wait4
> 0.00 0.000000 0 2 uname
> 0.00 0.000000 0 24 2 fcntl
> 0.00 0.000000 0 16 getdents
> 0.00 0.000000 0 2 readlink
> 0.00 0.000000 0 2 gettimeofday
> 0.00 0.000000 0 5 getrlimit
> 0.00 0.000000 0 10 getuid
> 0.00 0.000000 0 10 geteuid
> 0.00 0.000000 0 10 getegid
> 0.00 0.000000 0 2 getppid
> 0.00 0.000000 0 2 getpgrp
> 0.00 0.000000 0 5 arch_prctl
> 0.00 0.000000 0 2 1 futex
> 0.00 0.000000 0 1 set_tid_address
> 0.00 0.000000 0 10 2 openat
> 0.00 0.000000 0 1 set_robust_list
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 0.001721 1795 113 total
>
>
> On Sun, Mar 31, 2013 at 11:47 AM, Gopi Krishna M <mg...@gmail.com> wrote:
>
>> Hi..
>>
>> I have installed hadoop 0.23.5 and is working fine on two of my
>> installations. In a new installation on Windows Azure VMs, I am seeing an
>> inordinate delay (of 1 minute) when doing a simple dfs ls command.
>>
>> This is a 3 node cluster, running 64 bit Ubuntu 12.04.1 LTS, Sun jre
>>
>> command : bin/hadoop dfs -ls /
>>
>> With log set to DEBUG, following is the output on the console. The
>> logs in bold show the sleep/timeout/delay of *1 minute*. Any pointers
>> on what to check for / what is the reason for this delay?
>>
>> 13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field
>> org.apache.hadoop.metrics2.lib.MutableRate
>> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess
>> with annotation @org.apache.
>> hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of
>> successful kerberos logins
>> and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
>> 13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field
>> org.apache.hadoop.metrics2.lib.MutableRate
>> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure
>> with annotation @org.apache.
>> hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of
>> failed kerberos logins and
>> latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
>> 13/03/31 05:48:55 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and
>> group related metrics
>> 13/03/31 05:49:55 DEBUG security.Groups: Creating new Groups object
>> *13/03/31 05:49:55 DEBUG security.Groups: Group mapping
>> impl=org.apache.hadoop.security.ShellBasedUni*
>> *xGroupsMapping; cacheTimeout=300000*
>> *13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login*
>> 13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login commit
>> 13/03/31 05:50:55 DEBUG security.UserGroupInformation: using local
>> user:UnixPrincipal: hduser
>> 13/03/31 05:50:55 DEBUG security.UserGroupInformation: UGI
>> loginUser:hduser (auth:SIMPLE)
>> 13/03/31 05:50:56 DEBUG hdfs.DFSClient: Short circuit read is false
>> 13/03/31 05:50:56 DEBUG ipc.Client: The ping interval is 60000 ms.
>> 13/03/31 05:50:56 DEBUG ipc.Client: Use SIMPLE authentication for
>> protocol ClientProtocol
>> 13/03/31 05:50:56 DEBUG ipc.Client: Connecting to XXXXXX:9000
>>
>>
>> Thanks
>> Gopi
>>
>>
>
>
Re: one minute delay in running a simple ls command on hadoop (maybe
near security groups..): hadoop 0.23.5
Posted by Gopi Krishna M <mg...@gmail.com>.
Hi Marcos,
executed it from shell.. returns immediately, no delay and the output is
just hduser..
hduser@hmaster-asia:~/installations/hadoop-0.23.5$ bash -c "id -Gn hduser"
hduser
one important thing I noticed on logs is (posted in my first mail), it
always get stuck *twice for exactly 60secs*. This looks like some timeout.
Went through settings which are close to 60secs, but no luck yet.
Output of strace shows futex as the last call after which it takes 2 mins
and then print the ls output. Not clear if it is stuck on futex there as
that seems to be the last system call. I tried with 0.23.6 but got the
same behavior.
*
*
*strace -t bin/hadoop dfs -ls / ; date*
.......
15:09:29 openat(AT_FDCWD,
"/home/hduser/installations/hadoop-0.23.5/share/hadoop/mapreduce/", O_RDON
LY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
15:09:29 getdents(3, /* 11 entries */, 32768) = 584
15:09:29 getdents(3, /* 0 entries */, 32768) = 0
15:09:29 close(3) = 0
15:09:29 mmap(NULL, 1052672, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7
ff716300000
15:09:29 mprotect(0x7ff716300000, 4096, PROT_NONE) = 0
15:09:29 clone(child_stack=0x7ff7163ffff0,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_T
HREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
parent_tidptr=0x7ff716400
9d0, tls=0x7ff716400700, child_tidptr=0x7ff7164009d0) = 42720
*15:09:29 futex*(0x7ff7164009d0, FUTEX_WAIT, 42720, NULLFound 5 items
drwxr-xr-x - hduser supergroup 0 2013-03-26 12:28
/aea8d83f-c0a6-4646-a88d-f9a89be92486
drwxr-xr-x - hduser supergroup 0 2013-03-06 12:43 /in
drwxr-xr-x - hduser supergroup 0 2013-03-04 12:57 /out1
drwxrwx--- - hduser supergroup 0 2013-03-04 12:32 /tmp
drwx------ - hduser supergroup 0 2013-03-04 12:49 /user
<unfinished ... exit status 0>
Thu Apr 4* 15:11:32 UTC 2013*
~/installations/hadoop-0.23.5$
On Thu, Apr 4, 2013 at 6:36 PM, MARCOS MEDRADO RUBINELLI <
marcosm@buscapecompany.com> wrote:
> Gopi,
>
> The namenode is essentially running bash -c "id -Gn hduser" and waiting
> for the response. You could try executing from the shell it to see if it
> does take a long time in Azure, or if the output is too complex to parse.
>
> Regards,
> Marcos
>
>
> In 04-04-2013 09:33, Gopi Krishna M wrote:
>
> in case anybody has some idea on what I should look for in debugging
> this.. below is a strace of the simple bin/hadoop dfs -ls / command run on
> the machine running namenode
>
> % time seconds usecs/call calls errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 49.56 0.000853 45 19 14 execve
> 36.20 0.000623 1 925 rt_sigprocmask
> 5.75 0.000099 2 62 brk
> 5.29 0.000091 9 10 getgid
> 3.20 0.000055 1 81 rt_sigaction
> 0.00 0.000000 0 71 read
> 0.00 0.000000 0 3 write
> 0.00 0.000000 0 49 11 open
> 0.00 0.000000 0 87 close
> 0.00 0.000000 0 71 31 stat
> 0.00 0.000000 0 36 fstat
> 0.00 0.000000 0 12 lseek
> 0.00 0.000000 0 72 mmap
> 0.00 0.000000 0 36 mprotect
> 0.00 0.000000 0 9 munmap
> 0.00 0.000000 0 19 rt_sigreturn
> 0.00 0.000000 0 2 2 ioctl
> 0.00 0.000000 0 40 31 access
> 0.00 0.000000 0 19 pipe
> 0.00 0.000000 0 8 dup2
> 0.00 0.000000 0 2 getpid
> 0.00 0.000000 0 20 clone
> 0.00 0.000000 0 38 19 wait4
> 0.00 0.000000 0 2 uname
> 0.00 0.000000 0 24 2 fcntl
> 0.00 0.000000 0 16 getdents
> 0.00 0.000000 0 2 readlink
> 0.00 0.000000 0 2 gettimeofday
> 0.00 0.000000 0 5 getrlimit
> 0.00 0.000000 0 10 getuid
> 0.00 0.000000 0 10 geteuid
> 0.00 0.000000 0 10 getegid
> 0.00 0.000000 0 2 getppid
> 0.00 0.000000 0 2 getpgrp
> 0.00 0.000000 0 5 arch_prctl
> 0.00 0.000000 0 2 1 futex
> 0.00 0.000000 0 1 set_tid_address
> 0.00 0.000000 0 10 2 openat
> 0.00 0.000000 0 1 set_robust_list
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 0.001721 1795 113 total
>
>
> On Sun, Mar 31, 2013 at 11:47 AM, Gopi Krishna M <mg...@gmail.com> wrote:
>
>> Hi..
>>
>> I have installed hadoop 0.23.5 and is working fine on two of my
>> installations. In a new installation on Windows Azure VMs, I am seeing an
>> inordinate delay (of 1 minute) when doing a simple dfs ls command.
>>
>> This is a 3 node cluster, running 64 bit Ubuntu 12.04.1 LTS, Sun jre
>>
>> command : bin/hadoop dfs -ls /
>>
>> With log set to DEBUG, following is the output on the console. The
>> logs in bold show the sleep/timeout/delay of *1 minute*. Any pointers
>> on what to check for / what is the reason for this delay?
>>
>> 13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field
>> org.apache.hadoop.metrics2.lib.MutableRate
>> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess
>> with annotation @org.apache.
>> hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of
>> successful kerberos logins
>> and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
>> 13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field
>> org.apache.hadoop.metrics2.lib.MutableRate
>> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure
>> with annotation @org.apache.
>> hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of
>> failed kerberos logins and
>> latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
>> 13/03/31 05:48:55 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and
>> group related metrics
>> 13/03/31 05:49:55 DEBUG security.Groups: Creating new Groups object
>> *13/03/31 05:49:55 DEBUG security.Groups: Group mapping
>> impl=org.apache.hadoop.security.ShellBasedUni*
>> *xGroupsMapping; cacheTimeout=300000*
>> *13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login*
>> 13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login commit
>> 13/03/31 05:50:55 DEBUG security.UserGroupInformation: using local
>> user:UnixPrincipal: hduser
>> 13/03/31 05:50:55 DEBUG security.UserGroupInformation: UGI
>> loginUser:hduser (auth:SIMPLE)
>> 13/03/31 05:50:56 DEBUG hdfs.DFSClient: Short circuit read is false
>> 13/03/31 05:50:56 DEBUG ipc.Client: The ping interval is 60000 ms.
>> 13/03/31 05:50:56 DEBUG ipc.Client: Use SIMPLE authentication for
>> protocol ClientProtocol
>> 13/03/31 05:50:56 DEBUG ipc.Client: Connecting to XXXXXX:9000
>>
>>
>> Thanks
>> Gopi
>>
>>
>
>
Re: one minute delay in running a simple ls command on hadoop
(maybe near security groups..): hadoop 0.23.5
Posted by MARCOS MEDRADO RUBINELLI <ma...@buscapecompany.com>.
Gopi,
The namenode is essentially running bash -c "id -Gn hduser" and waiting for the response. You could try executing from the shell it to see if it does take a long time in Azure, or if the output is too complex to parse.
Regards,
Marcos
In 04-04-2013 09:33, Gopi Krishna M wrote:
in case anybody has some idea on what I should look for in debugging this.. below is a strace of the simple bin/hadoop dfs -ls / command run on the machine running namenode
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
49.56 0.000853 45 19 14 execve
36.20 0.000623 1 925 rt_sigprocmask
5.75 0.000099 2 62 brk
5.29 0.000091 9 10 getgid
3.20 0.000055 1 81 rt_sigaction
0.00 0.000000 0 71 read
0.00 0.000000 0 3 write
0.00 0.000000 0 49 11 open
0.00 0.000000 0 87 close
0.00 0.000000 0 71 31 stat
0.00 0.000000 0 36 fstat
0.00 0.000000 0 12 lseek
0.00 0.000000 0 72 mmap
0.00 0.000000 0 36 mprotect
0.00 0.000000 0 9 munmap
0.00 0.000000 0 19 rt_sigreturn
0.00 0.000000 0 2 2 ioctl
0.00 0.000000 0 40 31 access
0.00 0.000000 0 19 pipe
0.00 0.000000 0 8 dup2
0.00 0.000000 0 2 getpid
0.00 0.000000 0 20 clone
0.00 0.000000 0 38 19 wait4
0.00 0.000000 0 2 uname
0.00 0.000000 0 24 2 fcntl
0.00 0.000000 0 16 getdents
0.00 0.000000 0 2 readlink
0.00 0.000000 0 2 gettimeofday
0.00 0.000000 0 5 getrlimit
0.00 0.000000 0 10 getuid
0.00 0.000000 0 10 geteuid
0.00 0.000000 0 10 getegid
0.00 0.000000 0 2 getppid
0.00 0.000000 0 2 getpgrp
0.00 0.000000 0 5 arch_prctl
0.00 0.000000 0 2 1 futex
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 10 2 openat
0.00 0.000000 0 1 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.001721 1795 113 total
On Sun, Mar 31, 2013 at 11:47 AM, Gopi Krishna M <mg...@gmail.com>> wrote:
Hi..
I have installed hadoop 0.23.5 and is working fine on two of my installations. In a new installation on Windows Azure VMs, I am seeing an inordinate delay (of 1 minute) when doing a simple dfs ls command.
This is a 3 node cluster, running 64 bit Ubuntu 12.04.1 LTS, Sun jre
command : bin/hadoop dfs -ls /
With log set to DEBUG, following is the output on the console. The logs in bold show the sleep/timeout/delay of 1 minute. Any pointers on what to check for / what is the reason for this delay?
13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.
hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of successful kerberos logins
and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.
hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of failed kerberos logins and
latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
13/03/31 05:48:55 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
13/03/31 05:49:55 DEBUG security.Groups: Creating new Groups object
13/03/31 05:49:55 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUni
xGroupsMapping; cacheTimeout=300000
13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login
13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login commit
13/03/31 05:50:55 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: hduser
13/03/31 05:50:55 DEBUG security.UserGroupInformation: UGI loginUser:hduser (auth:SIMPLE)
13/03/31 05:50:56 DEBUG hdfs.DFSClient: Short circuit read is false
13/03/31 05:50:56 DEBUG ipc.Client: The ping interval is 60000 ms.
13/03/31 05:50:56 DEBUG ipc.Client: Use SIMPLE authentication for protocol ClientProtocol
13/03/31 05:50:56 DEBUG ipc.Client: Connecting to XXXXXX:9000
Thanks
Gopi
Re: one minute delay in running a simple ls command on hadoop
(maybe near security groups..): hadoop 0.23.5
Posted by MARCOS MEDRADO RUBINELLI <ma...@buscapecompany.com>.
Gopi,
The namenode is essentially running bash -c "id -Gn hduser" and waiting for the response. You could try executing from the shell it to see if it does take a long time in Azure, or if the output is too complex to parse.
Regards,
Marcos
In 04-04-2013 09:33, Gopi Krishna M wrote:
in case anybody has some idea on what I should look for in debugging this.. below is a strace of the simple bin/hadoop dfs -ls / command run on the machine running namenode
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
49.56 0.000853 45 19 14 execve
36.20 0.000623 1 925 rt_sigprocmask
5.75 0.000099 2 62 brk
5.29 0.000091 9 10 getgid
3.20 0.000055 1 81 rt_sigaction
0.00 0.000000 0 71 read
0.00 0.000000 0 3 write
0.00 0.000000 0 49 11 open
0.00 0.000000 0 87 close
0.00 0.000000 0 71 31 stat
0.00 0.000000 0 36 fstat
0.00 0.000000 0 12 lseek
0.00 0.000000 0 72 mmap
0.00 0.000000 0 36 mprotect
0.00 0.000000 0 9 munmap
0.00 0.000000 0 19 rt_sigreturn
0.00 0.000000 0 2 2 ioctl
0.00 0.000000 0 40 31 access
0.00 0.000000 0 19 pipe
0.00 0.000000 0 8 dup2
0.00 0.000000 0 2 getpid
0.00 0.000000 0 20 clone
0.00 0.000000 0 38 19 wait4
0.00 0.000000 0 2 uname
0.00 0.000000 0 24 2 fcntl
0.00 0.000000 0 16 getdents
0.00 0.000000 0 2 readlink
0.00 0.000000 0 2 gettimeofday
0.00 0.000000 0 5 getrlimit
0.00 0.000000 0 10 getuid
0.00 0.000000 0 10 geteuid
0.00 0.000000 0 10 getegid
0.00 0.000000 0 2 getppid
0.00 0.000000 0 2 getpgrp
0.00 0.000000 0 5 arch_prctl
0.00 0.000000 0 2 1 futex
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 10 2 openat
0.00 0.000000 0 1 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.001721 1795 113 total
On Sun, Mar 31, 2013 at 11:47 AM, Gopi Krishna M <mg...@gmail.com>> wrote:
Hi..
I have installed hadoop 0.23.5 and is working fine on two of my installations. In a new installation on Windows Azure VMs, I am seeing an inordinate delay (of 1 minute) when doing a simple dfs ls command.
This is a 3 node cluster, running 64 bit Ubuntu 12.04.1 LTS, Sun jre
command : bin/hadoop dfs -ls /
With log set to DEBUG, following is the output on the console. The logs in bold show the sleep/timeout/delay of 1 minute. Any pointers on what to check for / what is the reason for this delay?
13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.
hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of successful kerberos logins
and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.
hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of failed kerberos logins and
latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
13/03/31 05:48:55 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
13/03/31 05:49:55 DEBUG security.Groups: Creating new Groups object
13/03/31 05:49:55 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUni
xGroupsMapping; cacheTimeout=300000
13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login
13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login commit
13/03/31 05:50:55 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: hduser
13/03/31 05:50:55 DEBUG security.UserGroupInformation: UGI loginUser:hduser (auth:SIMPLE)
13/03/31 05:50:56 DEBUG hdfs.DFSClient: Short circuit read is false
13/03/31 05:50:56 DEBUG ipc.Client: The ping interval is 60000 ms.
13/03/31 05:50:56 DEBUG ipc.Client: Use SIMPLE authentication for protocol ClientProtocol
13/03/31 05:50:56 DEBUG ipc.Client: Connecting to XXXXXX:9000
Thanks
Gopi
Re: one minute delay in running a simple ls command on hadoop
(maybe near security groups..): hadoop 0.23.5
Posted by MARCOS MEDRADO RUBINELLI <ma...@buscapecompany.com>.
Gopi,
The namenode is essentially running bash -c "id -Gn hduser" and waiting for the response. You could try executing from the shell it to see if it does take a long time in Azure, or if the output is too complex to parse.
Regards,
Marcos
In 04-04-2013 09:33, Gopi Krishna M wrote:
in case anybody has some idea on what I should look for in debugging this.. below is a strace of the simple bin/hadoop dfs -ls / command run on the machine running namenode
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
49.56 0.000853 45 19 14 execve
36.20 0.000623 1 925 rt_sigprocmask
5.75 0.000099 2 62 brk
5.29 0.000091 9 10 getgid
3.20 0.000055 1 81 rt_sigaction
0.00 0.000000 0 71 read
0.00 0.000000 0 3 write
0.00 0.000000 0 49 11 open
0.00 0.000000 0 87 close
0.00 0.000000 0 71 31 stat
0.00 0.000000 0 36 fstat
0.00 0.000000 0 12 lseek
0.00 0.000000 0 72 mmap
0.00 0.000000 0 36 mprotect
0.00 0.000000 0 9 munmap
0.00 0.000000 0 19 rt_sigreturn
0.00 0.000000 0 2 2 ioctl
0.00 0.000000 0 40 31 access
0.00 0.000000 0 19 pipe
0.00 0.000000 0 8 dup2
0.00 0.000000 0 2 getpid
0.00 0.000000 0 20 clone
0.00 0.000000 0 38 19 wait4
0.00 0.000000 0 2 uname
0.00 0.000000 0 24 2 fcntl
0.00 0.000000 0 16 getdents
0.00 0.000000 0 2 readlink
0.00 0.000000 0 2 gettimeofday
0.00 0.000000 0 5 getrlimit
0.00 0.000000 0 10 getuid
0.00 0.000000 0 10 geteuid
0.00 0.000000 0 10 getegid
0.00 0.000000 0 2 getppid
0.00 0.000000 0 2 getpgrp
0.00 0.000000 0 5 arch_prctl
0.00 0.000000 0 2 1 futex
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 10 2 openat
0.00 0.000000 0 1 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.001721 1795 113 total
On Sun, Mar 31, 2013 at 11:47 AM, Gopi Krishna M <mg...@gmail.com>> wrote:
Hi..
I have installed hadoop 0.23.5 and is working fine on two of my installations. In a new installation on Windows Azure VMs, I am seeing an inordinate delay (of 1 minute) when doing a simple dfs ls command.
This is a 3 node cluster, running 64 bit Ubuntu 12.04.1 LTS, Sun jre
command : bin/hadoop dfs -ls /
With log set to DEBUG, following is the output on the console. The logs in bold show the sleep/timeout/delay of 1 minute. Any pointers on what to check for / what is the reason for this delay?
13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.
hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of successful kerberos logins
and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.
hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of failed kerberos logins and
latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
13/03/31 05:48:55 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
13/03/31 05:49:55 DEBUG security.Groups: Creating new Groups object
13/03/31 05:49:55 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUni
xGroupsMapping; cacheTimeout=300000
13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login
13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login commit
13/03/31 05:50:55 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: hduser
13/03/31 05:50:55 DEBUG security.UserGroupInformation: UGI loginUser:hduser (auth:SIMPLE)
13/03/31 05:50:56 DEBUG hdfs.DFSClient: Short circuit read is false
13/03/31 05:50:56 DEBUG ipc.Client: The ping interval is 60000 ms.
13/03/31 05:50:56 DEBUG ipc.Client: Use SIMPLE authentication for protocol ClientProtocol
13/03/31 05:50:56 DEBUG ipc.Client: Connecting to XXXXXX:9000
Thanks
Gopi
Re: one minute delay in running a simple ls command on hadoop
(maybe near security groups..): hadoop 0.23.5
Posted by MARCOS MEDRADO RUBINELLI <ma...@buscapecompany.com>.
Gopi,
The namenode is essentially running bash -c "id -Gn hduser" and waiting for the response. You could try executing from the shell it to see if it does take a long time in Azure, or if the output is too complex to parse.
Regards,
Marcos
In 04-04-2013 09:33, Gopi Krishna M wrote:
in case anybody has some idea on what I should look for in debugging this.. below is a strace of the simple bin/hadoop dfs -ls / command run on the machine running namenode
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
49.56 0.000853 45 19 14 execve
36.20 0.000623 1 925 rt_sigprocmask
5.75 0.000099 2 62 brk
5.29 0.000091 9 10 getgid
3.20 0.000055 1 81 rt_sigaction
0.00 0.000000 0 71 read
0.00 0.000000 0 3 write
0.00 0.000000 0 49 11 open
0.00 0.000000 0 87 close
0.00 0.000000 0 71 31 stat
0.00 0.000000 0 36 fstat
0.00 0.000000 0 12 lseek
0.00 0.000000 0 72 mmap
0.00 0.000000 0 36 mprotect
0.00 0.000000 0 9 munmap
0.00 0.000000 0 19 rt_sigreturn
0.00 0.000000 0 2 2 ioctl
0.00 0.000000 0 40 31 access
0.00 0.000000 0 19 pipe
0.00 0.000000 0 8 dup2
0.00 0.000000 0 2 getpid
0.00 0.000000 0 20 clone
0.00 0.000000 0 38 19 wait4
0.00 0.000000 0 2 uname
0.00 0.000000 0 24 2 fcntl
0.00 0.000000 0 16 getdents
0.00 0.000000 0 2 readlink
0.00 0.000000 0 2 gettimeofday
0.00 0.000000 0 5 getrlimit
0.00 0.000000 0 10 getuid
0.00 0.000000 0 10 geteuid
0.00 0.000000 0 10 getegid
0.00 0.000000 0 2 getppid
0.00 0.000000 0 2 getpgrp
0.00 0.000000 0 5 arch_prctl
0.00 0.000000 0 2 1 futex
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 10 2 openat
0.00 0.000000 0 1 set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00 0.001721 1795 113 total
On Sun, Mar 31, 2013 at 11:47 AM, Gopi Krishna M <mg...@gmail.com>> wrote:
Hi..
I have installed hadoop 0.23.5 and is working fine on two of my installations. In a new installation on Windows Azure VMs, I am seeing an inordinate delay (of 1 minute) when doing a simple dfs ls command.
This is a 3 node cluster, running 64 bit Ubuntu 12.04.1 LTS, Sun jre
command : bin/hadoop dfs -ls /
With log set to DEBUG, following is the output on the console. The logs in bold show the sleep/timeout/delay of 1 minute. Any pointers on what to check for / what is the reason for this delay?
13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.
hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of successful kerberos logins
and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
13/03/31 05:48:55 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate
org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.
hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of failed kerberos logins and
latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
13/03/31 05:48:55 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
13/03/31 05:49:55 DEBUG security.Groups: Creating new Groups object
13/03/31 05:49:55 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUni
xGroupsMapping; cacheTimeout=300000
13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login
13/03/31 05:50:55 DEBUG security.UserGroupInformation: hadoop login commit
13/03/31 05:50:55 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: hduser
13/03/31 05:50:55 DEBUG security.UserGroupInformation: UGI loginUser:hduser (auth:SIMPLE)
13/03/31 05:50:56 DEBUG hdfs.DFSClient: Short circuit read is false
13/03/31 05:50:56 DEBUG ipc.Client: The ping interval is 60000 ms.
13/03/31 05:50:56 DEBUG ipc.Client: Use SIMPLE authentication for protocol ClientProtocol
13/03/31 05:50:56 DEBUG ipc.Client: Connecting to XXXXXX:9000
Thanks
Gopi