You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Martin Aesch <ma...@googlemail.com> on 2013/12/07 23:19:21 UTC

Re: Cannot run program "/bin/ls": java.io.IOException: error=11, Resource temporarily unavailable

Dear Jon, dear nutchers,

thanks - do you remember by chance more about the backgrounds of that
problem? I am using nutch-1.7 and have currently the same issue while
parsing.

Nutch-1.7 in pseudo distributed mode, 32GB total, 768M per
mapper/reducer task, 8G for hadoop, 2G for nutch. 6 mappers, max total
in segment 100000 URLs. According to the log, each URL takes 0-1 ms to
be parsed.

Suddenly, the 1min load of the machine goes up to 200 and higher, even
(1/5/15-200-500-1200) I could see. But there is moderate CPU-usage, low
IO-Wait and ~50 percent idle.

Currently, I am running under same conditions, but only 10k URLs per
segment. Up to now for 30 generate-fetch-parse-update-cycles no problem.

I am already a veteran with ulimit problems and set values (ulimit -n:
250000, ulimit -u 320000) very high. Now I am out of ideas.

Any ideas, suggestions?

Cheers,
Martin

-----Original Message-----
From: Jon Uhal <jo...@gmail.com>
Reply-to: user@nutch.apache.org
To: user@nutch.apache.org
Subject: Cannot run program "/bin/ls": java.io.IOException: error=11,
Resource temporarily unavailable
Date: Wed, 20 Nov 2013 16:47:33 -0500

I just wanted to leave this here since it took me way too long to figure
out. For some people, this might be an obvious problem, but since it wasn't
to me, I want to make sure anyone else that gets this can have this answer.

I kept getting the following error when I was running a crawl. For me, it
was consistently happening, but I couldn't find any similar issues or
solutions on the typical sites. The closest thing I could find was this:
http://www.nosql.se/2011/10/hadoop-tasktracker-java-lang-outofmemoryerror/

Below is the error I was seeing. This is just one of several exceptions
that would happen during the parse but in the end, the parse step would
have too many errors and fail the Nutch error limit.

13/11/20 20:14:19 INFO parse.ParseSegment: ParseSegment: segment:
test/segments/20131120201240
13/11/20 20:14:20 INFO mapred.FileInputFormat: Total input paths to process
: 2
13/11/20 20:14:21 INFO mapred.JobClient: Running job: job_201311202006_0017
13/11/20 20:14:22 INFO mapred.JobClient:  map 0% reduce 0%
13/11/20 20:14:34 INFO mapred.JobClient:  map 40% reduce 0%
13/11/20 20:14:36 INFO mapred.JobClient:  map 50% reduce 0%
13/11/20 20:14:36 INFO mapred.JobClient: Task Id :
attempt_201311202006_0017_m_000001_0, Status : FAILED
java.lang.RuntimeException: Error while running command to get file
permissions : java.io.IOException: Cannot run program "/bin/ls":
java.io.IOException: error=11, Resource temporarily unavailable
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:200)
        at org.apache.hadoop.util.Shell.run(Shell.java:182)
        at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)
        at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:712)
        at
org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:448)
        at
org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:431)
        at
org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:267)
        at
org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:260)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.IOException: java.io.IOException: error=11, Resource
temporarily unavailable
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
        at java.lang.ProcessImpl.start(ProcessImpl.java:65)
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
        ... 15 more

        at
org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:473)
        at
org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:431)
        at
org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:267)
        at
org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:260)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/11/20 20:14:44 INFO mapred.JobClient:  map 75% reduce 16%
13/11/20 20:14:47 INFO mapred.JobClient:  map 100% reduce 16%
13/11/20 20:14:53 INFO mapred.JobClient:  map 100% reduce 72%
13/11/20 20:14:54 INFO mapred.JobClient:  map 100% reduce 100%
13/11/20 20:14:56 INFO mapred.JobClient: Job complete: job_201311202006_0017

I can't remember exactly what caused me to think it might be related to the
number of available file handles, but that is where I found my solution.

Originally, my system was set for 1024 open files:

$ ulimit -n
1024

Bumping this up to 8096 has fixed my issue at the moment. This is different
depending on the system you run on, so I'm not going to try to cover each
system's solution, but getting this limit increased was critical to getting
my crawl to run through more than one iteration.

Hope this helps anyone with the same problem.

Re: Cannot run program "/bin/ls": java.io.IOException: error=11, Resource temporarily unavailable

Posted by Martin Aesch <ma...@googlemail.com>.

Thanks, Tejas, I adjusted my settings.

I digged further around and "found" the reason for the sudden increase
in number of threads, described in
https://issues.apache.org/jira/browse/NUTCH-1640

Working smoothly now, great!

Martin


-----Original Message-----
From: Tejas Patil <te...@gmail.com>
Reply-to: user@nutch.apache.org
To: user@nutch.apache.org <us...@nutch.apache.org>
Subject: Re: Cannot run program "/bin/ls": java.io.IOException:
error=11, Resource temporarily unavailable
Date: Sat, 7 Dec 2013 18:19:44 -0800

"java.io.IOException: error=11, Resource temporarily unavailable" can be
accounted to 2 reasons:

Too many processes are already running (
http://stackoverflow.com/questions/8384000/java-io-ioexception-error-11) OR
Too many files open (
http://stackoverflow.com/questions/15494749/java-lang-outofmemoryerror-unable-to-create-new-native-thread-for-big-data-set
)

Coming to your load/environment: Nutch / Hadoop won't leave behind stale
processes or open files which would pile up after running several rounds.
After every round, it is expected to clear up things. To verify this, have
a script to capture #live-processes and #open-file-handles periodically
while nutch is running.

Thanks,
Tejas

On Sat, Dec 7, 2013 at 2:19 PM, Martin Aesch <ma...@googlemail.com>wrote:

> Dear Jon, dear nutchers,
>
> thanks - do you remember by chance more about the backgrounds of that
> problem? I am using nutch-1.7 and have currently the same issue while
> parsing.
>
> Nutch-1.7 in pseudo distributed mode, 32GB total, 768M per
> mapper/reducer task, 8G for hadoop, 2G for nutch. 6 mappers, max total
> in segment 100000 URLs. According to the log, each URL takes 0-1 ms to
> be parsed.


> Suddenly, the 1min load of the machine goes up to 200 and higher, even
> (1/5/15-200-500-1200) I could see. But there is moderate CPU-usage, low
> IO-Wait and ~50 percent idle.
>
> Currently, I am running under same conditions, but only 10k URLs per
> segment. Up to now for 30 generate-fetch-parse-update-cycles no problem.
>
> I am already a veteran with ulimit problems and set values (ulimit -n:
> 250000, ulimit -u 320000) very high. Now I am out of ideas.
>
> Any ideas, suggestions?
>
> Cheers,
> Martin
>
> -----Original Message-----
> From: Jon Uhal <jo...@gmail.com>
> Reply-to: user@nutch.apache.org
> To: user@nutch.apache.org
> Subject: Cannot run program "/bin/ls": java.io.IOException: error=11,
> Resource temporarily unavailable
> Date: Wed, 20 Nov 2013 16:47:33 -0500
>
> I just wanted to leave this here since it took me way too long to figure
> out. For some people, this might be an obvious problem, but since it wasn't
> to me, I want to make sure anyone else that gets this can have this answer.
>
> I kept getting the following error when I was running a crawl. For me, it
> was consistently happening, but I couldn't find any similar issues or
> solutions on the typical sites. The closest thing I could find was this:
> http://www.nosql.se/2011/10/hadoop-tasktracker-java-lang-outofmemoryerror/
>
> Below is the error I was seeing. This is just one of several exceptions
> that would happen during the parse but in the end, the parse step would
> have too many errors and fail the Nutch error limit.
>
> 13/11/20 20:14:19 INFO parse.ParseSegment: ParseSegment: segment:
> test/segments/20131120201240
> 13/11/20 20:14:20 INFO mapred.FileInputFormat: Total input paths to process
> : 2
> 13/11/20 20:14:21 INFO mapred.JobClient: Running job: job_201311202006_0017
> 13/11/20 20:14:22 INFO mapred.JobClient:  map 0% reduce 0%
> 13/11/20 20:14:34 INFO mapred.JobClient:  map 40% reduce 0%
> 13/11/20 20:14:36 INFO mapred.JobClient:  map 50% reduce 0%
> 13/11/20 20:14:36 INFO mapred.JobClient: Task Id :
> attempt_201311202006_0017_m_000001_0, Status : FAILED
> java.lang.RuntimeException: Error while running command to get file
> permissions : java.io.IOException: Cannot run program "/bin/ls":
> java.io.IOException: error=11, Resource temporarily unavailable
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:200)
>         at org.apache.hadoop.util.Shell.run(Shell.java:182)
>         at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
>         at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
>         at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)
>         at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:712)
>         at
>
> org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:448)
>         at
>
> org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:431)
>         at
> org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:267)
>         at
>
> org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:260)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.io.IOException: java.io.IOException: error=11, Resource
> temporarily unavailable
>         at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
>         at java.lang.ProcessImpl.start(ProcessImpl.java:65)
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
>         ... 15 more
>
>         at
>
> org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:473)
>         at
>
> org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:431)
>         at
> org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:267)
>         at
>
> org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:260)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
> 13/11/20 20:14:44 INFO mapred.JobClient:  map 75% reduce 16%
> 13/11/20 20:14:47 INFO mapred.JobClient:  map 100% reduce 16%
> 13/11/20 20:14:53 INFO mapred.JobClient:  map 100% reduce 72%
> 13/11/20 20:14:54 INFO mapred.JobClient:  map 100% reduce 100%
> 13/11/20 20:14:56 INFO mapred.JobClient: Job complete:
> job_201311202006_0017
>
> I can't remember exactly what caused me to think it might be related to the
> number of available file handles, but that is where I found my solution.
>
> Originally, my system was set for 1024 open files:
>
> $ ulimit -n
> 1024
>
> Bumping this up to 8096 has fixed my issue at the moment. This is different
> depending on the system you run on, so I'm not going to try to cover each
> system's solution, but getting this limit increased was critical to getting
> my crawl to run through more than one iteration.
>
> Hope this helps anyone with the same problem.
>
>
>

Re: Cannot run program "/bin/ls": java.io.IOException: error=11, Resource temporarily unavailable

Posted by Tejas Patil <te...@gmail.com>.

"java.io.IOException: error=11, Resource temporarily unavailable" can be
accounted to 2 reasons:

Too many processes are already running (
http://stackoverflow.com/questions/8384000/java-io-ioexception-error-11) OR
Too many files open (
http://stackoverflow.com/questions/15494749/java-lang-outofmemoryerror-unable-to-create-new-native-thread-for-big-data-set
)

Coming to your load/environment: Nutch / Hadoop won't leave behind stale
processes or open files which would pile up after running several rounds.
After every round, it is expected to clear up things. To verify this, have
a script to capture #live-processes and #open-file-handles periodically
while nutch is running.

Thanks,
Tejas

On Sat, Dec 7, 2013 at 2:19 PM, Martin Aesch <ma...@googlemail.com>wrote:

> Dear Jon, dear nutchers,
>
> thanks - do you remember by chance more about the backgrounds of that
> problem? I am using nutch-1.7 and have currently the same issue while
> parsing.
>
> Nutch-1.7 in pseudo distributed mode, 32GB total, 768M per
> mapper/reducer task, 8G for hadoop, 2G for nutch. 6 mappers, max total
> in segment 100000 URLs. According to the log, each URL takes 0-1 ms to
> be parsed.


> Suddenly, the 1min load of the machine goes up to 200 and higher, even
> (1/5/15-200-500-1200) I could see. But there is moderate CPU-usage, low
> IO-Wait and ~50 percent idle.
>
> Currently, I am running under same conditions, but only 10k URLs per
> segment. Up to now for 30 generate-fetch-parse-update-cycles no problem.
>
> I am already a veteran with ulimit problems and set values (ulimit -n:
> 250000, ulimit -u 320000) very high. Now I am out of ideas.
>
> Any ideas, suggestions?
>
> Cheers,
> Martin
>
> -----Original Message-----
> From: Jon Uhal <jo...@gmail.com>
> Reply-to: user@nutch.apache.org
> To: user@nutch.apache.org
> Subject: Cannot run program "/bin/ls": java.io.IOException: error=11,
> Resource temporarily unavailable
> Date: Wed, 20 Nov 2013 16:47:33 -0500
>
> I just wanted to leave this here since it took me way too long to figure
> out. For some people, this might be an obvious problem, but since it wasn't
> to me, I want to make sure anyone else that gets this can have this answer.
>
> I kept getting the following error when I was running a crawl. For me, it
> was consistently happening, but I couldn't find any similar issues or
> solutions on the typical sites. The closest thing I could find was this:
> http://www.nosql.se/2011/10/hadoop-tasktracker-java-lang-outofmemoryerror/
>
> Below is the error I was seeing. This is just one of several exceptions
> that would happen during the parse but in the end, the parse step would
> have too many errors and fail the Nutch error limit.
>
> 13/11/20 20:14:19 INFO parse.ParseSegment: ParseSegment: segment:
> test/segments/20131120201240
> 13/11/20 20:14:20 INFO mapred.FileInputFormat: Total input paths to process
> : 2
> 13/11/20 20:14:21 INFO mapred.JobClient: Running job: job_201311202006_0017
> 13/11/20 20:14:22 INFO mapred.JobClient:  map 0% reduce 0%
> 13/11/20 20:14:34 INFO mapred.JobClient:  map 40% reduce 0%
> 13/11/20 20:14:36 INFO mapred.JobClient:  map 50% reduce 0%
> 13/11/20 20:14:36 INFO mapred.JobClient: Task Id :
> attempt_201311202006_0017_m_000001_0, Status : FAILED
> java.lang.RuntimeException: Error while running command to get file
> permissions : java.io.IOException: Cannot run program "/bin/ls":
> java.io.IOException: error=11, Resource temporarily unavailable
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:200)
>         at org.apache.hadoop.util.Shell.run(Shell.java:182)
>         at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
>         at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
>         at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)
>         at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:712)
>         at
>
> org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:448)
>         at
>
> org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:431)
>         at
> org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:267)
>         at
>
> org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:260)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.io.IOException: java.io.IOException: error=11, Resource
> temporarily unavailable
>         at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
>         at java.lang.ProcessImpl.start(ProcessImpl.java:65)
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
>         ... 15 more
>
>         at
>
> org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:473)
>         at
>
> org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:431)
>         at
> org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:267)
>         at
>
> org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:260)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
> 13/11/20 20:14:44 INFO mapred.JobClient:  map 75% reduce 16%
> 13/11/20 20:14:47 INFO mapred.JobClient:  map 100% reduce 16%
> 13/11/20 20:14:53 INFO mapred.JobClient:  map 100% reduce 72%
> 13/11/20 20:14:54 INFO mapred.JobClient:  map 100% reduce 100%
> 13/11/20 20:14:56 INFO mapred.JobClient: Job complete:
> job_201311202006_0017
>
> I can't remember exactly what caused me to think it might be related to the
> number of available file handles, but that is where I found my solution.
>
> Originally, my system was set for 1024 open files:
>
> $ ulimit -n
> 1024
>
> Bumping this up to 8096 has fixed my issue at the moment. This is different
> depending on the system you run on, so I'm not going to try to cover each
> system's solution, but getting this limit increased was critical to getting
> my crawl to run through more than one iteration.
>
> Hope this helps anyone with the same problem.
>
>
>