You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by yann <ya...@yahoo.com> on 2014/01/06 15:39:30 UTC

Cannot run program "chmod" : too many open files

Hi guys,

first, thanks for your help so far!

I have a Nutch server running in one Java JVM, starting a new thread for
each crawl. 

I ran into a new issue after ~1 week of continuously repeated crawls (~ 10
sites ~ every hour each).

My hadoop.log said:

2014-01-04 19:15:42,229 WARN  mapred.LocalJobRunner - job_local_45262
java.io.IOException: Cannot run program "chmod": error=24, Too many open
files

later on, I get:

java.io.IOException: Cannot run program "bash": error=24, Too many open
files
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
	at org.apache.hadoop.util.Shell.run(Shell.java:134)
	at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
	at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329)
	at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
	at
org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
	at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1221)
	at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1129)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:359)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

and eventually all crawls fail.

I'm wondering if there is any file descriptor leak that might be fixed in a
patch somewhere, or if there might be any other idea on how to fix this?

Thanks a lot,

Yann




--
View this message in context: http://lucene.472066.n3.nabble.com/Cannot-run-program-chmod-too-many-open-files-tp4109753.html
Sent from the Nutch - User mailing list archive at Nabble.com.

RE: Cannot run program "chmod" : too many open files

Posted by yann <ya...@yahoo.com>.
Hi everybody,

reporting back on this issue.

It seems like for each crawl, I get one new line like the following in the
lsof output:

COMMAND     PID USER   FD      TYPE             DEVICE  SIZE/OFF     NODE
NAME
java          12813 egov  720u     sock                0,5       0t0         
46043464 can't identify protocol

I have a crawl server (Rest API on top of Nutch); each crawl is trigger by
an http request, so the socket leak could be in the web server (I originally
used the HttpServer that comes with the JDK), or by my own Jersey code, or
by Nutch itself.

HttpServer is not guilty; the same issue happens with Tomcat. To test my own
code, I exercised my exact code after commenting out the Nutch calls; the
socket leak disappeared. So it seems like the leak occurs in Nutch - one
dangling socket per crawl.

This page:
http://serverfault.com/questions/153983/sockets-found-by-lsof-but-not-by-netstat

suggests that this can occur when a socket is created, but there is no
connect() or bind() associated to it.

Anything I can do on my side at this point?

Thanks,

Yannick





--
View this message in context: http://lucene.472066.n3.nabble.com/Cannot-run-program-chmod-too-many-open-files-tp4109753p4111046.html
Sent from the Nutch - User mailing list archive at Nabble.com.

RE: Cannot run program "chmod" : too many open files

Posted by yann <ya...@yahoo.com>.
Hi Markus,

thanks for the suggestion. I'll monitor and report back. I don't see
anything untoward after a couple hours of crawling; and no giant list of
hadoop files, most of it is jars.

Yann



--
View this message in context: http://lucene.472066.n3.nabble.com/Cannot-run-program-chmod-too-many-open-files-tp4109753p4109776.html
Sent from the Nutch - User mailing list archive at Nabble.com.

RE: Cannot run program "chmod" : too many open files

Posted by Markus Jelsma <ma...@openindex.io>.
Yes, there is clearly a leak. Can you use lsof to find out which files are open that should not be open?
 
-----Original message-----
> From:yann <ya...@yahoo.com>
> Sent: Monday 6th January 2014 15:40
> To: user@nutch.apache.org
> Subject: Cannot run program &quot;chmod&quot; : too many open files
> 
> Hi guys,
> 
> first, thanks for your help so far!
> 
> I have a Nutch server running in one Java JVM, starting a new thread for
> each crawl. 
> 
> I ran into a new issue after ~1 week of continuously repeated crawls (~ 10
> sites ~ every hour each).
> 
> My hadoop.log said:
> 
> 2014-01-04 19:15:42,229 WARN  mapred.LocalJobRunner - job_local_45262
> java.io.IOException: Cannot run program "chmod": error=24, Too many open
> files
> 
> later on, I get:
> 
> java.io.IOException: Cannot run program "bash": error=24, Too many open
> files
> 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041)
> 	at org.apache.hadoop.util.Shell.runCommand(Shell.java:149)
> 	at org.apache.hadoop.util.Shell.run(Shell.java:134)
> 	at org.apache.hadoop.fs.DF.getAvailable(DF.java:73)
> 	at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:329)
> 	at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
> 	at
> org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:107)
> 	at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1221)
> 	at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1129)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:359)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> 
> and eventually all crawls fail.
> 
> I'm wondering if there is any file descriptor leak that might be fixed in a
> patch somewhere, or if there might be any other idea on how to fix this?
> 
> Thanks a lot,
> 
> Yann
> 
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Cannot-run-program-chmod-too-many-open-files-tp4109753.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>