You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by ni...@gmail.com on 2005/10/19 03:28:49 UTC

No buffer space available

Hi,
 I was trying to fetch DMOZ open directory using using the exact example in
the nutch tutorial website. So did the following steps:
 mkdir db
mkdir segments
bin/nutch admin db -create
bin/nutch inject db -dmozfile ../nutch-0.7.1/content.rdf.u8 -subset 3000
bin/nutch generate db segments
s1=`ls -d segments/2* | tail -1`
echo $s1
bin/nutch fetch -showThreadID -noParsing -threads 50 $s1
bin/nutch updatedb db $s1
 It starts fetching the pages, but after couple hundred pages it starts
giving me this exception:
 "java.net.SocketException: No buffer space available"
 Do you have any idea why this might happen? I know it is running out of
availabe buffer for new socket, but why the old socket are not closed? Even
if a fetch fails its socket should be closed and the its buffer should get
freed!
 I tried both 0.7 and 0.7.1.
 Thanks. Nima

RE: No buffer space available

Posted by Fuad Efendi <fu...@efendi.ca>.
>It means the socket buffer is
>not global for all processes and VM assigns some certain portion of
>global buffer to each process.

Yes! Buffer should be shared for all IP packets, "common-sense", IP packet
contains information on a port number (Socket), and all IP packets share
same "buffer", then they are handled (passed to a higher layer in OSI, to
Transport Layer / TCP) accordingly to info found in IP header...
7 Layers of the OSI Model, http://www.webopedia.com/quick_ref/OSI_Layers.asp

Thanks


Re: No buffer space available

Posted by ni...@gmail.com.
I almost have the same setting. My Linux is Cent OS, I tried all sort
of TCP tunings,but it still gives me the same error.

I found something else, it works fine with 2 threads, and if I run
another instance of nutch on the same machine with 200 threads, the
200 threads one fails with No Buffer Space error after a while, but
the 2 threads one still working fine!! It means the socket buffer is
not global for all processes and VM assigns some certain portion of
global buffer to each process.

I could mange it to work on another machine with about 200 threads and
same connection speed, I could get almost 28 pages/sec with that
connection. Now, I am pretty sure this is OS related and nutch
problem.

Thanks for the helps and tips.

Nima


On 10/20/05, Fuad Efendi <fu...@efendi.ca> wrote:
> Check this please,
> net.ipv4.ip_local_port_range = 1024 65000
>
>
> I don't know Linux in-depth...
>
> Funny, you have very good Internet connection 100Mbps, probably synchronous,
> and such a problem may also happen if hardware is not fast enough... Some
> kind of a handshake with TCP. After handshake, wnen both sites defined speed
> of a hardware, Server sends to you probably 20 IP packets at a time, and
> waits for 1 single IP packet with confirmation. "Buffer overload" at your
> site... Smth at TCP layer...
>
>
> At my Linux, etc/sysctl.conf (recommended by Oracle, should be higher than
> that):
>
> kernel.shmall = 2097152
> kernel.shmmax = 2147483648
> kernel.shmmni = 4096
> kernel.sem = 250 32000 100 128
> fs.file-max = 65536
> net.ipv4.ip_local_port_range = 1024 65000
> rmem_default = 262144
> rmem_max = 262144
> wmem_default = 262144
> wmem_max = 262144
>
>
>
> -----Original Message-----
> Sent: Thursday, October 20, 2005 1:42 PM
> To: nutch-dev@lucene.apache.org
> Subject: Re: No buffer space available
>
>
> I've been searching on Google for two days. I retuned all the Kernel
> paramters. I did the following changes for the Kernel setting:
> # increase TCP max buffer size
>  net.core.rmem_max = 16777216
>  net.core.wmem_max = 16777216
>
>  # increase Linux autotuning TCP buffer limits
>  # min, default, and max number of bytes to use
>  net.ipv4.tcp_rmem = 4096 87380 16777216
>  net.ipv4.tcp_wmem = 4096 65536 16777216
>
> But I still have the same problem. It might be because of the maximum
> number of TCP connection on Linux, do you have any idea how I can
> figure out maximum possible number of TCP connection on Redhat?
>
> Thanks
>
>
>

RE: No buffer space available

Posted by Fuad Efendi <fu...@efendi.ca>.
Check this please, 
net.ipv4.ip_local_port_range = 1024 65000


I don't know Linux in-depth...

Funny, you have very good Internet connection 100Mbps, probably synchronous,
and such a problem may also happen if hardware is not fast enough... Some
kind of a handshake with TCP. After handshake, wnen both sites defined speed
of a hardware, Server sends to you probably 20 IP packets at a time, and
waits for 1 single IP packet with confirmation. "Buffer overload" at your
site... Smth at TCP layer...


At my Linux, etc/sysctl.conf (recommended by Oracle, should be higher than
that):

kernel.shmall = 2097152
kernel.shmmax = 2147483648
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
fs.file-max = 65536
net.ipv4.ip_local_port_range = 1024 65000
rmem_default = 262144
rmem_max = 262144
wmem_default = 262144
wmem_max = 262144



-----Original Message-----
Sent: Thursday, October 20, 2005 1:42 PM
To: nutch-dev@lucene.apache.org
Subject: Re: No buffer space available


I've been searching on Google for two days. I retuned all the Kernel
paramters. I did the following changes for the Kernel setting:
# increase TCP max buffer size
  net.core.rmem_max = 16777216
  net.core.wmem_max = 16777216

  # increase Linux autotuning TCP buffer limits
  # min, default, and max number of bytes to use
  net.ipv4.tcp_rmem = 4096 87380 16777216
  net.ipv4.tcp_wmem = 4096 65536 16777216

But I still have the same problem. It might be because of the maximum
number of TCP connection on Linux, do you have any idea how I can
figure out maximum possible number of TCP connection on Redhat?

Thanks



Re: No buffer space available

Posted by ni...@gmail.com.
I've been searching on Google for two days. I retuned all the Kernel
paramters. I did the following changes for the Kernel setting:
# increase TCP max buffer size
  net.core.rmem_max = 16777216
  net.core.wmem_max = 16777216

  # increase Linux autotuning TCP buffer limits
  # min, default, and max number of bytes to use
  net.ipv4.tcp_rmem = 4096 87380 16777216
  net.ipv4.tcp_wmem = 4096 65536 16777216

But I still have the same problem. It might be because of the maximum
number of TCP connection on Linux, do you have any idea how I can
figure out maximum possible number of TCP connection on Redhat?

Thanks


On 10/19/05, Fuad Efendi <fu...@efendi.ca> wrote:
> I have 2x Opteron 252 Troy, 64-bit of course, 8Gb. And, Suse Linux, they are
> first with 64-bit version for Opteron-based... Most Linux flavours are
> 32-bit compilations...
>
> It's your OS, Hardware + Driver + Linux (is it really 64-bit native
> compilation, are you sure?)
>
> "No buffer space available" - try to perform a search at Linux related
> sites, I am sure it is not Nutch, it's message from OS.
>
> You will easily find a lot:
> http://www.google.ca/search?hl=en&q=No+buffer+space+available+Linux&meta=
>
>
>
>
> -----Original Message-----
> From: nimakh@gmail.com [mailto:nimakh@gmail.com]
> Sent: Wednesday, October 19, 2005 12:58 PM
> To: nutch-dev@lucene.apache.org
> Subject: Re: No buffer space available
>
>
> Thanks for the tips. But I have a monster computer, 12G RAM and dual
> 64 bits processors, my network connection is 100 MB/S! I guess Nutch
> doesn't close the opened sockets in the case of bad host! I am still
> strugelling with problem.
>
> Any other idea?
>
> Nima
>
>
> On 10/18/05, Fuad Efendi <fu...@efendi.ca> wrote:
> > For comparison (in order to locate a problem...) you may try also
> > http://htmlparser.sourceforge.net/
> >
> > - it has web-site crawler written in Java.
> >
> > Also, some Linux-specific staff, web-site crawlers written in C
> >
> >
> > -----Original Message-----
> > From: nimakh@gmail.com [mailto:nimakh@gmail.com]
> > Sent: Tuesday, October 18, 2005 11:00 PM
> > To: nutch-dev@lucene.apache.org
> > Subject: Re: No buffer space available
> >
> >
> > But I tired it on two different machines, one with Linux Cent OS and the
> > other one Linux UBUNTU!
> >
> > On example of the given Exception is like this:
> >
> > 051018 153727 28 fetching http://perso.wanadoo.es/largo/
> > java.net.SocketException: No buffer space available
> >        at java.net.PlainSocketImpl.socketConnect(Native Method)
> >        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
> >        at
> > java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
> >        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
> >        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:364)
> >        at java.net.Socket.connect(Socket.java:507)
> >        at java.net.Socket.connect(Socket.java:457)
> >        at java.net.Socket.<init>(Socket.java:365)
> >        at java.net.Socket.<init>(Socket.java:238)
> >        at
> > org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.c
> >    reateSocket(DefaultProtocolSocketFactory.java:79)
> >        at
> > org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$
> >    1.doit(ControllerThreadSocketFactory.java:90)
> >        at
> > org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$
> >    SocketTask.run(ControllerThreadSocketFactory.java:157)
> >        at java.lang.Thread.run(Thread.java:595)
> > Nima
> >
> >
> >
> >
> >
> > On 10/18/05, Fuad Efendi <fu...@efendi.ca> wrote:
> > >
> > > java.net.SocketException - Thrown to indicate that there is an error
> > > in the underlying protocol, such as a TCP error.
> > >
> > > "No buffer space available" - message comes from underlying OS...
> > >
> > > I think it's not Nutch or configuration of Nutch...
> > >
> > > May be OS tuning? May be JVM version/vendor?
> > >
> > > I don't know in-depth UNIX, but it has some specific settings for
> > > protocol...
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: nimakh@gmail.com [mailto:nimakh@gmail.com]
> > > Sent: Tuesday, October 18, 2005 9:29 PM
> > > To: nutch-dev@lucene.apache.org
> > > Subject: No buffer space available
> > >
> > >
> > > Hi,
> > > I was trying to fetch DMOZ open directory using using the exact
> > > example in the nutch tutorial website. So did the following steps:
> > > mkdir db mkdir segments bin/nutch admin db -create bin/nutch inject db
> > > -dmozfile ../nutch-0.7.1/content.rdf.u8 -subset 3000 bin/nutch
> > > generate db segments s1=`ls -d segments/2* | tail -1` echo $s1
> > > bin/nutch fetch -showThreadID -noParsing -threads 50 $s1 bin/nutch
> > > updatedb db $s1  It starts fetching the pages, but after couple
> > > hundred pages it starts giving me this exception:
> > > "java.net.SocketException: No buffer space available"
> > > Do you have any idea why this might happen? I know it is running out of
> > > availabe buffer for new socket, but why the old socket are not closed?
> > Even
> > > if a fetch fails its socket should be closed and the its buffer should
> get
> > > freed!  I tried both 0.7 and 0.7.1.  Thanks. Nima
> > >
> > >
> >
> >
> >
>
>
>

Re: No buffer space available

Posted by Paul Baclace <pe...@baclace.net>.
nimakh@gmail.com wrote:
> Thanks for the tips. But I have a monster computer, 12G RAM and dual
> 64 bits processors, my network connection is 100 MB/S! I guess Nutch
> doesn't close the opened sockets in the case of bad host! I am still
> strugelling with problem.

If the OS is using a default/generic configuration, you should look
into tuning the machine for a specific purpose.

Linux has an enormous number of settings for max sockets (with a
breakdown by various states like SYN or FIN, etc.) and various buffer
settings, including a max total buffer space for sockets which might
be your problem.  The socket maximums can be set at both the
system level and per-process.

If you can report how many sockets are in the various states when
the problem occurs, that would be needed in order to determine
whether there is a leak.  In any case, there are always opportunities
to tune the server and the software.

Paul


RE: No buffer space available

Posted by Fuad Efendi <fu...@efendi.ca>.
I have 2x Opteron 252 Troy, 64-bit of course, 8Gb. And, Suse Linux, they are
first with 64-bit version for Opteron-based... Most Linux flavours are
32-bit compilations...

It's your OS, Hardware + Driver + Linux (is it really 64-bit native
compilation, are you sure?)

"No buffer space available" - try to perform a search at Linux related
sites, I am sure it is not Nutch, it's message from OS.

You will easily find a lot:
http://www.google.ca/search?hl=en&q=No+buffer+space+available+Linux&meta=




-----Original Message-----
From: nimakh@gmail.com [mailto:nimakh@gmail.com] 
Sent: Wednesday, October 19, 2005 12:58 PM
To: nutch-dev@lucene.apache.org
Subject: Re: No buffer space available


Thanks for the tips. But I have a monster computer, 12G RAM and dual
64 bits processors, my network connection is 100 MB/S! I guess Nutch
doesn't close the opened sockets in the case of bad host! I am still
strugelling with problem.

Any other idea?

Nima


On 10/18/05, Fuad Efendi <fu...@efendi.ca> wrote:
> For comparison (in order to locate a problem...) you may try also
> http://htmlparser.sourceforge.net/
>
> - it has web-site crawler written in Java.
>
> Also, some Linux-specific staff, web-site crawlers written in C
>
>
> -----Original Message-----
> From: nimakh@gmail.com [mailto:nimakh@gmail.com]
> Sent: Tuesday, October 18, 2005 11:00 PM
> To: nutch-dev@lucene.apache.org
> Subject: Re: No buffer space available
>
>
> But I tired it on two different machines, one with Linux Cent OS and the
> other one Linux UBUNTU!
>
> On example of the given Exception is like this:
>
> 051018 153727 28 fetching http://perso.wanadoo.es/largo/
> java.net.SocketException: No buffer space available
>        at java.net.PlainSocketImpl.socketConnect(Native Method)
>        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>        at
> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:364)
>        at java.net.Socket.connect(Socket.java:507)
>        at java.net.Socket.connect(Socket.java:457)
>        at java.net.Socket.<init>(Socket.java:365)
>        at java.net.Socket.<init>(Socket.java:238)
>        at
> org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.c
>    reateSocket(DefaultProtocolSocketFactory.java:79)
>        at
> org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$
>    1.doit(ControllerThreadSocketFactory.java:90)
>        at
> org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$
>    SocketTask.run(ControllerThreadSocketFactory.java:157)
>        at java.lang.Thread.run(Thread.java:595)
> Nima
>
>
>
>
>
> On 10/18/05, Fuad Efendi <fu...@efendi.ca> wrote:
> >
> > java.net.SocketException - Thrown to indicate that there is an error
> > in the underlying protocol, such as a TCP error.
> >
> > "No buffer space available" - message comes from underlying OS...
> >
> > I think it's not Nutch or configuration of Nutch...
> >
> > May be OS tuning? May be JVM version/vendor?
> >
> > I don't know in-depth UNIX, but it has some specific settings for
> > protocol...
> >
> >
> >
> > -----Original Message-----
> > From: nimakh@gmail.com [mailto:nimakh@gmail.com]
> > Sent: Tuesday, October 18, 2005 9:29 PM
> > To: nutch-dev@lucene.apache.org
> > Subject: No buffer space available
> >
> >
> > Hi,
> > I was trying to fetch DMOZ open directory using using the exact
> > example in the nutch tutorial website. So did the following steps:
> > mkdir db mkdir segments bin/nutch admin db -create bin/nutch inject db
> > -dmozfile ../nutch-0.7.1/content.rdf.u8 -subset 3000 bin/nutch
> > generate db segments s1=`ls -d segments/2* | tail -1` echo $s1
> > bin/nutch fetch -showThreadID -noParsing -threads 50 $s1 bin/nutch
> > updatedb db $s1  It starts fetching the pages, but after couple
> > hundred pages it starts giving me this exception:
> > "java.net.SocketException: No buffer space available"
> > Do you have any idea why this might happen? I know it is running out of
> > availabe buffer for new socket, but why the old socket are not closed?
> Even
> > if a fetch fails its socket should be closed and the its buffer should
get
> > freed!  I tried both 0.7 and 0.7.1.  Thanks. Nima
> >
> >
>
>
>



Re: No buffer space available

Posted by ni...@gmail.com.
Thanks for the tips. But I have a monster computer, 12G RAM and dual
64 bits processors, my network connection is 100 MB/S! I guess Nutch
doesn't close the opened sockets in the case of bad host! I am still
strugelling with problem.

Any other idea?

Nima


On 10/18/05, Fuad Efendi <fu...@efendi.ca> wrote:
> For comparison (in order to locate a problem...) you may try also
> http://htmlparser.sourceforge.net/
>
> - it has web-site crawler written in Java.
>
> Also, some Linux-specific staff, web-site crawlers written in C
>
>
> -----Original Message-----
> From: nimakh@gmail.com [mailto:nimakh@gmail.com]
> Sent: Tuesday, October 18, 2005 11:00 PM
> To: nutch-dev@lucene.apache.org
> Subject: Re: No buffer space available
>
>
> But I tired it on two different machines, one with Linux Cent OS and the
> other one Linux UBUNTU!
>
> On example of the given Exception is like this:
>
> 051018 153727 28 fetching http://perso.wanadoo.es/largo/
> java.net.SocketException: No buffer space available
>        at java.net.PlainSocketImpl.socketConnect(Native Method)
>        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>        at
> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:364)
>        at java.net.Socket.connect(Socket.java:507)
>        at java.net.Socket.connect(Socket.java:457)
>        at java.net.Socket.<init>(Socket.java:365)
>        at java.net.Socket.<init>(Socket.java:238)
>        at
> org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.c
>    reateSocket(DefaultProtocolSocketFactory.java:79)
>        at
> org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$
>    1.doit(ControllerThreadSocketFactory.java:90)
>        at
> org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$
>    SocketTask.run(ControllerThreadSocketFactory.java:157)
>        at java.lang.Thread.run(Thread.java:595)
> Nima
>
>
>
>
>
> On 10/18/05, Fuad Efendi <fu...@efendi.ca> wrote:
> >
> > java.net.SocketException - Thrown to indicate that there is an error
> > in the underlying protocol, such as a TCP error.
> >
> > "No buffer space available" - message comes from underlying OS...
> >
> > I think it's not Nutch or configuration of Nutch...
> >
> > May be OS tuning? May be JVM version/vendor?
> >
> > I don't know in-depth UNIX, but it has some specific settings for
> > protocol...
> >
> >
> >
> > -----Original Message-----
> > From: nimakh@gmail.com [mailto:nimakh@gmail.com]
> > Sent: Tuesday, October 18, 2005 9:29 PM
> > To: nutch-dev@lucene.apache.org
> > Subject: No buffer space available
> >
> >
> > Hi,
> > I was trying to fetch DMOZ open directory using using the exact
> > example in the nutch tutorial website. So did the following steps:
> > mkdir db mkdir segments bin/nutch admin db -create bin/nutch inject db
> > -dmozfile ../nutch-0.7.1/content.rdf.u8 -subset 3000 bin/nutch
> > generate db segments s1=`ls -d segments/2* | tail -1` echo $s1
> > bin/nutch fetch -showThreadID -noParsing -threads 50 $s1 bin/nutch
> > updatedb db $s1  It starts fetching the pages, but after couple
> > hundred pages it starts giving me this exception:
> > "java.net.SocketException: No buffer space available"
> > Do you have any idea why this might happen? I know it is running out of
> > availabe buffer for new socket, but why the old socket are not closed?
> Even
> > if a fetch fails its socket should be closed and the its buffer should get
> > freed!  I tried both 0.7 and 0.7.1.  Thanks. Nima
> >
> >
>
>
>

RE: No buffer space available

Posted by Fuad Efendi <fu...@efendi.ca>.
I never had such problem with default settings, Suse Linux 9.3, J2SE 5
(default installation of Java 1.5.0_03 from Suse ftp site)...

> at java.net.PlainSocketImpl.socketConnect(Native Method) 

- native method... Seems like OS error... Try some tuning (may be memory not
enough? May be bad NIC driver?)


-----Original Message-----
From: nimakh@gmail.com [mailto:nimakh@gmail.com] 
Sent: Tuesday, October 18, 2005 11:00 PM
To: nutch-dev@lucene.apache.org
Subject: Re: No buffer space available


But I tired it on two different machines, one with Linux Cent OS and the
other one Linux UBUNTU!

On example of the given Exception is like this:

051018 153727 28 fetching http://perso.wanadoo.es/largo/
java.net.SocketException: No buffer space available
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:364)
        at java.net.Socket.connect(Socket.java:507)
        at java.net.Socket.connect(Socket.java:457)
        at java.net.Socket.<init>(Socket.java:365)
        at java.net.Socket.<init>(Socket.java:238)
        at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.c
    reateSocket(DefaultProtocolSocketFactory.java:79)
        at
org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$
    1.doit(ControllerThreadSocketFactory.java:90)
        at
org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$
    SocketTask.run(ControllerThreadSocketFactory.java:157)
        at java.lang.Thread.run(Thread.java:595)
Nima





On 10/18/05, Fuad Efendi <fu...@efendi.ca> wrote:
>
> java.net.SocketException - Thrown to indicate that there is an error 
> in the underlying protocol, such as a TCP error.
>
> "No buffer space available" - message comes from underlying OS...
>
> I think it's not Nutch or configuration of Nutch...
>
> May be OS tuning? May be JVM version/vendor?
>
> I don't know in-depth UNIX, but it has some specific settings for 
> protocol...
>
>
>
> -----Original Message-----
> From: nimakh@gmail.com [mailto:nimakh@gmail.com]
> Sent: Tuesday, October 18, 2005 9:29 PM
> To: nutch-dev@lucene.apache.org
> Subject: No buffer space available
>
>
> Hi,
> I was trying to fetch DMOZ open directory using using the exact 
> example in the nutch tutorial website. So did the following steps:  
> mkdir db mkdir segments bin/nutch admin db -create bin/nutch inject db 
> -dmozfile ../nutch-0.7.1/content.rdf.u8 -subset 3000 bin/nutch 
> generate db segments s1=`ls -d segments/2* | tail -1` echo $s1 
> bin/nutch fetch -showThreadID -noParsing -threads 50 $s1 bin/nutch 
> updatedb db $s1  It starts fetching the pages, but after couple 
> hundred pages it starts giving me this exception:
> "java.net.SocketException: No buffer space available"
> Do you have any idea why this might happen? I know it is running out of
> availabe buffer for new socket, but why the old socket are not closed?
Even
> if a fetch fails its socket should be closed and the its buffer should get
> freed!  I tried both 0.7 and 0.7.1.  Thanks. Nima
>
>



RE: No buffer space available

Posted by Fuad Efendi <fu...@efendi.ca>.
For comparison (in order to locate a problem...) you may try also
http://htmlparser.sourceforge.net/

- it has web-site crawler written in Java.

Also, some Linux-specific staff, web-site crawlers written in C


-----Original Message-----
From: nimakh@gmail.com [mailto:nimakh@gmail.com] 
Sent: Tuesday, October 18, 2005 11:00 PM
To: nutch-dev@lucene.apache.org
Subject: Re: No buffer space available


But I tired it on two different machines, one with Linux Cent OS and the
other one Linux UBUNTU!

On example of the given Exception is like this:

051018 153727 28 fetching http://perso.wanadoo.es/largo/
java.net.SocketException: No buffer space available
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at
java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:364)
        at java.net.Socket.connect(Socket.java:507)
        at java.net.Socket.connect(Socket.java:457)
        at java.net.Socket.<init>(Socket.java:365)
        at java.net.Socket.<init>(Socket.java:238)
        at
org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.c
    reateSocket(DefaultProtocolSocketFactory.java:79)
        at
org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$
    1.doit(ControllerThreadSocketFactory.java:90)
        at
org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$
    SocketTask.run(ControllerThreadSocketFactory.java:157)
        at java.lang.Thread.run(Thread.java:595)
Nima





On 10/18/05, Fuad Efendi <fu...@efendi.ca> wrote:
>
> java.net.SocketException - Thrown to indicate that there is an error 
> in the underlying protocol, such as a TCP error.
>
> "No buffer space available" - message comes from underlying OS...
>
> I think it's not Nutch or configuration of Nutch...
>
> May be OS tuning? May be JVM version/vendor?
>
> I don't know in-depth UNIX, but it has some specific settings for 
> protocol...
>
>
>
> -----Original Message-----
> From: nimakh@gmail.com [mailto:nimakh@gmail.com]
> Sent: Tuesday, October 18, 2005 9:29 PM
> To: nutch-dev@lucene.apache.org
> Subject: No buffer space available
>
>
> Hi,
> I was trying to fetch DMOZ open directory using using the exact 
> example in the nutch tutorial website. So did the following steps:  
> mkdir db mkdir segments bin/nutch admin db -create bin/nutch inject db 
> -dmozfile ../nutch-0.7.1/content.rdf.u8 -subset 3000 bin/nutch 
> generate db segments s1=`ls -d segments/2* | tail -1` echo $s1 
> bin/nutch fetch -showThreadID -noParsing -threads 50 $s1 bin/nutch 
> updatedb db $s1  It starts fetching the pages, but after couple 
> hundred pages it starts giving me this exception:
> "java.net.SocketException: No buffer space available"
> Do you have any idea why this might happen? I know it is running out of
> availabe buffer for new socket, but why the old socket are not closed?
Even
> if a fetch fails its socket should be closed and the its buffer should get
> freed!  I tried both 0.7 and 0.7.1.  Thanks. Nima
>
>



Re: No buffer space available

Posted by ni...@gmail.com.
But I tired it on two different machines, one with Linux Cent OS and
the other one Linux UBUNTU!

On example of the given Exception is like this:

051018 153727 28 fetching http://perso.wanadoo.es/largo/
java.net.SocketException: No buffer space available
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:364)
        at java.net.Socket.connect(Socket.java:507)
        at java.net.Socket.connect(Socket.java:457)
        at java.net.Socket.<init>(Socket.java:365)
        at java.net.Socket.<init>(Socket.java:238)
        at org.apache.commons.httpclient.protocol.DefaultProtocolSocketFactory.c
    reateSocket(DefaultProtocolSocketFactory.java:79)
        at org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$
    1.doit(ControllerThreadSocketFactory.java:90)
        at org.apache.commons.httpclient.protocol.ControllerThreadSocketFactory$
    SocketTask.run(ControllerThreadSocketFactory.java:157)
        at java.lang.Thread.run(Thread.java:595)
Nima





On 10/18/05, Fuad Efendi <fu...@efendi.ca> wrote:
>
> java.net.SocketException - Thrown to indicate that there is an error in the
> underlying protocol, such as a TCP error.
>
> "No buffer space available" - message comes from underlying OS...
>
> I think it's not Nutch or configuration of Nutch...
>
> May be OS tuning? May be JVM version/vendor?
>
> I don't know in-depth UNIX, but it has some specific settings for
> protocol...
>
>
>
> -----Original Message-----
> From: nimakh@gmail.com [mailto:nimakh@gmail.com]
> Sent: Tuesday, October 18, 2005 9:29 PM
> To: nutch-dev@lucene.apache.org
> Subject: No buffer space available
>
>
> Hi,
> I was trying to fetch DMOZ open directory using using the exact example in
> the nutch tutorial website. So did the following steps:  mkdir db mkdir
> segments bin/nutch admin db -create bin/nutch inject db -dmozfile
> ../nutch-0.7.1/content.rdf.u8 -subset 3000 bin/nutch generate db segments
> s1=`ls -d segments/2* | tail -1` echo $s1 bin/nutch fetch -showThreadID
> -noParsing -threads 50 $s1 bin/nutch updatedb db $s1  It starts fetching the
> pages, but after couple hundred pages it starts giving me this exception:
> "java.net.SocketException: No buffer space available"
> Do you have any idea why this might happen? I know it is running out of
> availabe buffer for new socket, but why the old socket are not closed? Even
> if a fetch fails its socket should be closed and the its buffer should get
> freed!  I tried both 0.7 and 0.7.1.  Thanks. Nima
>
>

RE: No buffer space available

Posted by Fuad Efendi <fu...@efendi.ca>.
java.net.SocketException - Thrown to indicate that there is an error in the
underlying protocol, such as a TCP error.

"No buffer space available" - message comes from underlying OS...

I think it's not Nutch or configuration of Nutch...

May be OS tuning? May be JVM version/vendor?

I don't know in-depth UNIX, but it has some specific settings for
protocol...



-----Original Message-----
From: nimakh@gmail.com [mailto:nimakh@gmail.com] 
Sent: Tuesday, October 18, 2005 9:29 PM
To: nutch-dev@lucene.apache.org
Subject: No buffer space available


Hi,
 I was trying to fetch DMOZ open directory using using the exact example in
the nutch tutorial website. So did the following steps:  mkdir db mkdir
segments bin/nutch admin db -create bin/nutch inject db -dmozfile
../nutch-0.7.1/content.rdf.u8 -subset 3000 bin/nutch generate db segments
s1=`ls -d segments/2* | tail -1` echo $s1 bin/nutch fetch -showThreadID
-noParsing -threads 50 $s1 bin/nutch updatedb db $s1  It starts fetching the
pages, but after couple hundred pages it starts giving me this exception:
 "java.net.SocketException: No buffer space available"
 Do you have any idea why this might happen? I know it is running out of
availabe buffer for new socket, but why the old socket are not closed? Even
if a fetch fails its socket should be closed and the its buffer should get
freed!  I tried both 0.7 and 0.7.1.  Thanks. Nima