You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by qu...@webmail.co.za on 2005/06/20 14:08:24 UTC

Nutch (Windows) ?

Hi everyone...

Anyone here running Nutch in a windows environment using
cygwin ? I was wondering if anyone could help. Nutch is
running wonderfully however I'd like to find out if it is
possible to change the system priority of tomcat and/or
cygwin (fetching etc etc). At the moment its using only
about 5-10% of CPU and would like to know if there's anyway
to boost this in windows?

Thanks
quo
_____________________________________________________________________
For super low premiums, click here http://www.dialdirect.co.za/quote

Re: PDFBox (Re: Nutch Lockup/Freeze (Fetcher) - HELP!!)

Posted by Juho Mäkinen <ju...@gmail.com>.
On 6/29/05, Andrzej Bialecki <ab...@getopt.org> wrote:
> Juho Mäkinen wrote:
> > I did some research and I traced the problem to be somewhere inside
> > HttpRequest of protocol-httpclient.
> 
> If you enabled the PDF parser, the version of PDFBox that is currently
> in SVN is known to be broken - for some PDFs a bug in CMap handling can
> ....

I'm not using PDF parser, so that can't be the problem.

 - Juho Mäkinen, http://www.juhonkoti.net

PDFBox (Re: Nutch Lockup/Freeze (Fetcher) - HELP!!)

Posted by Andrzej Bialecki <ab...@getopt.org>.
Juho Mäkinen wrote:
> I did some research and I traced the problem to be somewhere inside
> HttpRequest of protocol-httpclient.

If you enabled the PDF parser, the version of PDFBox that is currently 
in SVN is known to be broken - for some PDFs a bug in CMap handling can 
cause an endless loop. Please download the latest binary from 
http://www.pdfbox.org/dist , and try again.

I didn't commit the latest PDFBox, because it's unreleased yet. As soon 
as there is a new release I'll update the one in our SVN. Until then you 
need to follow the above procedure.

I attached also a simple tool to create fetchlists based on a list of 
arbitrary URLs. This comes handy if you want to test various parts of 
Nutch with arbitrary URLs, not coming from the DB.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Nutch Lockup/Freeze (Fetcher) - HELP!!

Posted by Juho Mäkinen <ju...@gmail.com>.
> > I did some research and I traced the problem to be somewhere inside
> > HttpRequest of protocol-httpclient.
> I had a similar report from someone else, and I'll try to find out what
> is happening. Thanks for this debugging output, it is helpful - if you
> find something else, please let me know.

It seems, that at least in most cases (dunno if in every case) inside
the HttpResponse, in the line
while ((bufferFilled = in.read(buffer, 0, buffer.length)) != -1 &&
tryAndRead > 0) {
read returns just one byte (bufferFilled == 1). Normally it returns
buffer.length, and it also returns full buffers from the same socket,
but for some reason it goes rampage
and starts returning one byte at a time.

I created an ugly workaround by creating a counter, which starts from 10
and degreases every time when bufferFilled == 1. Once the counter
reaches zero, it aborts the read by breaking the inner while loop. This
makes the fetched page to be corrupted, but at least it won't halt
the whole fetch of thousands pages.

 - Juho Mäkinen, http://www.juhonkoti.net

Nutch Maximum urls per domain?

Posted by qu...@webmail.co.za.
Hi there

Is there any way to limit the fetching per domain to a set
number? eg. only 1000 urls for each domain in the
fetchlist?

Any ideas?
_____________________________________________________________________
For super low premiums, click here http://www.dialdirect.co.za/quote

Re: Nutch Lockup/Freeze (Fetcher) - HELP!!

Posted by Andrzej Bialecki <ab...@getopt.org>.
Juho Mäkinen wrote:
> I did some research and I traced the problem to be somewhere inside
> HttpRequest of protocol-httpclient.

I had a similar report from someone else, and I'll try to find out what 
is happening. Thanks for this debugging output, it is helpful - if you 
find something else, please let me know.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Nutch Lockup/Freeze (Fetcher) - HELP!!

Posted by Juho Mäkinen <ju...@gmail.com>.
I did some research and I traced the problem to be somewhere inside
HttpRequest of protocol-httpclient.

I added some System.err.println for debug into the
HttpRequest::HttpRequest constructor:
  public HttpResponse(String orig, URL url) throws IOException {
      System.err.println("started HttpResponse");

      origURL = url;
      origUrl = url.toString();
      url = new URL(url.getProtocol(), "127.0.0.1", url.getFile());
      orig = url.toString();

    this.orig = origUrl;
    this.base = origURL.toString();

    GetMethod get = new GetMethod(url.toString());

   get.setFollowRedirects(false);
    get.setStrictMode(false);
    get.setRequestHeader("User-Agent", Http.AGENT_STRING);
    get.setHttp11(false);
    get.setMethodRetryHandler(null);
    try {
      code = Http.getClient().executeMethod(get);

      System.err.println("6");
      Header[] heads = get.getResponseHeaders();

      for (int i = 0; i < heads.length; i++) {
        headers.put(heads[i].getName(), heads[i].getValue());
      }
      System.err.println("7, " + code);
      if (code == 200) {

      System.err.println("8");
        InputStream in = get.getResponseBodyAsStream();
        byte[] buffer = new byte[Http.BUFFER_SIZE];
      System.err.println("9");
        int bufferFilled = 0;
        int totalRead = 0;
      System.err.println("10");
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        int tryAndRead = calculateTryToRead(totalRead);
      System.err.println("11");
        while ((bufferFilled = in.read(buffer, 0, buffer.length)) !=
-1 && tryAndRead > 0) {
      System.err.println("12, " + bufferFilled);
          totalRead += bufferFilled;
          out.write(buffer, 0, bufferFilled);
          tryAndRead = calculateTryToRead(totalRead);
          System.err.println("12.2");
        }
      System.err.println("13");
        content = out.toByteArray();
        in.close();
      System.err.println("14");
      }
    } catch (org.apache.commons.httpclient.ProtocolException pe) {
      pe.printStackTrace();
      throw new IOException(pe.toString());
    } finally {
      get.releaseConnection();
    }
  }


And here is a snapshot of the output:
050627 141912 fetching http://xxx/yyy/zzz/errors_ids100.html
started HttpResponse
6
7, 200
8
9
10
11
12, 8192
12.2
12, 7880
12.2
050627 141912 Thread[fetcher0,5,fetcher]
050627 141912 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
12, 8192
12.2
12, 8191
12.2
12, 8192
12.2
12, 8191
12.2
12, 8192
12.2
12, 8191
12.2
12, 8192
12.2
13
050627 141913 Thread[fetcher0,5,fetcher]
050627 141913 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
050627 141914 Thread[fetcher0,5,fetcher]
050627 141914 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
050627 141915 Thread[fetcher0,5,fetcher]
050627 141915 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
050627 141916 Thread[fetcher0,5,fetcher]
050627 141916 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
050627 141917 Thread[fetcher0,5,fetcher]
** and looping **


On 6/27/05, Juho Mäkinen <ju...@gmail.com> wrote:
> I turned -logLevel finest on with bin/nutch fetch and I got these few debug
> lines looping for ever when the fetcher freezes, hope this helps:
> 
> 050627 133307 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
> 050627 133308 Thread[fetcher0,5,fetcher]
> 050627 133308 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
> 050627 133309 Thread[fetcher0,5,fetcher]
> 050627 133309 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
> 050627 133310 Thread[fetcher0,5,fetcher]
> 
> 
> I'm using nutch-nightly (nutch-2005-06-19.tar.gz)
> 
>  - Juho Mäkinen, http://www.juhonkoti.net
> 
> On 6/23/05, Andy Liu <an...@gmail.com> wrote:
> > If you have an older version of Nutch you may have the older version
> > of NekoHTML which was causing fetcher threads to lockup.
> >
> > http://issues.apache.org/jira/browse/NUTCH-17
> >
> > On 6/23/05, quovadis@webmail.co.za <qu...@webmail.co.za> wrote:
> > > Hi Andrzej
> > >
> > > Looks like using a newer version eliminates this issue -
> > > I'll get back to you after its completed a few fetches.
> > >
> > >
> > >
> > > On Thu, 23 Jun 2005 11:53:35 +0200
> > >  Andrzej Bialecki <ab...@getopt.org> wrote:
> > > > quovadis@webmail.co.za wrote:
> > > >
> > > > > (LOCKED UP - pressed control-c and got cygwin prompt)
> > > > > Administrator@MACHINE-C /nutch-0.6
> > > >
> > > > LOCKED UP is a very subjective term ;-) Don;t touch
> > > > Ctrl-C, but instead please press Ctrl-Break for a full
> > > > thread dump, copy it and send it here.
> > > >
> > > > Also, the official 0.6 release is quite old, you should
> > > > probably try the newer version (one of the nightly
> > > > builds), and see if the problem persists.
> > > >
> > > > --
> > > > Best regards,
> > > > Andrzej Bialecki     <><
> >
>

Re: Nutch Lockup/Freeze (Fetcher) - HELP!!

Posted by Juho Mäkinen <ju...@gmail.com>.
I turned -logLevel finest on with bin/nutch fetch and I got these few debug
lines looping for ever when the fetcher freezes, hope this helps:

050627 133307 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
050627 133308 Thread[fetcher0,5,fetcher]
050627 133308 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
050627 133309 Thread[fetcher0,5,fetcher]
050627 133309 Thread[MultiThreadedHttpConnectionManager cleanup,5,fetcher]
050627 133310 Thread[fetcher0,5,fetcher]


I'm using nutch-nightly (nutch-2005-06-19.tar.gz)

 - Juho Mäkinen, http://www.juhonkoti.net

On 6/23/05, Andy Liu <an...@gmail.com> wrote:
> If you have an older version of Nutch you may have the older version
> of NekoHTML which was causing fetcher threads to lockup.
> 
> http://issues.apache.org/jira/browse/NUTCH-17
> 
> On 6/23/05, quovadis@webmail.co.za <qu...@webmail.co.za> wrote:
> > Hi Andrzej
> >
> > Looks like using a newer version eliminates this issue -
> > I'll get back to you after its completed a few fetches.
> >
> >
> >
> > On Thu, 23 Jun 2005 11:53:35 +0200
> >  Andrzej Bialecki <ab...@getopt.org> wrote:
> > > quovadis@webmail.co.za wrote:
> > >
> > > > (LOCKED UP - pressed control-c and got cygwin prompt)
> > > > Administrator@MACHINE-C /nutch-0.6
> > >
> > > LOCKED UP is a very subjective term ;-) Don;t touch
> > > Ctrl-C, but instead please press Ctrl-Break for a full
> > > thread dump, copy it and send it here.
> > >
> > > Also, the official 0.6 release is quite old, you should
> > > probably try the newer version (one of the nightly
> > > builds), and see if the problem persists.
> > >
> > > --
> > > Best regards,
> > > Andrzej Bialecki     <><
> > >   ___. ___ ___ ___ _ _
> > >   __________________________________
> > > [__ || __|__/|__||\/|  Information Retrieval, Semantic
> > > Web
> > > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > > http://www.sigram.com  Contact: info at sigram dot com
> > >
> >
> > _____________________________________________________________________
> > For super low premiums, click here http://www.dialdirect.co.za/quote
> >
>

Re: Nutch Lockup/Freeze (Fetcher) - HELP!!

Posted by Andy Liu <an...@gmail.com>.
If you have an older version of Nutch you may have the older version
of NekoHTML which was causing fetcher threads to lockup.

http://issues.apache.org/jira/browse/NUTCH-17

On 6/23/05, quovadis@webmail.co.za <qu...@webmail.co.za> wrote:
> Hi Andrzej
> 
> Looks like using a newer version eliminates this issue -
> I'll get back to you after its completed a few fetches.
> 
> 
> 
> On Thu, 23 Jun 2005 11:53:35 +0200
>  Andrzej Bialecki <ab...@getopt.org> wrote:
> > quovadis@webmail.co.za wrote:
> >
> > > (LOCKED UP - pressed control-c and got cygwin prompt)
> > > Administrator@MACHINE-C /nutch-0.6
> >
> > LOCKED UP is a very subjective term ;-) Don;t touch
> > Ctrl-C, but instead please press Ctrl-Break for a full
> > thread dump, copy it and send it here.
> >
> > Also, the official 0.6 release is quite old, you should
> > probably try the newer version (one of the nightly
> > builds), and see if the problem persists.
> >
> > --
> > Best regards,
> > Andrzej Bialecki     <><
> >   ___. ___ ___ ___ _ _
> >   __________________________________
> > [__ || __|__/|__||\/|  Information Retrieval, Semantic
> > Web
> > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > http://www.sigram.com  Contact: info at sigram dot com
> >
> 
> _____________________________________________________________________
> For super low premiums, click here http://www.dialdirect.co.za/quote
>

Re: Nutch Lockup/Freeze (Fetcher) - HELP!!

Posted by qu...@webmail.co.za.
Hi Andrzej

Looks like using a newer version eliminates this issue -
I'll get back to you after its completed a few fetches.



On Thu, 23 Jun 2005 11:53:35 +0200
 Andrzej Bialecki <ab...@getopt.org> wrote:
> quovadis@webmail.co.za wrote:
> 
> > (LOCKED UP - pressed control-c and got cygwin prompt)
> > Administrator@MACHINE-C /nutch-0.6
> 
> LOCKED UP is a very subjective term ;-) Don;t touch
> Ctrl-C, but instead please press Ctrl-Break for a full
> thread dump, copy it and send it here.
> 
> Also, the official 0.6 release is quite old, you should
> probably try the newer version (one of the nightly
> builds), and see if the problem persists.
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _
>   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic
> Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 

_____________________________________________________________________
For super low premiums, click here http://www.dialdirect.co.za/quote

Re: Nutch Lockup/Freeze (Fetcher) - HELP!!

Posted by Andrzej Bialecki <ab...@getopt.org>.
quovadis@webmail.co.za wrote:

> (LOCKED UP - pressed control-c and got cygwin prompt)
> Administrator@MACHINE-C /nutch-0.6

LOCKED UP is a very subjective term ;-) Don;t touch Ctrl-C, but instead 
please press Ctrl-Break for a full thread dump, copy it and send it here.

Also, the official 0.6 release is quite old, you should probably try the 
newer version (one of the nightly builds), and see if the problem persists.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Nutch Lockup/Freeze (Fetcher) - HELP!!

Posted by qu...@webmail.co.za.
Windows 2000 Standard Server 4GB Ram
J2SE v1.4.2_08 (java_opts: -ms 512mb ram, xmx 1024mb ram)
Nutch v0.6  (All components standard incl. fetcher)

The fetcher does its usual thing the interesting thing is
that it usually locks up at only the end (or near the end)
of a segment which is fetched - be it 100, 1000 or 10000
urls. If i set the fetcher threads to 5 then it works but
obviously is slow - if i set it to 100 it flies but locks
up randomly near the end of fetching a segments.

The last 3 lines before fetcher dies:
050623 114407 fetch of http://www.veza.co.za/ failed with:
net.nutch.protocol.http.HttpException:
java.net.SocketTimeoutException: Read timed out
050623 114410 fetch of http://www.visserinc.co.za/ failed
with: net.nutch.protocol.http.HttpException:
java.net.SocketTimeoutException: Read timed out
050623 114414 fetch of http://www.webdesigner.co.za/ failed
with: net.nutch.protocol.http.HttpException:
java.net.SocketTimeoutException: Read timed out
(LOCKED UP - pressed control-c and got cygwin prompt)
Administrator@MACHINE-C /nutch-0.6
$

Any other information u need?
Thanks in advance

On Thu, 23 Jun 2005 11:16:12 +0200
 Andrzej Bialecki <ab...@getopt.org> wrote:
> quovadis@webmail.co.za wrote:
> > Most of the time the following error occurs near the
> end
> > just before the fetcher freezes/locks up:
> > java.net.SocketTimeoutException: Read timed out
> > 
> > Any1 have any ideas?
> 
> Please do a full thread dump (Ctrl-E on Unix, Ctrl-Break
> on Windows), and also provide us with more details about
> your environment (nutch version / revision, which HTTP
> plugin, OS and JDK version). If there are stacktraces at
> the end, please send them too.
> 
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _
>   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic
> Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 

_____________________________________________________________________
For super low premiums, click here http://www.dialdirect.co.za/quote

Re: Nutch Lockup/Freeze (Fetcher) - HELP!!

Posted by Andrzej Bialecki <ab...@getopt.org>.
quovadis@webmail.co.za wrote:
> Most of the time the following error occurs near the end
> just before the fetcher freezes/locks up:
> java.net.SocketTimeoutException: Read timed out
> 
> Any1 have any ideas?

Please do a full thread dump (Ctrl-E on Unix, Ctrl-Break on Windows), 
and also provide us with more details about your environment (nutch 
version / revision, which HTTP plugin, OS and JDK version). If there are 
stacktraces at the end, please send them too.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Nutch Lockup/Freeze (Fetcher) - HELP!!

Posted by qu...@webmail.co.za.
Most of the time the following error occurs near the end
just before the fetcher freezes/locks up:
java.net.SocketTimeoutException: Read timed out

Any1 have any ideas?


On Thu, 23 Jun 2005 09:40:29 +0300
 Juho Mäkinen <ju...@gmail.com> wrote:
> I have also notices similar problems here. I'm running
> only one fetching thread
> and the fetchg randomly stops for some reason. I once
> managed
> to open the lock by restarting the apache server which
> the fetcher
> was crawling, but that's just once :(
> 
> I also don't see any problems with dns queries, so your
> idea didn't work
> here either. It's strance, because nutch should have a
> socket
> timeout, which works in most cases, but not in these
> freezings.
> I'm still looking and studying what could cause this.
> 
>  - Juho Mäkinen, http://www.juhonkoti.net
> 
> On 6/23/05, Sami Siren <s....@sonera.inet.fi> wrote:
> > I have experienced similar random freezing in fetcher
> but after setting
> > up a local caching dns these problems went away.
> > 
> > At least in my case the problem was due to connectivity
> to (some) remote
> > name servers. You can verify if this is your problem by
> doing something
> > like netstat -na|grep ":53 " while you suspect to have
> a frozen fetch
> > and look for connections that will not go away.
> > 
> > --
> >   Sami Siren
> > 
> > quovadis@webmail.co.za wrote:
> > > Anyone experiencing freezes when fetching with 50
> threads ?
> > > If I use 5 threads everything is fine - if i raise it
> to 10
> > > it freezes and random times when fetching a segment.
> > >
> > > Any ideas?
> > >
>
_____________________________________________________________________
> > > For super low premiums, click here
> http://www.dialdirect.co.za/quote
> > >
> > 
> >

_____________________________________________________________________
For super low premiums, click here http://www.dialdirect.co.za/quote

Re: Nutch Lockup/Freeze (Fetcher)

Posted by Juho Mäkinen <ju...@gmail.com>.
I have also notices similar problems here. I'm running only one fetching thread
and the fetchg randomly stops for some reason. I once managed
to open the lock by restarting the apache server which the fetcher
was crawling, but that's just once :(

I also don't see any problems with dns queries, so your idea didn't work
here either. It's strance, because nutch should have a socket
timeout, which works in most cases, but not in these freezings.
I'm still looking and studying what could cause this.

 - Juho Mäkinen, http://www.juhonkoti.net

On 6/23/05, Sami Siren <s....@sonera.inet.fi> wrote:
> I have experienced similar random freezing in fetcher but after setting
> up a local caching dns these problems went away.
> 
> At least in my case the problem was due to connectivity to (some) remote
> name servers. You can verify if this is your problem by doing something
> like netstat -na|grep ":53 " while you suspect to have a frozen fetch
> and look for connections that will not go away.
> 
> --
>   Sami Siren
> 
> quovadis@webmail.co.za wrote:
> > Anyone experiencing freezes when fetching with 50 threads ?
> > If I use 5 threads everything is fine - if i raise it to 10
> > it freezes and random times when fetching a segment.
> >
> > Any ideas?
> > _____________________________________________________________________
> > For super low premiums, click here http://www.dialdirect.co.za/quote
> >
> 
>

Re: Nutch Lockup/Freeze (Fetcher)

Posted by Sami Siren <s....@sonera.inet.fi>.
I have experienced similar random freezing in fetcher but after setting
up a local caching dns these problems went away.

At least in my case the problem was due to connectivity to (some) remote 
name servers. You can verify if this is your problem by doing something 
like netstat -na|grep ":53 " while you suspect to have a frozen fetch 
and look for connections that will not go away.

--
  Sami Siren

quovadis@webmail.co.za wrote:
> Anyone experiencing freezes when fetching with 50 threads ?
> If I use 5 threads everything is fine - if i raise it to 10
> it freezes and random times when fetching a segment.
> 
> Any ideas?
> _____________________________________________________________________
> For super low premiums, click here http://www.dialdirect.co.za/quote
> 


Nutch Lockup/Freeze (Fetcher)

Posted by qu...@webmail.co.za.
Anyone experiencing freezes when fetching with 50 threads ?
If I use 5 threads everything is fine - if i raise it to 10
it freezes and random times when fetching a segment.

Any ideas?
_____________________________________________________________________
For super low premiums, click here http://www.dialdirect.co.za/quote

Re: Nutch Lockup/Free after fetching (I think)

Posted by Sébastien LE CALLONNEC <sl...@yahoo.ie>.
Oh well...  I certainly had the problem with IBM JVM running on AIX. 
Reducing the number of threads did the trick but it sure isn't a proper
answer to it.

Sorry for not being of any help there.

Regards,
Sebastien.


--- quovadis@webmail.co.za a écrit :

> Hi Sebastien
> 
> It's Sun's JAVA 2 SE v1.4.2_08
> 
> On Wed, 22 Jun 2005 15:01:34 +0200 (CEST)
>  Sébastien LE CALLONNEC <sl...@yahoo.ie> wrote:
> > Hi quovadis, 
> > 
> > 
> > I am sorry, I don't have the solution to your problem,
> > but I have
> > encountered it before (and never came around it).  Tell
> > me: which JVM
> > do you use?  Wouldn't that be IBM's?
> > 
> > Regards,
> > Sebastien.
> > 
> > 
> > --- quovadis@webmail.co.za a écrit :
> > 
> > > Hi again
> > > 
> > > I have a question in respect of nutch when fetching a
> > > segment. The problem is that it works fine when I have
> > 5
> > > threads configured however as soon as I go above this
> > it
> > > seems that when fetching a segement of 10000 urls when
> > it
> > > reaches the end it just pauses and "locks up". The
> > > processor usage is 0% once this happens. If I CTRL-C
> > then i
> > > can return to the prompt obviously.
> > > 
> > > Any ideas?
> > > 
> > >
> >
> _____________________________________________________________________
> > > For super low premiums, click here
> > http://www.dialdirect.co.za/quote
> > > 
> > 
> > 
> > 
> > 	
> > 
> > 	
> > 		
> >
>
___________________________________________________________________________
> > 
> > Appel audio GRATUIT partout dans le monde avec le nouveau
> > Yahoo! Messenger 
> > Téléchargez cette version sur
> > http://fr.messenger.yahoo.com
> 
> _____________________________________________________________________
> For super low premiums, click here http://www.dialdirect.co.za/quote
> 



	

	
		
___________________________________________________________________________ 
Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger 
Téléchargez cette version sur http://fr.messenger.yahoo.com

Re: Nutch Lockup/Free after fetching (I think)

Posted by qu...@webmail.co.za.
Hi Sebastien

It's Sun's JAVA 2 SE v1.4.2_08

On Wed, 22 Jun 2005 15:01:34 +0200 (CEST)
 Sébastien LE CALLONNEC <sl...@yahoo.ie> wrote:
> Hi quovadis, 
> 
> 
> I am sorry, I don't have the solution to your problem,
> but I have
> encountered it before (and never came around it).  Tell
> me: which JVM
> do you use?  Wouldn't that be IBM's?
> 
> Regards,
> Sebastien.
> 
> 
> --- quovadis@webmail.co.za a écrit :
> 
> > Hi again
> > 
> > I have a question in respect of nutch when fetching a
> > segment. The problem is that it works fine when I have
> 5
> > threads configured however as soon as I go above this
> it
> > seems that when fetching a segement of 10000 urls when
> it
> > reaches the end it just pauses and "locks up". The
> > processor usage is 0% once this happens. If I CTRL-C
> then i
> > can return to the prompt obviously.
> > 
> > Any ideas?
> > 
> >
>
_____________________________________________________________________
> > For super low premiums, click here
> http://www.dialdirect.co.za/quote
> > 
> 
> 
> 
> 	
> 
> 	
> 		
>
___________________________________________________________________________
> 
> Appel audio GRATUIT partout dans le monde avec le nouveau
> Yahoo! Messenger 
> Téléchargez cette version sur
> http://fr.messenger.yahoo.com

_____________________________________________________________________
For super low premiums, click here http://www.dialdirect.co.za/quote

RE: Nutch Lockup/Free after fetching (I think)

Posted by Sébastien LE CALLONNEC <sl...@yahoo.ie>.
Hi quovadis, 


I am sorry, I don't have the solution to your problem, but I have
encountered it before (and never came around it).  Tell me: which JVM
do you use?  Wouldn't that be IBM's?

Regards,
Sebastien.


--- quovadis@webmail.co.za a écrit :

> Hi again
> 
> I have a question in respect of nutch when fetching a
> segment. The problem is that it works fine when I have 5
> threads configured however as soon as I go above this it
> seems that when fetching a segement of 10000 urls when it
> reaches the end it just pauses and "locks up". The
> processor usage is 0% once this happens. If I CTRL-C then i
> can return to the prompt obviously.
> 
> Any ideas?
> 
> _____________________________________________________________________
> For super low premiums, click here http://www.dialdirect.co.za/quote
> 



	

	
		
___________________________________________________________________________ 
Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger 
Téléchargez cette version sur http://fr.messenger.yahoo.com

Nutch Lockup/Free after fetching (I think)

Posted by qu...@webmail.co.za.
Hi again

I have a question in respect of nutch when fetching a
segment. The problem is that it works fine when I have 5
threads configured however as soon as I go above this it
seems that when fetching a segement of 10000 urls when it
reaches the end it just pauses and "locks up". The
processor usage is 0% once this happens. If I CTRL-C then i
can return to the prompt obviously.

Any ideas?

_____________________________________________________________________
For super low premiums, click here http://www.dialdirect.co.za/quote

Re: Nutch (Windows) - Out of Memory

Posted by qu...@webmail.co.za.
I changed the environment variables and it works like a
bomb! Thanks

On Tue, 21 Jun 2005 14:41:10 +0200
 "Andre Schild" <a....@aarboard.ch> wrote:
> > Hi Andre
> > 
> > Where do I specify this?
> 
> What servlet engine (and version) do you run ?
> 
> With tomcat 5.5.x it's for example the environment
> JAVA_OPTS  
> where you must specify the -Xms= and -Xmx
> 
> André
> 
> > 
> > Thanks
> > 
> > On Tue, 21 Jun 2005 14:33:30 +0200
> >  "Andre Schild" <a....@aarboard.ch> wrote:
> > > Did you specify more memory for the JVM where the
> servlet
> > > engine runs ?
> > > 
> > > Default JVM is arround 64MB
> > > 
> > > André
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: quovadis@webmail.co.za
> > > [mailto:quovadis@webmail.co.za] 
> > > > Sent: Tuesday, June 21, 2005 1:30 PM
> > > > To: nutch-user@incubator.apache.org; Andy Liu
> > > > Subject: Nutch (Windows) - Out of Memory
> > > > 
> > > > 
> > > > I'm running everything at the moment on one windows
> box
> > > > using cygwin etc etc as per installation
> instructions.
> > > Ive
> > > > got +- 1,000,000 pages indexed and my problem is
> this:
> > > > every now and again when i do a search i get the
> error
> > > "Out
> > > > of memory" (or the results page is blank / no text)
> -
> > > this
> > > > is usually when the indexer is doing something
> which is
> > > > taking time. I have plenty of memory available to
> the
> > > O/S -
> > > > do i perhaps have to change a settings somewhere?
> > > > 
> > > > Thanks
> > > > 
> > > > On Mon, 20 Jun 2005 09:59:41 -0400
> > > >  Andy Liu <an...@gmail.com> wrote:
> > > > > Most Nutch operations are I/O bound, so raising
> CPU
> > > > > utilization
> > > > > wouldn't help overall performance.  For fetching
> you
> > > can
> > > > > configure the
> > > > > amount of fetcher threads to use which may help,
> but
> > > > > you'll probably
> > > > > run out of bandwidth before you max out your CPU.
> > > > > 
> > > > > Since indexing is single-threaded I've been able
> to
> > > run
> > > > > multiple index
> > > > > processes concurrently, which speeds things up a
> bit.
> > > > > 
> > > > > Andy
> > > > > 
> > > > > On 6/20/05, quovadis@webmail.co.za
> > > > > <qu...@webmail.co.za> wrote:
> > > > > > Hi everyone...
> > > > > > 
> > > > > > Anyone here running Nutch in a windows
> environment
> > > > > using
> > > > > > cygwin ? I was wondering if anyone could help.
> > > Nutch is
> > > > > > running wonderfully however I'd like to find
> out if
> > > it
> > > > > is
> > > > > > possible to change the system priority of
> tomcat
> > > and/or
> > > > > > cygwin (fetching etc etc). At the moment its
> using
> > > only
> > > > > > about 5-10% of CPU and would like to know if
> > > there's
> > > > > anyway
> > > > > > to boost this in windows?
> > > > > > 
> > > > > > Thanks
> > > > > > quo
> > > > > >
> > > > >
> > > >
> > >
> >
>
_____________________________________________________________________
> > > > > > For super low premiums, click here
> > > > > http://www.dialdirect.co.za/quote
> > > > > >
> > > > 
> > > >
> > >
> >
>
_____________________________________________________________________
> > > > For super low premiums, click here
> > > http://www.dialdirect.co.za/quote
> > > > 
> > > 
> > > 
> > 
> >
>
_____________________________________________________________________
> > For super low premiums, click here
> http://www.dialdirect.co.za/quote
> > 
> 
> 

_____________________________________________________________________
For super low premiums, click here http://www.dialdirect.co.za/quote

RE: Nutch (Windows) - Out of Memory

Posted by Andre Schild <a....@aarboard.ch>.
> Hi Andre
> 
> Where do I specify this?

What servlet engine (and version) do you run ?

With tomcat 5.5.x it's for example the environment JAVA_OPTS  
where you must specify the -Xms= and -Xmx

André

> 
> Thanks
> 
> On Tue, 21 Jun 2005 14:33:30 +0200
>  "Andre Schild" <a....@aarboard.ch> wrote:
> > Did you specify more memory for the JVM where the servlet
> > engine runs ?
> > 
> > Default JVM is arround 64MB
> > 
> > André
> > 
> > 
> > > -----Original Message-----
> > > From: quovadis@webmail.co.za
> > [mailto:quovadis@webmail.co.za] 
> > > Sent: Tuesday, June 21, 2005 1:30 PM
> > > To: nutch-user@incubator.apache.org; Andy Liu
> > > Subject: Nutch (Windows) - Out of Memory
> > > 
> > > 
> > > I'm running everything at the moment on one windows box
> > > using cygwin etc etc as per installation instructions.
> > Ive
> > > got +- 1,000,000 pages indexed and my problem is this:
> > > every now and again when i do a search i get the error
> > "Out
> > > of memory" (or the results page is blank / no text) -
> > this
> > > is usually when the indexer is doing something which is
> > > taking time. I have plenty of memory available to the
> > O/S -
> > > do i perhaps have to change a settings somewhere?
> > > 
> > > Thanks
> > > 
> > > On Mon, 20 Jun 2005 09:59:41 -0400
> > >  Andy Liu <an...@gmail.com> wrote:
> > > > Most Nutch operations are I/O bound, so raising CPU
> > > > utilization
> > > > wouldn't help overall performance.  For fetching you
> > can
> > > > configure the
> > > > amount of fetcher threads to use which may help, but
> > > > you'll probably
> > > > run out of bandwidth before you max out your CPU.
> > > > 
> > > > Since indexing is single-threaded I've been able to
> > run
> > > > multiple index
> > > > processes concurrently, which speeds things up a bit.
> > > > 
> > > > Andy
> > > > 
> > > > On 6/20/05, quovadis@webmail.co.za
> > > > <qu...@webmail.co.za> wrote:
> > > > > Hi everyone...
> > > > > 
> > > > > Anyone here running Nutch in a windows environment
> > > > using
> > > > > cygwin ? I was wondering if anyone could help.
> > Nutch is
> > > > > running wonderfully however I'd like to find out if
> > it
> > > > is
> > > > > possible to change the system priority of tomcat
> > and/or
> > > > > cygwin (fetching etc etc). At the moment its using
> > only
> > > > > about 5-10% of CPU and would like to know if
> > there's
> > > > anyway
> > > > > to boost this in windows?
> > > > > 
> > > > > Thanks
> > > > > quo
> > > > >
> > > >
> > >
> >
> _____________________________________________________________________
> > > > > For super low premiums, click here
> > > > http://www.dialdirect.co.za/quote
> > > > >
> > > 
> > >
> >
> _____________________________________________________________________
> > > For super low premiums, click here
> > http://www.dialdirect.co.za/quote
> > > 
> > 
> > 
> 
> _____________________________________________________________________
> For super low premiums, click here http://www.dialdirect.co.za/quote
> 



Re: Nutch (Windows) - Out of Memory

Posted by qu...@webmail.co.za.
Hi Andre

Where do I specify this?

Thanks

On Tue, 21 Jun 2005 14:33:30 +0200
 "Andre Schild" <a....@aarboard.ch> wrote:
> Did you specify more memory for the JVM where the servlet
> engine runs ?
> 
> Default JVM is arround 64MB
> 
> André
> 
> 
> > -----Original Message-----
> > From: quovadis@webmail.co.za
> [mailto:quovadis@webmail.co.za] 
> > Sent: Tuesday, June 21, 2005 1:30 PM
> > To: nutch-user@incubator.apache.org; Andy Liu
> > Subject: Nutch (Windows) - Out of Memory
> > 
> > 
> > I'm running everything at the moment on one windows box
> > using cygwin etc etc as per installation instructions.
> Ive
> > got +- 1,000,000 pages indexed and my problem is this:
> > every now and again when i do a search i get the error
> "Out
> > of memory" (or the results page is blank / no text) -
> this
> > is usually when the indexer is doing something which is
> > taking time. I have plenty of memory available to the
> O/S -
> > do i perhaps have to change a settings somewhere?
> > 
> > Thanks
> > 
> > On Mon, 20 Jun 2005 09:59:41 -0400
> >  Andy Liu <an...@gmail.com> wrote:
> > > Most Nutch operations are I/O bound, so raising CPU
> > > utilization
> > > wouldn't help overall performance.  For fetching you
> can
> > > configure the
> > > amount of fetcher threads to use which may help, but
> > > you'll probably
> > > run out of bandwidth before you max out your CPU.
> > > 
> > > Since indexing is single-threaded I've been able to
> run
> > > multiple index
> > > processes concurrently, which speeds things up a bit.
> > > 
> > > Andy
> > > 
> > > On 6/20/05, quovadis@webmail.co.za
> > > <qu...@webmail.co.za> wrote:
> > > > Hi everyone...
> > > > 
> > > > Anyone here running Nutch in a windows environment
> > > using
> > > > cygwin ? I was wondering if anyone could help.
> Nutch is
> > > > running wonderfully however I'd like to find out if
> it
> > > is
> > > > possible to change the system priority of tomcat
> and/or
> > > > cygwin (fetching etc etc). At the moment its using
> only
> > > > about 5-10% of CPU and would like to know if
> there's
> > > anyway
> > > > to boost this in windows?
> > > > 
> > > > Thanks
> > > > quo
> > > >
> > >
> >
>
_____________________________________________________________________
> > > > For super low premiums, click here
> > > http://www.dialdirect.co.za/quote
> > > >
> > 
> >
>
_____________________________________________________________________
> > For super low premiums, click here
> http://www.dialdirect.co.za/quote
> > 
> 
> 

_____________________________________________________________________
For super low premiums, click here http://www.dialdirect.co.za/quote

RE: Nutch (Windows) - Out of Memory

Posted by Andre Schild <a....@aarboard.ch>.
Did you specify more memory for the JVM where the servlet engine runs ?

Default JVM is arround 64MB

André


> -----Original Message-----
> From: quovadis@webmail.co.za [mailto:quovadis@webmail.co.za] 
> Sent: Tuesday, June 21, 2005 1:30 PM
> To: nutch-user@incubator.apache.org; Andy Liu
> Subject: Nutch (Windows) - Out of Memory
> 
> 
> I'm running everything at the moment on one windows box
> using cygwin etc etc as per installation instructions. Ive
> got +- 1,000,000 pages indexed and my problem is this:
> every now and again when i do a search i get the error "Out
> of memory" (or the results page is blank / no text) - this
> is usually when the indexer is doing something which is
> taking time. I have plenty of memory available to the O/S -
> do i perhaps have to change a settings somewhere?
> 
> Thanks
> 
> On Mon, 20 Jun 2005 09:59:41 -0400
>  Andy Liu <an...@gmail.com> wrote:
> > Most Nutch operations are I/O bound, so raising CPU
> > utilization
> > wouldn't help overall performance.  For fetching you can
> > configure the
> > amount of fetcher threads to use which may help, but
> > you'll probably
> > run out of bandwidth before you max out your CPU.
> > 
> > Since indexing is single-threaded I've been able to run
> > multiple index
> > processes concurrently, which speeds things up a bit.
> > 
> > Andy
> > 
> > On 6/20/05, quovadis@webmail.co.za
> > <qu...@webmail.co.za> wrote:
> > > Hi everyone...
> > > 
> > > Anyone here running Nutch in a windows environment
> > using
> > > cygwin ? I was wondering if anyone could help. Nutch is
> > > running wonderfully however I'd like to find out if it
> > is
> > > possible to change the system priority of tomcat and/or
> > > cygwin (fetching etc etc). At the moment its using only
> > > about 5-10% of CPU and would like to know if there's
> > anyway
> > > to boost this in windows?
> > > 
> > > Thanks
> > > quo
> > >
> >
> _____________________________________________________________________
> > > For super low premiums, click here
> > http://www.dialdirect.co.za/quote
> > >
> 
> _____________________________________________________________________
> For super low premiums, click here http://www.dialdirect.co.za/quote
> 



Nutch (Windows) - Out of Memory

Posted by qu...@webmail.co.za.
I'm running everything at the moment on one windows box
using cygwin etc etc as per installation instructions. Ive
got +- 1,000,000 pages indexed and my problem is this:
every now and again when i do a search i get the error "Out
of memory" (or the results page is blank / no text) - this
is usually when the indexer is doing something which is
taking time. I have plenty of memory available to the O/S -
do i perhaps have to change a settings somewhere?

Thanks

On Mon, 20 Jun 2005 09:59:41 -0400
 Andy Liu <an...@gmail.com> wrote:
> Most Nutch operations are I/O bound, so raising CPU
> utilization
> wouldn't help overall performance.  For fetching you can
> configure the
> amount of fetcher threads to use which may help, but
> you'll probably
> run out of bandwidth before you max out your CPU.
> 
> Since indexing is single-threaded I've been able to run
> multiple index
> processes concurrently, which speeds things up a bit.
> 
> Andy
> 
> On 6/20/05, quovadis@webmail.co.za
> <qu...@webmail.co.za> wrote:
> > Hi everyone...
> > 
> > Anyone here running Nutch in a windows environment
> using
> > cygwin ? I was wondering if anyone could help. Nutch is
> > running wonderfully however I'd like to find out if it
> is
> > possible to change the system priority of tomcat and/or
> > cygwin (fetching etc etc). At the moment its using only
> > about 5-10% of CPU and would like to know if there's
> anyway
> > to boost this in windows?
> > 
> > Thanks
> > quo
> >
>
_____________________________________________________________________
> > For super low premiums, click here
> http://www.dialdirect.co.za/quote
> >

_____________________________________________________________________
For super low premiums, click here http://www.dialdirect.co.za/quote

Re: Nutch (Windows) ?

Posted by Andy Liu <an...@gmail.com>.
Most Nutch operations are I/O bound, so raising CPU utilization
wouldn't help overall performance.  For fetching you can configure the
amount of fetcher threads to use which may help, but you'll probably
run out of bandwidth before you max out your CPU.

Since indexing is single-threaded I've been able to run multiple index
processes concurrently, which speeds things up a bit.

Andy

On 6/20/05, quovadis@webmail.co.za <qu...@webmail.co.za> wrote:
> Hi everyone...
> 
> Anyone here running Nutch in a windows environment using
> cygwin ? I was wondering if anyone could help. Nutch is
> running wonderfully however I'd like to find out if it is
> possible to change the system priority of tomcat and/or
> cygwin (fetching etc etc). At the moment its using only
> about 5-10% of CPU and would like to know if there's anyway
> to boost this in windows?
> 
> Thanks
> quo
> _____________________________________________________________________
> For super low premiums, click here http://www.dialdirect.co.za/quote
>