You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Rod Taylor <rb...@sitesell.com> on 2005/10/03 21:59:55 UTC

mapred Sort Progress Reports

I have high load, but it seems that the percentage progress progress
during the reduce > sort phase of fetch (parse?) is not increasing which
makes it appear as if nothing is happening (stuck at 0.5, or 50%).

Full logs available upon request.
-- 
Rod Taylor <rb...@sitesell.com>


Re: mapred Sort Progress Reports

Posted by Rod Taylor <rb...@sitesell.com>.
For comparison, here are some IO numbers while reduce > reduce is
running:

        avg-cpu:  %user   %nice    %sys %iowait   %idle
                  72.22    0.00    5.67   14.85    7.25
        
        Device:            tps    kB_read/s    kB_wrtn/s    kB_read
        kB_wrtn
        sda             473.77     12363.16     13399.40     123508
        133860
        
        avg-cpu:  %user   %nice    %sys %iowait   %idle
                  73.72    0.00   10.73    9.60    5.95
        
        Device:            tps    kB_read/s    kB_wrtn/s    kB_read
        kB_wrtn
        sda             445.11      3380.44     23845.91      33872
        238936
        

On Mon, 2005-10-03 at 14:11 -0700, Doug Cutting wrote:
> Rod Taylor wrote:
> > I see. Is there any way to speed up this phase? It seems to be taking as
> > long to run the sort phase as it did to download the data.
> > 
> > It would appear that nearly 30% of the time for the nutch fetch segment
> > is spent doing the sorts, so I'm well off the 20% overhead number you
> > seem to be able to achieve for a full cycle.
> > 
> > 5 machines (4CPU) each with 8 tasks with a load average is about 5 and
> > they run Redhat. Context switches are low (under 1500/second). There is
> > virtually no IO (boxes have plenty of ram) but the kernel is doing a
> > bunch of work as 50% of CPU time is in system (unsure what, I'm not
> > familiar with the Linux DTrace type tools).
> 
> Sorting is usually i/o bound on mapred.local.dir.  When eight tasks are 
> using the same device this could become a bottleneck.  Use iostat or sar 
> to view disk i/o statistics.
> 
> My plan is to permit one to specify a list of directories for 
> mapred.local.dir and have the sorting (and everything else) select 
> randomly among these for temporary local files.  That way all devices 
> can be used in parallel.
> 
> As a workaround you could try starting eight tasktrackers, each 
> configured with a different device for mapred.local.dir.  Yes, that's a 
> pain, but it would give us an idea of whether my analysis is correct.
> 
> Doug
> 
-- 
Rod Taylor <rb...@sitesell.com>


Re: mapred Sort Progress Reports

Posted by Doug Cutting <cu...@nutch.org>.
Rod Taylor wrote:
> Thanks for your patience. Is there a "best practises" page in the Wiki
> that I could contribute this insight to?

If you can't find one, please add one.  Your installation is somewhat 
atypical, 4-cpu nodes with NAS.  More typical is one- or two- cpu nodes 
with between one and four IDE drives: cheap PCs.

Doug

Re: mapred Sort Progress Reports

Posted by Rod Taylor <rb...@sitesell.com>.
On Mon, 2005-10-03 at 15:33 -0700, Doug Cutting wrote:
> Rod Taylor wrote:
> > Virtually no IO reported at all.  Averages about 200kB/sec read and
> > writes are usually 0, but burst to 120MB/sec for under 1 second once
> > every 30 seconds or so.
> 
> That's strange.  I wonder what it's doing.  Can you use 'kill -QUIT' to 
> get a thread dump?  Try a few of these to sample the stack and see where 
> it seems to be spending time.

The stacktrace is running FileOutputStream.writeBytes in the
LocalFileSystem.write function, so it is definitely IO bound.

I had the admins change a number of settings on a couple of machines
(ext3 fs to journal=writeback, noatime, etc.) and those machines have
completed their work in a much more reasonable timeframe. I guess we can
chalk it up to a lot of head movement

I guess bumping access times and things for all of the .out files had a
large impact.


Thanks for your patience. Is there a "best practises" page in the Wiki
that I could contribute this insight to?

-- 
Rod Taylor <rb...@sitesell.com>


Re: mapred Sort Progress Reports

Posted by Doug Cutting <cu...@nutch.org>.
Rod Taylor wrote:
> Virtually no IO reported at all.  Averages about 200kB/sec read and
> writes are usually 0, but burst to 120MB/sec for under 1 second once
> every 30 seconds or so.

That's strange.  I wonder what it's doing.  Can you use 'kill -QUIT' to 
get a thread dump?  Try a few of these to sample the stack and see where 
it seems to be spending time.

Doug

Re: mapred Sort Progress Reports

Posted by Rod Taylor <rb...@sitesell.com>.
On Mon, 2005-10-03 at 15:57 -0700, Doug Cutting wrote:
> Try the following on your system:
> 
> bin/nutch org.apache.nutch.io.TestSequenceFile -fast -count 20000000 
> -megabytes 100 foo
> 
> Tell me how it behaves during the sort phase.

We changed interpreters on some machines to see if that would help.

Sun 1.5.0:
        3MB to 5MB/sec steady writes. Takes 9.5 minutes with a 100MB
        file and 8.5 minutes with a 500MB file.

The IBM 1.4.2 interpreter I was using at with the initial repeatably
receives these errors, although I cannot find them in the logs. Perhaps
the nutch catches them and retries?

051004 105012 running merge pass=1
Exception in thread "main" java.io.IOException: Cannot seek after EOF
        at org.apache.nutch.ndfs.NDFSClient
$NDFSInputStream.seek(NDFSClient.java:445)
        at org.apache.nutch.fs.NFSDataInputStream
$PositionCache.seek(NFSDataInputStream.java:43)
        at org.apache.nutch.fs.NFSDataInputStream
$Buffer.seek(NFSDataInputStream.java:68)
        at
org.apache.nutch.fs.NFSDataInputStream.seek(NFSDataInputStream.java:95)
        at org.apache.nutch.io.SequenceFile$Sorter
$MergePass.run(SequenceFile.java:720)
        at org.apache.nutch.io.SequenceFile
$Sorter.mergePass(SequenceFile.java:664)
        at org.apache.nutch.io.SequenceFile
$Sorter.sort(SequenceFile.java:490)
        at
org.apache.nutch.io.TestSequenceFile.sortTest(TestSequenceFile.java:112)
        at
org.apache.nutch.io.TestSequenceFile.main(TestSequenceFile.java:272)


-- 
Rod Taylor <rb...@sitesell.com>


Re: mapred Sort Progress Reports

Posted by Rod Taylor <rb...@sitesell.com>.
On Tue, 2005-10-04 at 09:52 -0700, Doug Cutting wrote:
> Rod Taylor wrote:
> >>Tell me how it behaves during the sort phase.
> > 
> > I ran 8 jobs simultaneously. Very high await time (1200) and it was
> > doing about 22MB/sec data writes. Nearly 0 reads from disk (everything
> > would be cached in memory).
> 
> This is during the sort part?  This first writes a big file, then reads 
> it, then sorts it.  With 20M records I think the file is around 2.5GB, 
> so eight of these would be 20GB.  Do you have 20GB of RAM?

I split the 20M record chunk into 8 jobs of 2.5M records each (better
replicates what I had earlier -- a 10M page segment with up to 8 tasks
on the machine).

The boxes were using have 4GB ram and the NAS has 2GB ram.

Yes, If I tried dealing with 160M records at a time I most certainly
would run out of IO being as there is only a single SCSI drive.

-- 
Rod Taylor <rb...@sitesell.com>


Re: mapred Sort Progress Reports

Posted by Rod Taylor <rb...@sitesell.com>.
On Tue, 2005-10-04 at 09:52 -0700, Doug Cutting wrote:
> Rod Taylor wrote:
> >>Tell me how it behaves during the sort phase.
> > 
> > I ran 8 jobs simultaneously. Very high await time (1200) and it was
> > doing about 22MB/sec data writes. Nearly 0 reads from disk (everything
> > would be cached in memory).
> 
> This is during the sort part?  This first writes a big file, then reads 
> it, then sorts it.  With 20M records I think the file is around 2.5GB, 
> so eight of these would be 20GB.  Do you have 20GB of RAM?

For the archives:

The poor performance was caused by the IBM 1.4.2 JDK. Changing to Suns
JDK (1.4 or 1.5) seems to have superior performance when dealing with
disk IO.

-- 
Rod Taylor <rb...@sitesell.com>


Re: mapred Sort Progress Reports

Posted by Doug Cutting <cu...@nutch.org>.
Rod Taylor wrote:
>>Tell me how it behaves during the sort phase.
> 
> I ran 8 jobs simultaneously. Very high await time (1200) and it was
> doing about 22MB/sec data writes. Nearly 0 reads from disk (everything
> would be cached in memory).

This is during the sort part?  This first writes a big file, then reads 
it, then sorts it.  With 20M records I think the file is around 2.5GB, 
so eight of these would be 20GB.  Do you have 20GB of RAM?

Doug

Re: mapred Sort Progress Reports

Posted by Rod Taylor <rb...@sitesell.com>.
On Mon, 2005-10-03 at 15:57 -0700, Doug Cutting wrote:
> Try the following on your system:
> 
> bin/nutch org.apache.nutch.io.TestSequenceFile -fast -count 20000000 
> -megabytes 100 foo
> 
> Tell me how it behaves during the sort phase.

I ran 8 jobs simultaneously. Very high await time (1200) and it was
doing about 22MB/sec data writes. Nearly 0 reads from disk (everything
would be cached in memory).


I'll answer your request for SIGQUIT output as soon as another round of
fetches reaches the sort phase.

-- 
Rod Taylor <rb...@sitesell.com>


Re: mapred Sort Progress Reports

Posted by Doug Cutting <cu...@nutch.org>.
Try the following on your system:

bin/nutch org.apache.nutch.io.TestSequenceFile -fast -count 20000000 
-megabytes 100 foo

Tell me how it behaves during the sort phase.

Thanks,

Doug

Re: mapred Sort Progress Reports

Posted by Rod Taylor <rb...@sitesell.com>.
On Mon, 2005-10-03 at 14:11 -0700, Doug Cutting wrote:
> Rod Taylor wrote:
> > I see. Is there any way to speed up this phase? It seems to be taking as
> > long to run the sort phase as it did to download the data.
> > 
> > It would appear that nearly 30% of the time for the nutch fetch segment
> > is spent doing the sorts, so I'm well off the 20% overhead number you
> > seem to be able to achieve for a full cycle.
> > 
> > 5 machines (4CPU) each with 8 tasks with a load average is about 5 and
> > they run Redhat. Context switches are low (under 1500/second). There is
> > virtually no IO (boxes have plenty of ram) but the kernel is doing a
> > bunch of work as 50% of CPU time is in system (unsure what, I'm not
> > familiar with the Linux DTrace type tools).
> 
> Sorting is usually i/o bound on mapred.local.dir.  When eight tasks are 
> using the same device this could become a bottleneck.  Use iostat or sar 
> to view disk i/o statistics.

Virtually no IO reported at all.  Averages about 200kB/sec read and
writes are usually 0, but burst to 120MB/sec for under 1 second once
every 30 seconds or so.

The IO used is not all that high (lots of ram) and the wait time isn't
too bad for a single drive. I presume Nutch caches enough in memory so
that it can read/write large blocks of data allowing the harddrive for
mostly sequential IO.

        avg-cpu:  %user   %nice    %sys %iowait   %idle
                  60.15    0.00   39.07    0.10    0.68
        
        Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s
        wkB/s avgrq-sz avgqu-sz   await  svctm  %util
        sda          0.00 246.35  1.30 12.09  307.69 2067.53   153.85
        1033.77   177.43     0.58   43.54   5.21   6.97
        
        avg-cpu:  %user   %nice    %sys %iowait   %idle
                  61.86    0.00   37.29    0.12    0.72
        
        Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s
        wkB/s avgrq-sz avgqu-sz   await  svctm  %util
        sda          0.10 271.67  1.60 12.01  410.81 2269.47   205.41
        1134.73   196.88     0.59   43.45   4.82   6.57

Well under what can be bursted:
        [rbt@sbider1 ~]$ time dd if=/dev/zero of=/home/rbt/here bs=4K
        count=25600
        25600+0 records in
        25600+0 records out
        
        real    0m0.488s
        user    0m0.003s
        sys     0m0.474s
        
        [rbt@sbider1 ~]$ ls -lah here
        -rw-rw-r--  1 rbt rbt 100M Oct  3 17:35 here

Does nutch fsync the data or is pessimistic in other ways?

-- 
Rod Taylor <rb...@sitesell.com>


Re: mapred Sort Progress Reports

Posted by Doug Cutting <cu...@nutch.org>.
Rod Taylor wrote:
> I see. Is there any way to speed up this phase? It seems to be taking as
> long to run the sort phase as it did to download the data.
> 
> It would appear that nearly 30% of the time for the nutch fetch segment
> is spent doing the sorts, so I'm well off the 20% overhead number you
> seem to be able to achieve for a full cycle.
> 
> 5 machines (4CPU) each with 8 tasks with a load average is about 5 and
> they run Redhat. Context switches are low (under 1500/second). There is
> virtually no IO (boxes have plenty of ram) but the kernel is doing a
> bunch of work as 50% of CPU time is in system (unsure what, I'm not
> familiar with the Linux DTrace type tools).

Sorting is usually i/o bound on mapred.local.dir.  When eight tasks are 
using the same device this could become a bottleneck.  Use iostat or sar 
to view disk i/o statistics.

My plan is to permit one to specify a list of directories for 
mapred.local.dir and have the sorting (and everything else) select 
randomly among these for temporary local files.  That way all devices 
can be used in parallel.

As a workaround you could try starting eight tasktrackers, each 
configured with a different device for mapred.local.dir.  Yes, that's a 
pain, but it would give us an idea of whether my analysis is correct.

Doug

Re: mapred Sort Progress Reports

Posted by Rod Taylor <rb...@sitesell.com>.
On Mon, 2005-10-03 at 13:12 -0700, Doug Cutting wrote:
> Rod Taylor wrote:
> > I have high load, but it seems that the percentage progress progress
> > during the reduce > sort phase of fetch (parse?) is not increasing which
> > makes it appear as if nothing is happening (stuck at 0.5, or 50%).
> 
> That's correct.  There are currently no progress reports during sorting. 
>   Reduce progress sticks at 50% during sorting, and jumps to 75% on 
> completion of the sort phase.

I see. Is there any way to speed up this phase? It seems to be taking as
long to run the sort phase as it did to download the data.

It would appear that nearly 30% of the time for the nutch fetch segment
is spent doing the sorts, so I'm well off the 20% overhead number you
seem to be able to achieve for a full cycle.

5 machines (4CPU) each with 8 tasks with a load average is about 5 and
they run Redhat. Context switches are low (under 1500/second). There is
virtually no IO (boxes have plenty of ram) but the kernel is doing a
bunch of work as 50% of CPU time is in system (unsure what, I'm not
familiar with the Linux DTrace type tools).

I generated the segment for the top 10Million pages, with 10 pages per
host.  map.tasks=383, reduce.tasks=43

-- 
Rod Taylor <rb...@sitesell.com>


Re: mapred Sort Progress Reports

Posted by Doug Cutting <cu...@nutch.org>.
Rod Taylor wrote:
> I have high load, but it seems that the percentage progress progress
> during the reduce > sort phase of fetch (parse?) is not increasing which
> makes it appear as if nothing is happening (stuck at 0.5, or 50%).

That's correct.  There are currently no progress reports during sorting. 
  Reduce progress sticks at 50% during sorting, and jumps to 75% on 
completion of the sort phase.

Doug