You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Rod Taylor <rb...@sitesell.com> on 2005/10/03 21:59:55 UTC
mapred Sort Progress Reports
I have high load, but it seems that the percentage progress progress
during the reduce > sort phase of fetch (parse?) is not increasing which
makes it appear as if nothing is happening (stuck at 0.5, or 50%).
Full logs available upon request.
--
Rod Taylor <rb...@sitesell.com>
Re: mapred Sort Progress Reports
Posted by Rod Taylor <rb...@sitesell.com>.
For comparison, here are some IO numbers while reduce > reduce is
running:
avg-cpu: %user %nice %sys %iowait %idle
72.22 0.00 5.67 14.85 7.25
Device: tps kB_read/s kB_wrtn/s kB_read
kB_wrtn
sda 473.77 12363.16 13399.40 123508
133860
avg-cpu: %user %nice %sys %iowait %idle
73.72 0.00 10.73 9.60 5.95
Device: tps kB_read/s kB_wrtn/s kB_read
kB_wrtn
sda 445.11 3380.44 23845.91 33872
238936
On Mon, 2005-10-03 at 14:11 -0700, Doug Cutting wrote:
> Rod Taylor wrote:
> > I see. Is there any way to speed up this phase? It seems to be taking as
> > long to run the sort phase as it did to download the data.
> >
> > It would appear that nearly 30% of the time for the nutch fetch segment
> > is spent doing the sorts, so I'm well off the 20% overhead number you
> > seem to be able to achieve for a full cycle.
> >
> > 5 machines (4CPU) each with 8 tasks with a load average is about 5 and
> > they run Redhat. Context switches are low (under 1500/second). There is
> > virtually no IO (boxes have plenty of ram) but the kernel is doing a
> > bunch of work as 50% of CPU time is in system (unsure what, I'm not
> > familiar with the Linux DTrace type tools).
>
> Sorting is usually i/o bound on mapred.local.dir. When eight tasks are
> using the same device this could become a bottleneck. Use iostat or sar
> to view disk i/o statistics.
>
> My plan is to permit one to specify a list of directories for
> mapred.local.dir and have the sorting (and everything else) select
> randomly among these for temporary local files. That way all devices
> can be used in parallel.
>
> As a workaround you could try starting eight tasktrackers, each
> configured with a different device for mapred.local.dir. Yes, that's a
> pain, but it would give us an idea of whether my analysis is correct.
>
> Doug
>
--
Rod Taylor <rb...@sitesell.com>
Re: mapred Sort Progress Reports
Posted by Doug Cutting <cu...@nutch.org>.
Rod Taylor wrote:
> Thanks for your patience. Is there a "best practises" page in the Wiki
> that I could contribute this insight to?
If you can't find one, please add one. Your installation is somewhat
atypical, 4-cpu nodes with NAS. More typical is one- or two- cpu nodes
with between one and four IDE drives: cheap PCs.
Doug
Re: mapred Sort Progress Reports
Posted by Rod Taylor <rb...@sitesell.com>.
On Mon, 2005-10-03 at 15:33 -0700, Doug Cutting wrote:
> Rod Taylor wrote:
> > Virtually no IO reported at all. Averages about 200kB/sec read and
> > writes are usually 0, but burst to 120MB/sec for under 1 second once
> > every 30 seconds or so.
>
> That's strange. I wonder what it's doing. Can you use 'kill -QUIT' to
> get a thread dump? Try a few of these to sample the stack and see where
> it seems to be spending time.
The stacktrace is running FileOutputStream.writeBytes in the
LocalFileSystem.write function, so it is definitely IO bound.
I had the admins change a number of settings on a couple of machines
(ext3 fs to journal=writeback, noatime, etc.) and those machines have
completed their work in a much more reasonable timeframe. I guess we can
chalk it up to a lot of head movement
I guess bumping access times and things for all of the .out files had a
large impact.
Thanks for your patience. Is there a "best practises" page in the Wiki
that I could contribute this insight to?
--
Rod Taylor <rb...@sitesell.com>
Re: mapred Sort Progress Reports
Posted by Doug Cutting <cu...@nutch.org>.
Rod Taylor wrote:
> Virtually no IO reported at all. Averages about 200kB/sec read and
> writes are usually 0, but burst to 120MB/sec for under 1 second once
> every 30 seconds or so.
That's strange. I wonder what it's doing. Can you use 'kill -QUIT' to
get a thread dump? Try a few of these to sample the stack and see where
it seems to be spending time.
Doug
Re: mapred Sort Progress Reports
Posted by Rod Taylor <rb...@sitesell.com>.
On Mon, 2005-10-03 at 15:57 -0700, Doug Cutting wrote:
> Try the following on your system:
>
> bin/nutch org.apache.nutch.io.TestSequenceFile -fast -count 20000000
> -megabytes 100 foo
>
> Tell me how it behaves during the sort phase.
We changed interpreters on some machines to see if that would help.
Sun 1.5.0:
3MB to 5MB/sec steady writes. Takes 9.5 minutes with a 100MB
file and 8.5 minutes with a 500MB file.
The IBM 1.4.2 interpreter I was using at with the initial repeatably
receives these errors, although I cannot find them in the logs. Perhaps
the nutch catches them and retries?
051004 105012 running merge pass=1
Exception in thread "main" java.io.IOException: Cannot seek after EOF
at org.apache.nutch.ndfs.NDFSClient
$NDFSInputStream.seek(NDFSClient.java:445)
at org.apache.nutch.fs.NFSDataInputStream
$PositionCache.seek(NFSDataInputStream.java:43)
at org.apache.nutch.fs.NFSDataInputStream
$Buffer.seek(NFSDataInputStream.java:68)
at
org.apache.nutch.fs.NFSDataInputStream.seek(NFSDataInputStream.java:95)
at org.apache.nutch.io.SequenceFile$Sorter
$MergePass.run(SequenceFile.java:720)
at org.apache.nutch.io.SequenceFile
$Sorter.mergePass(SequenceFile.java:664)
at org.apache.nutch.io.SequenceFile
$Sorter.sort(SequenceFile.java:490)
at
org.apache.nutch.io.TestSequenceFile.sortTest(TestSequenceFile.java:112)
at
org.apache.nutch.io.TestSequenceFile.main(TestSequenceFile.java:272)
--
Rod Taylor <rb...@sitesell.com>
Re: mapred Sort Progress Reports
Posted by Rod Taylor <rb...@sitesell.com>.
On Tue, 2005-10-04 at 09:52 -0700, Doug Cutting wrote:
> Rod Taylor wrote:
> >>Tell me how it behaves during the sort phase.
> >
> > I ran 8 jobs simultaneously. Very high await time (1200) and it was
> > doing about 22MB/sec data writes. Nearly 0 reads from disk (everything
> > would be cached in memory).
>
> This is during the sort part? This first writes a big file, then reads
> it, then sorts it. With 20M records I think the file is around 2.5GB,
> so eight of these would be 20GB. Do you have 20GB of RAM?
I split the 20M record chunk into 8 jobs of 2.5M records each (better
replicates what I had earlier -- a 10M page segment with up to 8 tasks
on the machine).
The boxes were using have 4GB ram and the NAS has 2GB ram.
Yes, If I tried dealing with 160M records at a time I most certainly
would run out of IO being as there is only a single SCSI drive.
--
Rod Taylor <rb...@sitesell.com>
Re: mapred Sort Progress Reports
Posted by Rod Taylor <rb...@sitesell.com>.
On Tue, 2005-10-04 at 09:52 -0700, Doug Cutting wrote:
> Rod Taylor wrote:
> >>Tell me how it behaves during the sort phase.
> >
> > I ran 8 jobs simultaneously. Very high await time (1200) and it was
> > doing about 22MB/sec data writes. Nearly 0 reads from disk (everything
> > would be cached in memory).
>
> This is during the sort part? This first writes a big file, then reads
> it, then sorts it. With 20M records I think the file is around 2.5GB,
> so eight of these would be 20GB. Do you have 20GB of RAM?
For the archives:
The poor performance was caused by the IBM 1.4.2 JDK. Changing to Suns
JDK (1.4 or 1.5) seems to have superior performance when dealing with
disk IO.
--
Rod Taylor <rb...@sitesell.com>
Re: mapred Sort Progress Reports
Posted by Doug Cutting <cu...@nutch.org>.
Rod Taylor wrote:
>>Tell me how it behaves during the sort phase.
>
> I ran 8 jobs simultaneously. Very high await time (1200) and it was
> doing about 22MB/sec data writes. Nearly 0 reads from disk (everything
> would be cached in memory).
This is during the sort part? This first writes a big file, then reads
it, then sorts it. With 20M records I think the file is around 2.5GB,
so eight of these would be 20GB. Do you have 20GB of RAM?
Doug
Re: mapred Sort Progress Reports
Posted by Rod Taylor <rb...@sitesell.com>.
On Mon, 2005-10-03 at 15:57 -0700, Doug Cutting wrote:
> Try the following on your system:
>
> bin/nutch org.apache.nutch.io.TestSequenceFile -fast -count 20000000
> -megabytes 100 foo
>
> Tell me how it behaves during the sort phase.
I ran 8 jobs simultaneously. Very high await time (1200) and it was
doing about 22MB/sec data writes. Nearly 0 reads from disk (everything
would be cached in memory).
I'll answer your request for SIGQUIT output as soon as another round of
fetches reaches the sort phase.
--
Rod Taylor <rb...@sitesell.com>
Re: mapred Sort Progress Reports
Posted by Doug Cutting <cu...@nutch.org>.
Try the following on your system:
bin/nutch org.apache.nutch.io.TestSequenceFile -fast -count 20000000
-megabytes 100 foo
Tell me how it behaves during the sort phase.
Thanks,
Doug
Re: mapred Sort Progress Reports
Posted by Rod Taylor <rb...@sitesell.com>.
On Mon, 2005-10-03 at 14:11 -0700, Doug Cutting wrote:
> Rod Taylor wrote:
> > I see. Is there any way to speed up this phase? It seems to be taking as
> > long to run the sort phase as it did to download the data.
> >
> > It would appear that nearly 30% of the time for the nutch fetch segment
> > is spent doing the sorts, so I'm well off the 20% overhead number you
> > seem to be able to achieve for a full cycle.
> >
> > 5 machines (4CPU) each with 8 tasks with a load average is about 5 and
> > they run Redhat. Context switches are low (under 1500/second). There is
> > virtually no IO (boxes have plenty of ram) but the kernel is doing a
> > bunch of work as 50% of CPU time is in system (unsure what, I'm not
> > familiar with the Linux DTrace type tools).
>
> Sorting is usually i/o bound on mapred.local.dir. When eight tasks are
> using the same device this could become a bottleneck. Use iostat or sar
> to view disk i/o statistics.
Virtually no IO reported at all. Averages about 200kB/sec read and
writes are usually 0, but burst to 120MB/sec for under 1 second once
every 30 seconds or so.
The IO used is not all that high (lots of ram) and the wait time isn't
too bad for a single drive. I presume Nutch caches enough in memory so
that it can read/write large blocks of data allowing the harddrive for
mostly sequential IO.
avg-cpu: %user %nice %sys %iowait %idle
60.15 0.00 39.07 0.10 0.68
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s
wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 246.35 1.30 12.09 307.69 2067.53 153.85
1033.77 177.43 0.58 43.54 5.21 6.97
avg-cpu: %user %nice %sys %iowait %idle
61.86 0.00 37.29 0.12 0.72
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s
wkB/s avgrq-sz avgqu-sz await svctm %util
sda 0.10 271.67 1.60 12.01 410.81 2269.47 205.41
1134.73 196.88 0.59 43.45 4.82 6.57
Well under what can be bursted:
[rbt@sbider1 ~]$ time dd if=/dev/zero of=/home/rbt/here bs=4K
count=25600
25600+0 records in
25600+0 records out
real 0m0.488s
user 0m0.003s
sys 0m0.474s
[rbt@sbider1 ~]$ ls -lah here
-rw-rw-r-- 1 rbt rbt 100M Oct 3 17:35 here
Does nutch fsync the data or is pessimistic in other ways?
--
Rod Taylor <rb...@sitesell.com>
Re: mapred Sort Progress Reports
Posted by Doug Cutting <cu...@nutch.org>.
Rod Taylor wrote:
> I see. Is there any way to speed up this phase? It seems to be taking as
> long to run the sort phase as it did to download the data.
>
> It would appear that nearly 30% of the time for the nutch fetch segment
> is spent doing the sorts, so I'm well off the 20% overhead number you
> seem to be able to achieve for a full cycle.
>
> 5 machines (4CPU) each with 8 tasks with a load average is about 5 and
> they run Redhat. Context switches are low (under 1500/second). There is
> virtually no IO (boxes have plenty of ram) but the kernel is doing a
> bunch of work as 50% of CPU time is in system (unsure what, I'm not
> familiar with the Linux DTrace type tools).
Sorting is usually i/o bound on mapred.local.dir. When eight tasks are
using the same device this could become a bottleneck. Use iostat or sar
to view disk i/o statistics.
My plan is to permit one to specify a list of directories for
mapred.local.dir and have the sorting (and everything else) select
randomly among these for temporary local files. That way all devices
can be used in parallel.
As a workaround you could try starting eight tasktrackers, each
configured with a different device for mapred.local.dir. Yes, that's a
pain, but it would give us an idea of whether my analysis is correct.
Doug
Re: mapred Sort Progress Reports
Posted by Rod Taylor <rb...@sitesell.com>.
On Mon, 2005-10-03 at 13:12 -0700, Doug Cutting wrote:
> Rod Taylor wrote:
> > I have high load, but it seems that the percentage progress progress
> > during the reduce > sort phase of fetch (parse?) is not increasing which
> > makes it appear as if nothing is happening (stuck at 0.5, or 50%).
>
> That's correct. There are currently no progress reports during sorting.
> Reduce progress sticks at 50% during sorting, and jumps to 75% on
> completion of the sort phase.
I see. Is there any way to speed up this phase? It seems to be taking as
long to run the sort phase as it did to download the data.
It would appear that nearly 30% of the time for the nutch fetch segment
is spent doing the sorts, so I'm well off the 20% overhead number you
seem to be able to achieve for a full cycle.
5 machines (4CPU) each with 8 tasks with a load average is about 5 and
they run Redhat. Context switches are low (under 1500/second). There is
virtually no IO (boxes have plenty of ram) but the kernel is doing a
bunch of work as 50% of CPU time is in system (unsure what, I'm not
familiar with the Linux DTrace type tools).
I generated the segment for the top 10Million pages, with 10 pages per
host. map.tasks=383, reduce.tasks=43
--
Rod Taylor <rb...@sitesell.com>
Re: mapred Sort Progress Reports
Posted by Doug Cutting <cu...@nutch.org>.
Rod Taylor wrote:
> I have high load, but it seems that the percentage progress progress
> during the reduce > sort phase of fetch (parse?) is not increasing which
> makes it appear as if nothing is happening (stuck at 0.5, or 50%).
That's correct. There are currently no progress reports during sorting.
Reduce progress sticks at 50% during sorting, and jumps to 75% on
completion of the sort phase.
Doug