You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Rod Taylor <rb...@sitesell.com> on 2005/11/03 21:32:08 UTC

mapred bug -- bad part calculation?

Sources are from October 31st. Sun Standard Edition 1.5.0_02-b09 for
amd64

Every segment that I fetch seems to be missing a part when stored on the
filesystem. The stranger thing is it is always the same part (very
reproducible).

If I have mapred.reduce.tasks set to 20, the hole is at part 13. That
is, the part-00013 directory is empty while the remainder (0 through 12,
14 through 19) all have data.

If I have mapred.reduce.tasks set to 19, the hole is at part 11.
content/part-00011 is empty.

Attached are my site configuration (reduce.tasks is 19), task log for a
failing task and the output from the job tracker.

Below is a snippet from the datanode log (the only errors that exist are
related to this task or others which process the above part #) and below
that the output from localhost:7845 on the jobtracker machine for the
job.

java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at
java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
        at
java.io.BufferedInputStream.read(BufferedInputStream.java:313)
        at java.io.DataInputStream.read(DataInputStream.java:134)
        at org.apache.nutch.ndfs.DataNode
$DataXceiver.run(DataNode.java:369)
        at java.lang.Thread.run(Thread.java:595)
java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at
java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at
java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
        at
java.io.BufferedInputStream.read(BufferedInputStream.java:313)
        at java.io.DataInputStream.read(DataInputStream.java:134)
        at org.apache.nutch.ndfs.DataNode
$DataXceiver.run(DataNode.java:369)
        at java.lang.Thread.run(Thread.java:595)


                                                Job 'job_k1p80p'

   Job File: /home/sitesell/system/submit_2pgex8/job.xml
   Start time: Thu Nov 03 12:04:43 EST 2005
   The job failed at: Thu Nov 03 16:00:42 EST 2005

__________________________________________________________________________________________________

Map Tasks

        Map Task Id  Pct Complete State
Diagnostic Text
       task_m_2m1twe 1.0          103189 pages, 5045 errors, 13.1
pages/s, 1000 kb/s,
       task_m_4nzguk 1.0          103141 pages, 5193 errors, 12.9
pages/s, 988 kb/s,
       task_m_5aprs2 1.0          103427 pages, 4756 errors, 13.4
pages/s, 1027 kb/s,
       task_m_6pd5q7 1.0          102650 pages, 5081 errors, 12.6
pages/s, 962 kb/s,
       task_m_8qzj8p 1.0          103610 pages, 4539 errors, 13.6
pages/s, 1039 kb/s,
       task_m_aev1di 1.0          102666 pages, 4997 errors, 13.2
pages/s, 1007 kb/s,
       task_m_f2zfyw 1.0          103235 pages, 4662 errors, 13.6
pages/s, 1045 kb/s,
       task_m_f84hfi 1.0          103746 pages, 4657 errors, 13.0
pages/s, 991 kb/s,
       task_m_hhv9b9 1.0          102909 pages, 4972 errors, 13.5
pages/s, 1026 kb/s,
       task_m_kijqqx 1.0          103439 pages, 4858 errors, 13.4
pages/s, 1024 kb/s,
       task_m_n5mxax 1.0          102894 pages, 4953 errors, 13.3
pages/s, 1017 kb/s,
       task_m_p45m8c 1.0          103705 pages, 4969 errors, 13.1
pages/s, 1007 kb/s,
       task_m_qfevss 1.0          102640 pages, 5006 errors, 13.2
pages/s, 1011 kb/s,
       task_m_qg3816 1.0          103658 pages, 5039 errors, 13.3
pages/s, 1014 kb/s,
       task_m_rlxmuw 1.0          103609 pages, 4491 errors, 13.6
pages/s, 1038 kb/s,
       task_m_t9ksdc 1.0          103053 pages, 5287 errors, 12.9
pages/s, 994 kb/s,
       task_m_wt3oyf 1.0          103006 pages, 5168 errors, 13.3
pages/s, 1014 kb/s,
       task_m_xk3gxz 1.0          103294 pages, 5216 errors, 13.0
pages/s, 996 kb/s,
       task_m_yjrejy 1.0          103158 pages, 4787 errors, 13.5
pages/s, 1038 kb/s,

__________________________________________________________________________________________________

   Reduce Task Id Pct Complete State Diagnostic Text
   task_r_2ktith 1.0 reduce > reduce
   task_r_6hwvi0 1.0 reduce > reduce
   task_r_8bi6h5 1.0 reduce > reduce
   task_r_bpisbi 1.0 reduce > reduce
   task_r_cfoo7z 1.0 reduce > reduce
   task_r_cmy1r3 1.0 reduce > reduce
   task_r_efnd4k 1.0 reduce > reduce
   task_r_ervlp5 1.0 reduce > reduce
   task_r_kvmno7 1.0 reduce > reduce
   task_r_n4q36e 1.0 reduce > reduce
   task_r_o4st5w 1.0 reduce > reduce
   task_r_ow0sul 1.0 reduce > reduce
   task_r_r7u152 1.0 reduce > reduce
   task_r_ra99xx 1.0 reduce > reduce
   task_r_ush85v 1.0 reduce > reduce
   task_r_vbmkfw 1.0 reduce > reduce
   task_r_wbirax 1.0 reduce > reduce
   task_r_z17yss 1.0 reduce > reduce
   task_r_o9mv91 0.9153447 reduce > reduce Timed
out.java.io.IOException: Task process exit with nonzero status.
   at org.apache.nutch.mapred.TaskRunner.runChild(TaskRunner.java:139)
at
   org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92) Timed
out.java.io.IOException: Task process exit
   with    nonzero    status.   at
org.apache.nutch.mapred.TaskRunner.runChild(TaskRunner.java:139)   at
   org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92) Timed
out.java.io.IOException: Task process exit
   with    nonzero    status.   at
org.apache.nutch.mapred.TaskRunner.runChild(TaskRunner.java:139)   at
   org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92) Timed
out.java.io.IOException: Task process exit
   with    nonzero    status.   at
org.apache.nutch.mapred.TaskRunner.runChild(TaskRunner.java:139)   at
   org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92)


-- 
Rod Taylor <rb...@sitesell.com>

Re: mapred bug -- bad part calculation?

Posted by Rod Taylor <rb...@sitesell.com>.
On Fri, 2005-11-04 at 13:43 -0800, Doug Cutting wrote:
> Rod Taylor wrote:
> > Every segment that I fetch seems to be missing a part when stored on the
> > filesystem. The stranger thing is it is always the same part (very
> > reproducible).
> 
> This sounds strange.  Are the datanode errors always on the same host? 
> How many hosts are you running this on?

It also seems to be limited to large segments. Using -topN 1000000
executes without any problems. 3 and 7 million both had difficulties.

-- 
Rod Taylor <rb...@sitesell.com>


Re: mapred bug -- bad part calculation?

Posted by Rod Taylor <rb...@sitesell.com>.
On Fri, 2005-11-04 at 13:43 -0800, Doug Cutting wrote:
> Rod Taylor wrote:
> > Every segment that I fetch seems to be missing a part when stored on the
> > filesystem. The stranger thing is it is always the same part (very
> > reproducible).
> 
> This sounds strange.  Are the datanode errors always on the same host? 
> How many hosts are you running this on?

I lied earlier. It still happens with smaller segments, just not as
frequently.

Found this in the namenode log file:

051104 200412 Server connection on port 5466 from 192.168.100.11:
exiting
051104 200438 Server connection on port 5466 from 192.168.100.11:
starting
051104 200438 Cannot start file because pendingCreates is non-null
051104 200438 Server handler on 5466 call error: java.io.IOException:
Cannot create file /opt/sitesell/sbider
_data/nutch/segments/20051104185259/20051104185300/crawl_fetch/part-00011/data
java.io.IOException: Cannot create
file /opt/sitesell/sbider_data/nutch/segments/20051104185259/2005110418530
0/crawl_fetch/part-00011/data
        at org.apache.nutch.ndfs.NameNode.create(NameNode.java:98)
        at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.nutch.ipc.RPC$1.call(RPC.java:186)
        at org.apache.nutch.ipc.Server$Handler.run(Server.java:198)
051104 200440 Server connection on port 5466 from 192.168.100.11:
exiting
051104 200504 Server connection on port 5466 from 192.168.100.11:
starting
051104 200504 Cannot start file because pendingCreates is non-null
051104 200504 Server handler on 5466 call error: java.io.IOException:
Cannot create
file /opt/sitesell/sbider_data/nutch/segments/20051104185259/20051104185300/crawl_fetch/part-00011/data
java.io.IOException: Cannot create
file /opt/sitesell/sbider_data/nutch/segments/20051104185259/20051104185300/crawl_fetch/part-00011/data
        at org.apache.nutch.ndfs.NameNode.create(NameNode.java:98)
        at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.nutch.ipc.RPC$1.call(RPC.java:186)
        at org.apache.nutch.ipc.Server$Handler.run(Server.java:198)
051104 200505 Server connection on port 5466 from 192.168.100.11:
exiting
051104 200506 Removing lease [Lease.  Holder: NDFSClient_1755346663,
heldlocks: 0, pendingcreates: 0], leases remaining: 1
051104 200529 Server connection on port 5466 from 192.168.100.11:
starting
051104 201807 Server connection on port 5466 from 192.168.100.11:
exiting
051104 201812 Server connection on port 5466 from 192.168.100.15:
exiting
051104 201823 Server connection on port 5466 from 192.168.100.15:
starting



-- 
Rod Taylor <rb...@sitesell.com>


Re: mapred bug -- bad part calculation?

Posted by Stefan Groschupf <sg...@media-style.com>.
>
> I tried running one datanode per machine connecting back to the  
> same SAN
> but it seemed pretty clunky.

SAN in general is a bad idea. A SAN is too slow for a serious setup.
... and it is the single point of failure...
Better use many local hdd.

Stefan 

Re: mapred bug -- bad part calculation?

Posted by Rod Taylor <rb...@sitesell.com>.
On Mon, 2005-11-07 at 18:12 -0800, Paul Baclace wrote:
> Rod Taylor wrote:
> > NDFS accomplishes the above path finding by auto-prefixing any path not
> > beginning with / with a /user/$USER. I didn't think it was appropriate
> > for LocalFileSystem.java to be mucking around trying to automatically
> > adjust paths to what the user may have intended.
> > 
> 
> Grep-ing for /user, NDFSFileSystem has:
>     "/user/" + System.getProperty("user.name") + "/"
> 
> I would think this is not consistent with the idea that properties
> and filenames working identically on all machines.  Perhaps this
> NDFSFileSystem line should use mapred.system.dir

Quite possibly.  Either way I was just trying to demonstrate that
multiple tasktrackers, regardless of whether it is NDFS or local
filesystem, requires the path expansion.

I don't think it should be a filesystem level item at all and should be
up to the code requesting the job to be done.

-- 
Rod Taylor <rb...@sitesell.com>


Re: mapred bug -- bad part calculation?

Posted by Paul Baclace <pe...@baclace.net>.
Rod Taylor wrote:
> NDFS accomplishes the above path finding by auto-prefixing any path not
> beginning with / with a /user/$USER. I didn't think it was appropriate
> for LocalFileSystem.java to be mucking around trying to automatically
> adjust paths to what the user may have intended.
> 

Grep-ing for /user, NDFSFileSystem has:
    "/user/" + System.getProperty("user.name") + "/"

I would think this is not consistent with the idea that properties
and filenames working identically on all machines.  Perhaps this
NDFSFileSystem line should use mapred.system.dir

Paul

Re: mapred bug -- bad part calculation?

Posted by Rod Taylor <rb...@sitesell.com>.
On Mon, 2005-11-07 at 17:26 -0800, Paul Baclace wrote:
> Rod Taylor wrote:
> > The attached patches for Generator.java and Injector.java allow a
> > specific temporary directory to be specified. This gives Nutch the full
> > path to these temporary directories and seems to fix the "No input
> > directories" issue when using a local filesystem with multiple task
> > trackers.
> 
> Is your patch with the new property mapred.temp.dir is meant to help
> finding files that should not be separate between different
> processes on the same host?  Is the user id different?

Generate and Inject both issue 2 jobs. In order for the second job to
find the files, the first job needs to write them in a predictable and
common location. The current path doesn't seem to be enough even if all
daemons are started within it. I believe it needs to be a common path
for all hosts like mapred.system.dir which I considered using instead.

NDFS accomplishes the above path finding by auto-prefixing any path not
beginning with / with a /user/$USER. I didn't think it was appropriate
for LocalFileSystem.java to be mucking around trying to automatically
adjust paths to what the user may have intended.

-- 
Rod Taylor <rb...@sitesell.com>


Re: mapred bug -- bad part calculation?

Posted by Paul Baclace <pe...@baclace.net>.
Rod Taylor wrote:
> The attached patches for Generator.java and Injector.java allow a
> specific temporary directory to be specified. This gives Nutch the full
> path to these temporary directories and seems to fix the "No input
> directories" issue when using a local filesystem with multiple task
> trackers.

Is your patch with the new property mapred.temp.dir is meant to help
finding files that should not be separate between different
processes on the same host?  Is the user id different?


Paul


Re: mapred bug -- bad part calculation?

Posted by Doug Cutting <cu...@nutch.org>.
Rod Taylor wrote:
> The attached patches for Generator.java and Injector.java allow a
> specific temporary directory to be specified. This gives Nutch the full
> path to these temporary directories and seems to fix the "No input
> directories" issue when using a local filesystem with multiple task
> trackers.

This looks like a good patch.  I've committed it.

This is a recent bug.  The nutch-daemon.sh script connects all daemons 
to the Nutch root, so that relative paths are consistent.  And, 
previously, child processes were always connected to the same place as 
the parent process.  But I changed that recently so that child processes 
are now connected to the directory where their job's jar (if any) is 
unpacked.  This was so that if the jar contains scripts (e.g., a 
parse-ext plugin script) then these scripts are easy to run.

In NDFS the current working directory is always /user/$USER.  On the 
local filesystem with a local jobtracker, paths are relative to the 
current working directory of the process (since there's only one 
process).  The problematic case is when the local filesystem is used 
with multiple processes.  The prior convention of making paths relative 
to the nutch root was fragile.  Better to supply absolute paths, as your 
patch does.

Doug

Re: mapred bug -- bad part calculation?

Posted by Paul Baclace <pe...@archive.org>.
Rod Taylor wrote:
> The attached patches for Generator.java and Injector.java allow a
> specific temporary directory to be specified. This gives Nutch the full
> path to these temporary directories and seems to fix the "No input
> directories" issue when using a local filesystem with multiple task
> trackers.

Is your patch with the new property mapred.temp.dir is meant to help
finding files that should not be separate between different
processes on the same host?  Is the user id different?


Paul

Re: mapred bug -- bad part calculation?

Posted by Rod Taylor <rb...@sitesell.com>.
The attached patches for Generator.java and Injector.java allow a
specific temporary directory to be specified. This gives Nutch the full
path to these temporary directories and seems to fix the "No input
directories" issue when using a local filesystem with multiple task
trackers.

On Mon, 2005-11-07 at 09:57 -0500, Rod Taylor wrote:
> On Fri, 2005-11-04 at 20:41 -0800, Doug Cutting wrote:
> > Rod Taylor wrote:
> > > Here you go. local filesystem and a single job tracker on another
> > > machine. When the tasktracker and jobtracker are on the same box there
> > > isn't a problem. When they are on different machines it runs into
> > > issues.
> > > 
> > > This is using mapred.local.dir on the local machine (not sharedd between
> > > sbider4 and sbider5):
> > 
> > >         parsing /home/sitesell/localt/taskTracker/task_m_o59djj/job.xml
> > >         [Fatal Error] :-1:-1: Premature end of file.
> > 
> > What is mapred.system.dir?  That must be shared.  Also, filenames you 
> > pass to commands must be pathnames that work on all hosts.
> 
> I managed to get past all of the initial injection problems by running a
> local crawl (no jobtracker) which created the crawldb/current/part-00000
> files. So I was able to do a real inject, with jobtracker, for all of
> the urls system wide without any complaints about files or directories
> not existing.
> 
> Now, when trying to run a generate with a jobtracker it seems to have a
> hard time finding the temporary working areas from one job to the next.
> I cannot figure out where it is creating generate-temp-908680235. With
> NDFS it would be /user/$USER/
> 
> <-- nutch generate -->
> 051107 091256 topN: 10000
> 051107 091256 Generator: starting
> 051107 091256 Generator:
> segment: /opt/sitesell/sbider_data/test2/segments/20051107091256
> 051107 091256 Generator: Selecting most-linked urls due for fetch.
> 051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
> 051107 091256 parsing file:/opt/nutch-0.8_7/conf/mapred-default.xml
> 051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
> 051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
> 051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
> 051107 091256 Client connection to 192.168.100.14:5464: starting
> 051107 091256 Running job: job_xhvq9b
> 051107 091258  map 0%
> 051107 091300  map 5%
> 051107 091303  map 16%
> 051107 091305  map 21%
> 051107 091306  map 26%
> 051107 091308  map 32%
> 051107 091309  map 37%
> 051107 091312  map 47%
> 051107 091315  map 58%
> 051107 091318  map 68%
> 051107 091320  map 74%
> 051107 091321  map 79%
> 051107 091324  map 89%
> 051107 091327  map 100%
> 051107 091330  reduce 5%
> 051107 091332  reduce 11%
> 051107 091333  reduce 16%
> 051107 091335  reduce 21%
> 051107 091337  reduce 26%
> 051107 091339  reduce 37%
> 051107 091342  reduce 47%
> 051107 091344  reduce 53%
> 051107 091345  reduce 58%
> 051107 091347  reduce 63%
> 051107 091348  reduce 68%
> 051107 091351  reduce 79%
> 051107 091354  reduce 89%
> 051107 091357  reduce 100%
> 051107 091359 Job complete: job_xhvq9b
> 051107 091359 Generator: Partitioning selected urls by host, for
> politeness.
> 051107 091359 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
> 051107 091359 parsing file:/opt/nutch-0.8_7/conf/mapred-default.xml
> 051107 091359 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
> Exception in thread "main" java.io.IOException: No input directories
> specified in: NutchConf: nutch-default.xml ,
> mapred-default.xml , /home/sitesell/local/jobTracker/job_h22fvi.xml ,
> nutch-site.xml
>         at org.apache.nutch.ipc.Client.call(Client.java:294)
>         at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
>         at $Proxy0.submitJob(Unknown Source)
>         at
> org.apache.nutch.mapred.JobClient.submitJob(JobClient.java:259)
>         at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:288)
>         at org.apache.nutch.crawl.Generator.generate(Generator.java:213)
>         at org.apache.nutch.crawl.Generator.main(Generator.java:258)
> 
> [sitesell@sbider5 sbider_data]$
> cat /home/sitesell/local/jobTracker/job_h22fvi.xml | grep input
> <property><name>mapred.input.format.class</name><value>org.apache.nutch.mapred.SequenceFileInputFormat</value></property>
> <property><name>mapred.input.dir</name><value>generate-temp-908680235</value></property>
> <property><name>mapred.input.value.class</name><value>org.apache.nutch.io.UTF8</value></property>
> <property><name>mapred.input.key.class</name><value>org.apache.nutch.crawl.CrawlDatum</value></property>
> 
> -- 
> Rod Taylor <rb...@sitesell.com>
> 
> 
-- 
Rod Taylor <rb...@sitesell.com>

Re: mapred bug -- bad part calculation?

Posted by Massimo Miccoli <mm...@iltrovatore.it>.
Hello Nutch devs,

I have same problems.  I have 10 hosts and one master.  For each host  I 
have a datanode and tasktracer.
My mapred conf is 100 maps and 25 reducers. Belove the logs with errors.

Thanks

051107 144101 task_r_pd3ybk 0.224% reduce > copy >
051107 144102 Moving bad file 
/tmp/nutch/mapred/local/task_m_mmdwzs/part-18.out to 
/tmp/bad_files/part-18.out.-1505193967
051107 144102 Server handler on 48724 caught: java.io.IOException: 
Checksum error: /tmp/nutch/mapred/local/task_m_mmdwzs/pa
rt-18.out
java.io.IOException: Checksum error: 
/tmp/nutch/mapred/local/task_m_mmdwzs/part-18.out
        at 
org.apache.nutch.fs.NFSDataInputStream$Checker.verifySum(NFSDataInputStream.java:115)
        at 
org.apache.nutch.fs.NFSDataInputStream$Checker.read(NFSDataInputStream.java:95)
        at 
org.apache.nutch.fs.NFSDataInputStream$PositionCache.read(NFSDataInputStream.java:152)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
        at java.io.DataInputStream.read(DataInputStream.java:80)
        at 
org.apache.nutch.mapred.MapOutputFile.write(MapOutputFile.java:95)
        at 
org.apache.nutch.io.ObjectWritable.writeObject(ObjectWritable.java:117)
        at org.apache.nutch.io.ObjectWritable.write(ObjectWritable.java:64)
        at org.apache.nutch.ipc.Server$Handler.run(Server.java:213)
051107 144103 task_r_pd3ybk 0.24400002% reduce > copy >
051107 144103 parsing file:/d1/mapred/conf/nutch-default.xml
051107 144103 parsing file:/d1/mapred/conf/mapred-default.xml
051107 144103 parsing 
/tmp/nutch/mapred/local/taskTracker/task_r_pd3ybk/job.xml
051107 144103 parsing file:/d1/mapred/conf/nutch-site.xml
051107 144104 task_r_pd3ybk  parsing file:/d1/mapred/conf/nutch-default.xml
051107 144104 task_r_pd3ybk  parsing file:/d1/mapred/conf/nutch-site.xml
051107 144104 task_r_pd3ybk  Child starting
051107 144104 task_r_pd3ybk  Client connection to 0.0.0.0:33273: starting
051107 144104 Server connection on port 33273 from 127.0.0.1: starting
051107 144104 task_r_pd3ybk  parsing file:/d1/mapred/conf/nutch-default.xml
051107 144104 task_r_pd3ybk  parsing file:/d1/mapred/conf/mapred-default.xml
051107 144104 task_r_pd3ybk  parsing 
/tmp/nutch/mapred/local/taskTracker/task_r_pd3ybk/job.xml
051107 144104 task_r_pd3ybk  parsing file:/d1/mapred/conf/nutch-site.xml
051107 144104 task_r_pd3ybk  parsing file:/d1/mapred/conf/nutch-default.xml
051107 144104 task_r_pd3ybk  parsing 
/tmp/nutch/mapred/local/taskTracker/task_r_pd3ybk/job.xml
051107 144104 task_r_pd3ybk  parsing file:/d1/mapred/conf/nutch-site.xml
051107 144105 task_r_pd3ybk 0.25640127% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_9b2agp.out
051107 144106 task_r_pd3ybk 0.26105025% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_iwbx48.out
051107 144107 task_r_pd3ybk 0.30607307% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_cphmud.out
051107 144108 task_r_pd3ybk 0.30645084% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_cphmud.out
051107 144109 task_r_pd3ybk 0.30679235% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_cphmud.out
051107 144110 task_r_pd3ybk 0.30714962% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_cphmud.out
051107 144111 task_r_pd3ybk 0.30751395% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_cphmud.out
051107 144112 task_r_pd3ybk 0.3078882% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_cphmud.out
051107 144113 task_r_pd3ybk 0.3246999% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_ahej3w.out
051107 144114 task_r_pd3ybk 0.33490744% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_rebwrf.out
051107 144115 task_r_pd3ybk 0.3441058% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_atf6cb.out
051107 144116 task_r_pd3ybk 0.3537717% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_objo5q.out
051107 144117 task_r_pd3ybk 0.35881257% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_ybv2xw.out
051107 144118 task_r_pd3ybk 0.36855537% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_pv6b9d.out
051107 144119 task_r_pd3ybk 0.37860525% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_lj8ljn.out
051107 144120 task_r_pd3ybk 0.3887727% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_5jjyb8.out
051107 144121 task_r_pd3ybk 0.39831316% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_q24lb2.out
051107 144122 task_r_pd3ybk 0.44835892% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_9yx6r2.out
051107 144123 task_r_pd3ybk 0.4488136% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_9yx6r2.out
051107 144124 task_r_pd3ybk 0.4492674% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_9yx6r2.out
051107 144125 task_r_pd3ybk 0.44971693% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_9yx6r2.out
051107 144126 task_r_pd3ybk 0.48041725% reduce > append > 
/tmp/nutch/mapred/local/task_r_pd3ybk/task_m_xnbtvi.out
051107 144128 task_r_pd3ybk 0.5% reduce > sort
051107 144129 task_r_pd3ybk 0.5% reduce > sort
051107 144130 task_r_pd3ybk 0.5% reduce > sort
051107 144131 task_r_pd3ybk 0.5% reduce > sort
051107 144132 task_r_pd3ybk 0.5% reduce > sort
051107 144133 task_r_pd3ybk 0.5% reduce > sort
051107 144134 task_r_pd3ybk 0.5% reduce > sort
051107 144135 task_r_pd3ybk 0.5% reduce > sort
051107 144136 task_r_pd3ybk 0.5% reduce > sort
051107 144137 task_r_pd3ybk 0.5% reduce > sort
051107 144138 task_r_pd3ybk 0.5% reduce > sort
051107 144139 task_r_pd3ybk 0.5% reduce > sort
051107 144140 task_r_pd3ybk 0.5% reduce > sort
051107 144141 task_r_pd3ybk 0.5% reduce > sort
051107 144142 task_r_pd3ybk 0.5% reduce > sort
051107 144144 task_r_pd3ybk 0.5% reduce > sort
051107 144145 task_r_pd3ybk 0.5% reduce > sort
051107 144146 task_r_pd3ybk 0.5% reduce > sort
051107 144147 task_r_pd3ybk 0.5% reduce > sort
051107 144148 task_r_pd3ybk 0.5% reduce > sort
051107 144149 task_r_pd3ybk 0.5% reduce > sort
051107 144150 task_r_pd3ybk 0.5% reduce > sort
051107 144151 task_r_pd3ybk 0.5% reduce > sort
051107 144151 task_r_pd3ybk  Client connection to 10.2.0.11:7000: starting
051107 144152 task_r_pd3ybk 0.75141895% reduce > reduce
051107 144153 task_r_pd3ybk 0.75535446% reduce > reduce
051107 144154 task_r_pd3ybk 0.7593212% reduce > reduce
051107 144155 task_r_pd3ybk 0.7630673% reduce > reduce
051107 144156 task_r_pd3ybk 0.7669503% reduce > reduce
051107 144157 task_r_pd3ybk 0.770851% reduce > reduce
051107 144158 task_r_pd3ybk 0.774693% reduce > reduce
051107 144159 task_r_pd3ybk 0.77830505% reduce > reduce
051107 144200 task_r_pd3ybk 0.78223264% reduce > reduce
051107 144201 task_r_pd3ybk 0.7861667% reduce > reduce
051107 144202 task_r_pd3ybk 0.7900911% reduce > reduce
051107 144203 Server connection on port 48724 from 10.2.0.9: exiting
051107 144203 task_r_pd3ybk 0.79412013% reduce > reduce
051107 144203 Server connection on port 48724 from 10.2.0.9: starting
051107 144203 Server handler on 48724 caught: 
java.io.FileNotFoundException: 
/tmp/nutch/mapred/local/task_m_mmdwzs/part-18.
out
java.io.FileNotFoundException: 
/tmp/nutch/mapred/local/task_m_mmdwzs/part-18.out
        at 
org.apache.nutch.fs.LocalFileSystem.openRaw(LocalFileSystem.java:106)
        at 
org.apache.nutch.fs.NFSDataInputStream$Checker.<init>(NFSDataInputStream.java:45)
        at 
org.apache.nutch.fs.NFSDataInputStream.<init>(NFSDataInputStream.java:217)
        at 
org.apache.nutch.fs.NutchFileSystem.open(NutchFileSystem.java:143)
        at 
org.apache.nutch.fs.NutchFileSystem.open(NutchFileSystem.java:132)
        at 
org.apache.nutch.mapred.MapOutputFile.write(MapOutputFile.java:91)
        at 
org.apache.nutch.io.ObjectWritable.writeObject(ObjectWritable.java:117)
        at org.apache.nutch.io.ObjectWritable.write(ObjectWritable.java:64)
        at org.apache.nutch.ipc.Server$Handler.run(Server.java:213)
051107 144204 task_r_pd3ybk 0.79818034% reduce > reduce
051107 144205 task_r_pd3ybk 0.80157274% reduce > reduce
051107 144206 task_r_pd3ybk 0.8053863% reduce > reduce
051107 144207 task_r_pd3ybk 0.8092159% reduce > reduce
....



Rod Taylor ha scritto:

>On Fri, 2005-11-04 at 20:41 -0800, Doug Cutting wrote:
>  
>
>>Rod Taylor wrote:
>>    
>>
>>>Here you go. local filesystem and a single job tracker on another
>>>machine. When the tasktracker and jobtracker are on the same box there
>>>isn't a problem. When they are on different machines it runs into
>>>issues.
>>>
>>>This is using mapred.local.dir on the local machine (not sharedd between
>>>sbider4 and sbider5):
>>>      
>>>
>>>        parsing /home/sitesell/localt/taskTracker/task_m_o59djj/job.xml
>>>        [Fatal Error] :-1:-1: Premature end of file.
>>>      
>>>
>>What is mapred.system.dir?  That must be shared.  Also, filenames you 
>>pass to commands must be pathnames that work on all hosts.
>>    
>>
>
>I managed to get past all of the initial injection problems by running a
>local crawl (no jobtracker) which created the crawldb/current/part-00000
>files. So I was able to do a real inject, with jobtracker, for all of
>the urls system wide without any complaints about files or directories
>not existing.
>
>Now, when trying to run a generate with a jobtracker it seems to have a
>hard time finding the temporary working areas from one job to the next.
>I cannot figure out where it is creating generate-temp-908680235. With
>NDFS it would be /user/$USER/
>
><-- nutch generate -->
>051107 091256 topN: 10000
>051107 091256 Generator: starting
>051107 091256 Generator:
>segment: /opt/sitesell/sbider_data/test2/segments/20051107091256
>051107 091256 Generator: Selecting most-linked urls due for fetch.
>051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
>051107 091256 parsing file:/opt/nutch-0.8_7/conf/mapred-default.xml
>051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
>051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
>051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
>051107 091256 Client connection to 192.168.100.14:5464: starting
>051107 091256 Running job: job_xhvq9b
>051107 091258  map 0%
>051107 091300  map 5%
>051107 091303  map 16%
>051107 091305  map 21%
>051107 091306  map 26%
>051107 091308  map 32%
>051107 091309  map 37%
>051107 091312  map 47%
>051107 091315  map 58%
>051107 091318  map 68%
>051107 091320  map 74%
>051107 091321  map 79%
>051107 091324  map 89%
>051107 091327  map 100%
>051107 091330  reduce 5%
>051107 091332  reduce 11%
>051107 091333  reduce 16%
>051107 091335  reduce 21%
>051107 091337  reduce 26%
>051107 091339  reduce 37%
>051107 091342  reduce 47%
>051107 091344  reduce 53%
>051107 091345  reduce 58%
>051107 091347  reduce 63%
>051107 091348  reduce 68%
>051107 091351  reduce 79%
>051107 091354  reduce 89%
>051107 091357  reduce 100%
>051107 091359 Job complete: job_xhvq9b
>051107 091359 Generator: Partitioning selected urls by host, for
>politeness.
>051107 091359 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
>051107 091359 parsing file:/opt/nutch-0.8_7/conf/mapred-default.xml
>051107 091359 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
>Exception in thread "main" java.io.IOException: No input directories
>specified in: NutchConf: nutch-default.xml ,
>mapred-default.xml , /home/sitesell/local/jobTracker/job_h22fvi.xml ,
>nutch-site.xml
>        at org.apache.nutch.ipc.Client.call(Client.java:294)
>        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
>        at $Proxy0.submitJob(Unknown Source)
>        at
>org.apache.nutch.mapred.JobClient.submitJob(JobClient.java:259)
>        at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:288)
>        at org.apache.nutch.crawl.Generator.generate(Generator.java:213)
>        at org.apache.nutch.crawl.Generator.main(Generator.java:258)
>
>[sitesell@sbider5 sbider_data]$
>cat /home/sitesell/local/jobTracker/job_h22fvi.xml | grep input
><property><name>mapred.input.format.class</name><value>org.apache.nutch.mapred.SequenceFileInputFormat</value></property>
><property><name>mapred.input.dir</name><value>generate-temp-908680235</value></property>
><property><name>mapred.input.value.class</name><value>org.apache.nutch.io.UTF8</value></property>
><property><name>mapred.input.key.class</name><value>org.apache.nutch.crawl.CrawlDatum</value></property>
>
>  
>

Re: mapred bug -- bad part calculation?

Posted by Rod Taylor <rb...@sitesell.com>.
On Fri, 2005-11-04 at 20:41 -0800, Doug Cutting wrote:
> Rod Taylor wrote:
> > Here you go. local filesystem and a single job tracker on another
> > machine. When the tasktracker and jobtracker are on the same box there
> > isn't a problem. When they are on different machines it runs into
> > issues.
> > 
> > This is using mapred.local.dir on the local machine (not sharedd between
> > sbider4 and sbider5):
> 
> >         parsing /home/sitesell/localt/taskTracker/task_m_o59djj/job.xml
> >         [Fatal Error] :-1:-1: Premature end of file.
> 
> What is mapred.system.dir?  That must be shared.  Also, filenames you 
> pass to commands must be pathnames that work on all hosts.

I managed to get past all of the initial injection problems by running a
local crawl (no jobtracker) which created the crawldb/current/part-00000
files. So I was able to do a real inject, with jobtracker, for all of
the urls system wide without any complaints about files or directories
not existing.

Now, when trying to run a generate with a jobtracker it seems to have a
hard time finding the temporary working areas from one job to the next.
I cannot figure out where it is creating generate-temp-908680235. With
NDFS it would be /user/$USER/

<-- nutch generate -->
051107 091256 topN: 10000
051107 091256 Generator: starting
051107 091256 Generator:
segment: /opt/sitesell/sbider_data/test2/segments/20051107091256
051107 091256 Generator: Selecting most-linked urls due for fetch.
051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051107 091256 parsing file:/opt/nutch-0.8_7/conf/mapred-default.xml
051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051107 091256 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051107 091256 Client connection to 192.168.100.14:5464: starting
051107 091256 Running job: job_xhvq9b
051107 091258  map 0%
051107 091300  map 5%
051107 091303  map 16%
051107 091305  map 21%
051107 091306  map 26%
051107 091308  map 32%
051107 091309  map 37%
051107 091312  map 47%
051107 091315  map 58%
051107 091318  map 68%
051107 091320  map 74%
051107 091321  map 79%
051107 091324  map 89%
051107 091327  map 100%
051107 091330  reduce 5%
051107 091332  reduce 11%
051107 091333  reduce 16%
051107 091335  reduce 21%
051107 091337  reduce 26%
051107 091339  reduce 37%
051107 091342  reduce 47%
051107 091344  reduce 53%
051107 091345  reduce 58%
051107 091347  reduce 63%
051107 091348  reduce 68%
051107 091351  reduce 79%
051107 091354  reduce 89%
051107 091357  reduce 100%
051107 091359 Job complete: job_xhvq9b
051107 091359 Generator: Partitioning selected urls by host, for
politeness.
051107 091359 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051107 091359 parsing file:/opt/nutch-0.8_7/conf/mapred-default.xml
051107 091359 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
Exception in thread "main" java.io.IOException: No input directories
specified in: NutchConf: nutch-default.xml ,
mapred-default.xml , /home/sitesell/local/jobTracker/job_h22fvi.xml ,
nutch-site.xml
        at org.apache.nutch.ipc.Client.call(Client.java:294)
        at org.apache.nutch.ipc.RPC$Invoker.invoke(RPC.java:127)
        at $Proxy0.submitJob(Unknown Source)
        at
org.apache.nutch.mapred.JobClient.submitJob(JobClient.java:259)
        at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:288)
        at org.apache.nutch.crawl.Generator.generate(Generator.java:213)
        at org.apache.nutch.crawl.Generator.main(Generator.java:258)

[sitesell@sbider5 sbider_data]$
cat /home/sitesell/local/jobTracker/job_h22fvi.xml | grep input
<property><name>mapred.input.format.class</name><value>org.apache.nutch.mapred.SequenceFileInputFormat</value></property>
<property><name>mapred.input.dir</name><value>generate-temp-908680235</value></property>
<property><name>mapred.input.value.class</name><value>org.apache.nutch.io.UTF8</value></property>
<property><name>mapred.input.key.class</name><value>org.apache.nutch.crawl.CrawlDatum</value></property>

-- 
Rod Taylor <rb...@sitesell.com>


Re: mapred bug -- bad part calculation?

Posted by Rod Taylor <rb...@sitesell.com>.
On Fri, 2005-11-04 at 20:41 -0800, Doug Cutting wrote:
> Rod Taylor wrote:
> > Here you go. local filesystem and a single job tracker on another
> > machine. When the tasktracker and jobtracker are on the same box there
> > isn't a problem. When they are on different machines it runs into
> > issues.
> > 
> > This is using mapred.local.dir on the local machine (not sharedd between
> > sbider4 and sbider5):
> 
> >         parsing /home/sitesell/localt/taskTracker/task_m_o59djj/job.xml
> >         [Fatal Error] :-1:-1: Premature end of file.
> 
> What is mapred.system.dir?  That must be shared.  Also, filenames you 
> pass to commands must be pathnames that work on all hosts.

Had the rest, but failed to override system.dir (description is "local
directory" which isn't really true if it is shared).

That worked through the map but failed at the reduce. Both the remote
task tracker and the task tracker on the same physical machine as the
job tracker failed.

Both had similar errors logged:

051104 235758 task_m_r2dcvc
0.6336343% /opt/sitesell/sbider_data/test/urls/list-oct31:167034415
+1758257
051104 235758 Server connection on port 45644 from 192.168.100.13:
exiting
051104 235759 task_m_r2dcvc
0.7225661% /opt/sitesell/sbider_data/test/urls/list-oct31:167034415
+1758257
051104 235800 task_m_r2dcvc
0.8255505% /opt/sitesell/sbider_data/test/urls/list-oct31:167034415
+1758257
051104 235801 task_m_r2dcvc
0.9183419% /opt/sitesell/sbider_data/test/urls/list-oct31:167034415
+1758257
051104 235802 task_m_r2dcvc
1.0% /opt/sitesell/sbider_data/test/urls/list-oct31:167034415+1758257
051104 235802 Task task_m_r2dcvc is done.
051104 235802 Server connection on port 45644 from 192.168.100.13:
exiting
java.io.FileNotFoundException: /opt/sitesell/sbider_data/test/system/submit_fubqfe/job.xml (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:106)
        at org.apache.nutch.fs.LocalFileSystem
$LocalNFSFileInputStream.<init>(LocalFileSystem.java:64)
        at
org.apache.nutch.fs.LocalFileSystem.openRaw(LocalFileSystem.java:108)
        at org.apache.nutch.fs.FileUtil.copyContents(FileUtil.java:57)
        at
org.apache.nutch.fs.LocalFileSystem.copyToLocalFile(LocalFileSystem.java:297)
        at org.apache.nutch.mapred.TaskTracker
$TaskInProgress.localizeTask(TaskTracker.java:328)
        at org.apache.nutch.mapred.TaskTracker
$TaskInProgress.<init>(TaskTracker.java:314)
        at
org.apache.nutch.mapred.TaskTracker.offerService(TaskTracker.java:214)
        at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:268)
        at
org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:633)
051104 235806 Lost connection to JobTracker
[sbider5.sitebuildit.com/192.168.100.14:5464].  Retrying...
051104 235811 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051104 235811 parsing file:/opt/nutch-0.8_7/conf/mapred-default.xml
051104 235811
parsing /home/sitesell/local/taskTracker/task_r_mdnul7/job.xml
[Fatal Error] :-1:-1: Premature end of file.
051104 235811 SEVERE error parsing conf file:
org.xml.sax.SAXParseException: Premature end of file.
java.lang.RuntimeException: org.xml.sax.SAXParseException: Premature end
of file.
        at
org.apache.nutch.util.NutchConf.loadResource(NutchConf.java:358)
        at org.apache.nutch.util.NutchConf.getProps(NutchConf.java:293)
        at org.apache.nutch.util.NutchConf.get(NutchConf.java:94)
        at org.apache.nutch.mapred.JobConf.getJar(JobConf.java:81)
        at org.apache.nutch.mapred.TaskTracker
$TaskInProgress.localizeTask(TaskTracker.java:332)
        at org.apache.nutch.mapred.TaskTracker
$TaskInProgress.<init>(TaskTracker.java:314)
        at
org.apache.nutch.mapred.TaskTracker.offerService(TaskTracker.java:214)
        at org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:268)
        at
org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:633)
Caused by: org.xml.sax.SAXParseException: Premature end of file.
        at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown
Source)
        at
javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:172)
        at
org.apache.nutch.util.NutchConf.loadResource(NutchConf.java:318)
        ... 8 more
051104 235811 Lost connection to JobTracker
[sbider5.sitebuildit.com/192.168.100.14:5464].  Retrying...

-- 
Rod Taylor <rb...@sitesell.com>


Re: mapred bug -- bad part calculation?

Posted by Doug Cutting <cu...@nutch.org>.
Rod Taylor wrote:
> Here you go. local filesystem and a single job tracker on another
> machine. When the tasktracker and jobtracker are on the same box there
> isn't a problem. When they are on different machines it runs into
> issues.
> 
> This is using mapred.local.dir on the local machine (not sharedd between
> sbider4 and sbider5):

>         parsing /home/sitesell/localt/taskTracker/task_m_o59djj/job.xml
>         [Fatal Error] :-1:-1: Premature end of file.

What is mapred.system.dir?  That must be shared.  Also, filenames you 
pass to commands must be pathnames that work on all hosts.

Doug

Re: mapred bug -- bad part calculation?

Posted by Rod Taylor <rb...@sitesell.com>.
On Fri, 2005-11-04 at 22:57 -0500, Rod Taylor wrote:
> On Fri, 2005-11-04 at 19:43 -0800, Doug Cutting wrote:
> > Rod Taylor wrote:
> > > I tried running one datanode per machine connecting back to the same SAN
> > > but it seemed pretty clunky.  A crash of any datanode would take down
> > > the entire system (no data replication since it's a common data-store in
> > > the end). Reducing it to a single datanode did not have this impact.
> > 
> > Why use NDFS at all?  Why not just mount the SAN on all hosts?  You're 
> > not using NDFS as a distributed file system, but rather as a centralized 
> > file system.
> 
> I was unable to make the mapred branch work by using 'local' as the
> filesystem and having more than one tasktracker. Tasktrackers were
> unable to complete any work, although it was quite a while ago when I
> last tried (September).

Here you go. local filesystem and a single job tracker on another
machine. When the tasktracker and jobtracker are on the same box there
isn't a problem. When they are on different machines it runs into
issues.

This is using mapred.local.dir on the local machine (not sharedd between
sbider4 and sbider5):

        051104 230802 parsing
        file:/opt/nutch-0.8_7/conf/nutch-default.xml
        051104 230802 parsing
        file:/opt/nutch-0.8_7/conf/mapred-default.xml
        051104 230802
        parsing /home/sitesell/localt/taskTracker/task_m_o59djj/job.xml
        [Fatal Error] :-1:-1: Premature end of file.
        051104 230802 SEVERE error parsing conf file:
        org.xml.sax.SAXParseException: Premature end of file.
        java.lang.RuntimeException: org.xml.sax.SAXParseException:
        Premature end of file.
                at
        org.apache.nutch.util.NutchConf.loadResource(NutchConf.java:358)
                at
        org.apache.nutch.util.NutchConf.getProps(NutchConf.java:293)
                at
        org.apache.nutch.util.NutchConf.get(NutchConf.java:94)
                at
        org.apache.nutch.mapred.JobConf.getJar(JobConf.java:81)
                at org.apache.nutch.mapred.TaskTracker
        $TaskInProgress.localizeTask(TaskTracker.java:332)
                at org.apache.nutch.mapred.TaskTracker
        $TaskInProgress.<init>(TaskTracker.java:314)
                at
        org.apache.nutch.mapred.TaskTracker.offerService(TaskTracker.java:214)
                at
        org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:268)
                at
        org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:633)
        Caused by: org.xml.sax.SAXParseException: Premature end of file.
                at org.apache.xerces.parsers.DOMParser.parse(Unknown
        Source)
                at
        org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
                at
        javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:172)
                at
        org.apache.nutch.util.NutchConf.loadResource(NutchConf.java:318)
                ... 8 more
        051104 230802 Lost connection to JobTracker
        [sbider5.sitebuildit.com/192.168.100.14:5464].  Retrying...
        
This is using a shared mapred.local.dir on the SAN:

        051104 232115 parsing
        file:/opt/nutch-0.8_7/conf/nutch-default.xml
        051104 232115 parsing
        file:/opt/nutch-0.8_7/conf/mapred-default.xml
        051104 232115
        parsing /opt/sitesell/sbider_data/test/local/taskTracker/task_m_l86ntl/job.xml
        [Fatal Error] :-1:-1: Premature end of file.
        051104 232116 SEVERE error parsing conf file:
        org.xml.sax.SAXParseException: Premature end of file.
        java.lang.RuntimeException: org.xml.sax.SAXParseException:
        Premature end of file.
                at
        org.apache.nutch.util.NutchConf.loadResource(NutchConf.java:358)
                at
        org.apache.nutch.util.NutchConf.getProps(NutchConf.java:293)
                at
        org.apache.nutch.util.NutchConf.get(NutchConf.java:94)
                at
        org.apache.nutch.mapred.JobConf.getJar(JobConf.java:81)
                at org.apache.nutch.mapred.TaskTracker
        $TaskInProgress.localizeTask(TaskTracker.java:332)
                at org.apache.nutch.mapred.TaskTracker
        $TaskInProgress.<init>(TaskTracker.java:314)
                at
        org.apache.nutch.mapred.TaskTracker.offerService(TaskTracker.java:214)
                at
        org.apache.nutch.mapred.TaskTracker.run(TaskTracker.java:268)
                at
        org.apache.nutch.mapred.TaskTracker.main(TaskTracker.java:633)
        Caused by: org.xml.sax.SAXParseException: Premature end of file.
                at org.apache.xerces.parsers.DOMParser.parse(Unknown
        Source)
                at
        org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
                at
        javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:172)
                at
        org.apache.nutch.util.NutchConf.loadResource(NutchConf.java:318)
                ... 8 more
        051104 232116 Lost connection to JobTracker
        [sbider5.sitebuildit.com/192.168.100.14:5464].  Retrying...




-- 
Rod Taylor <rb...@sitesell.com>


Re: mapred bug -- bad part calculation?

Posted by Rod Taylor <rb...@sitesell.com>.
On Fri, 2005-11-04 at 19:43 -0800, Doug Cutting wrote:
> Rod Taylor wrote:
> > I tried running one datanode per machine connecting back to the same SAN
> > but it seemed pretty clunky.  A crash of any datanode would take down
> > the entire system (no data replication since it's a common data-store in
> > the end). Reducing it to a single datanode did not have this impact.
> 
> Why use NDFS at all?  Why not just mount the SAN on all hosts?  You're 
> not using NDFS as a distributed file system, but rather as a centralized 
> file system.

I was unable to make the mapred branch work by using 'local' as the
filesystem and having more than one tasktracker. Tasktrackers were
unable to complete any work, although it was quite a while ago when I
last tried (September).

-- 
Rod Taylor <rb...@sitesell.com>


Re: mapred bug -- bad part calculation?

Posted by Doug Cutting <cu...@nutch.org>.
Rod Taylor wrote:
> I tried running one datanode per machine connecting back to the same SAN
> but it seemed pretty clunky.  A crash of any datanode would take down
> the entire system (no data replication since it's a common data-store in
> the end). Reducing it to a single datanode did not have this impact.

Why use NDFS at all?  Why not just mount the SAN on all hosts?  You're 
not using NDFS as a distributed file system, but rather as a centralized 
file system.

Doug

Re: mapred bug -- bad part calculation?

Posted by Rod Taylor <rb...@sitesell.com>.
On Fri, 2005-11-04 at 19:15 -0800, Doug Cutting wrote:
> Rod Taylor wrote:
> > There is only a single datanode and there are 20 hosts.
> 
> That's a lot of load on one datanode.  I typically run a datanode on 
> every host, accessing the local drives on that host.

I tried running one datanode per machine connecting back to the same SAN
but it seemed pretty clunky.  A crash of any datanode would take down
the entire system (no data replication since it's a common data-store in
the end). Reducing it to a single datanode did not have this impact.

The boxes themselves don't have much for local drives aside from a bit
of temp space.

Recently we moved the datanode, namenode and jobtracker to their own
machine per your earlier suggestion and upgraded Nutch sources to Nov
1st from about October 20th. This is when the difficulties started.

Earlier with the single datanode, namenode and jobtracker on an
overloaded worker machine (load average was around 20 normally) things
worked without errors, but slowly.

-- 
Rod Taylor <rb...@sitesell.com>


Re: mapred bug -- bad part calculation?

Posted by Doug Cutting <cu...@nutch.org>.
Rod Taylor wrote:
> There is only a single datanode and there are 20 hosts.

That's a lot of load on one datanode.  I typically run a datanode on 
every host, accessing the local drives on that host.

Doug

Re: mapred bug -- bad part calculation?

Posted by Rod Taylor <rb...@sitesell.com>.
On Fri, 2005-11-04 at 13:43 -0800, Doug Cutting wrote:
> Rod Taylor wrote:
> > Every segment that I fetch seems to be missing a part when stored on the
> > filesystem. The stranger thing is it is always the same part (very
> > reproducible).
> 
> This sounds strange.  Are the datanode errors always on the same host? 
> How many hosts are you running this on?

There is only a single datanode and there are 20 hosts.

-- 
Rod Taylor <rb...@sitesell.com>


Re: mapred bug -- bad part calculation?

Posted by Doug Cutting <cu...@nutch.org>.
Rod Taylor wrote:
> Every segment that I fetch seems to be missing a part when stored on the
> filesystem. The stranger thing is it is always the same part (very
> reproducible).

This sounds strange.  Are the datanode errors always on the same host? 
How many hosts are you running this on?

Doug

Re: mapred bug -- bad part calculation?

Posted by Rod Taylor <rb...@sitesell.com>.
I forgot to provide this earlier.  Here is nutch ndfs -ls output for the
directory structure of a segment with a failed part-00013.

[rbt@sbider5 ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133
051103 162002 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051103 162003 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051103 162003 No FS indicated, using
default:master1.sitebuildit.com:5466
051103 162003 Client connection to 192.168.100.15:5466: starting
Found 6 items
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content  <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch      <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse      <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text       <dir>
[rbt@sbider5 ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content
051103 162010 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051103 162011 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051103 162011 No FS indicated, using
default:master1.sitebuildit.com:5466
051103 162011 Client connection to 192.168.100.15:5466: starting
Found 20 items
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00000       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00001       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00002       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00003       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00004       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00005       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00006       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00007       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00008       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00009       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00010       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00011       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00012       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00013       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00014       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00015       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00016       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00017       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00018       <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00019       <dir>
[rbt@sbider5 ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00012
051103 162017 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051103 162017 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051103 162017 No FS indicated, using
default:master1.sitebuildit.com:5466
051103 162017 Client connection to 192.168.100.15:5466: starting
Found 2 items
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00012/data  439524693
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00012/index 56208
[rbt@sbider5 ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00013
051103 162019 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051103 162019 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051103 162019 No FS indicated, using
default:master1.sitebuildit.com:5466
051103 162020 Client connection to 192.168.100.15:5466: starting
Found 0 items
[rbt@sbider5 ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00014
051103 162021 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051103 162022 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051103 162022 No FS indicated, using
default:master1.sitebuildit.com:5466
051103 162022 Client connection to 192.168.100.15:5466: starting
Found 2 items
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00014/data  440339945
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/content/part-00014/index 56183
[rbt@sbider5 ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch
051103 162033 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051103 162034 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051103 162034 No FS indicated, using
default:master1.sitebuildit.com:5466
051103 162034 Client connection to 192.168.100.15:5466: starting
Found 20 items
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00000   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00001   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00002   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00003   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00004   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00005   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00006   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00007   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00008   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00009   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00010   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00011   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00012   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00013   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00014   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00015   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00016   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00017   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00018   <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00019   <dir>
[rbt@sbider5 ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00013
051103 162039 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051103 162039 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051103 162039 No FS indicated, using
default:master1.sitebuildit.com:5466
051103 162039 Client connection to 192.168.100.15:5466: starting
Found 0 items
[rbt@sbider5 ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00012
051103 162041 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051103 162041 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051103 162042 No FS indicated, using
default:master1.sitebuildit.com:5466
051103 162042 Client connection to 192.168.100.15:5466: starting
Found 2 items
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00012/data      8784520
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00012/index     56208
[rbt@sbider5 ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00014
051103 162043 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051103 162043 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051103 162044 No FS indicated, using
default:master1.sitebuildit.com:5466
051103 162044 Client connection to 192.168.100.15:5466: starting
Found 2 items
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00014/data      8788470
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_fetch/part-00014/index     56183
[rbt@sbider5 ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate
051103 162055 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051103 162055 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051103 162055 No FS indicated, using
default:master1.sitebuildit.com:5466
051103 162055 Client connection to 192.168.100.15:5466: starting
Found 20 items
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00000        9531698
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00001        9684746
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00002        9762019
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00003        9715727
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00004        9518134
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00005        9676499
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00006        9722801
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00007        9715404
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00008        9514007
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00009        9668149
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00010        9649085
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00011        9726466
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00012        9534012
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00013        9744911
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00014        9694646
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00015        9652845
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00016        9505674
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00017        9700052
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00018        9714650
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_generate/part-00019        9714743
[rbt@sbider5 ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse
051103 162108 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051103 162109 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051103 162109 No FS indicated, using
default:master1.sitebuildit.com:5466
051103 162109 Client connection to 192.168.100.15:5466: starting
Found 19 items
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse/part-00000   155306656
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse/part-00001   163093258
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse/part-00002   155290671
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse/part-00003   163551019
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse/part-00004   156198582
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse/part-00005   163963632
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse/part-00006   155873286
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse/part-00007   162752185
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse/part-00008   155215446
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse/part-00009   163084991
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse/part-00010   154982905
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse/part-00011   164212118
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse/part-00012   154450623
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse/part-00014   155279291
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse/part-00015   163724449
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse/part-00016   154542758
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse/part-00017   162865027
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse/part-00018   154375952
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/crawl_parse/part-00019   162991584
[rbt@sbider5 ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data
051103 162121 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051103 162122 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051103 162122 No FS indicated, using
default:master1.sitebuildit.com:5466
051103 162122 Client connection to 192.168.100.15:5466: starting
Found 20 items
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00000    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00001    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00002    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00003    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00004    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00005    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00006    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00007    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00008    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00009    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00010    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00011    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00012    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00013    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00014    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00015    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00016    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00017    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00018    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00019    <dir>
[rbt@sbider5 ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00012
051103 162127 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051103 162127 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051103 162127 No FS indicated, using
default:master1.sitebuildit.com:5466
051103 162127 Client connection to 192.168.100.15:5466: starting
Found 2 items
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00012/data       128385655
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00012/index      56509
[rbt@sbider5 ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00013
051103 162129 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051103 162129 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051103 162129 No FS indicated, using
default:master1.sitebuildit.com:5466
051103 162129 Client connection to 192.168.100.15:5466: starting
Found 0 items
[rbt@sbider5 ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00014
051103 162131 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051103 162131 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051103 162131 No FS indicated, using
default:master1.sitebuildit.com:5466
051103 162131 Client connection to 192.168.100.15:5466: starting
Found 2 items
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00014/data       128731018
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_data/part-00014/index      55566
[rbt@sbider5 ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text
051103 162139 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051103 162140 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051103 162140 No FS indicated, using
default:master1.sitebuildit.com:5466
051103 162140 Client connection to 192.168.100.15:5466: starting
Found 20 items
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00000    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00001    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00002    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00003    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00004    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00005    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00006    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00007    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00008    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00009    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00010    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00011    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00012    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00013    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00014    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00015    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00016    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00017    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00018    <dir>
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00019    <dir>
[rbt@sbider5 ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00012
051103 162145 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051103 162145 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051103 162145 No FS indicated, using
default:master1.sitebuildit.com:5466
051103 162145 Client connection to 192.168.100.15:5466: starting
Found 2 items
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00012/data       111853821
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00012/index      56509
[rbt@sbider5 ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00013
051103 162147 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051103 162147 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051103 162147 No FS indicated, using
default:master1.sitebuildit.com:5466
051103 162147 Client connection to 192.168.100.15:5466: starting
Found 0 items
[rbt@sbider5 ~]$ /opt/nutch/bin/nutch ndfs
-ls /opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00014
051103 162149 parsing file:/opt/nutch-0.8_7/conf/nutch-default.xml
051103 162149 parsing file:/opt/nutch-0.8_7/conf/nutch-site.xml
051103 162149 No FS indicated, using
default:master1.sitebuildit.com:5466
051103 162149 Client connection to 192.168.100.15:5466: starting
Found 2 items
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00014/data       111121278
/opt/sitesell/sbider_data/nutch/segments/20051102031132/20051102031133/parse_text/part-00014/index      55566



On Thu, 2005-11-03 at 15:32 -0500, Rod Taylor wrote:
> Sources are from October 31st. Sun Standard Edition 1.5.0_02-b09 for
> amd64
> 
> Every segment that I fetch seems to be missing a part when stored on the
> filesystem. The stranger thing is it is always the same part (very
> reproducible).
> 
> If I have mapred.reduce.tasks set to 20, the hole is at part 13. That
> is, the part-00013 directory is empty while the remainder (0 through 12,
> 14 through 19) all have data.
> 
> If I have mapred.reduce.tasks set to 19, the hole is at part 11.
> content/part-00011 is empty.
> 
> Attached are my site configuration (reduce.tasks is 19), task log for a
> failing task and the output from the job tracker.
> 
> Below is a snippet from the datanode log (the only errors that exist are
> related to this task or others which process the above part #) and below
> that the output from localhost:7845 on the jobtracker machine for the
> job.
> 
> java.net.SocketTimeoutException: Read timed out
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at
> java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>         at
> java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
>         at
> java.io.BufferedInputStream.read(BufferedInputStream.java:313)
>         at java.io.DataInputStream.read(DataInputStream.java:134)
>         at org.apache.nutch.ndfs.DataNode
> $DataXceiver.run(DataNode.java:369)
>         at java.lang.Thread.run(Thread.java:595)
> java.net.SocketTimeoutException: Read timed out
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(SocketInputStream.java:129)
>         at
> java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>         at
> java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
>         at
> java.io.BufferedInputStream.read(BufferedInputStream.java:313)
>         at java.io.DataInputStream.read(DataInputStream.java:134)
>         at org.apache.nutch.ndfs.DataNode
> $DataXceiver.run(DataNode.java:369)
>         at java.lang.Thread.run(Thread.java:595)
> 
> 
>                                                 Job 'job_k1p80p'
> 
>    Job File: /home/sitesell/system/submit_2pgex8/job.xml
>    Start time: Thu Nov 03 12:04:43 EST 2005
>    The job failed at: Thu Nov 03 16:00:42 EST 2005
> 
> __________________________________________________________________________________________________
> 
> Map Tasks
> 
>         Map Task Id  Pct Complete State
> Diagnostic Text
>        task_m_2m1twe 1.0          103189 pages, 5045 errors, 13.1
> pages/s, 1000 kb/s,
>        task_m_4nzguk 1.0          103141 pages, 5193 errors, 12.9
> pages/s, 988 kb/s,
>        task_m_5aprs2 1.0          103427 pages, 4756 errors, 13.4
> pages/s, 1027 kb/s,
>        task_m_6pd5q7 1.0          102650 pages, 5081 errors, 12.6
> pages/s, 962 kb/s,
>        task_m_8qzj8p 1.0          103610 pages, 4539 errors, 13.6
> pages/s, 1039 kb/s,
>        task_m_aev1di 1.0          102666 pages, 4997 errors, 13.2
> pages/s, 1007 kb/s,
>        task_m_f2zfyw 1.0          103235 pages, 4662 errors, 13.6
> pages/s, 1045 kb/s,
>        task_m_f84hfi 1.0          103746 pages, 4657 errors, 13.0
> pages/s, 991 kb/s,
>        task_m_hhv9b9 1.0          102909 pages, 4972 errors, 13.5
> pages/s, 1026 kb/s,
>        task_m_kijqqx 1.0          103439 pages, 4858 errors, 13.4
> pages/s, 1024 kb/s,
>        task_m_n5mxax 1.0          102894 pages, 4953 errors, 13.3
> pages/s, 1017 kb/s,
>        task_m_p45m8c 1.0          103705 pages, 4969 errors, 13.1
> pages/s, 1007 kb/s,
>        task_m_qfevss 1.0          102640 pages, 5006 errors, 13.2
> pages/s, 1011 kb/s,
>        task_m_qg3816 1.0          103658 pages, 5039 errors, 13.3
> pages/s, 1014 kb/s,
>        task_m_rlxmuw 1.0          103609 pages, 4491 errors, 13.6
> pages/s, 1038 kb/s,
>        task_m_t9ksdc 1.0          103053 pages, 5287 errors, 12.9
> pages/s, 994 kb/s,
>        task_m_wt3oyf 1.0          103006 pages, 5168 errors, 13.3
> pages/s, 1014 kb/s,
>        task_m_xk3gxz 1.0          103294 pages, 5216 errors, 13.0
> pages/s, 996 kb/s,
>        task_m_yjrejy 1.0          103158 pages, 4787 errors, 13.5
> pages/s, 1038 kb/s,
> 
> __________________________________________________________________________________________________
> 
>    Reduce Task Id Pct Complete State Diagnostic Text
>    task_r_2ktith 1.0 reduce > reduce
>    task_r_6hwvi0 1.0 reduce > reduce
>    task_r_8bi6h5 1.0 reduce > reduce
>    task_r_bpisbi 1.0 reduce > reduce
>    task_r_cfoo7z 1.0 reduce > reduce
>    task_r_cmy1r3 1.0 reduce > reduce
>    task_r_efnd4k 1.0 reduce > reduce
>    task_r_ervlp5 1.0 reduce > reduce
>    task_r_kvmno7 1.0 reduce > reduce
>    task_r_n4q36e 1.0 reduce > reduce
>    task_r_o4st5w 1.0 reduce > reduce
>    task_r_ow0sul 1.0 reduce > reduce
>    task_r_r7u152 1.0 reduce > reduce
>    task_r_ra99xx 1.0 reduce > reduce
>    task_r_ush85v 1.0 reduce > reduce
>    task_r_vbmkfw 1.0 reduce > reduce
>    task_r_wbirax 1.0 reduce > reduce
>    task_r_z17yss 1.0 reduce > reduce
>    task_r_o9mv91 0.9153447 reduce > reduce Timed
> out.java.io.IOException: Task process exit with nonzero status.
>    at org.apache.nutch.mapred.TaskRunner.runChild(TaskRunner.java:139)
> at
>    org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92) Timed
> out.java.io.IOException: Task process exit
>    with    nonzero    status.   at
> org.apache.nutch.mapred.TaskRunner.runChild(TaskRunner.java:139)   at
>    org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92) Timed
> out.java.io.IOException: Task process exit
>    with    nonzero    status.   at
> org.apache.nutch.mapred.TaskRunner.runChild(TaskRunner.java:139)   at
>    org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92) Timed
> out.java.io.IOException: Task process exit
>    with    nonzero    status.   at
> org.apache.nutch.mapred.TaskRunner.runChild(TaskRunner.java:139)   at
>    org.apache.nutch.mapred.TaskRunner.run(TaskRunner.java:92)
> 
> 
-- 
Rod Taylor <rb...@sitesell.com>