You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (JIRA)" <ji...@apache.org> on 2014/05/04 22:21:16 UTC

[jira] [Resolved] (NUTCH-1182) fetcher to log hung threads

     [ https://issues.apache.org/jira/browse/NUTCH-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Nagel resolved NUTCH-1182.
------------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 2.4)
                   2.3

Committed to trunk r1592415 and 2.x r1592414.

> fetcher to log hung threads
> ---------------------------
>
>                 Key: NUTCH-1182
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1182
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.3, 1.4
>         Environment: Linux, local job runner
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 2.3, 1.9
>
>         Attachments: NUTCH-1182-2x.patch, NUTCH-1182-trunk-v1.patch, NUTCH-1182-v2.patch
>
>
> While crawling a slow server with a couple of very large PDF documents (30 MB) on it
> after some time and a bulk of successfully fetched documents the fetcher stops
> with the message: ??Aborting with 10 hung threads.??
> From now on every cycle ends with hung threads, almost no documents are fetched
> successfully. In addition, strange hadoop errors are logged:
> {noformat}
>    fetch of http://.../xyz.pdf failed with: java.lang.NullPointerException
>     at java.lang.System.arraycopy(Native Method)
>     at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1108)
>     ...
> {noformat}
> or
> {noformat}
>    Exception in thread "QueueFeeder" java.lang.NullPointerException
>          at org.apache.hadoop.fs.BufferedFSInputStream.getPos(BufferedFSInputStream.java:48)
>          at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:41)
>          at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:214)
> {noformat}
> I've run the debugger and found:
> # after the "hung threads" are reported the fetcher stops but the threads are still alive and continue fetching a document. In consequence, this will
> #* limit the small bandwidth of network/server even more
> #* after the document is fetched the thread tries to write the content via {{output.collect()}} which must fail because the fetcher map job is already finished and the associated temporary mapred directory is deleted. The error message may get mixed with the progress output of the next fetch cycle causing additional confusion.
> # documents/URLs causing the hung thread are never reported nor stored. That is, it's hard to track them down, and they will cause a hung thread again and again.
> The problem is reproducible when fetching bigger documents and setting {{mapred.task.timeout}} to a low value (this will definitely cause hung threads).



--
This message was sent by Atlassian JIRA
(v6.2#6252)