You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Paul Baclace (JIRA)" <ji...@apache.org> on 2005/12/27 03:52:30 UTC

[jira] Created: (NUTCH-152) TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small

TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small
------------------------------------------------------------------------------------------------------------

         Key: NUTCH-152
         URL: http://issues.apache.org/jira/browse/NUTCH-152
     Project: Nutch
        Type: Bug
  Components: fetcher  
    Versions: 0.8-dev    
 Environment: all
    Reporter: Paul Baclace


1. io pipes should be setDaemon(true) so that process cannot hang.
2. error messages for Exceptions are incomplete since e.getMessage() is used and it can be empty (NullPointerException has an empty message).   Change this to e.toString() which always has more meaning.
3. a separate thread is not used for the subprocess stdout pipe, but it must be a separate thread if setDaemon(true).
4. TaskRunner.kill()  does not stop the io pipe threads, but it should.
5. If InterruptedException occurs, it was assumed to be for the current (main) thread, but it should check this with Thread.interrupted() otherwise spurious thread interrupts will be rethrown as IOException.
6. A recent run had some Tasktracker child processes that ran out of heap.  The default max heap size should be larger.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-152) TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small

Posted by "Paul Baclace (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-152?page=comments#action_12362043 ] 

Paul Baclace commented on NUTCH-152:
------------------------------------

>re 3: Why is a separate thread needed for stdout? 

It certainly makes the code easier to read.  Using the main thread to read the subprocess stdout is a clever deviation from the usual idiom of using a separate thread.  

Programming defensively, being able to setDaemon(true) and interrupt() a separate thread eliminates any possibility that external, unexpected problems (bugs) will not cause a hang or resource leak.  

>re 4: I'd expect the io pipes to get EOF when the process is killed. 

If the subprocess is hanging in a device driver, it might not be killed in a timely fashion, so the EOF might not arrive immediately.  Rare, but not impossible.  


> TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small
> ------------------------------------------------------------------------------------------------------------
>
>          Key: NUTCH-152
>          URL: http://issues.apache.org/jira/browse/NUTCH-152
>      Project: Nutch
>         Type: Bug
>   Components: fetcher
>     Versions: 0.8-dev
>  Environment: all
>     Reporter: Paul Baclace
>  Attachments: TaskRunner.java.patch
>
> 1. io pipes should be setDaemon(true) so that process cannot hang.
> 2. error messages for Exceptions are incomplete since e.getMessage() is used and it can be empty (NullPointerException has an empty message).   Change this to e.toString() which always has more meaning.
> 3. a separate thread is not used for the subprocess stdout pipe, but it must be a separate thread if setDaemon(true).
> 4. TaskRunner.kill()  does not stop the io pipe threads, but it should.
> 5. If InterruptedException occurs, it was assumed to be for the current (main) thread, but it should check this with Thread.interrupted() otherwise spurious thread interrupts will be rethrown as IOException.
> 6. A recent run had some Tasktracker child processes that ran out of heap.  The default max heap size should be larger.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (NUTCH-152) TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small

Posted by "Paul Baclace (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/NUTCH-152?page=all ]

Paul Baclace updated NUTCH-152:
-------------------------------

    Attachment: TaskRunner.java.patch

The patch addresses each issue listed in the detailed description of this bug.  The detailed description is suitable as a source change comment.


> TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small
> ------------------------------------------------------------------------------------------------------------
>
>          Key: NUTCH-152
>          URL: http://issues.apache.org/jira/browse/NUTCH-152
>      Project: Nutch
>         Type: Bug
>   Components: fetcher
>     Versions: 0.8-dev
>  Environment: all
>     Reporter: Paul Baclace
>  Attachments: TaskRunner.java.patch
>
> 1. io pipes should be setDaemon(true) so that process cannot hang.
> 2. error messages for Exceptions are incomplete since e.getMessage() is used and it can be empty (NullPointerException has an empty message).   Change this to e.toString() which always has more meaning.
> 3. a separate thread is not used for the subprocess stdout pipe, but it must be a separate thread if setDaemon(true).
> 4. TaskRunner.kill()  does not stop the io pipe threads, but it should.
> 5. If InterruptedException occurs, it was assumed to be for the current (main) thread, but it should check this with Thread.interrupted() otherwise spurious thread interrupts will be rethrown as IOException.
> 6. A recent run had some Tasktracker child processes that ran out of heap.  The default max heap size should be larger.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (NUTCH-152) TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/NUTCH-152?page=comments#action_12362004 ] 

Doug Cutting commented on NUTCH-152:
------------------------------------

re 1,2,5: sounds good.
re 3: Why is a separate thread needed for stdout?  Can you please elaborate on how this causes problems?
re 4: I'd expect the io pipes to get EOF when the process is killed.  Is that not the case?
re 6: this is now in nutch-default.xml, tasks can override it, or it can be set in nutch-default.xml, so the value in this file has little importance.


> TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small
> ------------------------------------------------------------------------------------------------------------
>
>          Key: NUTCH-152
>          URL: http://issues.apache.org/jira/browse/NUTCH-152
>      Project: Nutch
>         Type: Bug
>   Components: fetcher
>     Versions: 0.8-dev
>  Environment: all
>     Reporter: Paul Baclace
>  Attachments: TaskRunner.java.patch
>
> 1. io pipes should be setDaemon(true) so that process cannot hang.
> 2. error messages for Exceptions are incomplete since e.getMessage() is used and it can be empty (NullPointerException has an empty message).   Change this to e.toString() which always has more meaning.
> 3. a separate thread is not used for the subprocess stdout pipe, but it must be a separate thread if setDaemon(true).
> 4. TaskRunner.kill()  does not stop the io pipe threads, but it should.
> 5. If InterruptedException occurs, it was assumed to be for the current (main) thread, but it should check this with Thread.interrupted() otherwise spurious thread interrupts will be rethrown as IOException.
> 6. A recent run had some Tasktracker child processes that ran out of heap.  The default max heap size should be larger.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira