You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Yoram Arnon (JIRA)" <ji...@apache.org> on 2006/03/21 18:10:02 UTC

[jira] Created: (HADOOP-96) name server should log decisions that affect data: block creation, removal, replication

name server should log decisions that affect data: block creation, removal, replication
---------------------------------------------------------------------------------------

         Key: HADOOP-96
         URL: http://issues.apache.org/jira/browse/HADOOP-96
     Project: Hadoop
        Type: Improvement
  Components: dfs  
    Versions: 0.1    
    Reporter: Yoram Arnon
    Priority: Critical


currently, there's no way to analyze and debug DFS errors where blocks disapear.
name server should log its decisions that affect data, including block creation, removal, replication:
- block <b> created, assigned to datanodes A, B, ...
- datanode A dead, block <b> underreplicated(1), replicating to datanode C
- datanode B dead, block <b> underreplicated(2), replicating to datanode D
- datanode A alive, block <b> overreplicated, removing from datanode D
- block <removed> from datanodes C, D, ...

that will enable me to track down, two weeks later, a block that's missing from a file, and to debug the name server.

extra credit:
- rotate log file, as it might grow large
- make this behaviour optional/configurable

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-96) name server should log decisions that affect data: block creation, removal, replication

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-96?page=comments#action_12372591 ] 

Doug Cutting commented on HADOOP-96:
------------------------------------

This sounds like a great plan!

Hadoop's current log formatter has an option to log thread id's.  The namenode code can simply turn this on.   Also, one can configure the JVM to rotate logs using:

http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/FileHandler.html

In any case, we should add a timestamp to the log file names generated by bin/hadoop-daemon.sh, where standard out and error are logged, regardless of JVM log configuration.

> name server should log decisions that affect data: block creation, removal, replication
> ---------------------------------------------------------------------------------------
>
>          Key: HADOOP-96
>          URL: http://issues.apache.org/jira/browse/HADOOP-96
>      Project: Hadoop
>         Type: Improvement
>   Components: dfs
>     Versions: 0.1
>     Reporter: Yoram Arnon
>     Priority: Critical

>
> currently, there's no way to analyze and debug DFS errors where blocks disapear.
> name server should log its decisions that affect data, including block creation, removal, replication:
> - block <b> created, assigned to datanodes A, B, ...
> - datanode A dead, block <b> underreplicated(1), replicating to datanode C
> - datanode B dead, block <b> underreplicated(2), replicating to datanode D
> - datanode A alive, block <b> overreplicated, removing from datanode D
> - block <removed> from datanodes C, D, ...
> that will enable me to track down, two weeks later, a block that's missing from a file, and to debug the name server.
> extra credit:
> - rotate log file, as it might grow large
> - make this behaviour optional/configurable

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-96) name server should log decisions that affect data: block creation, removal, replication

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-96?page=all ]

Doug Cutting updated HADOOP-96:
-------------------------------

    Fix Version: 0.2

> name server should log decisions that affect data: block creation, removal, replication
> ---------------------------------------------------------------------------------------
>
>          Key: HADOOP-96
>          URL: http://issues.apache.org/jira/browse/HADOOP-96
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.1.0
>     Reporter: Yoram Arnon
>     Assignee: Hairong Kuang
>     Priority: Critical
>      Fix For: 0.2

>
> currently, there's no way to analyze and debug DFS errors where blocks disapear.
> name server should log its decisions that affect data, including block creation, removal, replication:
> - block <b> created, assigned to datanodes A, B, ...
> - datanode A dead, block <b> underreplicated(1), replicating to datanode C
> - datanode B dead, block <b> underreplicated(2), replicating to datanode D
> - datanode A alive, block <b> overreplicated, removing from datanode D
> - block <removed> from datanodes C, D, ...
> that will enable me to track down, two weeks later, a block that's missing from a file, and to debug the name server.
> extra credit:
> - rotate log file, as it might grow large
> - make this behaviour optional/configurable

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-96) name server should log decisions that affect data: block creation, removal, replication

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-96?page=comments#action_12377261 ] 

Hairong Kuang commented on HADOOP-96:
-------------------------------------

Yes, it makes sense to log the log file name. I will make the change. If HADOOP_LOG_DIR is set, the log directory is set to be HADOOP_LOG_DIR. Otherwise, if HADOOP_HOME is set,  the log directory is set to be HADOOP_HOME/logs. Otherwise, it is set to be the user's home directory. I do not know why you saw log files in HADOOP_HOME. I will take a look.

> name server should log decisions that affect data: block creation, removal, replication
> ---------------------------------------------------------------------------------------
>
>          Key: HADOOP-96
>          URL: http://issues.apache.org/jira/browse/HADOOP-96
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.1.0
>     Reporter: Yoram Arnon
>     Assignee: Hairong Kuang
>     Priority: Critical
>      Fix For: 0.2
>  Attachments: namenodeLogging.patch
>
> currently, there's no way to analyze and debug DFS errors where blocks disapear.
> name server should log its decisions that affect data, including block creation, removal, replication:
> - block <b> created, assigned to datanodes A, B, ...
> - datanode A dead, block <b> underreplicated(1), replicating to datanode C
> - datanode B dead, block <b> underreplicated(2), replicating to datanode D
> - datanode A alive, block <b> overreplicated, removing from datanode D
> - block <removed> from datanodes C, D, ...
> that will enable me to track down, two weeks later, a block that's missing from a file, and to debug the name server.
> extra credit:
> - rotate log file, as it might grow large
> - make this behaviour optional/configurable

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-96) name server should log decisions that affect data: block creation, removal, replication

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-96?page=comments#action_12378116 ] 

Doug Cutting commented on HADOOP-96:
------------------------------------

I figured out the problem under Cygwin & fixed it.  HADOOP_HOME wasn't quoted in hadoop-daemon.sh, so things failed when the path contained a space (as my HADOOP_HOME does under cygwin).  When testing under Cygwin one should always install in a path that contains a space to test these cases.

> name server should log decisions that affect data: block creation, removal, replication
> ---------------------------------------------------------------------------------------
>
>          Key: HADOOP-96
>          URL: http://issues.apache.org/jira/browse/HADOOP-96
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.1.0
>     Reporter: Yoram Arnon
>     Assignee: Hairong Kuang
>     Priority: Critical
>      Fix For: 0.2
>  Attachments: namenodeLogging.patch, namenodeLogging.patch, namenodeLogging.patch
>
> currently, there's no way to analyze and debug DFS errors where blocks disapear.
> name server should log its decisions that affect data, including block creation, removal, replication:
> - block <b> created, assigned to datanodes A, B, ...
> - datanode A dead, block <b> underreplicated(1), replicating to datanode C
> - datanode B dead, block <b> underreplicated(2), replicating to datanode D
> - datanode A alive, block <b> overreplicated, removing from datanode D
> - block <removed> from datanodes C, D, ...
> that will enable me to track down, two weeks later, a block that's missing from a file, and to debug the name server.
> extra credit:
> - rotate log file, as it might grow large
> - make this behaviour optional/configurable

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-96) name server should log decisions that affect data: block creation, removal, replication

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-96?page=all ]

Hairong Kuang updated HADOOP-96:
--------------------------------

    Attachment: namenodeLogging.patch

The Cygwin problem is solved.

> name server should log decisions that affect data: block creation, removal, replication
> ---------------------------------------------------------------------------------------
>
>          Key: HADOOP-96
>          URL: http://issues.apache.org/jira/browse/HADOOP-96
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.1.0
>     Reporter: Yoram Arnon
>     Assignee: Hairong Kuang
>     Priority: Critical
>      Fix For: 0.2
>  Attachments: namenodeLogging.patch, namenodeLogging.patch, namenodeLogging.patch
>
> currently, there's no way to analyze and debug DFS errors where blocks disapear.
> name server should log its decisions that affect data, including block creation, removal, replication:
> - block <b> created, assigned to datanodes A, B, ...
> - datanode A dead, block <b> underreplicated(1), replicating to datanode C
> - datanode B dead, block <b> underreplicated(2), replicating to datanode D
> - datanode A alive, block <b> overreplicated, removing from datanode D
> - block <removed> from datanodes C, D, ...
> that will enable me to track down, two weeks later, a block that's missing from a file, and to debug the name server.
> extra credit:
> - rotate log file, as it might grow large
> - make this behaviour optional/configurable

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-96) name server should log decisions that affect data: block creation, removal, replication

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-96?page=all ]

Hairong Kuang updated HADOOP-96:
--------------------------------

    Attachment: namenodeLogging.patch

This patch adds the following features
1. NameNode adds a static field "stateChangeLog" that keeps tracks of all the namenode state changes
2. Various logging statements are added to NameNode, FSNamesystem, FSDirectory. Basically namesapce (dir) changes are logged at the fine level and block changes are logged at the finer level
3. initFileHandler is added to LogFormatter. All logs are directed to a log file instead of stdout. All logs are rolled and capped in size. Log file names end with .log.
4. In hadoop-daemon.sh, stdout is redirected to a .out file.
5. namenode logging levels, log file max size, and number of generations are configurable.
6. A JUnit test program is added to test the correctness of namespace change logging.

> name server should log decisions that affect data: block creation, removal, replication
> ---------------------------------------------------------------------------------------
>
>          Key: HADOOP-96
>          URL: http://issues.apache.org/jira/browse/HADOOP-96
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.1.0
>     Reporter: Yoram Arnon
>     Assignee: Hairong Kuang
>     Priority: Critical
>      Fix For: 0.2
>  Attachments: namenodeLogging.patch
>
> currently, there's no way to analyze and debug DFS errors where blocks disapear.
> name server should log its decisions that affect data, including block creation, removal, replication:
> - block <b> created, assigned to datanodes A, B, ...
> - datanode A dead, block <b> underreplicated(1), replicating to datanode C
> - datanode B dead, block <b> underreplicated(2), replicating to datanode D
> - datanode A alive, block <b> overreplicated, removing from datanode D
> - block <removed> from datanodes C, D, ...
> that will enable me to track down, two weeks later, a block that's missing from a file, and to debug the name server.
> extra credit:
> - rotate log file, as it might grow large
> - make this behaviour optional/configurable

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Re: [jira] Assigned: (HADOOP-96) name server should log decisions that affect data: block creation, removal, replication

Posted by Eric Baldeschwieler <er...@yahoo-inc.com>.

do we have an easy way of loading the some of the name nodes (and  
data nodes) local directories into HDFs space?  This is an easy trick  
that would make it trivial to apply map-reduce to distributed log  
grepping.  Should we add that?  Had good success with this sort of  
thing with previous systems.

EG.

hdfs://blah:6666/SYSTEM/namenodes/<host:port>/logs/

just points allows one to HDFS browse the logs directory of a name  
node.  Can then use the plan-9 trick of putting all kinds of API info  
into "files" on the data nodes (be they real files or CGI equivalents).

Should I file a bug on this?


On Mar 30, 2006, at 4:08 PM, Yoram Arnon (JIRA) wrote:

>      [ http://issues.apache.org/jira/browse/HADOOP-96?page=all ]
>
> Yoram Arnon reassigned HADOOP-96:
> ---------------------------------
>
>     Assign To: Hairong Kuang
>
>> name server should log decisions that affect data: block creation,  
>> removal, replication
>> --------------------------------------------------------------------- 
>> ------------------
>>
>>          Key: HADOOP-96
>>          URL: http://issues.apache.org/jira/browse/HADOOP-96
>>      Project: Hadoop
>>         Type: Improvement
>>   Components: dfs
>>     Versions: 0.1
>>     Reporter: Yoram Arnon
>>     Assignee: Hairong Kuang
>>     Priority: Critical
>
>>
>> currently, there's no way to analyze and debug DFS errors where  
>> blocks disapear.
>> name server should log its decisions that affect data, including  
>> block creation, removal, replication:
>> - block <b> created, assigned to datanodes A, B, ...
>> - datanode A dead, block <b> underreplicated(1), replicating to  
>> datanode C
>> - datanode B dead, block <b> underreplicated(2), replicating to  
>> datanode D
>> - datanode A alive, block <b> overreplicated, removing from  
>> datanode D
>> - block <removed> from datanodes C, D, ...
>> that will enable me to track down, two weeks later, a block that's  
>> missing from a file, and to debug the name server.
>> extra credit:
>> - rotate log file, as it might grow large
>> - make this behaviour optional/configurable
>
> -- 
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the  
> administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira
>

[jira] Assigned: (HADOOP-96) name server should log decisions that affect data: block creation, removal, replication

Posted by "Yoram Arnon (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-96?page=all ]

Yoram Arnon reassigned HADOOP-96:
---------------------------------

    Assign To: Hairong Kuang

> name server should log decisions that affect data: block creation, removal, replication
> ---------------------------------------------------------------------------------------
>
>          Key: HADOOP-96
>          URL: http://issues.apache.org/jira/browse/HADOOP-96
>      Project: Hadoop
>         Type: Improvement
>   Components: dfs
>     Versions: 0.1
>     Reporter: Yoram Arnon
>     Assignee: Hairong Kuang
>     Priority: Critical

>
> currently, there's no way to analyze and debug DFS errors where blocks disapear.
> name server should log its decisions that affect data, including block creation, removal, replication:
> - block <b> created, assigned to datanodes A, B, ...
> - datanode A dead, block <b> underreplicated(1), replicating to datanode C
> - datanode B dead, block <b> underreplicated(2), replicating to datanode D
> - datanode A alive, block <b> overreplicated, removing from datanode D
> - block <removed> from datanodes C, D, ...
> that will enable me to track down, two weeks later, a block that's missing from a file, and to debug the name server.
> extra credit:
> - rotate log file, as it might grow large
> - make this behaviour optional/configurable

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-96) name server should log decisions that affect data: block creation, removal, replication

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-96?page=comments#action_12377255 ] 

Doug Cutting commented on HADOOP-96:
------------------------------------

Overall I like this.  A few nits, however:

1. It would be nice if, before it switches logging to a file, the name of that file were logged.  That way, when folks upgrade they can figure out where there logs are written.  This will also help in debugging configuration issues.

2. When I run bin/start-all.sh on my Linux box, the log files end up in my connected directory, in HADOOP_HOME, not in HADOOP_HOME/logs.  When I add a print statement, it shows the correct directory for logDir, but that's not where the files are written.  I have not tested this on Windows, but it would be good to check that it with a simple configuration (i.e., a hadoop-site.xml that only specifies localhost for the jobtracker and namenode) that, on Windows and Linux, the files are written where expected.  Perhaps we could even add a unit test for this.

Thanks!

> name server should log decisions that affect data: block creation, removal, replication
> ---------------------------------------------------------------------------------------
>
>          Key: HADOOP-96
>          URL: http://issues.apache.org/jira/browse/HADOOP-96
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.1.0
>     Reporter: Yoram Arnon
>     Assignee: Hairong Kuang
>     Priority: Critical
>      Fix For: 0.2
>  Attachments: namenodeLogging.patch
>
> currently, there's no way to analyze and debug DFS errors where blocks disapear.
> name server should log its decisions that affect data, including block creation, removal, replication:
> - block <b> created, assigned to datanodes A, B, ...
> - datanode A dead, block <b> underreplicated(1), replicating to datanode C
> - datanode B dead, block <b> underreplicated(2), replicating to datanode D
> - datanode A alive, block <b> overreplicated, removing from datanode D
> - block <removed> from datanodes C, D, ...
> that will enable me to track down, two weeks later, a block that's missing from a file, and to debug the name server.
> extra credit:
> - rotate log file, as it might grow large
> - make this behaviour optional/configurable

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-96) name server should log decisions that affect data: block creation, removal, replication

Posted by "eric baldeschwieler (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-96?page=comments#action_12372603 ] 

eric baldeschwieler commented on HADOOP-96:
-------------------------------------------

One thing that really helped us was to be able to specify the duration to keep log files.  So you could configure the system to keep up to N seconds of logs (think one month).  This way logs don't grow without bound, but you can be confident how much history will be available.  Also, logs gzip pretty well.  It would be nice to zip closed logs automatically.

Don't know how much of this you can get for free from the existing logs packages.  Should investigate this.

Are we just logging on the name node, or are data nodes logging all events too.  Seems like that would be desirable as well.  Using the same mechanisms of course.


> name server should log decisions that affect data: block creation, removal, replication
> ---------------------------------------------------------------------------------------
>
>          Key: HADOOP-96
>          URL: http://issues.apache.org/jira/browse/HADOOP-96
>      Project: Hadoop
>         Type: Improvement
>   Components: dfs
>     Versions: 0.1
>     Reporter: Yoram Arnon
>     Assignee: Hairong Kuang
>     Priority: Critical

>
> currently, there's no way to analyze and debug DFS errors where blocks disapear.
> name server should log its decisions that affect data, including block creation, removal, replication:
> - block <b> created, assigned to datanodes A, B, ...
> - datanode A dead, block <b> underreplicated(1), replicating to datanode C
> - datanode B dead, block <b> underreplicated(2), replicating to datanode D
> - datanode A alive, block <b> overreplicated, removing from datanode D
> - block <removed> from datanodes C, D, ...
> that will enable me to track down, two weeks later, a block that's missing from a file, and to debug the name server.
> extra credit:
> - rotate log file, as it might grow large
> - make this behaviour optional/configurable

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Resolved: (HADOOP-96) name server should log decisions that affect data: block creation, removal, replication

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-96?page=all ]
     
Doug Cutting resolved HADOOP-96:
--------------------------------

    Resolution: Fixed

I just committed this.

I made one additional change.  You removed the 'cd $HADOOP_HOME' line in hadoop-daemon.sh.  I re-added this to hadoop-daemons.sh, so that, when starting remote daemons, they are always run from HADOOP_HOME rather than the users home directory, which is more likely to be NFS mounted.  The CWD of a daemon is used for core dumps, java profiler output, etc. and it is generally best if it is not NFS mounted.

Thanks, Hairong!

> name server should log decisions that affect data: block creation, removal, replication
> ---------------------------------------------------------------------------------------
>
>          Key: HADOOP-96
>          URL: http://issues.apache.org/jira/browse/HADOOP-96
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.1.0
>     Reporter: Yoram Arnon
>     Assignee: Hairong Kuang
>     Priority: Critical
>      Fix For: 0.2
>  Attachments: namenodeLogging.patch, namenodeLogging.patch, namenodeLogging.patch
>
> currently, there's no way to analyze and debug DFS errors where blocks disapear.
> name server should log its decisions that affect data, including block creation, removal, replication:
> - block <b> created, assigned to datanodes A, B, ...
> - datanode A dead, block <b> underreplicated(1), replicating to datanode C
> - datanode B dead, block <b> underreplicated(2), replicating to datanode D
> - datanode A alive, block <b> overreplicated, removing from datanode D
> - block <removed> from datanodes C, D, ...
> that will enable me to track down, two weeks later, a block that's missing from a file, and to debug the name server.
> extra credit:
> - rotate log file, as it might grow large
> - make this behaviour optional/configurable

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-96) name server should log decisions that affect data: block creation, removal, replication

Posted by "Yoram Arnon (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-96?page=comments#action_12372589 ] 

Yoram Arnon commented on HADOOP-96:
-----------------------------------

the plan is to add a log line for each change in the name space and each change in block placement or replication. What we get is effectively a trace of program execution for DFS changes.
the log will go to a new log object, to enable switching this (extensive) logging on or off.
name space changes will be logged at level fine, block commit changes at finer, and block pending changes at finest.
In order to facilitate tracing of multiple concurrent operations, each line will include the thread id of the name server's thread. For that we derive a logging class, that places the thread id right after the date/time.

we log in the following methods of class name node, and in methods of class nameSystem called by them:
create (startFile)
abandonFileInProgress (abandonFileInProgress )
AbandonBlock (AbandonBlock )
reportWrittenBlock (blockReceived)
addBlock (getAdditionalBlock)
Complete (completeFile)
rename (renameTo)
delete (delete)
Mkdirs (Mkdirs)
sendHeartbeat (getHeartbeat)
blockReport (processReoprt)
blockReceived (blockReceived)
errorReport
getBlockWork (pendingTransfer, blocksToInvalidate)


> name server should log decisions that affect data: block creation, removal, replication
> ---------------------------------------------------------------------------------------
>
>          Key: HADOOP-96
>          URL: http://issues.apache.org/jira/browse/HADOOP-96
>      Project: Hadoop
>         Type: Improvement
>   Components: dfs
>     Versions: 0.1
>     Reporter: Yoram Arnon
>     Priority: Critical

>
> currently, there's no way to analyze and debug DFS errors where blocks disapear.
> name server should log its decisions that affect data, including block creation, removal, replication:
> - block <b> created, assigned to datanodes A, B, ...
> - datanode A dead, block <b> underreplicated(1), replicating to datanode C
> - datanode B dead, block <b> underreplicated(2), replicating to datanode D
> - datanode A alive, block <b> overreplicated, removing from datanode D
> - block <removed> from datanodes C, D, ...
> that will enable me to track down, two weeks later, a block that's missing from a file, and to debug the name server.
> extra credit:
> - rotate log file, as it might grow large
> - make this behaviour optional/configurable

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-96) name server should log decisions that affect data: block creation, removal, replication

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-96?page=comments#action_12378099 ] 

Hairong Kuang commented on HADOOP-96:
-------------------------------------

Hi Doug,

Thanks for commiting this patch.

The reason that I removed the line 'cd $HADOOP_HOME' is that I had difficulty starting hadoop from any directory except for "HADOOP_HOME". In my configuration, the log dir & pid dir are relative to "HADOOP_HOME".  But "HADOOP_HOME" is relative to the current directory ".". If the script changes the current directory, it is not able to get the log dir and pid dir right.

An alternative fix is to set HADOOP_HOME to be its absolute path.

> name server should log decisions that affect data: block creation, removal, replication
> ---------------------------------------------------------------------------------------
>
>          Key: HADOOP-96
>          URL: http://issues.apache.org/jira/browse/HADOOP-96
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.1.0
>     Reporter: Yoram Arnon
>     Assignee: Hairong Kuang
>     Priority: Critical
>      Fix For: 0.2
>  Attachments: namenodeLogging.patch, namenodeLogging.patch, namenodeLogging.patch
>
> currently, there's no way to analyze and debug DFS errors where blocks disapear.
> name server should log its decisions that affect data, including block creation, removal, replication:
> - block <b> created, assigned to datanodes A, B, ...
> - datanode A dead, block <b> underreplicated(1), replicating to datanode C
> - datanode B dead, block <b> underreplicated(2), replicating to datanode D
> - datanode A alive, block <b> overreplicated, removing from datanode D
> - block <removed> from datanodes C, D, ...
> that will enable me to track down, two weeks later, a block that's missing from a file, and to debug the name server.
> extra credit:
> - rotate log file, as it might grow large
> - make this behaviour optional/configurable

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-96) name server should log decisions that affect data: block creation, removal, replication

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-96?page=comments#action_12377257 ] 

Doug Cutting commented on HADOOP-96:
------------------------------------

Also, one of Owen's patches has made applying this patch require manual steps.  So if you make a new patch, please first update your tree and merge with Owen's changes.  Thanks!

> name server should log decisions that affect data: block creation, removal, replication
> ---------------------------------------------------------------------------------------
>
>          Key: HADOOP-96
>          URL: http://issues.apache.org/jira/browse/HADOOP-96
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.1.0
>     Reporter: Yoram Arnon
>     Assignee: Hairong Kuang
>     Priority: Critical
>      Fix For: 0.2
>  Attachments: namenodeLogging.patch
>
> currently, there's no way to analyze and debug DFS errors where blocks disapear.
> name server should log its decisions that affect data, including block creation, removal, replication:
> - block <b> created, assigned to datanodes A, B, ...
> - datanode A dead, block <b> underreplicated(1), replicating to datanode C
> - datanode B dead, block <b> underreplicated(2), replicating to datanode D
> - datanode A alive, block <b> overreplicated, removing from datanode D
> - block <removed> from datanodes C, D, ...
> that will enable me to track down, two weeks later, a block that's missing from a file, and to debug the name server.
> extra credit:
> - rotate log file, as it might grow large
> - make this behaviour optional/configurable

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-96) name server should log decisions that affect data: block creation, removal, replication

Posted by "Yoram Arnon (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-96?page=comments#action_12372598 ] 

Yoram Arnon commented on HADOOP-96:
-----------------------------------

OK.
we'll use the current log formatter, and just turn on the output of thread id when we're debugging.


> name server should log decisions that affect data: block creation, removal, replication
> ---------------------------------------------------------------------------------------
>
>          Key: HADOOP-96
>          URL: http://issues.apache.org/jira/browse/HADOOP-96
>      Project: Hadoop
>         Type: Improvement
>   Components: dfs
>     Versions: 0.1
>     Reporter: Yoram Arnon
>     Priority: Critical

>
> currently, there's no way to analyze and debug DFS errors where blocks disapear.
> name server should log its decisions that affect data, including block creation, removal, replication:
> - block <b> created, assigned to datanodes A, B, ...
> - datanode A dead, block <b> underreplicated(1), replicating to datanode C
> - datanode B dead, block <b> underreplicated(2), replicating to datanode D
> - datanode A alive, block <b> overreplicated, removing from datanode D
> - block <removed> from datanodes C, D, ...
> that will enable me to track down, two weeks later, a block that's missing from a file, and to debug the name server.
> extra credit:
> - rotate log file, as it might grow large
> - make this behaviour optional/configurable

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-96) name server should log decisions that affect data: block creation, removal, replication

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-96?page=comments#action_12378108 ] 

Doug Cutting commented on HADOOP-96:
------------------------------------

I'm now having troubles with things under cygwin.  'bin/hadoop-daemon.sh start namenode' works, but 'bin/start-dfs.sh' silently fails to start any daemons.

> name server should log decisions that affect data: block creation, removal, replication
> ---------------------------------------------------------------------------------------
>
>          Key: HADOOP-96
>          URL: http://issues.apache.org/jira/browse/HADOOP-96
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.1.0
>     Reporter: Yoram Arnon
>     Assignee: Hairong Kuang
>     Priority: Critical
>      Fix For: 0.2
>  Attachments: namenodeLogging.patch, namenodeLogging.patch, namenodeLogging.patch
>
> currently, there's no way to analyze and debug DFS errors where blocks disapear.
> name server should log its decisions that affect data, including block creation, removal, replication:
> - block <b> created, assigned to datanodes A, B, ...
> - datanode A dead, block <b> underreplicated(1), replicating to datanode C
> - datanode B dead, block <b> underreplicated(2), replicating to datanode D
> - datanode A alive, block <b> overreplicated, removing from datanode D
> - block <removed> from datanodes C, D, ...
> that will enable me to track down, two weeks later, a block that's missing from a file, and to debug the name server.
> extra credit:
> - rotate log file, as it might grow large
> - make this behaviour optional/configurable

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-96) name server should log decisions that affect data: block creation, removal, replication

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-96?page=all ]

Hairong Kuang updated HADOOP-96:
--------------------------------

    Attachment: namenodeLogging.patch

Here is patch that includes the changes that Doug suggested. I tested it on a RH linux machine, it worked fine. But it seems that it has a problem on cygwin in that File.exists() returns true for a nonexistent directory. Doug, could you test it? If I am able to figure out what went wrong on cygwin, I will resubmit the patch tomorrow.

> name server should log decisions that affect data: block creation, removal, replication
> ---------------------------------------------------------------------------------------
>
>          Key: HADOOP-96
>          URL: http://issues.apache.org/jira/browse/HADOOP-96
>      Project: Hadoop
>         Type: Improvement

>   Components: dfs
>     Versions: 0.1.0
>     Reporter: Yoram Arnon
>     Assignee: Hairong Kuang
>     Priority: Critical
>      Fix For: 0.2
>  Attachments: namenodeLogging.patch, namenodeLogging.patch
>
> currently, there's no way to analyze and debug DFS errors where blocks disapear.
> name server should log its decisions that affect data, including block creation, removal, replication:
> - block <b> created, assigned to datanodes A, B, ...
> - datanode A dead, block <b> underreplicated(1), replicating to datanode C
> - datanode B dead, block <b> underreplicated(2), replicating to datanode D
> - datanode A alive, block <b> overreplicated, removing from datanode D
> - block <removed> from datanodes C, D, ...
> that will enable me to track down, two weeks later, a block that's missing from a file, and to debug the name server.
> extra credit:
> - rotate log file, as it might grow large
> - make this behaviour optional/configurable

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira