You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Konstantin Shvachko (JIRA)" <ji...@apache.org> on 2007/10/01 22:21:50 UTC

[jira] Created: (HADOOP-1978) Name-node should remove edits.new during startup rather than renaming it to edits.

Name-node should remove edits.new during startup rather than renaming it to edits.
----------------------------------------------------------------------------------

                 Key: HADOOP-1978
                 URL: https://issues.apache.org/jira/browse/HADOOP-1978
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.13.1
            Reporter: Konstantin Shvachko
            Priority: Blocker
             Fix For: 0.14.2, 0.15.0


Secondary name-node fails in the middle. The main name-node writes its journal transactions into edits.new at that time.
If the name-node is shut down after that and restarted, then loadFSImage() reads current image file, merges it with the edits
file and with the edits.new file.
Now saveFSImage() saves new image file, creates empty edits file, and then calls rollFSImage(), which particularly renames 
edits.new into edits. This is a mistake, during startup edits.new should be merely removed after merging it with the image.
The purpose of calling rollFSImage() during startups imho is to recover from an unsuccessful checkpoint. So an easy fix
is to empty edits.new before calling rollFSImage the same as edits are emptied, then rollFSImage will rename empty file
to empty which gives us the desired result.
We should fix this bug both in 0.14 and 0.15. I make it a blocker for 0.15.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1978) Name-node should remove edits.new during startup rather than renaming it to edits.

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1978:
-------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thanks Konstantin.

> Name-node should remove edits.new during startup rather than renaming it to edits.
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-1978
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1978
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.1
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.14.2, 0.15.0
>
>         Attachments: editsNew-0-14.patch, editsNew-0-15.patch
>
>
> Secondary name-node fails in the middle. The main name-node writes its journal transactions into edits.new at that time.
> If the name-node is shut down after that and restarted, then loadFSImage() reads current image file, merges it with the edits
> file and with the edits.new file.
> Now saveFSImage() saves new image file, creates empty edits file, and then calls rollFSImage(), which particularly renames 
> edits.new into edits. This is a mistake, during startup edits.new should be merely removed after merging it with the image.
> The purpose of calling rollFSImage() during startups imho is to recover from an unsuccessful checkpoint. So an easy fix
> is to empty edits.new before calling rollFSImage the same as edits are emptied, then rollFSImage will rename empty file
> to empty which gives us the desired result.
> We should fix this bug both in 0.14 and 0.15. I make it a blocker for 0.15.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1978) Name-node should remove edits.new during startup rather than renaming it to edits.

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531984 ] 

Hadoop QA commented on HADOOP-1978:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
http://issues.apache.org/jira/secure/attachment/12366970/editsNew-0-15.patch
against trunk revision r581427.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/871/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/871/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/871/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/871/console

This message is automatically generated.

> Name-node should remove edits.new during startup rather than renaming it to edits.
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-1978
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1978
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.1
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.14.2, 0.15.0
>
>         Attachments: editsNew-0-14.patch, editsNew-0-15.patch
>
>
> Secondary name-node fails in the middle. The main name-node writes its journal transactions into edits.new at that time.
> If the name-node is shut down after that and restarted, then loadFSImage() reads current image file, merges it with the edits
> file and with the edits.new file.
> Now saveFSImage() saves new image file, creates empty edits file, and then calls rollFSImage(), which particularly renames 
> edits.new into edits. This is a mistake, during startup edits.new should be merely removed after merging it with the image.
> The purpose of calling rollFSImage() during startups imho is to recover from an unsuccessful checkpoint. So an easy fix
> is to empty edits.new before calling rollFSImage the same as edits are emptied, then rollFSImage will rename empty file
> to empty which gives us the desired result.
> We should fix this bug both in 0.14 and 0.15. I make it a blocker for 0.15.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1978) Name-node should remove edits.new during startup rather than renaming it to edits.

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-1978:
----------------------------------------

    Attachment: editsNew-0-14.patch

This is the patch for 0.14.

> Name-node should remove edits.new during startup rather than renaming it to edits.
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-1978
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1978
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.1
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.14.2, 0.15.0
>
>         Attachments: editsNew-0-14.patch, editsNew-0-15.patch
>
>
> Secondary name-node fails in the middle. The main name-node writes its journal transactions into edits.new at that time.
> If the name-node is shut down after that and restarted, then loadFSImage() reads current image file, merges it with the edits
> file and with the edits.new file.
> Now saveFSImage() saves new image file, creates empty edits file, and then calls rollFSImage(), which particularly renames 
> edits.new into edits. This is a mistake, during startup edits.new should be merely removed after merging it with the image.
> The purpose of calling rollFSImage() during startups imho is to recover from an unsuccessful checkpoint. So an easy fix
> is to empty edits.new before calling rollFSImage the same as edits are emptied, then rollFSImage will rename empty file
> to empty which gives us the desired result.
> We should fix this bug both in 0.14 and 0.15. I make it a blocker for 0.15.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1978) Name-node should remove edits.new during startup rather than renaming it to edits.

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532407 ] 

Hudson commented on HADOOP-1978:
--------------------------------

Integrated in Hadoop-Nightly #260 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/260/])

> Name-node should remove edits.new during startup rather than renaming it to edits.
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-1978
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1978
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.1
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.14.2, 0.15.0
>
>         Attachments: editsNew-0-14.patch, editsNew-0-15.patch
>
>
> Secondary name-node fails in the middle. The main name-node writes its journal transactions into edits.new at that time.
> If the name-node is shut down after that and restarted, then loadFSImage() reads current image file, merges it with the edits
> file and with the edits.new file.
> Now saveFSImage() saves new image file, creates empty edits file, and then calls rollFSImage(), which particularly renames 
> edits.new into edits. This is a mistake, during startup edits.new should be merely removed after merging it with the image.
> The purpose of calling rollFSImage() during startups imho is to recover from an unsuccessful checkpoint. So an easy fix
> is to empty edits.new before calling rollFSImage the same as edits are emptied, then rollFSImage will rename empty file
> to empty which gives us the desired result.
> We should fix this bug both in 0.14 and 0.15. I make it a blocker for 0.15.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1978) Name-node should remove edits.new during startup rather than renaming it to edits.

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-1978:
----------------------------------------

    Status: Patch Available  (was: Open)

> Name-node should remove edits.new during startup rather than renaming it to edits.
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-1978
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1978
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.1
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.14.2, 0.15.0
>
>         Attachments: editsNew-0-14.patch, editsNew-0-15.patch
>
>
> Secondary name-node fails in the middle. The main name-node writes its journal transactions into edits.new at that time.
> If the name-node is shut down after that and restarted, then loadFSImage() reads current image file, merges it with the edits
> file and with the edits.new file.
> Now saveFSImage() saves new image file, creates empty edits file, and then calls rollFSImage(), which particularly renames 
> edits.new into edits. This is a mistake, during startup edits.new should be merely removed after merging it with the image.
> The purpose of calling rollFSImage() during startups imho is to recover from an unsuccessful checkpoint. So an easy fix
> is to empty edits.new before calling rollFSImage the same as edits are emptied, then rollFSImage will rename empty file
> to empty which gives us the desired result.
> We should fix this bug both in 0.14 and 0.15. I make it a blocker for 0.15.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1978) Name-node should remove edits.new during startup rather than renaming it to edits.

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-1978:
----------------------------------------

    Attachment:     (was: editsNew-0-15.patch)

> Name-node should remove edits.new during startup rather than renaming it to edits.
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-1978
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1978
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.1
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.14.2, 0.15.0
>
>         Attachments: editsNew-0-14.patch, editsNew-0-15.patch
>
>
> Secondary name-node fails in the middle. The main name-node writes its journal transactions into edits.new at that time.
> If the name-node is shut down after that and restarted, then loadFSImage() reads current image file, merges it with the edits
> file and with the edits.new file.
> Now saveFSImage() saves new image file, creates empty edits file, and then calls rollFSImage(), which particularly renames 
> edits.new into edits. This is a mistake, during startup edits.new should be merely removed after merging it with the image.
> The purpose of calling rollFSImage() during startups imho is to recover from an unsuccessful checkpoint. So an easy fix
> is to empty edits.new before calling rollFSImage the same as edits are emptied, then rollFSImage will rename empty file
> to empty which gives us the desired result.
> We should fix this bug both in 0.14 and 0.15. I make it a blocker for 0.15.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-1978) Name-node should remove edits.new during startup rather than renaming it to edits.

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko reassigned HADOOP-1978:
-------------------------------------------

    Assignee: Konstantin Shvachko

> Name-node should remove edits.new during startup rather than renaming it to edits.
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-1978
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1978
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.1
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.14.2, 0.15.0
>
>
> Secondary name-node fails in the middle. The main name-node writes its journal transactions into edits.new at that time.
> If the name-node is shut down after that and restarted, then loadFSImage() reads current image file, merges it with the edits
> file and with the edits.new file.
> Now saveFSImage() saves new image file, creates empty edits file, and then calls rollFSImage(), which particularly renames 
> edits.new into edits. This is a mistake, during startup edits.new should be merely removed after merging it with the image.
> The purpose of calling rollFSImage() during startups imho is to recover from an unsuccessful checkpoint. So an easy fix
> is to empty edits.new before calling rollFSImage the same as edits are emptied, then rollFSImage will rename empty file
> to empty which gives us the desired result.
> We should fix this bug both in 0.14 and 0.15. I make it a blocker for 0.15.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1978) Name-node should remove edits.new during startup rather than renaming it to edits.

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531622 ] 

dhruba borthakur commented on HADOOP-1978:
------------------------------------------

This bug was introduced in 0.13. Does it mean that we need a fix for 0.13 release too?

> Name-node should remove edits.new during startup rather than renaming it to edits.
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-1978
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1978
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.1
>            Reporter: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.14.2, 0.15.0
>
>
> Secondary name-node fails in the middle. The main name-node writes its journal transactions into edits.new at that time.
> If the name-node is shut down after that and restarted, then loadFSImage() reads current image file, merges it with the edits
> file and with the edits.new file.
> Now saveFSImage() saves new image file, creates empty edits file, and then calls rollFSImage(), which particularly renames 
> edits.new into edits. This is a mistake, during startup edits.new should be merely removed after merging it with the image.
> The purpose of calling rollFSImage() during startups imho is to recover from an unsuccessful checkpoint. So an easy fix
> is to empty edits.new before calling rollFSImage the same as edits are emptied, then rollFSImage will rename empty file
> to empty which gives us the desired result.
> We should fix this bug both in 0.14 and 0.15. I make it a blocker for 0.15.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1978) Name-node should remove edits.new during startup rather than renaming it to edits.

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12531626 ] 

Doug Cutting commented on HADOOP-1978:
--------------------------------------

> Does it mean that we need a fix for 0.13 release too?

I don't think so.  Folks can upgrade from 0.13 to 0.14.2 to get this fix.


> Name-node should remove edits.new during startup rather than renaming it to edits.
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-1978
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1978
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.1
>            Reporter: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.14.2, 0.15.0
>
>
> Secondary name-node fails in the middle. The main name-node writes its journal transactions into edits.new at that time.
> If the name-node is shut down after that and restarted, then loadFSImage() reads current image file, merges it with the edits
> file and with the edits.new file.
> Now saveFSImage() saves new image file, creates empty edits file, and then calls rollFSImage(), which particularly renames 
> edits.new into edits. This is a mistake, during startup edits.new should be merely removed after merging it with the image.
> The purpose of calling rollFSImage() during startups imho is to recover from an unsuccessful checkpoint. So an easy fix
> is to empty edits.new before calling rollFSImage the same as edits are emptied, then rollFSImage will rename empty file
> to empty which gives us the desired result.
> We should fix this bug both in 0.14 and 0.15. I make it a blocker for 0.15.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1978) Name-node should remove edits.new during startup rather than renaming it to edits.

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-1978:
----------------------------------------

    Attachment: editsNew-0-15.patch

- I fixed the problem as described above by making edits.new empty at the same time the edits gets emptied.
- I truncate edits files when they are emptied, otherwise it is not clear what is the real file size.
- Based on that modified TestCheckpoint, which now checks edits and image files existence and lengths.
  The test fails with the old code and succeeds with the new.
- Cleaned up EditLogOutputStream: it calls force() now on the FileChannel instead of flush() and synch(). 
This makes the fd member obsolete. I removed it.
- Fsck was failing consistently on my machine because safe-mode was still on when the test was trying
to delete files. It should call waitActive() before deleting.
- Patch for 0.14 is coming soon.
- For hadoop 0.13 in order to avoid this problem one should just start the cluster with -upgrade. Then edits.new
will be merged with the image as well as the edits, but will not be renamed to edits, since it does not
exist in the new image directory.

> Name-node should remove edits.new during startup rather than renaming it to edits.
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-1978
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1978
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.1
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.14.2, 0.15.0
>
>         Attachments: editsNew-0-15.patch
>
>
> Secondary name-node fails in the middle. The main name-node writes its journal transactions into edits.new at that time.
> If the name-node is shut down after that and restarted, then loadFSImage() reads current image file, merges it with the edits
> file and with the edits.new file.
> Now saveFSImage() saves new image file, creates empty edits file, and then calls rollFSImage(), which particularly renames 
> edits.new into edits. This is a mistake, during startup edits.new should be merely removed after merging it with the image.
> The purpose of calling rollFSImage() during startups imho is to recover from an unsuccessful checkpoint. So an easy fix
> is to empty edits.new before calling rollFSImage the same as edits are emptied, then rollFSImage will rename empty file
> to empty which gives us the desired result.
> We should fix this bug both in 0.14 and 0.15. I make it a blocker for 0.15.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1978) Name-node should remove edits.new during startup rather than renaming it to edits.

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532153 ] 

dhruba borthakur commented on HADOOP-1978:
------------------------------------------

Code looks good.

> Name-node should remove edits.new during startup rather than renaming it to edits.
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-1978
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1978
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.1
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.14.2, 0.15.0
>
>         Attachments: editsNew-0-14.patch, editsNew-0-15.patch
>
>
> Secondary name-node fails in the middle. The main name-node writes its journal transactions into edits.new at that time.
> If the name-node is shut down after that and restarted, then loadFSImage() reads current image file, merges it with the edits
> file and with the edits.new file.
> Now saveFSImage() saves new image file, creates empty edits file, and then calls rollFSImage(), which particularly renames 
> edits.new into edits. This is a mistake, during startup edits.new should be merely removed after merging it with the image.
> The purpose of calling rollFSImage() during startups imho is to recover from an unsuccessful checkpoint. So an easy fix
> is to empty edits.new before calling rollFSImage the same as edits are emptied, then rollFSImage will rename empty file
> to empty which gives us the desired result.
> We should fix this bug both in 0.14 and 0.15. I make it a blocker for 0.15.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1978) Name-node should remove edits.new during startup rather than renaming it to edits.

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-1978:
----------------------------------------

    Attachment: editsNew-0-15.patch

> Name-node should remove edits.new during startup rather than renaming it to edits.
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-1978
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1978
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.13.1
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>            Priority: Blocker
>             Fix For: 0.14.2, 0.15.0
>
>         Attachments: editsNew-0-14.patch, editsNew-0-15.patch
>
>
> Secondary name-node fails in the middle. The main name-node writes its journal transactions into edits.new at that time.
> If the name-node is shut down after that and restarted, then loadFSImage() reads current image file, merges it with the edits
> file and with the edits.new file.
> Now saveFSImage() saves new image file, creates empty edits file, and then calls rollFSImage(), which particularly renames 
> edits.new into edits. This is a mistake, during startup edits.new should be merely removed after merging it with the image.
> The purpose of calling rollFSImage() during startups imho is to recover from an unsuccessful checkpoint. So an easy fix
> is to empty edits.new before calling rollFSImage the same as edits are emptied, then rollFSImage will rename empty file
> to empty which gives us the desired result.
> We should fix this bug both in 0.14 and 0.15. I make it a blocker for 0.15.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.