You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Konstantin Shvachko (JIRA)" <ji...@apache.org> on 2008/05/08 02:29:55 UTC

[jira] Created: (HADOOP-3364) Faster image and log edits loading.

Faster image and log edits loading.
-----------------------------------

                 Key: HADOOP-3364
                 URL: https://issues.apache.org/jira/browse/HADOOP-3364
             Project: Hadoop Core
          Issue Type: Improvement
          Components: dfs
    Affects Versions: 0.18.0
            Reporter: Konstantin Shvachko
            Assignee: Konstantin Shvachko
             Fix For: 0.18.0
         Attachments: fastLoadImage.patch

This patch optimizes code to provide faster load of fsimage and edits log.
I implemented ideas mentioned in HADOOP-3022. Namely, removed unnecessary object allocations,
and implemented optimized loading which avoids unnecessary name-space tree lookups 
if consecutive files belong to the same parent.
I changed saveImage algorithm so that it writes first all children of the same directory,
and then goes inside of its sub-directories. This does not change the format of the image, 
just changes the order of the stored objects. 
This should make loading faster after the image is saved with the new version.
The advantages in performance are
load/save fsimage: 15-20%
load edits: 5-10%
In terms of performance I expected somewhat more from this changes.
Especially for edits, but it turned out that recent changes substantially slowed down
edits loading. ADD and CLOSE operations first remove existing file with all its blocks,
then include at back with potentially new blocks, and then ADD additionally replaces
just inserted inode by inode-under-construction for the same file.
This is very inefficient, but hard to fix.
I'll do it in a separate jira.
Other changes:
- I combined most of the UTF8 references in one place at least for FSImage.
- Included log messages about the startup progress with load/save times and file sizes.
- Removed pre-crc-upgrade code from FSEdits, which was missed by the crc-remove patch.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3364) Faster image and log edits loading.

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-3364:
----------------------------------------

    Attachment: fastLoadImage.patch

> Faster image and log edits loading.
> -----------------------------------
>
>                 Key: HADOOP-3364
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3364
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.18.0
>
>         Attachments: fastLoadImage.patch
>
>
> This patch optimizes code to provide faster load of fsimage and edits log.
> I implemented ideas mentioned in HADOOP-3022. Namely, removed unnecessary object allocations,
> and implemented optimized loading which avoids unnecessary name-space tree lookups 
> if consecutive files belong to the same parent.
> I changed saveImage algorithm so that it writes first all children of the same directory,
> and then goes inside of its sub-directories. This does not change the format of the image, 
> just changes the order of the stored objects. 
> This should make loading faster after the image is saved with the new version.
> The advantages in performance are
> load/save fsimage: 15-20%
> load edits: 5-10%
> In terms of performance I expected somewhat more from this changes.
> Especially for edits, but it turned out that recent changes substantially slowed down
> edits loading. ADD and CLOSE operations first remove existing file with all its blocks,
> then include at back with potentially new blocks, and then ADD additionally replaces
> just inserted inode by inode-under-construction for the same file.
> This is very inefficient, but hard to fix.
> I'll do it in a separate jira.
> Other changes:
> - I combined most of the UTF8 references in one place at least for FSImage.
> - Included log messages about the startup progress with load/save times and file sizes.
> - Removed pre-crc-upgrade code from FSEdits, which was missed by the crc-remove patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3364) Faster image and log edits loading.

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595816#action_12595816 ] 

Hudson commented on HADOOP-3364:
--------------------------------

Integrated in Hadoop-trunk #486 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/486/])

> Faster image and log edits loading.
> -----------------------------------
>
>                 Key: HADOOP-3364
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3364
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.18.0
>
>         Attachments: fastLoadImage.patch, fastLoadImage.patch
>
>
> This patch optimizes code to provide faster load of fsimage and edits log.
> I implemented ideas mentioned in HADOOP-3022. Namely, removed unnecessary object allocations,
> and implemented optimized loading which avoids unnecessary name-space tree lookups 
> if consecutive files belong to the same parent.
> I changed saveImage algorithm so that it writes first all children of the same directory,
> and then goes inside of its sub-directories. This does not change the format of the image, 
> just changes the order of the stored objects. 
> This should make loading faster after the image is saved with the new version.
> The advantages in performance are
> load/save fsimage: 15-20%
> load edits: 5-10%
> In terms of performance I expected somewhat more from this changes.
> Especially for edits, but it turned out that recent changes substantially slowed down
> edits loading. ADD and CLOSE operations first remove existing file with all its blocks,
> then include at back with potentially new blocks, and then ADD additionally replaces
> just inserted inode by inode-under-construction for the same file.
> This is very inefficient, but hard to fix.
> I'll do it in a separate jira.
> Other changes:
> - I combined most of the UTF8 references in one place at least for FSImage.
> - Included log messages about the startup progress with load/save times and file sizes.
> - Removed pre-crc-upgrade code from FSEdits, which was missed by the crc-remove patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3364) Faster image and log edits loading.

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-3364:
----------------------------------------

    Attachment: fastLoadImage.patch

- Fixed one new find bugs and two old ones.
- Incorporated most of Nicholas's comments.
   Did not make FSImage.uStr  a large  array because UTF8 does not have capacity, and because it does not effect the performance.
   Did not remove isRoot. This is the definition of the root.
 Removed getParentINode although it is the method that will replace current getParent().
  Refactored FSDirectory/INodeDirectory.addToParent(...) reused duplicate with addNode() code.
  LeaseManager was printing logs in info mode which polluted a lot during startup and effected the performance. Had to downgrade them to debug mode.


> Faster image and log edits loading.
> -----------------------------------
>
>                 Key: HADOOP-3364
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3364
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.18.0
>
>         Attachments: fastLoadImage.patch, fastLoadImage.patch
>
>
> This patch optimizes code to provide faster load of fsimage and edits log.
> I implemented ideas mentioned in HADOOP-3022. Namely, removed unnecessary object allocations,
> and implemented optimized loading which avoids unnecessary name-space tree lookups 
> if consecutive files belong to the same parent.
> I changed saveImage algorithm so that it writes first all children of the same directory,
> and then goes inside of its sub-directories. This does not change the format of the image, 
> just changes the order of the stored objects. 
> This should make loading faster after the image is saved with the new version.
> The advantages in performance are
> load/save fsimage: 15-20%
> load edits: 5-10%
> In terms of performance I expected somewhat more from this changes.
> Especially for edits, but it turned out that recent changes substantially slowed down
> edits loading. ADD and CLOSE operations first remove existing file with all its blocks,
> then include at back with potentially new blocks, and then ADD additionally replaces
> just inserted inode by inode-under-construction for the same file.
> This is very inefficient, but hard to fix.
> I'll do it in a separate jira.
> Other changes:
> - I combined most of the UTF8 references in one place at least for FSImage.
> - Included log messages about the startup progress with load/save times and file sizes.
> - Removed pre-crc-upgrade code from FSEdits, which was missed by the crc-remove patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3364) Faster image and log edits loading.

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595574#action_12595574 ] 

Hadoop QA commented on HADOOP-3364:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12381748/fastLoadImage.patch
  against trunk revision 654315.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2438/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2438/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2438/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2438/console

This message is automatically generated.

> Faster image and log edits loading.
> -----------------------------------
>
>                 Key: HADOOP-3364
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3364
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.18.0
>
>         Attachments: fastLoadImage.patch, fastLoadImage.patch
>
>
> This patch optimizes code to provide faster load of fsimage and edits log.
> I implemented ideas mentioned in HADOOP-3022. Namely, removed unnecessary object allocations,
> and implemented optimized loading which avoids unnecessary name-space tree lookups 
> if consecutive files belong to the same parent.
> I changed saveImage algorithm so that it writes first all children of the same directory,
> and then goes inside of its sub-directories. This does not change the format of the image, 
> just changes the order of the stored objects. 
> This should make loading faster after the image is saved with the new version.
> The advantages in performance are
> load/save fsimage: 15-20%
> load edits: 5-10%
> In terms of performance I expected somewhat more from this changes.
> Especially for edits, but it turned out that recent changes substantially slowed down
> edits loading. ADD and CLOSE operations first remove existing file with all its blocks,
> then include at back with potentially new blocks, and then ADD additionally replaces
> just inserted inode by inode-under-construction for the same file.
> This is very inefficient, but hard to fix.
> I'll do it in a separate jira.
> Other changes:
> - I combined most of the UTF8 references in one place at least for FSImage.
> - Included log messages about the startup progress with load/save times and file sizes.
> - Removed pre-crc-upgrade code from FSEdits, which was missed by the crc-remove patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3364) Faster image and log edits loading.

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595375#action_12595375 ] 

Hadoop QA commented on HADOOP-3364:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12381648/fastLoadImage.patch
  against trunk revision 654315.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2426/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2426/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2426/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2426/console

This message is automatically generated.

> Faster image and log edits loading.
> -----------------------------------
>
>                 Key: HADOOP-3364
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3364
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.18.0
>
>         Attachments: fastLoadImage.patch
>
>
> This patch optimizes code to provide faster load of fsimage and edits log.
> I implemented ideas mentioned in HADOOP-3022. Namely, removed unnecessary object allocations,
> and implemented optimized loading which avoids unnecessary name-space tree lookups 
> if consecutive files belong to the same parent.
> I changed saveImage algorithm so that it writes first all children of the same directory,
> and then goes inside of its sub-directories. This does not change the format of the image, 
> just changes the order of the stored objects. 
> This should make loading faster after the image is saved with the new version.
> The advantages in performance are
> load/save fsimage: 15-20%
> load edits: 5-10%
> In terms of performance I expected somewhat more from this changes.
> Especially for edits, but it turned out that recent changes substantially slowed down
> edits loading. ADD and CLOSE operations first remove existing file with all its blocks,
> then include at back with potentially new blocks, and then ADD additionally replaces
> just inserted inode by inode-under-construction for the same file.
> This is very inefficient, but hard to fix.
> I'll do it in a separate jira.
> Other changes:
> - I combined most of the UTF8 references in one place at least for FSImage.
> - Included log messages about the startup progress with load/save times and file sizes.
> - Removed pre-crc-upgrade code from FSEdits, which was missed by the crc-remove patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3364) Faster image and log edits loading.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595683#action_12595683 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-3364:
------------------------------------------------

+1 the new patch looks good.

> Faster image and log edits loading.
> -----------------------------------
>
>                 Key: HADOOP-3364
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3364
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.18.0
>
>         Attachments: fastLoadImage.patch, fastLoadImage.patch
>
>
> This patch optimizes code to provide faster load of fsimage and edits log.
> I implemented ideas mentioned in HADOOP-3022. Namely, removed unnecessary object allocations,
> and implemented optimized loading which avoids unnecessary name-space tree lookups 
> if consecutive files belong to the same parent.
> I changed saveImage algorithm so that it writes first all children of the same directory,
> and then goes inside of its sub-directories. This does not change the format of the image, 
> just changes the order of the stored objects. 
> This should make loading faster after the image is saved with the new version.
> The advantages in performance are
> load/save fsimage: 15-20%
> load edits: 5-10%
> In terms of performance I expected somewhat more from this changes.
> Especially for edits, but it turned out that recent changes substantially slowed down
> edits loading. ADD and CLOSE operations first remove existing file with all its blocks,
> then include at back with potentially new blocks, and then ADD additionally replaces
> just inserted inode by inode-under-construction for the same file.
> This is very inefficient, but hard to fix.
> I'll do it in a separate jira.
> Other changes:
> - I combined most of the UTF8 references in one place at least for FSImage.
> - Included log messages about the startup progress with load/save times and file sizes.
> - Removed pre-crc-upgrade code from FSEdits, which was missed by the crc-remove patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3364) Faster image and log edits loading.

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-3364:
----------------------------------------

    Status: Patch Available  (was: Open)

> Faster image and log edits loading.
> -----------------------------------
>
>                 Key: HADOOP-3364
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3364
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.18.0
>
>         Attachments: fastLoadImage.patch
>
>
> This patch optimizes code to provide faster load of fsimage and edits log.
> I implemented ideas mentioned in HADOOP-3022. Namely, removed unnecessary object allocations,
> and implemented optimized loading which avoids unnecessary name-space tree lookups 
> if consecutive files belong to the same parent.
> I changed saveImage algorithm so that it writes first all children of the same directory,
> and then goes inside of its sub-directories. This does not change the format of the image, 
> just changes the order of the stored objects. 
> This should make loading faster after the image is saved with the new version.
> The advantages in performance are
> load/save fsimage: 15-20%
> load edits: 5-10%
> In terms of performance I expected somewhat more from this changes.
> Especially for edits, but it turned out that recent changes substantially slowed down
> edits loading. ADD and CLOSE operations first remove existing file with all its blocks,
> then include at back with potentially new blocks, and then ADD additionally replaces
> just inserted inode by inode-under-construction for the same file.
> This is very inefficient, but hard to fix.
> I'll do it in a separate jira.
> Other changes:
> - I combined most of the UTF8 references in one place at least for FSImage.
> - Included log messages about the startup progress with load/save times and file sizes.
> - Removed pre-crc-upgrade code from FSEdits, which was missed by the crc-remove patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3364) Faster image and log edits loading.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12595434#action_12595434 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-3364:
------------------------------------------------

fastLoadImage.patch simplifies the image reading codes a lot.  We should do something similar for writing.  Here are some comments:

# check whether LOG.isDebugEnabled() before calling LOG.debug(...).
# keep NodeIterator private by defining a method BlockInfo.iterater().
# In FSDirectory.closeFile(...), remove local variable fileNode.  Also, do we need "synchronized (rootDir)" there?
# In FSDirectory.addToParent(...), remove "parentINode = newParent;" and return newParent.
# Make FSImage.uStr final.  Initialize FSImage.uStr.bytes to a large enough array.
# Define a static final field EMPTY_ARRAY = new DatanodeDescriptor[0] in DatanodeDescriptor.  Then, change "lastLocations = new DatanodeDescriptor[0];" to
lastLocations = DatanodeDescriptor.EMPTY_ARRAY;" in case OP_CLOSE.
# For case OP_SET_GENSTAMP, call in.readLong() directly.
# In FSImage, change saveImage(..., INode current, ...) to saveImage(..., INodeDirectory curDir, ...) since current always is a directory
# In INode, remove isRoot and getParentINode since they are not used.
# Add comments saying that FSDirectory/INodeDirectory.addToParent(...) should only be used in FSImage.
# Revert the change in LeaseManager.


> Faster image and log edits loading.
> -----------------------------------
>
>                 Key: HADOOP-3364
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3364
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.18.0
>
>         Attachments: fastLoadImage.patch
>
>
> This patch optimizes code to provide faster load of fsimage and edits log.
> I implemented ideas mentioned in HADOOP-3022. Namely, removed unnecessary object allocations,
> and implemented optimized loading which avoids unnecessary name-space tree lookups 
> if consecutive files belong to the same parent.
> I changed saveImage algorithm so that it writes first all children of the same directory,
> and then goes inside of its sub-directories. This does not change the format of the image, 
> just changes the order of the stored objects. 
> This should make loading faster after the image is saved with the new version.
> The advantages in performance are
> load/save fsimage: 15-20%
> load edits: 5-10%
> In terms of performance I expected somewhat more from this changes.
> Especially for edits, but it turned out that recent changes substantially slowed down
> edits loading. ADD and CLOSE operations first remove existing file with all its blocks,
> then include at back with potentially new blocks, and then ADD additionally replaces
> just inserted inode by inode-under-construction for the same file.
> This is very inefficient, but hard to fix.
> I'll do it in a separate jira.
> Other changes:
> - I combined most of the UTF8 references in one place at least for FSImage.
> - Included log messages about the startup progress with load/save times and file sizes.
> - Removed pre-crc-upgrade code from FSEdits, which was missed by the crc-remove patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3364) Faster image and log edits loading.

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-3364:
----------------------------------------

    Status: Open  (was: Patch Available)

> Faster image and log edits loading.
> -----------------------------------
>
>                 Key: HADOOP-3364
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3364
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.18.0
>
>         Attachments: fastLoadImage.patch
>
>
> This patch optimizes code to provide faster load of fsimage and edits log.
> I implemented ideas mentioned in HADOOP-3022. Namely, removed unnecessary object allocations,
> and implemented optimized loading which avoids unnecessary name-space tree lookups 
> if consecutive files belong to the same parent.
> I changed saveImage algorithm so that it writes first all children of the same directory,
> and then goes inside of its sub-directories. This does not change the format of the image, 
> just changes the order of the stored objects. 
> This should make loading faster after the image is saved with the new version.
> The advantages in performance are
> load/save fsimage: 15-20%
> load edits: 5-10%
> In terms of performance I expected somewhat more from this changes.
> Especially for edits, but it turned out that recent changes substantially slowed down
> edits loading. ADD and CLOSE operations first remove existing file with all its blocks,
> then include at back with potentially new blocks, and then ADD additionally replaces
> just inserted inode by inode-under-construction for the same file.
> This is very inefficient, but hard to fix.
> I'll do it in a separate jira.
> Other changes:
> - I combined most of the UTF8 references in one place at least for FSImage.
> - Included log messages about the startup progress with load/save times and file sizes.
> - Removed pre-crc-upgrade code from FSEdits, which was missed by the crc-remove patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3364) Faster image and log edits loading.

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-3364:
----------------------------------------

    Hadoop Flags: [Reviewed]
          Status: Patch Available  (was: Open)

> Faster image and log edits loading.
> -----------------------------------
>
>                 Key: HADOOP-3364
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3364
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.18.0
>
>         Attachments: fastLoadImage.patch, fastLoadImage.patch
>
>
> This patch optimizes code to provide faster load of fsimage and edits log.
> I implemented ideas mentioned in HADOOP-3022. Namely, removed unnecessary object allocations,
> and implemented optimized loading which avoids unnecessary name-space tree lookups 
> if consecutive files belong to the same parent.
> I changed saveImage algorithm so that it writes first all children of the same directory,
> and then goes inside of its sub-directories. This does not change the format of the image, 
> just changes the order of the stored objects. 
> This should make loading faster after the image is saved with the new version.
> The advantages in performance are
> load/save fsimage: 15-20%
> load edits: 5-10%
> In terms of performance I expected somewhat more from this changes.
> Especially for edits, but it turned out that recent changes substantially slowed down
> edits loading. ADD and CLOSE operations first remove existing file with all its blocks,
> then include at back with potentially new blocks, and then ADD additionally replaces
> just inserted inode by inode-under-construction for the same file.
> This is very inefficient, but hard to fix.
> I'll do it in a separate jira.
> Other changes:
> - I combined most of the UTF8 references in one place at least for FSImage.
> - Included log messages about the startup progress with load/save times and file sizes.
> - Removed pre-crc-upgrade code from FSEdits, which was missed by the crc-remove patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3364) Faster image and log edits loading.

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-3364:
----------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.

> Faster image and log edits loading.
> -----------------------------------
>
>                 Key: HADOOP-3364
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3364
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Konstantin Shvachko
>            Assignee: Konstantin Shvachko
>             Fix For: 0.18.0
>
>         Attachments: fastLoadImage.patch, fastLoadImage.patch
>
>
> This patch optimizes code to provide faster load of fsimage and edits log.
> I implemented ideas mentioned in HADOOP-3022. Namely, removed unnecessary object allocations,
> and implemented optimized loading which avoids unnecessary name-space tree lookups 
> if consecutive files belong to the same parent.
> I changed saveImage algorithm so that it writes first all children of the same directory,
> and then goes inside of its sub-directories. This does not change the format of the image, 
> just changes the order of the stored objects. 
> This should make loading faster after the image is saved with the new version.
> The advantages in performance are
> load/save fsimage: 15-20%
> load edits: 5-10%
> In terms of performance I expected somewhat more from this changes.
> Especially for edits, but it turned out that recent changes substantially slowed down
> edits loading. ADD and CLOSE operations first remove existing file with all its blocks,
> then include at back with potentially new blocks, and then ADD additionally replaces
> just inserted inode by inode-under-construction for the same file.
> This is very inefficient, but hard to fix.
> I'll do it in a separate jira.
> Other changes:
> - I combined most of the UTF8 references in one place at least for FSImage.
> - Included log messages about the startup progress with load/save times and file sizes.
> - Removed pre-crc-upgrade code from FSEdits, which was missed by the crc-remove patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.