You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Vivek Ratan (JIRA)" <ji...@apache.org> on 2007/10/01 12:03:50 UTC

[jira] Created: (HADOOP-1974) Progress node should cache root object for faster progress computation

Progress node should cache root object for faster progress computation
----------------------------------------------------------------------

                 Key: HADOOP-1974
                 URL: https://issues.apache.org/jira/browse/HADOOP-1974
             Project: Hadoop
          Issue Type: Improvement
            Reporter: Vivek Ratan
            Assignee: Vivek Ratan
            Priority: Minor


In org.apache.hadoop.util.Progress.get(), we walk through the tree of objects to find the root of a node, before figuring out the progress. This approach is not optimized, especially since get() is called frequently. Each Progressnode should cache its root object, and this is easy to do since nodes do not change their parents. 

Keeping track of the root node is also useful in synchronization issues. [see HADOOP-1970 for more details]. The root node can be used to synchronize the entire structure for methods that need to traverse the tree in different directions and lock nodes along the way. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1974) Progress node should cache root object for faster progress computation

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Ratan updated HADOOP-1974:
--------------------------------

    Attachment: 1974_patch01

>From my comments in HADOOP_1970: 

There are 2-3 methods that traverse the tree-like structure of Progress methods, and at least two of them traverse (and obtain locks) in different directions, hence the deadlock. (One of) the right solution is to obtain locks in one direction only - so we lock when going downwards from the root node. This happens in get(), getInternal(), and toString(). If you need to traverse upwards towards the root (in complete()), you either release your lock before getting your parent's (which is what I've chosen to do, since we don't need transactional semantics), or you get locks in the same direction as other traversal methods.

Another somewhat related issue is that Progress::get(), which is called quite often, always traverses upwards to find the root of a structure. Since a node's root never changes, it should be cached at each node. This certainly improves performance for get(), but it also offers a synch mechanism should we ever need to write code that needs to lock multiple nodes and traverse upwards towards the root. In such a case, the methods can lock the root object to get sole access control to the entire structure. We don't need this for now, but it's a good mechanism to have for the future. 

> Progress node should cache root object for faster progress computation
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-1974
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1974
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Vivek Ratan
>            Assignee: Vivek Ratan
>            Priority: Minor
>         Attachments: 1974_patch01
>
>
> In org.apache.hadoop.util.Progress.get(), we walk through the tree of objects to find the root of a node, before figuring out the progress. This approach is not optimized, especially since get() is called frequently. Each Progressnode should cache its root object, and this is easy to do since nodes do not change their parents. 
> Keeping track of the root node is also useful in synchronization issues. [see HADOOP-1970 for more details]. The root node can be used to synchronize the entire structure for methods that need to traverse the tree in different directions and lock nodes along the way. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1974) Progress node should cache root object for faster progress computation

Posted by "Vivek Ratan (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Ratan updated HADOOP-1974:
--------------------------------

    Attachment: 1974_patch02

In my earlier patch, I had forgotten to use just the relative path to the files. That has been fixed in this patch (1974_patch02). 

> Progress node should cache root object for faster progress computation
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-1974
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1974
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Vivek Ratan
>            Assignee: Vivek Ratan
>            Priority: Minor
>         Attachments: 1974_patch01, 1974_patch02
>
>
> In org.apache.hadoop.util.Progress.get(), we walk through the tree of objects to find the root of a node, before figuring out the progress. This approach is not optimized, especially since get() is called frequently. Each Progressnode should cache its root object, and this is easy to do since nodes do not change their parents. 
> Keeping track of the root node is also useful in synchronization issues. [see HADOOP-1970 for more details]. The root node can be used to synchronize the entire structure for methods that need to traverse the tree in different directions and lock nodes along the way. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.