You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hemanth Yamijala (JIRA)" <ji...@apache.org> on 2009/01/13 18:40:01 UTC

[jira] Created: (HADOOP-5022) [HOD] logcondense should delete all hod logs for a user, including jobtracker logs

[HOD] logcondense should delete all hod logs for a user, including jobtracker logs
----------------------------------------------------------------------------------

                 Key: HADOOP-5022
                 URL: https://issues.apache.org/jira/browse/HADOOP-5022
             Project: Hadoop Core
          Issue Type: Bug
          Components: contrib/hod
            Reporter: Hemanth Yamijala
            Assignee: Peeyush Bishnoi


Currently, logcondense.py does not delete jobtracker logs that it uploads to the DFS when the HOD cluster is deallocated. This will result in the hod-logs directory to slowly accumulate a whole bunch of jobtracker logs. Particularly for users who run a lot of user jobs, this could fill up the namespace.  Further these directories will cause the logcondense program to keep repeatedly looking at these directories stressing out the namenode. So, logcondense.py should optionally also delete the jobtracker logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5022) [HOD] logcondense should delete all hod logs for a user, including jobtracker logs

Posted by "Peeyush Bishnoi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peeyush Bishnoi updated HADOOP-5022:
------------------------------------

    Attachment: hadoop-5022.txt

Patch is attached for logcondense.py that will optionally delete the JobTracker logs and also update the  logcondense.py documentation . In fact the patch will delete the complete job directory inside  hod-logs in DFS if option "-a" or "--all" is set to 'true' . 

TaskTracker logs gets deleted if option "-a" or "--all" is set to 'false' or  if option is not set . By default option is set to 'false'. 

For example:
python logcondense.py -p ~/hadoop-0.17.0/bin/hadoop -d 7 -c ~/hadoop-conf -l /user -a true

---

> [HOD] logcondense should delete all hod logs for a user, including jobtracker logs
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5022
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5022
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>            Reporter: Hemanth Yamijala
>            Assignee: Peeyush Bishnoi
>            Priority: Blocker
>             Fix For: 0.18.3
>
>         Attachments: hadoop-5022.txt
>
>
> Currently, logcondense.py does not delete jobtracker logs that it uploads to the DFS when the HOD cluster is deallocated. This will result in the hod-logs directory to slowly accumulate a whole bunch of jobtracker logs. Particularly for users who run a lot of user jobs, this could fill up the namespace.  Further these directories will cause the logcondense program to keep repeatedly looking at these directories stressing out the namenode. So, logcondense.py should optionally also delete the jobtracker logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5022) [HOD] logcondense should delete all hod logs for a user, including jobtracker logs

Posted by "Marco Nicosia (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marco Nicosia updated HADOOP-5022:
----------------------------------

    Priority: Blocker  (was: Major)

Thanks Hemanth. This is having a big impact on our running Hadoop clusters, so marking as blocker for 0.18.3.


> [HOD] logcondense should delete all hod logs for a user, including jobtracker logs
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5022
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5022
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>            Reporter: Hemanth Yamijala
>            Assignee: Peeyush Bishnoi
>            Priority: Blocker
>             Fix For: 0.18.3
>
>
> Currently, logcondense.py does not delete jobtracker logs that it uploads to the DFS when the HOD cluster is deallocated. This will result in the hod-logs directory to slowly accumulate a whole bunch of jobtracker logs. Particularly for users who run a lot of user jobs, this could fill up the namespace.  Further these directories will cause the logcondense program to keep repeatedly looking at these directories stressing out the namenode. So, logcondense.py should optionally also delete the jobtracker logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5022) [HOD] logcondense should delete all hod logs for a user, including jobtracker logs

Posted by "Peeyush Bishnoi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peeyush Bishnoi updated HADOOP-5022:
------------------------------------

    Attachment: hadoop-5022-2.txt

With this new patch minor change has been done in logcondense.py   "--retain-masters-logs" options  'help ' attribute .

Also testing of the patch has been done. 

If  options " -r " or "--retain-masters-logs" is set to 'true' then , log  of masters ( JobTracker and Namenode if "--dynamicdfs" is set ) , does not get removed .  But If  options " -r " or "--retain-masters-logs" is set to 'false' then complete job directory inside hod-logs in HDFS gets removed .

---

> [HOD] logcondense should delete all hod logs for a user, including jobtracker logs
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5022
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5022
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>            Reporter: Hemanth Yamijala
>            Assignee: Peeyush Bishnoi
>            Priority: Blocker
>             Fix For: 0.18.3
>
>         Attachments: hadoop-5022-1.txt, hadoop-5022-2.txt, hadoop-5022.txt
>
>
> Currently, logcondense.py does not delete jobtracker logs that it uploads to the DFS when the HOD cluster is deallocated. This will result in the hod-logs directory to slowly accumulate a whole bunch of jobtracker logs. Particularly for users who run a lot of user jobs, this could fill up the namespace.  Further these directories will cause the logcondense program to keep repeatedly looking at these directories stressing out the namenode. So, logcondense.py should optionally also delete the jobtracker logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5022) [HOD] logcondense should delete all hod logs for a user, including jobtracker logs

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-5022:
-------------------------------------

    Attachment: hadoop-5022-3.txt

> [HOD] logcondense should delete all hod logs for a user, including jobtracker logs
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5022
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5022
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>            Reporter: Hemanth Yamijala
>            Assignee: Peeyush Bishnoi
>            Priority: Blocker
>             Fix For: 0.18.3
>
>         Attachments: hadoop-5022-1.txt, hadoop-5022-2.txt, hadoop-5022-3.txt, hadoop-5022.txt
>
>
> Currently, logcondense.py does not delete jobtracker logs that it uploads to the DFS when the HOD cluster is deallocated. This will result in the hod-logs directory to slowly accumulate a whole bunch of jobtracker logs. Particularly for users who run a lot of user jobs, this could fill up the namespace.  Further these directories will cause the logcondense program to keep repeatedly looking at these directories stressing out the namenode. So, logcondense.py should optionally also delete the jobtracker logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5022) [HOD] logcondense should delete all hod logs for a user, including jobtracker logs

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664385#action_12664385 ] 

Hemanth Yamijala commented on HADOOP-5022:
------------------------------------------

I looked at the patch. Some comments:

- Agree with Vinod on the option's name. I think the default value can be 'false' and we can mark this as an incompatible change. This seems like a required behavior.
- The code that's deleting the files seems to be incorrectly indented. Note that the cmd variable is overwritten while iterating over prefixes. This is not how the code was previously. Can you please check.

Also, please do test this carefully.

> [HOD] logcondense should delete all hod logs for a user, including jobtracker logs
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5022
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5022
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>            Reporter: Hemanth Yamijala
>            Assignee: Peeyush Bishnoi
>            Priority: Blocker
>             Fix For: 0.18.3
>
>         Attachments: hadoop-5022.txt
>
>
> Currently, logcondense.py does not delete jobtracker logs that it uploads to the DFS when the HOD cluster is deallocated. This will result in the hod-logs directory to slowly accumulate a whole bunch of jobtracker logs. Particularly for users who run a lot of user jobs, this could fill up the namespace.  Further these directories will cause the logcondense program to keep repeatedly looking at these directories stressing out the namenode. So, logcondense.py should optionally also delete the jobtracker logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5022) [HOD] logcondense should delete all hod logs for a user, including jobtracker logs

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664112#action_12664112 ] 

Vinod K V commented on HADOOP-5022:
-----------------------------------

The patch looks good, seems to complete its requirements. I don't have a cluster set up with old logs, so couldn't test it live.

Some minor code comments w.r.t logcondense.py:
 - Line +100 : the dictionary related to the newly added option can be indented. It won't give a compilation problem now, but it can be indented nevertheless for better readability
 - The option name can be something in the lines of "retain-masters-logs" instead of "cleanall". It should be true by default (current scenario). The description string can be something like "true if the logs of the masters(jobtracker and namenode if dynamicdfs is set) have to be retained, false if everything has to be removed"
 - Line +191 : ret = 0 not needed
 - Line +135 : spurious blank line to be removed

Another point to be noted is that the code path covered by this patch could have been much simpler had the job-specific directories(present in hod-logs parent directory) themselves had timestamp in their names. If that were the case, the output needed from dfs as well as the number of filenames processed would have been much lesser. But, this needs changes in hod and so the fix in the provided patch should suffice for now.

> [HOD] logcondense should delete all hod logs for a user, including jobtracker logs
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5022
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5022
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>            Reporter: Hemanth Yamijala
>            Assignee: Peeyush Bishnoi
>            Priority: Blocker
>             Fix For: 0.18.3
>
>         Attachments: hadoop-5022.txt
>
>
> Currently, logcondense.py does not delete jobtracker logs that it uploads to the DFS when the HOD cluster is deallocated. This will result in the hod-logs directory to slowly accumulate a whole bunch of jobtracker logs. Particularly for users who run a lot of user jobs, this could fill up the namespace.  Further these directories will cause the logcondense program to keep repeatedly looking at these directories stressing out the namenode. So, logcondense.py should optionally also delete the jobtracker logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5022) [HOD] logcondense should delete all hod logs for a user, including jobtracker logs

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664503#action_12664503 ] 

Vinod K V commented on HADOOP-5022:
-----------------------------------

+1 for the changes in the patch.

> [HOD] logcondense should delete all hod logs for a user, including jobtracker logs
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5022
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5022
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>            Reporter: Hemanth Yamijala
>            Assignee: Peeyush Bishnoi
>            Priority: Blocker
>             Fix For: 0.18.3
>
>         Attachments: hadoop-5022-1.txt, hadoop-5022-2.txt, hadoop-5022.txt
>
>
> Currently, logcondense.py does not delete jobtracker logs that it uploads to the DFS when the HOD cluster is deallocated. This will result in the hod-logs directory to slowly accumulate a whole bunch of jobtracker logs. Particularly for users who run a lot of user jobs, this could fill up the namespace.  Further these directories will cause the logcondense program to keep repeatedly looking at these directories stressing out the namenode. So, logcondense.py should optionally also delete the jobtracker logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HADOOP-5022) [HOD] logcondense should delete all hod logs for a user, including jobtracker logs

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala resolved HADOOP-5022.
--------------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.18.3)
                   0.21.0
     Hadoop Flags: [Incompatible change, Reviewed]

I just committed this to trunk. Thanks, Peeyush !

> [HOD] logcondense should delete all hod logs for a user, including jobtracker logs
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5022
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5022
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>            Reporter: Hemanth Yamijala
>            Assignee: Peeyush Bishnoi
>            Priority: Blocker
>             Fix For: 0.21.0
>
>         Attachments: hadoop-5022-1.txt, hadoop-5022-2.txt, hadoop-5022-3.txt, hadoop-5022.txt
>
>
> Currently, logcondense.py does not delete jobtracker logs that it uploads to the DFS when the HOD cluster is deallocated. This will result in the hod-logs directory to slowly accumulate a whole bunch of jobtracker logs. Particularly for users who run a lot of user jobs, this could fill up the namespace.  Further these directories will cause the logcondense program to keep repeatedly looking at these directories stressing out the namenode. So, logcondense.py should optionally also delete the jobtracker logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5022) [HOD] logcondense should delete all hod logs for a user, including jobtracker logs

Posted by "Marco Nicosia (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marco Nicosia updated HADOOP-5022:
----------------------------------

    Fix Version/s: 0.18.3

> [HOD] logcondense should delete all hod logs for a user, including jobtracker logs
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5022
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5022
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>            Reporter: Hemanth Yamijala
>            Assignee: Peeyush Bishnoi
>            Priority: Blocker
>             Fix For: 0.18.3
>
>
> Currently, logcondense.py does not delete jobtracker logs that it uploads to the DFS when the HOD cluster is deallocated. This will result in the hod-logs directory to slowly accumulate a whole bunch of jobtracker logs. Particularly for users who run a lot of user jobs, this could fill up the namespace.  Further these directories will cause the logcondense program to keep repeatedly looking at these directories stressing out the namenode. So, logcondense.py should optionally also delete the jobtracker logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5022) [HOD] logcondense should delete all hod logs for a user, including jobtracker logs

Posted by "Peeyush Bishnoi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peeyush Bishnoi updated HADOOP-5022:
------------------------------------

    Attachment: hadoop-5022-1.txt

Thanks! for the comments. I incorporated the changes in the patch as per the comments provided and uploaded the new patch . This new patch has "-r " and "--retain-masters-logs" as option that need to be passed with value either "true" or "false" while running the logcondense.py script . By default the option is set to 'false' . It means it will delete the complete job directory inside hod-logs in HDFS. But if option is set to 'true' it will delete only the tasktracker logs . It will delete the Datanode logs , if "--dynamicdfs" is 'true'.

For example:
python logcondense.py -p ~/hadoop-0.17.0/bin/hadoop -d 7 -c ~/hadoop-conf -l /user -r true

---



> [HOD] logcondense should delete all hod logs for a user, including jobtracker logs
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5022
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5022
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>            Reporter: Hemanth Yamijala
>            Assignee: Peeyush Bishnoi
>            Priority: Blocker
>             Fix For: 0.18.3
>
>         Attachments: hadoop-5022-1.txt, hadoop-5022.txt
>
>
> Currently, logcondense.py does not delete jobtracker logs that it uploads to the DFS when the HOD cluster is deallocated. This will result in the hod-logs directory to slowly accumulate a whole bunch of jobtracker logs. Particularly for users who run a lot of user jobs, this could fill up the namespace.  Further these directories will cause the logcondense program to keep repeatedly looking at these directories stressing out the namenode. So, logcondense.py should optionally also delete the jobtracker logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5022) [HOD] logcondense should delete all hod logs for a user, including jobtracker logs

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664750#action_12664750 ] 

Hemanth Yamijala commented on HADOOP-5022:
------------------------------------------

I made some very minor changes to the attached patch. The following:

- modified the comments in the code a bit to better reflect the current algorithm
- changed the name of the option to retain-master-logs instead of retain-masters-logs. Also updated the name in the documentation.
- changed the path being deleted if retain-master-logs is false to remove the final / in the patch. This was unnecessary. I've tested this on my local box and it seems to work fine.

test-patch results are as follows. Hod tests continue to be done manually, outside the unit test cycle of Hadoop.

     [exec] -1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
     [exec]                         Please justify why no tests are needed for this patch.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.


> [HOD] logcondense should delete all hod logs for a user, including jobtracker logs
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5022
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5022
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>            Reporter: Hemanth Yamijala
>            Assignee: Peeyush Bishnoi
>            Priority: Blocker
>             Fix For: 0.18.3
>
>         Attachments: hadoop-5022-1.txt, hadoop-5022-2.txt, hadoop-5022-3.txt, hadoop-5022.txt
>
>
> Currently, logcondense.py does not delete jobtracker logs that it uploads to the DFS when the HOD cluster is deallocated. This will result in the hod-logs directory to slowly accumulate a whole bunch of jobtracker logs. Particularly for users who run a lot of user jobs, this could fill up the namespace.  Further these directories will cause the logcondense program to keep repeatedly looking at these directories stressing out the namenode. So, logcondense.py should optionally also delete the jobtracker logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5022) [HOD] logcondense should delete all hod logs for a user, including jobtracker logs

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-5022:
-------------------------------------

    Release Note: Implemented an option,  retain-master-logs, in logcondense support script that would indicate whether the script should delete hadoop master logs as part of its cleanup process. By default this option is false, that makes the change incompatible with earlier versions of logcondense.
    Hadoop Flags: [Incompatible change, Reviewed]  (was: [Reviewed, Incompatible change])

> [HOD] logcondense should delete all hod logs for a user, including jobtracker logs
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-5022
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5022
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hod
>            Reporter: Hemanth Yamijala
>            Assignee: Peeyush Bishnoi
>            Priority: Blocker
>             Fix For: 0.21.0
>
>         Attachments: hadoop-5022-1.txt, hadoop-5022-2.txt, hadoop-5022-3.txt, hadoop-5022.txt
>
>
> Currently, logcondense.py does not delete jobtracker logs that it uploads to the DFS when the HOD cluster is deallocated. This will result in the hod-logs directory to slowly accumulate a whole bunch of jobtracker logs. Particularly for users who run a lot of user jobs, this could fill up the namespace.  Further these directories will cause the logcondense program to keep repeatedly looking at these directories stressing out the namenode. So, logcondense.py should optionally also delete the jobtracker logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.