You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@oozie.apache.org by "Kiran Nagasubramanian (JIRA)" <ji...@apache.org> on 2011/09/19 08:48:08 UTC

[jira] [Created] (OOZIE-562) Design of Oozie Logging System

Design of Oozie Logging System
------------------------------

                 Key: OOZIE-562
                 URL: https://issues.apache.org/jira/browse/OOZIE-562
             Project: Oozie
          Issue Type: Question
            Reporter: Kiran Nagasubramanian


When large log files are there in the log folder, even if we try to retrieve the log content(of size of the order of kBs) for a small job, it takes quite some significant amount of time, as equally as it takes to retrieve log content for very large jobs(of the order of GBs). This happens because, the list of files to be scanned for log retrieval is the same for all jobs that run approximately at around the same time.

Chances that this might materialize in production systems is really high. Since, hundreds of jobs would be logging to the same file for an hr and this file size would be really huge. Is it possible to have the logs for the jobs separately so that scanning large log files of other jobs can be avoided? Would this be really worth the effort?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-562) Design of Oozie Logging System

Posted by "Kiran Nagasubramanian (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109211#comment-13109211 ] 

Kiran Nagasubramanian commented on OOZIE-562:
---------------------------------------------

Hi Alejandro,

    Thanks for responding. I generated a log myself of size around 3.2 GB. I noticed the delay when I was playing with that. Since I was unsure about the size of the individual log files(1 hr of log content) generated on the production system, I posted this question. In case, the log files wont be of the order of GBs when the logging level is set to DEBUG, this would not be an issue at all. 

Thanks,
Kiran

> Design of Oozie Logging System
> ------------------------------
>
>                 Key: OOZIE-562
>                 URL: https://issues.apache.org/jira/browse/OOZIE-562
>             Project: Oozie
>          Issue Type: Question
>            Reporter: Kiran Nagasubramanian
>
> When large log files are there in the log folder, even if we try to retrieve the log content(of size of the order of kBs) for a small job, it takes quite some significant amount of time, as equally as it takes to retrieve log content for very large jobs(of the order of GBs). This happens because, the list of files to be scanned for log retrieval is the same for all jobs that run approximately at around the same time.
> Chances that this might materialize in production systems is really high. Since, hundreds of jobs would be logging to the same file for an hr and this file size would be really huge. Is it possible to have the logs for the jobs separately so that scanning large log files of other jobs can be avoided? Would this be really worth the effort?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-562) Design of Oozie Logging System

Posted by "Alejandro Abdelnur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107809#comment-13107809 ] 

Alejandro Abdelnur commented on OOZIE-562:
------------------------------------------

By default Oozie logging level is DEBUG for most of its modules. Before putting Oozie in production, the log levels should be carefully revisited by the user. Wouldn't this be enough?

> Design of Oozie Logging System
> ------------------------------
>
>                 Key: OOZIE-562
>                 URL: https://issues.apache.org/jira/browse/OOZIE-562
>             Project: Oozie
>          Issue Type: Question
>            Reporter: Kiran Nagasubramanian
>
> When large log files are there in the log folder, even if we try to retrieve the log content(of size of the order of kBs) for a small job, it takes quite some significant amount of time, as equally as it takes to retrieve log content for very large jobs(of the order of GBs). This happens because, the list of files to be scanned for log retrieval is the same for all jobs that run approximately at around the same time.
> Chances that this might materialize in production systems is really high. Since, hundreds of jobs would be logging to the same file for an hr and this file size would be really huge. Is it possible to have the logs for the jobs separately so that scanning large log files of other jobs can be avoided? Would this be really worth the effort?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira