You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@chukwa.apache.org by "Jiaqi Tan (JIRA)" <ji...@apache.org> on 2009/04/06 16:17:13 UTC

[jira] Created: (CHUKWA-94) SALSA state-machine extraction from Hadoop logs

SALSA state-machine extraction from Hadoop logs
-----------------------------------------------

                 Key: CHUKWA-94
                 URL: https://issues.apache.org/jira/browse/CHUKWA-94
             Project: Hadoop Chukwa
          Issue Type: New Feature
          Components: Data Processors
            Reporter: Jiaqi Tan


This is a proposed feature addition to extract state-machine views from Hadoop's logs (TaskTracker, JobTracker, and DataNode currently supported, NameNode soon). These views are as described in http://www.usenix.org/event/wasl08/tech/full_papers/tan/tan_html/ and will enable analysis and diagnosis algorithms to be built on top of them.

Building a full SALSA view involves two steps:

1. Incrementally parsing log entries on a per-node basis to extract states (line-by-line reading, assuming the entire log file from a given node is available to the same process)
2. "Stitching" and correlating states across all logs (across nodes and across types) to build a full state machine.

My idea is to add SALSA as two jobs in the demux stage, with the first parsing job in demux, and either having: 
(a) the parsing job write its output to the permanent store with the correlating job reading/writing from/to the permanent store, or 
(b) the parsing job write its output back to the sinkfile and having the correlating job reading from the sink file and writing to the permanent store.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CHUKWA-94) SALSA state-machine extraction from Hadoop logs

Posted by "Jiaqi Tan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CHUKWA-94?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jiaqi Tan updated CHUKWA-94:
----------------------------

    Attachment: tan.pdf

SALSA paper from the Workshop on Analysis of System Logs '08, San Diego, CA

> SALSA state-machine extraction from Hadoop logs
> -----------------------------------------------
>
>                 Key: CHUKWA-94
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-94
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: Data Processors
>            Reporter: Jiaqi Tan
>         Attachments: tan.pdf
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> This is a proposed feature addition to extract state-machine views from Hadoop's logs (TaskTracker, JobTracker, and DataNode currently supported, NameNode soon). These views are as described in http://www.usenix.org/event/wasl08/tech/full_papers/tan/tan_html/ and will enable analysis and diagnosis algorithms to be built on top of them.
> Building a full SALSA view involves two steps:
> 1. Incrementally parsing log entries on a per-node basis to extract states (line-by-line reading, assuming the entire log file from a given node is available to the same process)
> 2. "Stitching" and correlating states across all logs (across nodes and across types) to build a full state machine.
> My idea is to add SALSA as two jobs in the demux stage, with the first parsing job in demux, and either having: 
> (a) the parsing job write its output to the permanent store with the correlating job reading/writing from/to the permanent store, or 
> (b) the parsing job write its output back to the sinkfile and having the correlating job reading from the sink file and writing to the permanent store.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work logged: (CHUKWA-94) SALSA state-machine extraction from Hadoop logs

Posted by "Jiaqi Tan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CHUKWA-94?page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#action_10916 ]

Jiaqi Tan logged work on CHUKWA-94:
-----------------------------------

                Author: Jiaqi Tan
            Created on: 22/Sep/09 09:12 AM
            Start Date: 22/Sep/09 09:12 AM
    Worklog Time Spent: 48h 
      Work Description: Completed state-machine generation

Issue Time Tracking
-------------------

            Time Spent: 624h  (was: 576h)
    Remaining Estimate: 48h  (was: 96h)

> SALSA state-machine extraction from Hadoop logs
> -----------------------------------------------
>
>                 Key: CHUKWA-94
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-94
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: Data Processors
>            Reporter: Jiaqi Tan
>            Assignee: Jiaqi Tan
>         Attachments: tan.pdf
>
>   Original Estimate: 672h
>          Time Spent: 624h
>  Remaining Estimate: 48h
>
> This is a proposed feature addition to extract state-machine views from Hadoop's logs (TaskTracker, JobTracker, and DataNode currently supported, NameNode soon). These views are as described in http://www.usenix.org/event/wasl08/tech/full_papers/tan/tan_html/ and will enable analysis and diagnosis algorithms to be built on top of them.
> Building a full SALSA view involves two steps:
> 1. Incrementally parsing log entries on a per-node basis to extract states (line-by-line reading, assuming the entire log file from a given node is available to the same process)
> 2. "Stitching" and correlating states across all logs (across nodes and across types) to build a full state machine.
> My idea is to add SALSA as two jobs in the demux stage, with the first parsing job in demux, and either having: 
> (a) the parsing job write its output to the permanent store with the correlating job reading/writing from/to the permanent store, or 
> (b) the parsing job write its output back to the sinkfile and having the correlating job reading from the sink file and writing to the permanent store.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Work logged: (CHUKWA-94) SALSA state-machine extraction from Hadoop logs

Posted by "Jiaqi Tan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CHUKWA-94?page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#action_10874 ]

Jiaqi Tan logged work on CHUKWA-94:
-----------------------------------

                Author: Jiaqi Tan
            Created on: 10/Jul/09 06:46 PM
            Start Date: 10/Jul/09 06:46 PM
    Worklog Time Spent: 576h 

Issue Time Tracking
-------------------

            Time Spent: 576h
    Remaining Estimate: 96h  (was: 672h)

> SALSA state-machine extraction from Hadoop logs
> -----------------------------------------------
>
>                 Key: CHUKWA-94
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-94
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: Data Processors
>            Reporter: Jiaqi Tan
>            Assignee: Jiaqi Tan
>         Attachments: tan.pdf
>
>   Original Estimate: 672h
>          Time Spent: 576h
>  Remaining Estimate: 96h
>
> This is a proposed feature addition to extract state-machine views from Hadoop's logs (TaskTracker, JobTracker, and DataNode currently supported, NameNode soon). These views are as described in http://www.usenix.org/event/wasl08/tech/full_papers/tan/tan_html/ and will enable analysis and diagnosis algorithms to be built on top of them.
> Building a full SALSA view involves two steps:
> 1. Incrementally parsing log entries on a per-node basis to extract states (line-by-line reading, assuming the entire log file from a given node is available to the same process)
> 2. "Stitching" and correlating states across all logs (across nodes and across types) to build a full state machine.
> My idea is to add SALSA as two jobs in the demux stage, with the first parsing job in demux, and either having: 
> (a) the parsing job write its output to the permanent store with the correlating job reading/writing from/to the permanent store, or 
> (b) the parsing job write its output back to the sinkfile and having the correlating job reading from the sink file and writing to the permanent store.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CHUKWA-94) SALSA state-machine extraction from Hadoop logs

Posted by "Jiaqi Tan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CHUKWA-94?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jiaqi Tan reassigned CHUKWA-94:
-------------------------------

    Assignee: Jiaqi Tan

> SALSA state-machine extraction from Hadoop logs
> -----------------------------------------------
>
>                 Key: CHUKWA-94
>                 URL: https://issues.apache.org/jira/browse/CHUKWA-94
>             Project: Hadoop Chukwa
>          Issue Type: New Feature
>          Components: Data Processors
>            Reporter: Jiaqi Tan
>            Assignee: Jiaqi Tan
>         Attachments: tan.pdf
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> This is a proposed feature addition to extract state-machine views from Hadoop's logs (TaskTracker, JobTracker, and DataNode currently supported, NameNode soon). These views are as described in http://www.usenix.org/event/wasl08/tech/full_papers/tan/tan_html/ and will enable analysis and diagnosis algorithms to be built on top of them.
> Building a full SALSA view involves two steps:
> 1. Incrementally parsing log entries on a per-node basis to extract states (line-by-line reading, assuming the entire log file from a given node is available to the same process)
> 2. "Stitching" and correlating states across all logs (across nodes and across types) to build a full state machine.
> My idea is to add SALSA as two jobs in the demux stage, with the first parsing job in demux, and either having: 
> (a) the parsing job write its output to the permanent store with the correlating job reading/writing from/to the permanent store, or 
> (b) the parsing job write its output back to the sinkfile and having the correlating job reading from the sink file and writing to the permanent store.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.