You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Grant Henke (Jira)" <ji...@apache.org> on 2020/06/03 02:52:00 UTC

[jira] [Updated] (KUDU-2434) Improve kudu-log-parser.pl

     [ https://issues.apache.org/jira/browse/KUDU-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Henke updated KUDU-2434:
------------------------------
    Target Version/s:   (was: 1.8.0)

> Improve kudu-log-parser.pl
> --------------------------
>
>                 Key: KUDU-2434
>                 URL: https://issues.apache.org/jira/browse/KUDU-2434
>             Project: Kudu
>          Issue Type: Improvement
>          Components: supportability
>    Affects Versions: 1.7.0
>            Reporter: William Berkeley
>            Assignee: William Berkeley
>            Priority: Major
>
> cc4e3957ba29bb42112dc21bfa8242e3f7afeac6 introduced the kudu-log-parser.pl script, which takes a collection of possibly-gzipped Kudu logs, categorizes and extracts information from some events in the logs using regexes, and then sorted-merges all the logs together. It can be pretty useful for looking at problems in a Kudu cluster ex post facto, especially when the exact timeframe or cause is not known.
> There's a number of things that can be done to make the script better, including:
> 1. Eliminating or disambiguating some false matches, e.g. "Time spent" is a prefix matched on that applies both to slow execution logging and to LBM startup messages.
> 2. Parallelizing the processing. In my experience, the script can take 30 minutes to munch a 12-node cluster's logs if the logs are 100-200MB in size.
> 3. Mike wrote the script to look at a cluster with consensus issues, so most of the categorization if focused on those types of logs. We cold generalize it to more types, and also allow filtering based on types.
> 4. The script is written in Perl. While that language is dear to Mike, most Kudu developers would be more comfortable using and tweaking the script if it were written in a more widely-known language like Python. Of course, Cython doesn't support parallelism, so maybe something like Scala? That has more unusual prerequisites, but it's Java-like and can be run as a script.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)