You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Grant Henke (Jira)" <ji...@apache.org> on 2020/06/03 02:52:00 UTC
[jira] [Updated] (KUDU-2434) Improve kudu-log-parser.pl
[ https://issues.apache.org/jira/browse/KUDU-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grant Henke updated KUDU-2434:
------------------------------
Target Version/s: (was: 1.8.0)
> Improve kudu-log-parser.pl
> --------------------------
>
> Key: KUDU-2434
> URL: https://issues.apache.org/jira/browse/KUDU-2434
> Project: Kudu
> Issue Type: Improvement
> Components: supportability
> Affects Versions: 1.7.0
> Reporter: William Berkeley
> Assignee: William Berkeley
> Priority: Major
>
> cc4e3957ba29bb42112dc21bfa8242e3f7afeac6 introduced the kudu-log-parser.pl script, which takes a collection of possibly-gzipped Kudu logs, categorizes and extracts information from some events in the logs using regexes, and then sorted-merges all the logs together. It can be pretty useful for looking at problems in a Kudu cluster ex post facto, especially when the exact timeframe or cause is not known.
> There's a number of things that can be done to make the script better, including:
> 1. Eliminating or disambiguating some false matches, e.g. "Time spent" is a prefix matched on that applies both to slow execution logging and to LBM startup messages.
> 2. Parallelizing the processing. In my experience, the script can take 30 minutes to munch a 12-node cluster's logs if the logs are 100-200MB in size.
> 3. Mike wrote the script to look at a cluster with consensus issues, so most of the categorization if focused on those types of logs. We cold generalize it to more types, and also allow filtering based on types.
> 4. The script is written in Perl. While that language is dear to Mike, most Kudu developers would be more comfortable using and tweaking the script if it were written in a more widely-known language like Python. Of course, Cython doesn't support parallelism, so maybe something like Scala? That has more unusual prerequisites, but it's Java-like and can be run as a script.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)