You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2019/07/21 00:15:00 UTC

[jira] [Commented] (DRILL-7327) Log Regex Plugin Won't Recognize Schema

    [ https://issues.apache.org/jira/browse/DRILL-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889612#comment-16889612 ] 

Paul Rogers commented on DRILL-7327:
------------------------------------

When run, as a unit test, in the first form (without schema), the output is:

{noformat}
#: columns
0: ["Dec 12 2018 06:50:25", "sshd", "36669", "Failed password for root from 61.160.251.136 port 1313 ssh2", "61.160.251.136"]
...
{noformat}

When I add a single field, as in the second example above, the output is:

{noformat}
#: eventDate, field_1, field_2, field_3, field_4
0: "Dec 12 2018 06:50:25", "sshd", "36669", "Failed password for root from 61.160.251.136 port 1313 ssh2", "61.160.251.136"
{noformat}

That the above works is odd, as Charles says that a name-only schema fails.

If I add a type and format, then I do get an exception:

{noformat}
    firewallConfig.getSchema().add(new LogFormatField("eventDate", "TIMESTAMP", "MMM dd yyyy hh:mm:ss"));
...
Cannot parse "Dec 12 2018 00:48:53": Value 0 for clockhourOfHalfday must be in the range [1,12]
{noformat}

Then, noting that formats use the Joda format, not the Java 8 date/time format, we can change hours to "HH", which works:

{noformat}
    firewallConfig.getSchema().add(new LogFormatField("eventDate", "TIMESTAMP", "MMM dd yyyy HH:mm:ss"));
...
#: eventDate, field_1, field_2, field_3, field_4
0: 1544626225000, "sshd", "36669", "Failed password for root from 61.160.251.136 port 1313 ssh2", "61.160.251.136"
...
{noformat}

Finally, we add the remaining fields and get the expected result:

{noformat}
    firewallConfig.getSchema().add(new LogFormatField("process_name"));
    firewallConfig.getSchema().add(new LogFormatField("pid", "INT"));
    firewallConfig.getSchema().add(new LogFormatField("message"));
    firewallConfig.getSchema().add(new LogFormatField("src_ip"));
...
#: eventDate, process_name, pid, message, src_ip
0: 1544626225000, "sshd", 36669, "Failed password for root from 61.160.251.136 port 1313 ssh2", "61.160.251.136"
...
{noformat}

Conclusion: as explained in the e-mail exchange, it is necessary to use the Joda format to parse TIMESTAMP values. Once that is done, everything else seems to work.

Please try this again with the Joda format, and note back if you encounter additional errors.


> Log Regex Plugin Won't Recognize Schema
> ---------------------------------------
>
>                 Key: DRILL-7327
>                 URL: https://issues.apache.org/jira/browse/DRILL-7327
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.17.0
>            Reporter: Charles Givre
>            Assignee: Paul Rogers
>            Priority: Major
>         Attachments: firewall.ssdlog
>
>
> When I attempt to create a define a schema for the new `logRegex` plugin, Drill does not recognize the plugin if the configuration includes a schema.
> {code:json}
> {,
>     "ssdlog": {
>       "type": "logRegex",
>       "regex": "(\\w{3}\\s\\d{1,2}\\s\\d{4}\\s\\d{2}:\\d{2}:\\d{2})\\s+(\\w+)\\[(\\d+)\\]:\\s(.*?(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}).*?)",
>       "extension": "ssdlog",
>       "maxErrors": 10,
>       "schema": []
>     }
> {code}
> This configuration works, however, this does not:
> {code:json}
> {,
>     "ssdlog": {
>       "type": "logRegex",
>       "regex": "(\\w{3}\\s\\d{1,2}\\s\\d{4}\\s\\d{2}:\\d{2}:\\d{2})\\s+(\\w+)\\[(\\d+)\\]:\\s(.*?(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}).*?)",
>       "extension": "ssdlog",
>       "maxErrors": 10,
>       "schema": [
> {"fieldName":"eventDate"}
> ]
>     }
> {code}
> [~paul-rogers]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)