You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Charles Givre (JIRA)" <ji...@apache.org> on 2016/10/05 16:28:20 UTC
[jira] [Comment Edited] (DRILL-3423) Add New HTTPD format plugin
[ https://issues.apache.org/jira/browse/DRILL-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549220#comment-15549220 ]
Charles Givre edited comment on DRILL-3423 at 10/5/16 4:28 PM:
---------------------------------------------------------------
I'm still working out how to use git, but I thought I'd explain what I've done so far:
1. I cleaned up the field names such that they are a little more user friendly. IE HTTP_PATH:request_firstline_uri_path is now request_firstline_uri_path.
2. I wrote two UDFs which I'll include with the PR, parse_url() and parse_query(). Parse_query splits up a query string and returns a map of the key value pairs. IE: parse_query( 'url?arg1=x&arg2=y') would return:
{
'arg1': 'x',
'arg2': 'y'
}
Parse_url takes a URL and returns a map of the various components of the URL. It basically is a wrapper for java.net.URL and returns a map of the port, path, querystring, protocol, host and references. Is this acceptable to everyone? I think more or less, it follows what [~jnadeau] described in his earlier comments.
I'd really also like to add a user agent UDF parser as well. Niels has one that looked good, but it isn't under the Apache license.
was (Author: cgivre):
I'm still working out how to use git, but I thought I'd explain what I've done so far:
1. I cleaned up the field names such that they are a little more user friendly. IE HTTP_PATH:request_firstline_uri_path is now request_firstline_uri_path.
2. I wrote two UDFs which I'll include with the PR, parse_url() and parse_query(). Parse_query splits up a query string and returns a map of the key value pairs. IE: parse_query( 'url?arg1=x&arg2=y') would return:
{
'arg1': 'x',
'arg2': 'y'
}
Parse_url takes a URL and returns a map of the various components of the URL. It basically is a wrapper for java.net.URL and returns a map of the port, path, querystring, protocol, host and references. Is this acceptable to everyone? I think more or less, it follows what [~jnadeau] described in his earlier comments.
> Add New HTTPD format plugin
> ---------------------------
>
> Key: DRILL-3423
> URL: https://issues.apache.org/jira/browse/DRILL-3423
> Project: Apache Drill
> Issue Type: New Feature
> Components: Storage - Other
> Reporter: Jacques Nadeau
> Assignee: Jim Scott
> Fix For: Future
>
>
> Add an HTTPD logparser based format plugin. The author has been kind enough to move the logparser project to be released under the Apache License. Can find it here:
> <dependency>
> <groupId>nl.basjes.parse.httpdlog</groupId>
> <artifactId>httpdlog-parser</artifactId>
> <version>2.0</version>
> </dependency>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)