You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Manish Rangari <li...@gmail.com> on 2016/09/20 09:22:36 UTC

ELB Log processing

Guys,

I want to get the field of elb logs. A sample elb log is given below and I
am using below create table definition. It is working fine. I am getting
what I wanted but now I want the bold part as well. For example eid, tid,
aid. Can anyone help me how can I match them as well.

NOTE: The position of aid, eid, tid is not fixed and it may change.

2016-09-16T06:55:19.056871Z testelb 2.1.7.2:52399 192.168.1.5:80 0.000021
0.000596 0.00002 200 200 0 43 "GET https://site1.example.com:443/peek?
*eid=aw123&tid=fskc235n&aid=2ADSFGSDG* HTTP/1.1" "Mozilla/5.0 (Windows NT
6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85
Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2


CREATE TABLE elblog (
Request_date STRING,
      ELBName STRING,
      RequestIP STRING,
      RequestPort INT,
      BackendIP STRING,
      BackendPort INT,
      RequestProcessingTime DOUBLE,
      BackendProcessingTime DOUBLE,
      ClientResponseTime DOUBLE,
      ELBResponseCode STRING,
      BackendResponseCode STRING,
      ReceivedBytes BIGINT,
      SentBytes BIGINT,
      RequestVerb STRING,
      URL STRING,
      Protocol STRING,
Useragent STRING,
ssl_cipher STRING,
ssl_protocol STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
          "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
)
STORED AS TEXTFILE;

Re: ELB Log processing

Posted by Manish Rangari <li...@gmail.com>.
Thanks Dudu, both the queries worked like a charm. I personally liked
second query as it is quite easy to remember.

--Manish

On Tue, Sep 20, 2016 at 8:41 PM, Markovitz, Dudu <dm...@paypal.com>
wrote:

> Or
>
>
>
> create view elb_raw_log_detailed
>
> as
>
> select request_date, elbname, requestip, requestport, backendip,
> backendport, requestprocessingtime, backendprocessingtime,
> clientresponsetime, elbresponsecode, backendresponsecode, receivedbytes,
> sentbytes, requestverb, url, parse_url(url, 'QUERY','aid') as aid,
> parse_url(url, 'QUERY','tid') as tid, parse_url(url, 'QUERY','eid') as eid,
> parse_url(url, 'QUERY','did') as did, protocol, useragent, ssl_cipher,
> ssl_protocol
>
> from elblog;
>
>
>
> Dudu
>
>
>
> *From:* Markovitz, Dudu [mailto:dmarkovitz@paypal.com]
> *Sent:* Tuesday, September 20, 2016 6:06 PM
> *To:* user@hive.apache.org
> *Subject:* RE: ELB Log processing
>
>
>
> create view elb_raw_log_detailed
>
> as
>
> select request_date, elbname, requestip, requestport, backendip,
> backendport, requestprocessingtime, backendprocessingtime,
> clientresponsetime, elbresponsecode, backendresponsecode, receivedbytes,
> sentbytes, requestverb, url, u.aid, u.tid, u.eid,u.did, protocol,
> useragent, ssl_cipher, ssl_protocol
>
> from elblog
>
> LATERAL VIEW parse_url_tuple(url,'QUERY:eid','QUERY:tid','QUERY:aid','QUERY:did')
> u as eid,tid,aid,did
>
> ;
>
>
>
> Dudu
>
>
>
> *From:* Manish Rangari [mailto:linuxtricksfordevops@gmail.com
> <li...@gmail.com>]
> *Sent:* Tuesday, September 20, 2016 4:09 PM
> *To:* user@hive.apache.org
> *Subject:* Re: ELB Log processing
>
>
>
> Guys,
>
>
>
> I am struggling to create this view. I am keep getting the error in bold.
> I found that I need to use lateral view but still I am not able to get the
> syntax right.
>
>
>
> hive> create view elb_raw_log_detailed as select request_date, elbname,
> requestip, requestport, backendip, backendport, requestprocessingtime,
> backendprocessingtime, clientresponsetime, elbresponsecode,
> backendresponsecode, receivedbytes, sentbytes, requestverb, url,
> parse_url_tuple(url, 'QUERY:aid') as aid, parse_url_tuple(url, 'QUERY:tid')
> as tid, parse_url_tuple(url, 'QUERY:eid') as eid, parse_url_tuple(url,
> 'QUERY:did') as did, protocol, useragent, ssl_cipher, ssl_protocol from
> elblogz;
>
>
>
> *FAILED: SemanticException [Error 10081]: UDTF's are not supported outside
> the SELECT clause, nor nested in expressions*
>
>
>
> On Tue, Sep 20, 2016 at 3:56 PM, Manish Rangari <
> linuxtricksfordevops@gmail.com> wrote:
>
> Yes views looks like a way to go
>
>
>
> On Tue, Sep 20, 2016 at 3:49 PM, Damien Carol <da...@gmail.com>
> wrote:
>
> The royal way to do that is a view IMHO.
>
>
>
> 2016-09-20 12:14 GMT+02:00 Manish Rangari <linuxtricksfordevops@gmail.com
> >:
>
> Thanks for the reply Damien. The suggestion you gave is really useful.
> Currently I am achieving my desired output by performing below steps. But I
> want to achieve the desired result in one step instead of two. Do we have
> any way so that I can get the aid, did etc in create table statement? If
> not I will have to look for the option that you mentioned
>
>
>
> 1.
>
> CREATE TABLE elblog (
>
> Request_date STRING,
>
>       ELBName STRING,
>
>       RequestIP STRING,
>
>       RequestPort INT,
>
>       BackendIP STRING,
>
>       BackendPort INT,
>
>       RequestProcessingTime DOUBLE,
>
>       BackendProcessingTime DOUBLE,
>
>       ClientResponseTime DOUBLE,
>
>       ELBResponseCode STRING,
>
>       BackendResponseCode STRING,
>
>       ReceivedBytes BIGINT,
>
>       SentBytes BIGINT,
>
>       RequestVerb STRING,
>
>       URL STRING,
>
>       Protocol STRING,
>
> Useragent STRING,
>
> ssl_cipher STRING,
>
> ssl_protocol STRING
>
> )
>
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
>
> WITH SERDEPROPERTIES (
>
>           "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
>
> )
>
> STORED AS TEXTFILE;
>
>
>
> 2.
>
> create table elb_raw_log as select request_date, elbname, requestip,
> requestport, backendip, backendport, requestprocessingtime,
> backendprocessingtime, clientresponsetime, elbresponsecode,
> backendresponsecode, receivedbytes, sentbytes, requestverb, url,
> regexp_extract(url, '.*aid=([a-zA-Z0-9]+).*', 1) as aid,
> regexp_extract(url, '.*tid=([a-zA-Z0-9]+).*', 1) as tid,
> regexp_extract(url, '.*eid=([a-zA-Z0-9]+).*', 1) as eid,
> regexp_extract(url, '.*did=([a-zA-Z0-9]+).*', 1) as did, protocol,
> useragent, ssl_cipher, ssl_protocol from elblog;
>
>
>
> On Tue, Sep 20, 2016 at 3:12 PM, Damien Carol <da...@gmail.com>
> wrote:
>
> see the udf *parse_url_tuple*
>
> SELECT b.*
>
> FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY',
> 'QUERY:id') b as host, path, query, query_id LIMIT 1;
>
>
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#
> LanguageManualUDF-parse_url_tuple
>
>
>
> 2016-09-20 11:22 GMT+02:00 Manish Rangari <linuxtricksfordevops@gmail.com
> >:
>
> Guys,
>
>
>
> I want to get the field of elb logs. A sample elb log is given below and I
> am using below create table definition. It is working fine. I am getting
> what I wanted but now I want the bold part as well. For example eid, tid,
> aid. Can anyone help me how can I match them as well.
>
>
>
> NOTE: The position of aid, eid, tid is not fixed and it may change.
>
>
>
> 2016-09-16T06:55:19.056871Z testelb 2.1.7.2:52399 192.168.1.5:80 0.000021
> 0.000596 0.00002 200 200 0 43 "GET https://site1.example.com:443/peek?
> *eid=aw123&tid=fskc235n&aid=2ADSFGSDG* HTTP/1.1" "Mozilla/5.0 (Windows NT
> 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85
> Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
>
>
>
>
>
> CREATE TABLE elblog (
>
> Request_date STRING,
>
>       ELBName STRING,
>
>       RequestIP STRING,
>
>       RequestPort INT,
>
>       BackendIP STRING,
>
>       BackendPort INT,
>
>       RequestProcessingTime DOUBLE,
>
>       BackendProcessingTime DOUBLE,
>
>       ClientResponseTime DOUBLE,
>
>       ELBResponseCode STRING,
>
>       BackendResponseCode STRING,
>
>       ReceivedBytes BIGINT,
>
>       SentBytes BIGINT,
>
>       RequestVerb STRING,
>
>       URL STRING,
>
>       Protocol STRING,
>
> Useragent STRING,
>
> ssl_cipher STRING,
>
> ssl_protocol STRING
>
> )
>
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
>
> WITH SERDEPROPERTIES (
>
>           "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
>
> )
>
> STORED AS TEXTFILE;
>
>
>
>
>
>
>
>
>
>
>

RE: ELB Log processing

Posted by "Markovitz, Dudu" <dm...@paypal.com>.
Or

create view elb_raw_log_detailed
as
select request_date, elbname, requestip, requestport, backendip, backendport, requestprocessingtime, backendprocessingtime, clientresponsetime, elbresponsecode, backendresponsecode, receivedbytes, sentbytes, requestverb, url, parse_url(url, 'QUERY','aid') as aid, parse_url(url, 'QUERY','tid') as tid, parse_url(url, 'QUERY','eid') as eid, parse_url(url, 'QUERY','did') as did, protocol, useragent, ssl_cipher, ssl_protocol
from elblog;

Dudu

From: Markovitz, Dudu [mailto:dmarkovitz@paypal.com]
Sent: Tuesday, September 20, 2016 6:06 PM
To: user@hive.apache.org
Subject: RE: ELB Log processing

create view elb_raw_log_detailed
as
select request_date, elbname, requestip, requestport, backendip, backendport, requestprocessingtime, backendprocessingtime, clientresponsetime, elbresponsecode, backendresponsecode, receivedbytes, sentbytes, requestverb, url, u.aid, u.tid, u.eid,u.did, protocol, useragent, ssl_cipher, ssl_protocol
from elblog
LATERAL VIEW parse_url_tuple(url,'QUERY:eid','QUERY:tid','QUERY:aid','QUERY:did') u as eid,tid,aid,did
;

Dudu

From: Manish Rangari [mailto:linuxtricksfordevops@gmail.com]
Sent: Tuesday, September 20, 2016 4:09 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: ELB Log processing

Guys,

I am struggling to create this view. I am keep getting the error in bold. I found that I need to use lateral view but still I am not able to get the syntax right.

hive> create view elb_raw_log_detailed as select request_date, elbname, requestip, requestport, backendip, backendport, requestprocessingtime, backendprocessingtime, clientresponsetime, elbresponsecode, backendresponsecode, receivedbytes, sentbytes, requestverb, url, parse_url_tuple(url, 'QUERY:aid') as aid, parse_url_tuple(url, 'QUERY:tid') as tid, parse_url_tuple(url, 'QUERY:eid') as eid, parse_url_tuple(url, 'QUERY:did') as did, protocol, useragent, ssl_cipher, ssl_protocol from elblogz;

FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions

On Tue, Sep 20, 2016 at 3:56 PM, Manish Rangari <li...@gmail.com>> wrote:
Yes views looks like a way to go

On Tue, Sep 20, 2016 at 3:49 PM, Damien Carol <da...@gmail.com>> wrote:
The royal way to do that is a view IMHO.

2016-09-20 12:14 GMT+02:00 Manish Rangari <li...@gmail.com>>:
Thanks for the reply Damien. The suggestion you gave is really useful. Currently I am achieving my desired output by performing below steps. But I want to achieve the desired result in one step instead of two. Do we have any way so that I can get the aid, did etc in create table statement? If not I will have to look for the option that you mentioned

1.
CREATE TABLE elblog (
Request_date STRING,
      ELBName STRING,
      RequestIP STRING,
      RequestPort INT,
      BackendIP STRING,
      BackendPort INT,
      RequestProcessingTime DOUBLE,
      BackendProcessingTime DOUBLE,
      ClientResponseTime DOUBLE,
      ELBResponseCode STRING,
      BackendResponseCode STRING,
      ReceivedBytes BIGINT,
      SentBytes BIGINT,
      RequestVerb STRING,
      URL STRING,
      Protocol STRING,
Useragent STRING,
ssl_cipher STRING,
ssl_protocol STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
          "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
)
STORED AS TEXTFILE;

2.
create table elb_raw_log as select request_date, elbname, requestip, requestport, backendip, backendport, requestprocessingtime, backendprocessingtime, clientresponsetime, elbresponsecode, backendresponsecode, receivedbytes, sentbytes, requestverb, url, regexp_extract(url, '.*aid=([a-zA-Z0-9]+).*', 1) as aid, regexp_extract(url, '.*tid=([a-zA-Z0-9]+).*', 1) as tid, regexp_extract(url, '.*eid=([a-zA-Z0-9]+).*', 1) as eid, regexp_extract(url, '.*did=([a-zA-Z0-9]+).*', 1) as did, protocol, useragent, ssl_cipher, ssl_protocol from elblog;

On Tue, Sep 20, 2016 at 3:12 PM, Damien Carol <da...@gmail.com>> wrote:
see the udf parse_url_tuple
SELECT b.*
FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY', 'QUERY:id') b as host, path, query, query_id LIMIT 1;


https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-parse_url_tuple

2016-09-20 11:22 GMT+02:00 Manish Rangari <li...@gmail.com>>:
Guys,

I want to get the field of elb logs. A sample elb log is given below and I am using below create table definition. It is working fine. I am getting what I wanted but now I want the bold part as well. For example eid, tid, aid. Can anyone help me how can I match them as well.

NOTE: The position of aid, eid, tid is not fixed and it may change.

2016-09-16T06:55:19.056871Z testelb 2.1.7.2:52399<http://2.1.7.2:52399> 192.168.1.5:80<http://192.168.1.5:80> 0.000021 0.000596 0.00002 200 200 0 43 "GET https://site1.example.com:443/peek?eid=aw123&tid=fskc235n&aid=2ADSFGSDG HTTP/1.1" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2


CREATE TABLE elblog (
Request_date STRING,
      ELBName STRING,
      RequestIP STRING,
      RequestPort INT,
      BackendIP STRING,
      BackendPort INT,
      RequestProcessingTime DOUBLE,
      BackendProcessingTime DOUBLE,
      ClientResponseTime DOUBLE,
      ELBResponseCode STRING,
      BackendResponseCode STRING,
      ReceivedBytes BIGINT,
      SentBytes BIGINT,
      RequestVerb STRING,
      URL STRING,
      Protocol STRING,
Useragent STRING,
ssl_cipher STRING,
ssl_protocol STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
          "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
)
STORED AS TEXTFILE;






RE: ELB Log processing

Posted by "Markovitz, Dudu" <dm...@paypal.com>.
create view elb_raw_log_detailed
as
select request_date, elbname, requestip, requestport, backendip, backendport, requestprocessingtime, backendprocessingtime, clientresponsetime, elbresponsecode, backendresponsecode, receivedbytes, sentbytes, requestverb, url, u.aid, u.tid, u.eid,u.did, protocol, useragent, ssl_cipher, ssl_protocol
from elblog
LATERAL VIEW parse_url_tuple(url,'QUERY:eid','QUERY:tid','QUERY:aid','QUERY:did') u as eid,tid,aid,did
;

Dudu

From: Manish Rangari [mailto:linuxtricksfordevops@gmail.com]
Sent: Tuesday, September 20, 2016 4:09 PM
To: user@hive.apache.org
Subject: Re: ELB Log processing

Guys,

I am struggling to create this view. I am keep getting the error in bold. I found that I need to use lateral view but still I am not able to get the syntax right.

hive> create view elb_raw_log_detailed as select request_date, elbname, requestip, requestport, backendip, backendport, requestprocessingtime, backendprocessingtime, clientresponsetime, elbresponsecode, backendresponsecode, receivedbytes, sentbytes, requestverb, url, parse_url_tuple(url, 'QUERY:aid') as aid, parse_url_tuple(url, 'QUERY:tid') as tid, parse_url_tuple(url, 'QUERY:eid') as eid, parse_url_tuple(url, 'QUERY:did') as did, protocol, useragent, ssl_cipher, ssl_protocol from elblogz;

FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions

On Tue, Sep 20, 2016 at 3:56 PM, Manish Rangari <li...@gmail.com>> wrote:
Yes views looks like a way to go

On Tue, Sep 20, 2016 at 3:49 PM, Damien Carol <da...@gmail.com>> wrote:
The royal way to do that is a view IMHO.

2016-09-20 12:14 GMT+02:00 Manish Rangari <li...@gmail.com>>:
Thanks for the reply Damien. The suggestion you gave is really useful. Currently I am achieving my desired output by performing below steps. But I want to achieve the desired result in one step instead of two. Do we have any way so that I can get the aid, did etc in create table statement? If not I will have to look for the option that you mentioned

1.
CREATE TABLE elblog (
Request_date STRING,
      ELBName STRING,
      RequestIP STRING,
      RequestPort INT,
      BackendIP STRING,
      BackendPort INT,
      RequestProcessingTime DOUBLE,
      BackendProcessingTime DOUBLE,
      ClientResponseTime DOUBLE,
      ELBResponseCode STRING,
      BackendResponseCode STRING,
      ReceivedBytes BIGINT,
      SentBytes BIGINT,
      RequestVerb STRING,
      URL STRING,
      Protocol STRING,
Useragent STRING,
ssl_cipher STRING,
ssl_protocol STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
          "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
)
STORED AS TEXTFILE;

2.
create table elb_raw_log as select request_date, elbname, requestip, requestport, backendip, backendport, requestprocessingtime, backendprocessingtime, clientresponsetime, elbresponsecode, backendresponsecode, receivedbytes, sentbytes, requestverb, url, regexp_extract(url, '.*aid=([a-zA-Z0-9]+).*', 1) as aid, regexp_extract(url, '.*tid=([a-zA-Z0-9]+).*', 1) as tid, regexp_extract(url, '.*eid=([a-zA-Z0-9]+).*', 1) as eid, regexp_extract(url, '.*did=([a-zA-Z0-9]+).*', 1) as did, protocol, useragent, ssl_cipher, ssl_protocol from elblog;

On Tue, Sep 20, 2016 at 3:12 PM, Damien Carol <da...@gmail.com>> wrote:
see the udf parse_url_tuple
SELECT b.*
FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY', 'QUERY:id') b as host, path, query, query_id LIMIT 1;


https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-parse_url_tuple

2016-09-20 11:22 GMT+02:00 Manish Rangari <li...@gmail.com>>:
Guys,

I want to get the field of elb logs. A sample elb log is given below and I am using below create table definition. It is working fine. I am getting what I wanted but now I want the bold part as well. For example eid, tid, aid. Can anyone help me how can I match them as well.

NOTE: The position of aid, eid, tid is not fixed and it may change.

2016-09-16T06:55:19.056871Z testelb 2.1.7.2:52399<http://2.1.7.2:52399> 192.168.1.5:80<http://192.168.1.5:80> 0.000021 0.000596 0.00002 200 200 0 43 "GET https://site1.example.com:443/peek?eid=aw123&tid=fskc235n&aid=2ADSFGSDG HTTP/1.1" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2


CREATE TABLE elblog (
Request_date STRING,
      ELBName STRING,
      RequestIP STRING,
      RequestPort INT,
      BackendIP STRING,
      BackendPort INT,
      RequestProcessingTime DOUBLE,
      BackendProcessingTime DOUBLE,
      ClientResponseTime DOUBLE,
      ELBResponseCode STRING,
      BackendResponseCode STRING,
      ReceivedBytes BIGINT,
      SentBytes BIGINT,
      RequestVerb STRING,
      URL STRING,
      Protocol STRING,
Useragent STRING,
ssl_cipher STRING,
ssl_protocol STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
          "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
)
STORED AS TEXTFILE;






Re: ELB Log processing

Posted by Manish Rangari <li...@gmail.com>.
Guys,

I am struggling to create this view. I am keep getting the error in bold. I
found that I need to use lateral view but still I am not able to get the
syntax right.

hive> create view elb_raw_log_detailed as select request_date, elbname,
requestip, requestport, backendip, backendport, requestprocessingtime,
backendprocessingtime, clientresponsetime, elbresponsecode,
backendresponsecode, receivedbytes, sentbytes, requestverb, url,
parse_url_tuple(url, 'QUERY:aid') as aid, parse_url_tuple(url, 'QUERY:tid')
as tid, parse_url_tuple(url, 'QUERY:eid') as eid, parse_url_tuple(url,
'QUERY:did') as did, protocol, useragent, ssl_cipher, ssl_protocol from
elblogz;

*FAILED: SemanticException [Error 10081]: UDTF's are not supported outside
the SELECT clause, nor nested in expressions*

On Tue, Sep 20, 2016 at 3:56 PM, Manish Rangari <
linuxtricksfordevops@gmail.com> wrote:

> Yes views looks like a way to go
>
> On Tue, Sep 20, 2016 at 3:49 PM, Damien Carol <da...@gmail.com>
> wrote:
>
>> The royal way to do that is a view IMHO.
>>
>> 2016-09-20 12:14 GMT+02:00 Manish Rangari <linuxtricksfordevops@gmail.com
>> >:
>>
>>> Thanks for the reply Damien. The suggestion you gave is really useful.
>>> Currently I am achieving my desired output by performing below steps. But I
>>> want to achieve the desired result in one step instead of two. Do we have
>>> any way so that I can get the aid, did etc in create table statement? If
>>> not I will have to look for the option that you mentioned
>>>
>>> 1.
>>> CREATE TABLE elblog (
>>> Request_date STRING,
>>>       ELBName STRING,
>>>       RequestIP STRING,
>>>       RequestPort INT,
>>>       BackendIP STRING,
>>>       BackendPort INT,
>>>       RequestProcessingTime DOUBLE,
>>>       BackendProcessingTime DOUBLE,
>>>       ClientResponseTime DOUBLE,
>>>       ELBResponseCode STRING,
>>>       BackendResponseCode STRING,
>>>       ReceivedBytes BIGINT,
>>>       SentBytes BIGINT,
>>>       RequestVerb STRING,
>>>       URL STRING,
>>>       Protocol STRING,
>>> Useragent STRING,
>>> ssl_cipher STRING,
>>> ssl_protocol STRING
>>> )
>>> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
>>> WITH SERDEPROPERTIES (
>>>           "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
>>> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
>>> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
>>> )
>>> STORED AS TEXTFILE;
>>>
>>> 2.
>>> create table elb_raw_log as select request_date, elbname, requestip,
>>> requestport, backendip, backendport, requestprocessingtime,
>>> backendprocessingtime, clientresponsetime, elbresponsecode,
>>> backendresponsecode, receivedbytes, sentbytes, requestverb, url,
>>> regexp_extract(url, '.*aid=([a-zA-Z0-9]+).*', 1) as aid,
>>> regexp_extract(url, '.*tid=([a-zA-Z0-9]+).*', 1) as tid,
>>> regexp_extract(url, '.*eid=([a-zA-Z0-9]+).*', 1) as eid,
>>> regexp_extract(url, '.*did=([a-zA-Z0-9]+).*', 1) as did, protocol,
>>> useragent, ssl_cipher, ssl_protocol from elblog;
>>>
>>> On Tue, Sep 20, 2016 at 3:12 PM, Damien Carol <da...@gmail.com>
>>> wrote:
>>>
>>>> see the udf
>>>> *parse_url_tuple*
>>>> SELECT b.*
>>>> FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY',
>>>> 'QUERY:id') b as host, path, query, query_id LIMIT 1;
>>>>
>>>>
>>>> https://cwiki.apache.org/confluence/display/Hive/LanguageMan
>>>> ual+UDF#LanguageManualUDF-parse_url_tuple
>>>>
>>>> 2016-09-20 11:22 GMT+02:00 Manish Rangari <
>>>> linuxtricksfordevops@gmail.com>:
>>>>
>>>>> Guys,
>>>>>
>>>>> I want to get the field of elb logs. A sample elb log is given below
>>>>> and I am using below create table definition. It is working fine. I am
>>>>> getting what I wanted but now I want the bold part as well. For example
>>>>> eid, tid, aid. Can anyone help me how can I match them as well.
>>>>>
>>>>> NOTE: The position of aid, eid, tid is not fixed and it may change.
>>>>>
>>>>> 2016-09-16T06:55:19.056871Z testelb 2.1.7.2:52399 192.168.1.5:80
>>>>> 0.000021 0.000596 0.00002 200 200 0 43 "GET
>>>>> https://site1.example.com:443/peek?
>>>>> *eid=aw123&tid=fskc235n&aid=2ADSFGSDG* HTTP/1.1" "Mozilla/5.0
>>>>> (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85
>>>>> Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
>>>>>
>>>>>
>>>>> CREATE TABLE elblog (
>>>>> Request_date STRING,
>>>>>       ELBName STRING,
>>>>>       RequestIP STRING,
>>>>>       RequestPort INT,
>>>>>       BackendIP STRING,
>>>>>       BackendPort INT,
>>>>>       RequestProcessingTime DOUBLE,
>>>>>       BackendProcessingTime DOUBLE,
>>>>>       ClientResponseTime DOUBLE,
>>>>>       ELBResponseCode STRING,
>>>>>       BackendResponseCode STRING,
>>>>>       ReceivedBytes BIGINT,
>>>>>       SentBytes BIGINT,
>>>>>       RequestVerb STRING,
>>>>>       URL STRING,
>>>>>       Protocol STRING,
>>>>> Useragent STRING,
>>>>> ssl_cipher STRING,
>>>>> ssl_protocol STRING
>>>>> )
>>>>> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
>>>>> WITH SERDEPROPERTIES (
>>>>>           "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
>>>>> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
>>>>> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
>>>>> )
>>>>> STORED AS TEXTFILE;
>>>>>
>>>>
>>>>
>>>
>>
>

Re: ELB Log processing

Posted by Manish Rangari <li...@gmail.com>.
Yes views looks like a way to go

On Tue, Sep 20, 2016 at 3:49 PM, Damien Carol <da...@gmail.com>
wrote:

> The royal way to do that is a view IMHO.
>
> 2016-09-20 12:14 GMT+02:00 Manish Rangari <li...@gmail.com>
> :
>
>> Thanks for the reply Damien. The suggestion you gave is really useful.
>> Currently I am achieving my desired output by performing below steps. But I
>> want to achieve the desired result in one step instead of two. Do we have
>> any way so that I can get the aid, did etc in create table statement? If
>> not I will have to look for the option that you mentioned
>>
>> 1.
>> CREATE TABLE elblog (
>> Request_date STRING,
>>       ELBName STRING,
>>       RequestIP STRING,
>>       RequestPort INT,
>>       BackendIP STRING,
>>       BackendPort INT,
>>       RequestProcessingTime DOUBLE,
>>       BackendProcessingTime DOUBLE,
>>       ClientResponseTime DOUBLE,
>>       ELBResponseCode STRING,
>>       BackendResponseCode STRING,
>>       ReceivedBytes BIGINT,
>>       SentBytes BIGINT,
>>       RequestVerb STRING,
>>       URL STRING,
>>       Protocol STRING,
>> Useragent STRING,
>> ssl_cipher STRING,
>> ssl_protocol STRING
>> )
>> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
>> WITH SERDEPROPERTIES (
>>           "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
>> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
>> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
>> )
>> STORED AS TEXTFILE;
>>
>> 2.
>> create table elb_raw_log as select request_date, elbname, requestip,
>> requestport, backendip, backendport, requestprocessingtime,
>> backendprocessingtime, clientresponsetime, elbresponsecode,
>> backendresponsecode, receivedbytes, sentbytes, requestverb, url,
>> regexp_extract(url, '.*aid=([a-zA-Z0-9]+).*', 1) as aid,
>> regexp_extract(url, '.*tid=([a-zA-Z0-9]+).*', 1) as tid,
>> regexp_extract(url, '.*eid=([a-zA-Z0-9]+).*', 1) as eid,
>> regexp_extract(url, '.*did=([a-zA-Z0-9]+).*', 1) as did, protocol,
>> useragent, ssl_cipher, ssl_protocol from elblog;
>>
>> On Tue, Sep 20, 2016 at 3:12 PM, Damien Carol <da...@gmail.com>
>> wrote:
>>
>>> see the udf
>>> *parse_url_tuple*
>>> SELECT b.*
>>> FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY',
>>> 'QUERY:id') b as host, path, query, query_id LIMIT 1;
>>>
>>>
>>> https://cwiki.apache.org/confluence/display/Hive/LanguageMan
>>> ual+UDF#LanguageManualUDF-parse_url_tuple
>>>
>>> 2016-09-20 11:22 GMT+02:00 Manish Rangari <linuxtricksfordevops@gmail.co
>>> m>:
>>>
>>>> Guys,
>>>>
>>>> I want to get the field of elb logs. A sample elb log is given below
>>>> and I am using below create table definition. It is working fine. I am
>>>> getting what I wanted but now I want the bold part as well. For example
>>>> eid, tid, aid. Can anyone help me how can I match them as well.
>>>>
>>>> NOTE: The position of aid, eid, tid is not fixed and it may change.
>>>>
>>>> 2016-09-16T06:55:19.056871Z testelb 2.1.7.2:52399 192.168.1.5:80
>>>> 0.000021 0.000596 0.00002 200 200 0 43 "GET
>>>> https://site1.example.com:443/peek?
>>>> *eid=aw123&tid=fskc235n&aid=2ADSFGSDG* HTTP/1.1" "Mozilla/5.0 (Windows
>>>> NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85
>>>> Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
>>>>
>>>>
>>>> CREATE TABLE elblog (
>>>> Request_date STRING,
>>>>       ELBName STRING,
>>>>       RequestIP STRING,
>>>>       RequestPort INT,
>>>>       BackendIP STRING,
>>>>       BackendPort INT,
>>>>       RequestProcessingTime DOUBLE,
>>>>       BackendProcessingTime DOUBLE,
>>>>       ClientResponseTime DOUBLE,
>>>>       ELBResponseCode STRING,
>>>>       BackendResponseCode STRING,
>>>>       ReceivedBytes BIGINT,
>>>>       SentBytes BIGINT,
>>>>       RequestVerb STRING,
>>>>       URL STRING,
>>>>       Protocol STRING,
>>>> Useragent STRING,
>>>> ssl_cipher STRING,
>>>> ssl_protocol STRING
>>>> )
>>>> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
>>>> WITH SERDEPROPERTIES (
>>>>           "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
>>>> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
>>>> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
>>>> )
>>>> STORED AS TEXTFILE;
>>>>
>>>
>>>
>>
>

Re: ELB Log processing

Posted by Damien Carol <da...@gmail.com>.
The royal way to do that is a view IMHO.

2016-09-20 12:14 GMT+02:00 Manish Rangari <li...@gmail.com>:

> Thanks for the reply Damien. The suggestion you gave is really useful.
> Currently I am achieving my desired output by performing below steps. But I
> want to achieve the desired result in one step instead of two. Do we have
> any way so that I can get the aid, did etc in create table statement? If
> not I will have to look for the option that you mentioned
>
> 1.
> CREATE TABLE elblog (
> Request_date STRING,
>       ELBName STRING,
>       RequestIP STRING,
>       RequestPort INT,
>       BackendIP STRING,
>       BackendPort INT,
>       RequestProcessingTime DOUBLE,
>       BackendProcessingTime DOUBLE,
>       ClientResponseTime DOUBLE,
>       ELBResponseCode STRING,
>       BackendResponseCode STRING,
>       ReceivedBytes BIGINT,
>       SentBytes BIGINT,
>       RequestVerb STRING,
>       URL STRING,
>       Protocol STRING,
> Useragent STRING,
> ssl_cipher STRING,
> ssl_protocol STRING
> )
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
> WITH SERDEPROPERTIES (
>           "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
> )
> STORED AS TEXTFILE;
>
> 2.
> create table elb_raw_log as select request_date, elbname, requestip,
> requestport, backendip, backendport, requestprocessingtime,
> backendprocessingtime, clientresponsetime, elbresponsecode,
> backendresponsecode, receivedbytes, sentbytes, requestverb, url,
> regexp_extract(url, '.*aid=([a-zA-Z0-9]+).*', 1) as aid,
> regexp_extract(url, '.*tid=([a-zA-Z0-9]+).*', 1) as tid,
> regexp_extract(url, '.*eid=([a-zA-Z0-9]+).*', 1) as eid,
> regexp_extract(url, '.*did=([a-zA-Z0-9]+).*', 1) as did, protocol,
> useragent, ssl_cipher, ssl_protocol from elblog;
>
> On Tue, Sep 20, 2016 at 3:12 PM, Damien Carol <da...@gmail.com>
> wrote:
>
>> see the udf
>> *parse_url_tuple*
>> SELECT b.*
>> FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY',
>> 'QUERY:id') b as host, path, query, query_id LIMIT 1;
>>
>>
>> https://cwiki.apache.org/confluence/display/Hive/LanguageMan
>> ual+UDF#LanguageManualUDF-parse_url_tuple
>>
>> 2016-09-20 11:22 GMT+02:00 Manish Rangari <linuxtricksfordevops@gmail.com
>> >:
>>
>>> Guys,
>>>
>>> I want to get the field of elb logs. A sample elb log is given below and
>>> I am using below create table definition. It is working fine. I am getting
>>> what I wanted but now I want the bold part as well. For example eid, tid,
>>> aid. Can anyone help me how can I match them as well.
>>>
>>> NOTE: The position of aid, eid, tid is not fixed and it may change.
>>>
>>> 2016-09-16T06:55:19.056871Z testelb 2.1.7.2:52399 192.168.1.5:80
>>> 0.000021 0.000596 0.00002 200 200 0 43 "GET
>>> https://site1.example.com:443/peek?
>>> *eid=aw123&tid=fskc235n&aid=2ADSFGSDG* HTTP/1.1" "Mozilla/5.0 (Windows
>>> NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85
>>> Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
>>>
>>>
>>> CREATE TABLE elblog (
>>> Request_date STRING,
>>>       ELBName STRING,
>>>       RequestIP STRING,
>>>       RequestPort INT,
>>>       BackendIP STRING,
>>>       BackendPort INT,
>>>       RequestProcessingTime DOUBLE,
>>>       BackendProcessingTime DOUBLE,
>>>       ClientResponseTime DOUBLE,
>>>       ELBResponseCode STRING,
>>>       BackendResponseCode STRING,
>>>       ReceivedBytes BIGINT,
>>>       SentBytes BIGINT,
>>>       RequestVerb STRING,
>>>       URL STRING,
>>>       Protocol STRING,
>>> Useragent STRING,
>>> ssl_cipher STRING,
>>> ssl_protocol STRING
>>> )
>>> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
>>> WITH SERDEPROPERTIES (
>>>           "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
>>> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
>>> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
>>> )
>>> STORED AS TEXTFILE;
>>>
>>
>>
>

Re: ELB Log processing

Posted by Manish Rangari <li...@gmail.com>.
Thanks for the reply Damien. The suggestion you gave is really useful.
Currently I am achieving my desired output by performing below steps. But I
want to achieve the desired result in one step instead of two. Do we have
any way so that I can get the aid, did etc in create table statement? If
not I will have to look for the option that you mentioned

1.
CREATE TABLE elblog (
Request_date STRING,
      ELBName STRING,
      RequestIP STRING,
      RequestPort INT,
      BackendIP STRING,
      BackendPort INT,
      RequestProcessingTime DOUBLE,
      BackendProcessingTime DOUBLE,
      ClientResponseTime DOUBLE,
      ELBResponseCode STRING,
      BackendResponseCode STRING,
      ReceivedBytes BIGINT,
      SentBytes BIGINT,
      RequestVerb STRING,
      URL STRING,
      Protocol STRING,
Useragent STRING,
ssl_cipher STRING,
ssl_protocol STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
          "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
)
STORED AS TEXTFILE;

2.
create table elb_raw_log as select request_date, elbname, requestip,
requestport, backendip, backendport, requestprocessingtime,
backendprocessingtime, clientresponsetime, elbresponsecode,
backendresponsecode, receivedbytes, sentbytes, requestverb, url,
regexp_extract(url, '.*aid=([a-zA-Z0-9]+).*', 1) as aid,
regexp_extract(url, '.*tid=([a-zA-Z0-9]+).*', 1) as tid,
regexp_extract(url, '.*eid=([a-zA-Z0-9]+).*', 1) as eid,
regexp_extract(url, '.*did=([a-zA-Z0-9]+).*', 1) as did, protocol,
useragent, ssl_cipher, ssl_protocol from elblog;

On Tue, Sep 20, 2016 at 3:12 PM, Damien Carol <da...@gmail.com>
wrote:

> see the udf
> *parse_url_tuple*
> SELECT b.*
> FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY',
> 'QUERY:id') b as host, path, query, query_id LIMIT 1;
>
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#
> LanguageManualUDF-parse_url_tuple
>
> 2016-09-20 11:22 GMT+02:00 Manish Rangari <li...@gmail.com>
> :
>
>> Guys,
>>
>> I want to get the field of elb logs. A sample elb log is given below and
>> I am using below create table definition. It is working fine. I am getting
>> what I wanted but now I want the bold part as well. For example eid, tid,
>> aid. Can anyone help me how can I match them as well.
>>
>> NOTE: The position of aid, eid, tid is not fixed and it may change.
>>
>> 2016-09-16T06:55:19.056871Z testelb 2.1.7.2:52399 192.168.1.5:80
>> 0.000021 0.000596 0.00002 200 200 0 43 "GET
>> https://site1.example.com:443/peek?*eid=aw123&tid=fskc235n&aid=2ADSFGSDG*
>> HTTP/1.1" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like
>> Gecko) Chrome/45.0.2454.85 Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256
>> TLSv1.2
>>
>>
>> CREATE TABLE elblog (
>> Request_date STRING,
>>       ELBName STRING,
>>       RequestIP STRING,
>>       RequestPort INT,
>>       BackendIP STRING,
>>       BackendPort INT,
>>       RequestProcessingTime DOUBLE,
>>       BackendProcessingTime DOUBLE,
>>       ClientResponseTime DOUBLE,
>>       ELBResponseCode STRING,
>>       BackendResponseCode STRING,
>>       ReceivedBytes BIGINT,
>>       SentBytes BIGINT,
>>       RequestVerb STRING,
>>       URL STRING,
>>       Protocol STRING,
>> Useragent STRING,
>> ssl_cipher STRING,
>> ssl_protocol STRING
>> )
>> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
>> WITH SERDEPROPERTIES (
>>           "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
>> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
>> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
>> )
>> STORED AS TEXTFILE;
>>
>
>

Re: ELB Log processing

Posted by Damien Carol <da...@gmail.com>.
see the udf
*parse_url_tuple*
SELECT b.*
FROM src LATERAL VIEW parse_url_tuple(fullurl, 'HOST', 'PATH', 'QUERY',
'QUERY:id') b as host, path, query, query_id LIMIT 1;


https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-parse_url_tuple

2016-09-20 11:22 GMT+02:00 Manish Rangari <li...@gmail.com>:

> Guys,
>
> I want to get the field of elb logs. A sample elb log is given below and I
> am using below create table definition. It is working fine. I am getting
> what I wanted but now I want the bold part as well. For example eid, tid,
> aid. Can anyone help me how can I match them as well.
>
> NOTE: The position of aid, eid, tid is not fixed and it may change.
>
> 2016-09-16T06:55:19.056871Z testelb 2.1.7.2:52399 192.168.1.5:80 0.000021
> 0.000596 0.00002 200 200 0 43 "GET https://site1.example.com:443/peek?
> *eid=aw123&tid=fskc235n&aid=2ADSFGSDG* HTTP/1.1" "Mozilla/5.0 (Windows NT
> 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85
> Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
>
>
> CREATE TABLE elblog (
> Request_date STRING,
>       ELBName STRING,
>       RequestIP STRING,
>       RequestPort INT,
>       BackendIP STRING,
>       BackendPort INT,
>       RequestProcessingTime DOUBLE,
>       BackendProcessingTime DOUBLE,
>       ClientResponseTime DOUBLE,
>       ELBResponseCode STRING,
>       BackendResponseCode STRING,
>       ReceivedBytes BIGINT,
>       SentBytes BIGINT,
>       RequestVerb STRING,
>       URL STRING,
>       Protocol STRING,
> Useragent STRING,
> ssl_cipher STRING,
> ssl_protocol STRING
> )
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
> WITH SERDEPROPERTIES (
>           "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^
> ]*):([0-9]*) ([.0-9]*) ([.0-9]*) ([.0-9]*) (-|[0-9]*) (-|[0-9]*) ([-0-9]*)
> ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"(.*)\" (.*) (.*)$"
> )
> STORED AS TEXTFILE;
>