You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohit Laddha (JIRA)" <ji...@apache.org> on 2014/02/28 08:44:20 UTC
[jira] [Commented] (PIG-3119) Aggregation not working in
conjunction with REGEX_EXTRACT_ALL
[ https://issues.apache.org/jira/browse/PIG-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915532#comment-13915532 ]
Rohit Laddha commented on PIG-3119:
-----------------------------------
I think problem is not aggregation with REGEX_EXTRACT_ALL. Problem is B is not having the expected output. It has empty tuples. So problem lies in REGEX_EXTREACT_ALL. It is not giving expected output.
> Aggregation not working in conjunction with REGEX_EXTRACT_ALL
> -------------------------------------------------------------
>
> Key: PIG-3119
> URL: https://issues.apache.org/jira/browse/PIG-3119
> Project: Pig
> Issue Type: Bug
> Components: build, grunt
> Affects Versions: 0.9.1
> Environment: OS -version
> ================================
> Linux version 2.6.18-194.3.1.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48))
> software installed
> =======================
> hadoop-1.0.4
> pig-0.9.1
> Hardware details
> ====================================
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 6
> model : 26
> model name : Intel(R) Xeon(R) CPU X5560 @ 2.80GHz
> stepping : 4
> cpu MHz : 2800.098
> cache size : 8192 KB
> fpu : yes
> fpu_exception : yes
> cpuid level : 11
> Reporter: siddhartha Pattanaik
> Priority: Critical
> Labels: newbie
> Fix For: 0.9.1
>
> Attachments: starwar_log1.txt
>
> Original Estimate: 276h
> Remaining Estimate: 276h
>
> Hi ,
> I have a use case in my project requirement,
> The i/p file consist of the following pattern:-
> 192.168.90.36 - - [16/May/2012:16:00:11 -0700] "GET /img/explore/encyclopedia/characters/yoda_card.jpg HTTP/1.1" 200 22620 "http://www.starwars.com/explore/encyclopedia/characters/2/featured/" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)" "Wookie-Cookie=474ca6b302a46696a1ec55f4b656f8c3; __utma=181359608.119611689.1337206567.1337206567.1337206567.1; __utmb=181359608.79.9.1337209104786; __utmc=181359608; __utmz=181359608.1337206567.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); JSESSIONID=aHX_NQheRq08" "-" 0
> I want to run a aggregate function along with regex_extract_all to extract the desired data.
> Even though the i/p file is parsing.I have issue with aggregate function working on it.
> Please find the below pig script:-
> ***************Ip_adress-count************************
> Ip_adress_count.pig
>
> A = LOAD 'starwar_log1' USING TextLoader AS (line:chararray);
> B = FOREACH A GENERATE FLATTEN (REGEX_EXTRACT_ALL(line,'^(\\S+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] "(.+?)" (\\S+) (\\S+) "([^"]*)" "([^"]*)" "([^"]*)" (\\S+) ') ) AS
> (
> remoteAddr: chararray,
> remoteLogname: chararray,
> user: chararray,
> time: chararray,
> request: chararray,
> status: int,
> bytes_string: chararray,
> referrer: chararray,
> Mozilla: chararray,
> wookie_cookie: chararray,
> browser3: chararray,
> acess_status:int
> );
> C = group B by remoteAddr;
> D = foreach C generate COUNT(B) as ip_adress_count;
> E = order D by ip_adress_count;
> F = STORE E INTO ‘ip_adress_count/' using PigStorage(',');
> Expected O/p
> ===========================
> ip_adress_count
> remoteAddr,ip_adress_count
> 192.168.90.36,19
> 192.168.90.37,1
> There is no parsing issue but the aggregate function count() is not working over the regex_extract_all function for regular expression.
> Please do the need.The requirement is I need the count of the ip adresses from the ip data.
> thanks,
> siddharth
> contact -8763666372
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)