You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Sid Stuart <si...@meez.com> on 2012/02/02 06:07:14 UTC
Question on generate semantics
Hi,
I'm using Pig to analyze some log files. We would like to find the last
time a URL has been accessed. I've pulled out the path and the time, but
I'm having difficulty create a relation of paths and latest access. My
thought was to group the relation by the path and then order the bags on
the time. The code looks like,
paths = group formatted by path;
final = foreach paths {
-- Pull out the bag and sort the tuples by the date field.
sorted = order formatted by lastAccess DESC;
biggest = limit sorted 1;
-- Return the first tuple, it should have the largest timestamp.
generate group, biggest;
};
When I illustrate this though, the result is the group and then a bag of
the tuples, not a single tuple,
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| raw | line:chararray
|
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
images.meez.com[09/Oct/2010:23:29:30 +0000] 68.96.191.249 Anonymous
90B50123876665F7 REST.GET.OBJECT
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
/resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 200
- 54023 54023 15 11 "
http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia" "Mozilla/5.0
(Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.3 (KHTML, like Gecko)
Chrome/6.0.472.63 Safari/534.3" - |
| |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
images.meez.com[09/Oct/2010:23:28:46 +0000] 75.16.59.201 Anonymous
B539D577B740FDEA REST.GET.OBJECT
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
/resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 304
- - 54023 5 - "http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705;
Media Center PC 3.1; .NET CLR 1.1.4322)" - |
| |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
images.meez.com[09/Oct/2010:23:29:30 +0000] 68.96.191.249 Anonymous
90B50123876665F7 REST.GET.OBJECT
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
/resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 200
- 54023 54023 15 11 "
http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia" "Mozilla/5.0
(Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.3 (KHTML, like Gecko)
Chrome/6.0.472.63 Safari/534.3" - |
| |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
images.meez.com[09/Oct/2010:23:29:30 +0000] 68.96.191.249 Anonymous
90B50123876665F7 REST.GET.OBJECT
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
/resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 200
- 54023 54023 15 11 "
http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia" "Mozilla/5.0
(Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.3 (KHTML, like Gecko)
Chrome/6.0.472.63 Safari/534.3" - |
| |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
images.meez.com[09/Oct/2010:23:28:46 +0000] 75.16.59.201 Anonymous
B539D577B740FDEA REST.GET.OBJECT
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
/resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 304
- - 54023 5 - "http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705;
Media Center PC 3.1; .NET CLR 1.1.4322)" - |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| pieces | garbage:chararray
| host:chararray | date:chararray | ip:chararray |
user:chararray | garbage2:chararray | operation:chararray |
path:chararray |
command:chararray |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
images.meez.com | 09/Oct/2010:23:29:30 +0000 | 68.96.191.249 |
Anonymous | 90B50123876665F7 | REST.GET.OBJECT |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
|
| |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
images.meez.com | 09/Oct/2010:23:28:46 +0000 | 75.16.59.201 |
Anonymous | B539D577B740FDEA | REST.GET.OBJECT |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
|
| |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
images.meez.com | 09/Oct/2010:23:29:30 +0000 | 68.96.191.249 |
Anonymous | 90B50123876665F7 | REST.GET.OBJECT |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
|
| |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
images.meez.com | 09/Oct/2010:23:29:30 +0000 | 68.96.191.249 |
Anonymous | 90B50123876665F7 | REST.GET.OBJECT |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
|
| |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
images.meez.com | 09/Oct/2010:23:28:46 +0000 | 75.16.59.201 |
Anonymous | B539D577B740FDEA | REST.GET.OBJECT |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
|
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------
| formatted | path:chararray
| lastAccess:long |
------------------------------------------------------------------------------------------------------
| |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
1286666970000 |
| |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
1286666926000 |
| |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
1286666970000 |
| |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
1286666970000 |
| |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
1286666926000 |
------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| paths | group:chararray
| formatted:bag{:tuple(path:chararray,lastAccess:long)}
|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| | resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png
| {(resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png,
1286666970000), ...,
(resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png,
1286666926000)} |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------
| final.sorted | path:chararray
| lastAccess:long |
---------------------------------------------------------------------------------------------------------
| |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
1286666970000 |
---------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------
| final.biggest | path:chararray
| lastAccess:long |
----------------------------------------------------------------------------------------------------------
| |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
1286666970000 |
----------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| final | group:chararray
| biggest:bag{:tuple(path:chararray,lastAccess:long)}
|
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| | resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png
| {(resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png,
1286666970000), ...,
(resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png,
1286666926000)} |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Re: Question on generate semantics
Posted by Xiaomeng Wan <sh...@gmail.com>.
a bag of one tuple is still a bag, you need to flatten it
generate group, FLATTEN(biggest);
Shawn
On Wed, Feb 1, 2012 at 10:07 PM, Sid Stuart <si...@meez.com> wrote:
> Hi,
>
> I'm using Pig to analyze some log files. We would like to find the last
> time a URL has been accessed. I've pulled out the path and the time, but
> I'm having difficulty create a relation of paths and latest access. My
> thought was to group the relation by the path and then order the bags on
> the time. The code looks like,
>
> paths = group formatted by path;
>
> final = foreach paths {
> -- Pull out the bag and sort the tuples by the date field.
> sorted = order formatted by lastAccess DESC;
> biggest = limit sorted 1;
> -- Return the first tuple, it should have the largest timestamp.
> generate group, biggest;
> };
>
>
> When I illustrate this though, the result is the group and then a bag of
> the tuples, not a single tuple,
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> | raw | line:chararray
>
>
>
>
>
> |
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> | |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
> images.meez.com[09/Oct/2010:23:29:30 +0000] 68.96.191.249 Anonymous
> 90B50123876665F7 REST.GET.OBJECT
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
> /resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 200
> - 54023 54023 15 11 "
> http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia" "Mozilla/5.0
> (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.3 (KHTML, like Gecko)
> Chrome/6.0.472.63 Safari/534.3" - |
> | |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
> images.meez.com[09/Oct/2010:23:28:46 +0000] 75.16.59.201 Anonymous
> B539D577B740FDEA REST.GET.OBJECT
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
> /resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 304
> - - 54023 5 - "http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia"
> "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705;
> Media Center PC 3.1; .NET CLR 1.1.4322)" - |
> | |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
> images.meez.com[09/Oct/2010:23:29:30 +0000] 68.96.191.249 Anonymous
> 90B50123876665F7 REST.GET.OBJECT
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
> /resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 200
> - 54023 54023 15 11 "
> http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia" "Mozilla/5.0
> (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.3 (KHTML, like Gecko)
> Chrome/6.0.472.63 Safari/534.3" - |
> | |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
> images.meez.com[09/Oct/2010:23:29:30 +0000] 68.96.191.249 Anonymous
> 90B50123876665F7 REST.GET.OBJECT
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
> /resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 200
> - 54023 54023 15 11 "
> http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia" "Mozilla/5.0
> (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.3 (KHTML, like Gecko)
> Chrome/6.0.472.63 Safari/534.3" - |
> | |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
> images.meez.com[09/Oct/2010:23:28:46 +0000] 75.16.59.201 Anonymous
> B539D577B740FDEA REST.GET.OBJECT
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
> /resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 304
> - - 54023 5 - "http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia"
> "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705;
> Media Center PC 3.1; .NET CLR 1.1.4322)" - |
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> | pieces | garbage:chararray
> | host:chararray | date:chararray | ip:chararray |
> user:chararray | garbage2:chararray | operation:chararray |
> path:chararray |
> command:chararray |
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> | |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
> images.meez.com | 09/Oct/2010:23:29:30 +0000 | 68.96.191.249 |
> Anonymous | 90B50123876665F7 | REST.GET.OBJECT |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
> |
> | |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
> images.meez.com | 09/Oct/2010:23:28:46 +0000 | 75.16.59.201 |
> Anonymous | B539D577B740FDEA | REST.GET.OBJECT |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
> |
> | |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
> images.meez.com | 09/Oct/2010:23:29:30 +0000 | 68.96.191.249 |
> Anonymous | 90B50123876665F7 | REST.GET.OBJECT |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
> |
> | |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
> images.meez.com | 09/Oct/2010:23:29:30 +0000 | 68.96.191.249 |
> Anonymous | 90B50123876665F7 | REST.GET.OBJECT |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
> |
> | |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
> images.meez.com | 09/Oct/2010:23:28:46 +0000 | 75.16.59.201 |
> Anonymous | B539D577B740FDEA | REST.GET.OBJECT |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
> |
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------------
> | formatted | path:chararray
> | lastAccess:long |
> ------------------------------------------------------------------------------------------------------
> | |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
> 1286666970000 |
> | |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
> 1286666926000 |
> | |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
> 1286666970000 |
> | |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
> 1286666970000 |
> | |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
> 1286666926000 |
> ------------------------------------------------------------------------------------------------------
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> | paths | group:chararray
> | formatted:bag{:tuple(path:chararray,lastAccess:long)}
>
> |
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> | | resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png
> | {(resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png,
> 1286666970000), ...,
> (resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png,
> 1286666926000)} |
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------------------------
> | final.sorted | path:chararray
> | lastAccess:long |
> ---------------------------------------------------------------------------------------------------------
> | |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
> 1286666970000 |
> ---------------------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------------------
> | final.biggest | path:chararray
> | lastAccess:long |
> ----------------------------------------------------------------------------------------------------------
> | |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
> 1286666970000 |
> ----------------------------------------------------------------------------------------------------------
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> | final | group:chararray
> | biggest:bag{:tuple(path:chararray,lastAccess:long)}
>
> |
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> | | resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png
> | {(resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png,
> 1286666970000), ...,
> (resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png,
> 1286666926000)} |
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------