You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Sid Stuart <si...@meez.com> on 2012/02/02 06:07:14 UTC

Question on generate semantics

Hi,

I'm using Pig to analyze some log files. We would like to find the last
time a URL has been accessed. I've pulled out the path and the time, but
I'm having difficulty create a relation of paths and latest access.  My
thought was to group the relation by the path and then order the bags on
the time. The code looks like,

paths = group formatted by path;

final = foreach paths {
-- Pull out the bag and sort the tuples by the date field.
sorted = order formatted by lastAccess DESC;
biggest = limit sorted 1;
-- Return the first tuple, it should have the largest timestamp.
generate group, biggest;
};


When I illustrate this though, the result is the group and then a bag of
the tuples, not a single tuple,

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| raw     | line:chararray





                                                                   |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|         |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
images.meez.com[09/Oct/2010:23:29:30 +0000] 68.96.191.249 Anonymous
90B50123876665F7 REST.GET.OBJECT
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
/resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 200
- 54023 54023 15 11 "
http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia" "Mozilla/5.0
(Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.3 (KHTML, like Gecko)
Chrome/6.0.472.63 Safari/534.3" - |
|         |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
images.meez.com[09/Oct/2010:23:28:46 +0000] 75.16.59.201 Anonymous
B539D577B740FDEA REST.GET.OBJECT
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
/resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 304
- - 54023 5 - "http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705;
Media Center PC 3.1; .NET CLR 1.1.4322)" -          |
|         |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
images.meez.com[09/Oct/2010:23:29:30 +0000] 68.96.191.249 Anonymous
90B50123876665F7 REST.GET.OBJECT
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
/resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 200
- 54023 54023 15 11 "
http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia" "Mozilla/5.0
(Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.3 (KHTML, like Gecko)
Chrome/6.0.472.63 Safari/534.3" - |
|         |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
images.meez.com[09/Oct/2010:23:29:30 +0000] 68.96.191.249 Anonymous
90B50123876665F7 REST.GET.OBJECT
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
/resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 200
- 54023 54023 15 11 "
http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia" "Mozilla/5.0
(Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.3 (KHTML, like Gecko)
Chrome/6.0.472.63 Safari/534.3" - |
|         |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
images.meez.com[09/Oct/2010:23:28:46 +0000] 75.16.59.201 Anonymous
B539D577B740FDEA REST.GET.OBJECT
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
/resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 304
- - 54023 5 - "http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705;
Media Center PC 3.1; .NET CLR 1.1.4322)" -          |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| pieces     | garbage:chararray
     | host:chararray     | date:chararray             | ip:chararray     |
user:chararray     | garbage2:chararray     | operation:chararray     |
path:chararray                                               |
command:chararray     |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|            |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
images.meez.com    | 09/Oct/2010:23:29:30 +0000 | 68.96.191.249    |
Anonymous          | 90B50123876665F7       | REST.GET.OBJECT         |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
         |
|            |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
images.meez.com    | 09/Oct/2010:23:28:46 +0000 | 75.16.59.201     |
Anonymous          | B539D577B740FDEA       | REST.GET.OBJECT         |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
         |
|            |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
images.meez.com    | 09/Oct/2010:23:29:30 +0000 | 68.96.191.249    |
Anonymous          | 90B50123876665F7       | REST.GET.OBJECT         |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
         |
|            |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
images.meez.com    | 09/Oct/2010:23:29:30 +0000 | 68.96.191.249    |
Anonymous          | 90B50123876665F7       | REST.GET.OBJECT         |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
         |
|            |
5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
images.meez.com    | 09/Oct/2010:23:28:46 +0000 | 75.16.59.201     |
Anonymous          | B539D577B740FDEA       | REST.GET.OBJECT         |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
         |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------
| formatted     | path:chararray
    | lastAccess:long     |
------------------------------------------------------------------------------------------------------
|               |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
1286666970000       |
|               |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
1286666926000       |
|               |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
1286666970000       |
|               |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
1286666970000       |
|               |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
1286666926000       |
------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| paths     | group:chararray
 | formatted:bag{:tuple(path:chararray,lastAccess:long)}

                |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|           | resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png
| {(resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png,
1286666970000), ...,
(resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png,
1286666926000)} |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------
| final.sorted     | path:chararray
      | lastAccess:long     |
---------------------------------------------------------------------------------------------------------
|                  |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
1286666970000       |
---------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------
| final.biggest     | path:chararray
        | lastAccess:long     |
----------------------------------------------------------------------------------------------------------
|                   |
resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
1286666970000       |
----------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| final     | group:chararray
 | biggest:bag{:tuple(path:chararray,lastAccess:long)}

                |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|           | resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png
| {(resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png,
1286666970000), ...,
(resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png,
1286666926000)} |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Re: Question on generate semantics

Posted by Xiaomeng Wan <sh...@gmail.com>.
a bag of one tuple is still a bag, you need to flatten it

generate group, FLATTEN(biggest);

Shawn

On Wed, Feb 1, 2012 at 10:07 PM, Sid Stuart <si...@meez.com> wrote:
> Hi,
>
> I'm using Pig to analyze some log files. We would like to find the last
> time a URL has been accessed. I've pulled out the path and the time, but
> I'm having difficulty create a relation of paths and latest access.  My
> thought was to group the relation by the path and then order the bags on
> the time. The code looks like,
>
> paths = group formatted by path;
>
> final = foreach paths {
> -- Pull out the bag and sort the tuples by the date field.
> sorted = order formatted by lastAccess DESC;
> biggest = limit sorted 1;
> -- Return the first tuple, it should have the largest timestamp.
> generate group, biggest;
> };
>
>
> When I illustrate this though, the result is the group and then a bag of
> the tuples, not a single tuple,
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> | raw     | line:chararray
>
>
>
>
>
>                                                                   |
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> |         |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
> images.meez.com[09/Oct/2010:23:29:30 +0000] 68.96.191.249 Anonymous
> 90B50123876665F7 REST.GET.OBJECT
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
> /resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 200
> - 54023 54023 15 11 "
> http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia" "Mozilla/5.0
> (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.3 (KHTML, like Gecko)
> Chrome/6.0.472.63 Safari/534.3" - |
> |         |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
> images.meez.com[09/Oct/2010:23:28:46 +0000] 75.16.59.201 Anonymous
> B539D577B740FDEA REST.GET.OBJECT
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
> /resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 304
> - - 54023 5 - "http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia"
> "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705;
> Media Center PC 3.1; .NET CLR 1.1.4322)" -          |
> |         |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
> images.meez.com[09/Oct/2010:23:29:30 +0000] 68.96.191.249 Anonymous
> 90B50123876665F7 REST.GET.OBJECT
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
> /resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 200
> - 54023 54023 15 11 "
> http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia" "Mozilla/5.0
> (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.3 (KHTML, like Gecko)
> Chrome/6.0.472.63 Safari/534.3" - |
> |         |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
> images.meez.com[09/Oct/2010:23:29:30 +0000] 68.96.191.249 Anonymous
> 90B50123876665F7 REST.GET.OBJECT
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
> /resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 200
> - 54023 54023 15 11 "
> http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia" "Mozilla/5.0
> (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.3 (KHTML, like Gecko)
> Chrome/6.0.472.63 Safari/534.3" - |
> |         |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090
> images.meez.com[09/Oct/2010:23:28:46 +0000] 75.16.59.201 Anonymous
> B539D577B740FDEA REST.GET.OBJECT
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png "GET
> /resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png HTTP/1.1" 304
> - - 54023 5 - "http://www.meez.com/roomz.dm?act=hangoutz&hood_id=arcadia"
> "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705;
> Media Center PC 3.1; .NET CLR 1.1.4322)" -          |
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> | pieces     | garbage:chararray
>     | host:chararray     | date:chararray             | ip:chararray     |
> user:chararray     | garbage2:chararray     | operation:chararray     |
> path:chararray                                               |
> command:chararray     |
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> |            |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
> images.meez.com    | 09/Oct/2010:23:29:30 +0000 | 68.96.191.249    |
> Anonymous          | 90B50123876665F7       | REST.GET.OBJECT         |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
>         |
> |            |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
> images.meez.com    | 09/Oct/2010:23:28:46 +0000 | 75.16.59.201     |
> Anonymous          | B539D577B740FDEA       | REST.GET.OBJECT         |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
>         |
> |            |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
> images.meez.com    | 09/Oct/2010:23:29:30 +0000 | 68.96.191.249    |
> Anonymous          | 90B50123876665F7       | REST.GET.OBJECT         |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
>         |
> |            |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
> images.meez.com    | 09/Oct/2010:23:29:30 +0000 | 68.96.191.249    |
> Anonymous          | 90B50123876665F7       | REST.GET.OBJECT         |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
>         |
> |            |
> 5f49db3521a7480e2a3ff2da407bdae72dbd22a6d9db4c81377182bedb68d090 |
> images.meez.com    | 09/Oct/2010:23:28:46 +0000 | 75.16.59.201     |
> Anonymous          | B539D577B740FDEA       | REST.GET.OBJECT         |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png | "GET
>         |
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------------
> | formatted     | path:chararray
>    | lastAccess:long     |
> ------------------------------------------------------------------------------------------------------
> |               |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
> 1286666970000       |
> |               |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
> 1286666926000       |
> |               |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
> 1286666970000       |
> |               |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
> 1286666970000       |
> |               |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
> 1286666926000       |
> ------------------------------------------------------------------------------------------------------
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> | paths     | group:chararray
>  | formatted:bag{:tuple(path:chararray,lastAccess:long)}
>
>                |
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> |           | resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png
> | {(resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png,
> 1286666970000), ...,
> (resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png,
> 1286666926000)} |
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------------------------
> | final.sorted     | path:chararray
>      | lastAccess:long     |
> ---------------------------------------------------------------------------------------------------------
> |                  |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
> 1286666970000       |
> ---------------------------------------------------------------------------------------------------------
> ----------------------------------------------------------------------------------------------------------
> | final.biggest     | path:chararray
>        | lastAccess:long     |
> ----------------------------------------------------------------------------------------------------------
> |                   |
> resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png |
> 1286666970000       |
> ----------------------------------------------------------------------------------------------------------
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> | final     | group:chararray
>  | biggest:bag{:tuple(path:chararray,lastAccess:long)}
>
>                |
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> |           | resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png
> | {(resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png,
> 1286666970000), ...,
> (resources/world/spaces-1.0/Icons/Arcadia/graveyardchapel.png,
> 1286666926000)} |
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------