You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Zhijie Shen (JIRA)" <ji...@apache.org> on 2011/03/22 13:35:05 UTC

[jira] [Commented] (PIG-1916) Nested cross

    [ https://issues.apache.org/jira/browse/PIG-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009626#comment-13009626 ] 

Zhijie Shen commented on PIG-1916:
----------------------------------

Hi developers,

I'm a graduate student and interested in big data. I'm interested in applying for GSoC with Pig. I've already successfully built Pig locally, and read two papers about Pig published on SIGMOD and VLDB. Now I'm investigating into this issue and the related one: 1631, which seem to be interesting additions to Pig. However, there's quite limited information here. Will anybody give me some hints of where I should start in Pig code to deal with this issue?



> Nested cross
> ------------
>
>                 Key: PIG-1916
>                 URL: https://issues.apache.org/jira/browse/PIG-1916
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Daniel Dai
>              Labels: gsoc2011
>             Fix For: 0.10
>
>
> It is useful to have cross inside foreach nested statement. One typical use case for nested foreach is after cogroup two relations, we want to flatten the records of the same key, and do some processing. This is naturally to be achieved by cross. Eg:
> {code}
> C = cogroup user by uid, session by uid;
> D = foreach C {
>     crossed = cross user, session; -- To flatten two input bags
>     filtered = filter crossed by user::region == session::region;
>     result = foreach crossed generate processSession(user::age, user::gender, session::ip);  --Nested foreach Jira: PIG-1631
>     generate result;
> }
> {code}
> If we don't have cross, user have to write a UDF process the bag user, session. It is much harder than a UDF process flattened tuples. This is especially true when we have nested foreach statement(PIG-1631).
> This is a candidate project for Google summer of code 2011. More information about the program can be found at http://wiki.apache.org/pig/GSoc2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira