You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Mix Nin <pi...@gmail.com> on 2013/11/15 00:03:03 UTC

Using variables generated by FOREACH command

Hi

I  have a group and foreach statements as below

grouped = GROUP filterdata BY (page_name,web_session_id);
x = foreach grouped {
distinct_web_cookie_id= DISTINCT filterdata.web_cookie_id;
distinct_encrypted_customer_id= DISTINCT filterdata.encrypted_customer_id;
distinct_web_session_id= DISTINCT filterdata.web_session_id;
distinct_event_time = DISTINCT filterdata.event_time;
distinct_customer_id = DISTINCT filterdata.customer_id;
generate flatten(group), COUNT_STAR(distinct_web_cookie_id) AS
distinct_web_cookie_id,  COUNT_STAR(distinct_encrypted_customer_id) AS
distinct_encrypted_customer_id, COUNT_STAR(distinct_customer_id) AS
distinct_customer_id, COUNT_STAR(distinct_web_session_id) AS
distinct_web_session_id ,COUNT_STAR(filterdata) AS cnt_events;
};


Now I  want to group on Session_id in x and get the sum of (cnt_events) and
written below commands

grouped2 = GROUP  x BY page_name;
d = foreach grouped2 generate group, COUNT_STAR(cnt_events) tot_events;

When I run "grouped2 = GROUP  x BY page_name;", I get below error:

[main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
<line 31, column 23> Invalid field projection. Projected field [page_name]
does not exist in schema: event_time:chararray.


When I use describe x, I get output as x: {event_time: chararray}

Not  sure whether schema for foreach statement works? How do I solve this
problem.

Thanks