You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by Keren Ouaknine <ke...@gmail.com> on 2013/12/22 11:36:08 UTC
flattening a map generated by Pigmix
Hi,
Pigmix generates a map called page_info (see of email for links) which I am
flattening in a script as follow:
register pigperf.jar;
A1 = load '/data/pigmix/page_views' using
org.apache.pig.test.udf.storefunc.PigPerformanceLoader()
as (user, action, timespent, query_term, ip_addr, timestamp, estimated_revenue,
page_info, page_links);
A = foreach A1 generate user, RANDOM() as rval:double, action, timespent,
query_term, ip_addr, timestamp, estimated_revenue, page_info, page_links;
B = foreach A generate user, rval, action, timespent, query_term, ip_addr,
timestamp, estimated_revenue;
*C = foreach A generate user, rval, flatten((map[])page_info);*
D = foreach A generate user, rval, flatten((bag{tuple(map[])})page_links);
store B into 'PV_master1' using PigStorage('\t');
store C into 'PV_page_info1' using PigStorage('\t');
store D into 'PV_page_links1' using PigStorage('\t');
I get an error while flattening the map as follows:
2013-12-22 01:17:21,554 [pool-1-thread-1] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map
- Aliases being processed per job phase (AliasName[line,offset]): M:
A1[3,5],A[7
,4],B[8,4],C[9,4],D[10,4] C: R:
2013-12-22 01:17:21,594 [Thread-3] INFO
org.apache.hadoop.mapred.LocalJobRunner - Map task executor complete.
2013-12-22 01:17:21,596 [Thread-3] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local909428065_0001
*java.lang.Exception: org.apache.pig.backend.executionengine.ExecException:
ERROR 0: Exception while executing (Name: C:
Store(file:///media/work/EDBT/data/tony_s/PV_page_info1:PigStorage(' ')) -
scope-*
*52 Operator Key: scope-52):
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception
while executing [POCast (Name: Cast[map:[]] - scope-49 Operator Key:
scope-49) children: [[POProject (N*
*ame: Project[bytearray][8] - scope-48 Operator Key: scope-48) children:
null at []]] at []]: java.lang.UnsupportedOperationException*
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0:
Exception while executing (Name: C:
Store(file:///media/work/EDBT/data/tony_s/PV_page_info1:PigStorage(' ')) -
scope-52 Opera
tor Key: scope-52): org.apache.pig.backend.executionengine.ExecException:
ERROR 0: Exception while executing [POCast (Name: Cast[map:[]] - scope-49
Operator Key: scope-49) children: [[POProject (Name: Pro
ject[bytearray][8] - scope-48 Operator Key: scope-48) children: null at
[]]] at []]: java.lang.UnsupportedOperationException
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
The pigmix page says there is no null values generated for page_info (see
table below). I also looked into the data, but the random generator doesnt
output pretty prints :)
Any ideas what this cast error means?
Thanks,
Keren
Pigmix: https://cwiki.apache.org/confluence/display/PIG/PigMix
DataGenerator: http://wiki.apache.org/pig/DataGeneratorHadoop
Name
Type
Average Length
Cardinality
Distribution
Percent Null
user
string
20
1.6M
zipf
7
timestamp
long
X
86400
uniform
0
timespent
int
X
20
zipf
0
query_term
string
10
1.8M
zipf
20
page_links
bag of maps
50
X
zipf
20
*page_info*
*map*
*15*
*X*
*zipf*
*0*
ip_addr
long
X
1M
zipf
0
estimated_revenue
double
X
100k
zipf
5
action
int
X
2
uniform
0
--
Keren Ouaknine
www.kereno.com