You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "Sabbidi, Prashanth" <Pr...@VerizonWireless.com.INVALID> on 2015/09/30 04:10:21 UTC
OLAP like CUBE operations in Pig
Hi,
I am using Pig CUBE to generate OLAP like CUBE aggregations. I have 8 dimensions to aggregate on.
However, the performance is so bad, and taking more than 2-3 hours for aggregating over 50K rows. The reduce job while executing CUBE is taking long time (example below).
Can someone please suggest where should I start to improving the performance.
Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MedianMapTime MaxReduceTime MinReduceTime AvgReduceTime MedianReducetime Alias Feature Outputs
job_1440641036158_61555 1 1 23 23 23 23 10 10 10 10 tmp,tmp1,tmp2,userAccessData,userAccessData1,userAccessData2,userAccessData3 GROUP_BY
job_1440641036158_61556 1 1 9 9 9 9 7 7 7 7 activities_1,activityData,activityData_1,activityData_2,activityData_3,activityData_4,activit
yData_5 GROUP_BY
job_1440641036158_61558 2 1 11 7 9 9 9 9 9 9 joinedData,joinedData_1,joinedData_2,rawData,rawData1,userData HASH_JOIN
job_1440641036158_61560 2 1 8 7 8 8 6464 6464 6464 6464 cube,data,data2 HASH_JOIN
job_1440641036158_61615 1516 204 249 24 76 43 373 276 308 301 sessions,sessions2,sessions_new,sessions_new_distinct,sessions_return,sessions_return_distinct,summary_Day,summary_Day_1,summary_Day_2,summary_Day_3,summary_Day_4,summary_Day_5,users,users2,users_new,users_new_distinct,users_return,users_return_distinct,users_tmp GROUP_BY,DISTINCT mobile_diag_dev_tbls.appAnalytic_users_cumulative3,
Regards
Reddy