You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Carlos Flores (JIRA)" <ji...@apache.org> on 2016/11/27 17:40:58 UTC

[jira] [Created] (PIG-5069) Skewed Join is crashing job

Carlos Flores created PIG-5069:
----------------------------------

             Summary: Skewed Join is crashing job
                 Key: PIG-5069
                 URL: https://issues.apache.org/jira/browse/PIG-5069
             Project: Pig
          Issue Type: Bug
            Reporter: Carlos Flores


Script below was working fine, but when i added the skewed join it began to give errors.
ERROR: java.lang.Long cannot be cast to org.apache.pig.data.Tuple


{code:sql}
SET  mapred.job.queue.name marathon;
SET pig.maxCombinedSplitSize 2147483648;
SET default_parallel 500;

dim_member_skill_final_opp_1 = LOAD '/user/caflores/SkillsDashboardUS/OPP-JOIN' USING LiAvroStorage();

top_skills_1 = LOAD '/user/caflores/SkillsDashboardUS/Top_Skills_Only' using LiAvroStorage();

----------------------------------------------------------------------------
dim_member_skill_final_opp = GROUP dim_member_skill_final_opp_1 by (country_sk,skill);

top_skills = GROUP top_skills_1 by (country_sk,skill);

opp_country = JOIN dim_member_skill_final_opp BY (group), top_skills BY (group) using 'skewed';

opp_country_generate = FOREACH opp_country GENERATE
FLATTEN(top_skills::group) as (country_sk,skill),
FLATTEN(top_skills::top_skills_1) as (country_sk2,title_sk,skill2,sum_of_members),
FLATTEN(dim_member_skill_final_opp::dim_member_skill_final_opp_1) as (member_sk,country_sk1,skill1);

opp_generate = FOREACH opp_country_generate GENERATE
country_sk,
title_sk,
member_sk;

opp_distinct = DISTINCT opp_generate;

opp_grouping = GROUP opp_distinct BY (country_sk,title_sk);

opp_count = FOREACH opp_grouping GENERATE
FLATTEN(group) AS (country_sk,title_sk),
COUNT(opp_distinct) AS sum_of_members;

store opp_count into '/user/caflores/JonathansUpdate/OPP-Index-US-skewed' using LiAvroStorage();{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)