You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Carlos Flores (JIRA)" <ji...@apache.org> on 2016/11/27 17:40:58 UTC
[jira] [Created] (PIG-5069) Skewed Join is crashing job
Carlos Flores created PIG-5069:
----------------------------------
Summary: Skewed Join is crashing job
Key: PIG-5069
URL: https://issues.apache.org/jira/browse/PIG-5069
Project: Pig
Issue Type: Bug
Reporter: Carlos Flores
Script below was working fine, but when i added the skewed join it began to give errors.
ERROR: java.lang.Long cannot be cast to org.apache.pig.data.Tuple
{code:sql}
SET mapred.job.queue.name marathon;
SET pig.maxCombinedSplitSize 2147483648;
SET default_parallel 500;
dim_member_skill_final_opp_1 = LOAD '/user/caflores/SkillsDashboardUS/OPP-JOIN' USING LiAvroStorage();
top_skills_1 = LOAD '/user/caflores/SkillsDashboardUS/Top_Skills_Only' using LiAvroStorage();
----------------------------------------------------------------------------
dim_member_skill_final_opp = GROUP dim_member_skill_final_opp_1 by (country_sk,skill);
top_skills = GROUP top_skills_1 by (country_sk,skill);
opp_country = JOIN dim_member_skill_final_opp BY (group), top_skills BY (group) using 'skewed';
opp_country_generate = FOREACH opp_country GENERATE
FLATTEN(top_skills::group) as (country_sk,skill),
FLATTEN(top_skills::top_skills_1) as (country_sk2,title_sk,skill2,sum_of_members),
FLATTEN(dim_member_skill_final_opp::dim_member_skill_final_opp_1) as (member_sk,country_sk1,skill1);
opp_generate = FOREACH opp_country_generate GENERATE
country_sk,
title_sk,
member_sk;
opp_distinct = DISTINCT opp_generate;
opp_grouping = GROUP opp_distinct BY (country_sk,title_sk);
opp_count = FOREACH opp_grouping GENERATE
FLATTEN(group) AS (country_sk,title_sk),
COUNT(opp_distinct) AS sum_of_members;
store opp_count into '/user/caflores/JonathansUpdate/OPP-Index-US-skewed' using LiAvroStorage();{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)