You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tez.apache.org by "sundy_baba_zhouzy (Jira)" <ji...@apache.org> on 2021/12/22 06:34:00 UTC

[jira] [Created] (TEZ-4360) IOException: Multiple partitions for one merge mapper

sundy_baba_zhouzy created TEZ-4360:
--------------------------------------

             Summary: IOException: Multiple partitions for one merge mapper
                 Key: TEZ-4360
                 URL: https://issues.apache.org/jira/browse/TEZ-4360
             Project: Apache Tez
          Issue Type: Bug
            Reporter: sundy_baba_zhouzy


hive version 2.3.7 ,tez version 0.9.2 . 

The following error occurs when SQL is executed in Hive :

[CF-100001]execute sql failed:org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=File Merge, vertexId=vertex_1631161845409_0980_2_00, diagnostics=[Task failed, taskId=task_1631161845409_0980_2_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1631161845409_0980_2_00_000000_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Multiple partitions for one merge mapper: hdfs://ns1/hive/warehouse/ns_dadev.db/dws_air_qlt_stat_20211029152731035/.hive-staging_hive_2021-11-25_16-25-07_185_3959810947578240944-5/-ext-10002/time_type=hour/time_col=2021102600/space_type=station/etl_script_id=air_sta_aqi_pp_1h/1 NOT EQUAL TO hdfs://ns1/hive/warehouse/ns_dadev.db/dws_air_qlt_stat_20211029152731035/.hive-staging_hive_2021-11-25_16-25-07_185_3959810947578240944-5/-ext-10002/time_type=hour/time_col=2021102600/space_type=station/etl_script_id=air_sta_aqi_pp_1h/2
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Multiple partitions for one merge mapper: hdfs://ns1/hive/warehouse/ns_dadev.db/dws_air_qlt_stat_20211029152731035/.hive-staging_hive_2021-11-25_16-25-07_185_3959810947578240944-5/-ext-10002/time_type=hour/time_col=2021102600/space_type=station/etl_script_id=air_sta_aqi_pp_1h/1 NOT EQUAL TO hdfs://ns1/hive/warehouse/ns_dadev.db/dws_air_qlt_stat_20211029152731035/.hive-staging_hive_2021-11-25_16-25-07_185_3959810947578240944-5/-ext-10002/time_type=hour/time_col=2021102600/space_type=station/etl_script_id=air_sta_aqi_pp_1h/2
at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:221)
at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:154)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Multiple partitions for one merge mapper: hdfs://ns1/hive/warehouse/ns_dadev.db/dws_air_qlt_stat_20211029152731035/.hive-staging_hive_2021-11-25_16-25-07_185_3959810947578240944-5/-ext-10002/time_type=hour/time_col=2021102600/space_type=station/etl_script_id=air_sta_aqi_pp_1h/1 NOT EQUAL TO hdfs://ns1/hive/warehouse/ns_dadev.db/dws_air_qlt_stat_20211029152731035/.hive-staging_hive_2021-11-25_16-25-07_185_3959810947578240944-5/-ext-10002/time_type=hour/time_col=2021102600/space_type=station/etl_script_id=air_sta_aqi_pp_1h/2
at org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.processKeyValuePairs(OrcFileMergeOperator.java:169)
at org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.process(OrcFileMergeOperator.java:72)
at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:212)
... 16 more

 

 

 

Execute SQL :

insert overwrite table dws_air_qlt_stat partition (time_type,time_col,space_type,etl_script_id) 
select 
    a.space_num as space_num,             
    a.space_name as space_name,            
    '1h_aqi' as pltt_item,                 
    '1AQI' as pltt_item_desc,          
    cast(floor(a.stat_rslt)as string) as stat_rslt,           
    a.stat_rslt_valid_ind as stat_rslt_valid_ind,  
    current_timestamp as insert_time,       
    'hour' as time_type,                   
    a.time_col as time_col,               
    'station' as space_type,               
    'air_sta_aqi_pp_1h' as etl_script_id   
from 
    (select
        space_num,
        space_name,
        stat_rslt_valid_ind,
        time_col,
        max(stat_rslt) as stat_rslt        
    from dws_air_qlt_stat
    where space_type = 'station' and time_type = 'hour' 
        and pltt_item in ('1h_avg_iaqi_co','1h_avg_iaqi_no2','1h_avg_iaqi_o3','8h_mavg_iaqi_o3','24h_mavg_iaqi_pm10','24h_mavg_iaqi_pm2_5','1h_avg_iaqi_so2')
        and time_col='${phour}'
        and stat_rslt_valid_ind='1'
    group by space_num,space_name,stat_rslt_valid_ind,time_col
    ) a 
union all
select 
    b.space_num as space_num,             
    b.space_name as space_name,            
    '1h_pp' as pltt_item,                  
    '1xxxx' as pltt_item_desc,    
    concat_ws(',',collect_set(cast (b.pltt_item as string))) as stat_rslt, 
    b.stat_rslt_valid_ind as stat_rslt_valid_ind,   
    current_timestamp as insert_time,       
    'hour' as time_type,                    
    b.time_col as time_col,                
    'station' as space_type,              
    'air_sta_aqi_pp_1h' as etl_script_id   
from 
    (select
        space_num,
        max(stat_rslt) as stat_rslt         
    from dws_air_qlt_stat
    where space_type = 'station' and time_type = 'hour' 
        and pltt_item in ('1h_avg_iaqi_co','1h_avg_iaqi_no2','1h_avg_iaqi_o3','8h_mavg_iaqi_o3','24h_mavg_iaqi_pm10','24h_mavg_iaqi_pm2_5','1h_avg_iaqi_so2')
        and time_col='${phour}'
        and stat_rslt_valid_ind='1'
    group by space_type,time_type,time_col,space_num
    ) a 
join dws_air_qlt_stat b
    on a.space_num = b.space_num  
    and a.stat_rslt = b.stat_rslt and b.space_type = 'station' 
    and b.time_type = 'hour' and b.pltt_item in ('1h_avg_iaqi_co','1h_avg_iaqi_no2','1h_avg_iaqi_o3','8h_mavg_iaqi_o3','24h_mavg_iaqi_pm10','24h_mavg_iaqi_pm2_5','1h_avg_iaqi_so2')
    and b.time_col='${phour}'
    and b.stat_rslt_valid_ind='1'
group by b.space_num,b.space_name,b.stat_rslt_valid_ind,b.time_col

 

 

The above sql is not a problem in Hive on mr , error in Hive on tez

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)