You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Srinivas (JIRA)" <ji...@apache.org> on 2018/11/03 15:40:00 UTC

[jira] [Updated] (HADOOP-15898) 1 TB Data size fails to run with the following error

     [ https://issues.apache.org/jira/browse/HADOOP-15898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Srinivas updated HADOOP-15898:
------------------------------
    Summary: 1 TB Data size fails to run with the following error   (was: 1 TB TeraGen fails to run with the following error )

> 1 TB Data size fails to run with the following error 
> -----------------------------------------------------
>
>                 Key: HADOOP-15898
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15898
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: performance
>    Affects Versions: 2.6.0
>         Environment: Hadoop 2.6.0-cdh5.5.1
>  
>  
>            Reporter: Srinivas
>            Priority: Major
>              Labels: performance
>             Fix For: 2.6.0
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> There is a business impact MR job which runs every day @ 2.00 PM PST and data size is about 1 - 1.5 TB (depends on the business days) . Ideal elapsed time of this job : 4 hrs.  But the multiple  mappers of this job simultaneously  failing  with the following error so job will take some times 11 and even 13 hours also like that.  
> Steps to prevent this problem : 1, Migrated the environment to Yarn .2 increased the ulimit 3. Added extra nodes to the cluster. 4. Disks replacement taking place regularly  But no luck.
> WARN [DataStreamer for file /analytical_profile/DMP_analytical_profile/Turn/SAUP/2018_11_02_tmp/tmp/part-01357.5789
> block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089]
> org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089 in pipeline DatanodeInfoWithStorage
> [10.0.1.37:50010,DS-ed333d2e-839a-4029-a1c9-b6615c322ed2,DISK],
>  DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK],
> DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK]:(
> bad datanode DatanodeInfoWithStorage[10.0.1.37:50010,DS-ed333d2e-839a-4029-a1c9-b6615c322ed2,DISK]
>  
> WARN [DataStreamer for file /analytical_profile/DMP_analytical_profile/Turn/SAUP/2018_11_02_tmp/tmp/part-01357.5789 block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089] org.apache.hadoop.hdfs.DFSClient: Error Recovery for block BP-854530680-69.194.253.58-1430267558563:blk_4683766046_1108754130089 in pipeline DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK], DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK]: bad datanode DatanodeInfoWithStorage[74.120.143.19:50010,DS-5d10576e-adc3-474f-bc9d-f0d6fb3ae4c3,DISK]
>  
> WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: java.io.IOException: All datanodes DatanodeInfoWithStorage[74.120.143.6:50010,DS-a5299d68-2858-46c3-8e37-d2559895f979,DISK] are bad. Aborting... at com.turn.platform.cheetah.storage.dmp.analytical_profile.merge.IncrementalProfileMergerMapper.close(IncrementalProfileMergerMapper.java:1185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org