You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Sankar Hariappan (JIRA)" <ji...@apache.org> on 2018/04/19 18:42:01 UTC
[jira] [Work started] (HIVE-19248) Hive replication cause file copy failures if HDFS block size differs across clusters

     [ https://issues.apache.org/jira/browse/HIVE-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HIVE-19248 started by Sankar Hariappan.
-----------------------------------------------
> Hive replication cause file copy failures if HDFS block size differs across clusters
> ------------------------------------------------------------------------------------
>
>                 Key: HIVE-19248
>                 URL: https://issues.apache.org/jira/browse/HIVE-19248
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2, repl
>    Affects Versions: 3.0.0
>            Reporter: Sankar Hariappan
>            Assignee: Sankar Hariappan
>            Priority: Major
>              Labels: DR, pull-request-available, replication
>             Fix For: 3.1.0
>
>
> Hive replication uses Hadoop distcp to copy files from primary to replica warehouse. If the HDFS block size is different across clusters, it cause file copy failures.
> {code}
> 2018-04-09 14:32:06,690 ERROR [main] org.apache.hadoop.tools.mapred.CopyMapper: Failure in copying hdfs://chelsea/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/000259_0 to hdfs://marilyn/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/.hive-staging_hive_2018-04-09_14-30-45_723_7153496419225102220-2/-ext-10001/000259_0
> java.io.IOException: File copy failed: hdfs://chelsea/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/000259_0 --> hdfs://marilyn/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/.hive-staging_hive_2018-04-09_14-30-45_723_7153496419225102220-2/-ext-10001/000259_0
>  at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:299)
>  at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:266)
>  at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:52)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
> Caused by: java.io.IOException: Couldn't run retriable-command: Copying hdfs://chelsea/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/000259_0 to hdfs://marilyn/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/.hive-staging_hive_2018-04-09_14-30-45_723_7153496419225102220-2/-ext-10001/000259_0
>  at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:101)
>  at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:296)
>  ... 10 more
> Caused by: java.io.IOException: Check-sum mismatch between hdfs://chelsea/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/000259_0 and hdfs://marilyn/apps/hive/warehouse/tpch_flat_orc_1000.db/customer/.hive-staging_hive_2018-04-09_14-30-45_723_7153496419225102220-2/-ext-10001/.distcp.tmp.attempt_1522833620762_4416_m_000000_0. Source and target differ in block-size. Use -pb to preserve block-sizes during copy. Alternatively, skip checksum-checks altogether, using -skipCrc. (NOTE: By skipping checksums, one runs the risk of masking data-corruption during file-transfer.)
>  at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.compareCheckSums(RetriableFileCopyCommand.java:212)
>  at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:130)
>  at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:99)
>  at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)
>  ... 11 more
> {code}
> Also, REPL LOAD returns success even if distcp jobs failed.
> So, need to perform 2 things.
>  # Set proper options for distcp to preserve the block size and skip CRC check. Use options such as *-pugpbx, -update* and *-skipcrccheck.*
>  # If copy of multiple files fail for some reason, need to check if any files completely copied by verifying the checksum and file size and skip those from retry.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)