You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Steven Rand (JIRA)" <ji...@apache.org> on 2019/07/23 06:55:00 UTC

[jira] [Created] (MAPREDUCE-7226) option to not fail distcp when source file is being written to

Steven Rand created MAPREDUCE-7226:
--------------------------------------

             Summary: option to not fail distcp when source file is being written to
                 Key: MAPREDUCE-7226
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7226
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: distcp
    Affects Versions: 3.2.0
            Reporter: Steven Rand


If a file is being written to during a distcp, then we'll throw an IOE because its size at the target will be less than its size at the source:
{code:java}
private void compareFileLengths(CopyListingFileStatus source, Path target,
	Configuration configuration, long targetLen)
	throws IOException {
	final Path sourcePath = source.getPath();
	FileSystem fs = sourcePath.getFileSystem(configuration);
	long srcLen = fs.getFileStatus(sourcePath).getLen();
	if (srcLen != targetLen)
	throw new IOException("Mismatch in length of source:" + sourcePath + " (" + srcLen +
	") and target:" + target + " (" + targetLen + ")");
	}
{code}
This happens even when the {{-i}} flag is given to distcp to ignore failures, since the only exceptions distcp will ignore are instances of {{CopyReadException}}, and {{compareFileLengths}} just throws an IOE.

This means that distcp can't be used for certain workflows. The particular one that I have in mind is incrementally copying data from a production cluster to a DR cluster. This can be handled nicely using distcp with the {{-update}} and {{-delete}} flags, but the problem is that clients might be modifying the production cluster while the distcp runs, thereby causing it to fail even when {{-i}} is given.

One idea is to:

1. Have {{compareFileLengths}} throw a custom exception. It doesn't make sense to use {{CopyReadException}}, since this isn't a failure on read, but we could make another subclass of IOE for writes.
 2. Have the CopyMapper check for our new custom exception in the same way that it does for {{CopyReadException}} when the {{-i}} flag is given. Or, if we don't want to change the behavior of the existing flag, we could add a new flag.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org