You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Vihang Karajgaonkar (JIRA)" <ji...@apache.org> on 2016/09/30 00:49:20 UTC

[jira] [Created] (HIVE-14864) Distcp is not called from MoveTask when src is a directory

Vihang Karajgaonkar created HIVE-14864:
------------------------------------------

             Summary: Distcp is not called from MoveTask when src is a directory
                 Key: HIVE-14864
                 URL: https://issues.apache.org/jira/browse/HIVE-14864
             Project: Hive
          Issue Type: Bug
            Reporter: Vihang Karajgaonkar
            Assignee: Vihang Karajgaonkar


In FileUtils.java the following code does not get executed even when src directory size is greater than HIVE_EXEC_COPYFILE_MAXSIZE because 
srcFS.getFileStatus(src).getLen() returns 0 when src is a directory. We should use srcFS.getContentSummary(src).getLength() instead.

{noformat}
    /* Run distcp if source file/dir is too big */
    if (srcFS.getUri().getScheme().equals("hdfs") &&
        srcFS.getFileStatus(src).getLen() > conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE)) {
      LOG.info("Source is " + srcFS.getFileStatus(src).getLen() + " bytes. (MAX: " + conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE) + ")");
      LOG.info("Launch distributed copy (distcp) job.");
      HiveConfUtil.updateJobCredentialProviders(conf);
      copied = shims.runDistCp(src, dst, conf);
      if (copied && deleteSource) {
        srcFS.delete(src, true);
      }
    }
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)