You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "liyunzhang_intel (JIRA)" <ji...@apache.org> on 2017/05/25 06:28:04 UTC
[jira] [Created] (PIG-5241) Specify the hdfs path directly to spark
and avoid the unnecessary download and upload in SparkLauncher.java
liyunzhang_intel created PIG-5241:
-------------------------------------
Summary: Specify the hdfs path directly to spark and avoid the unnecessary download and upload in SparkLauncher.java
Key: PIG-5241
URL: https://issues.apache.org/jira/browse/PIG-5241
Project: Pig
Issue Type: Sub-task
Reporter: liyunzhang_intel
//TODO: Specify the hdfs path directly to spark and avoid the unnecessary download and upload in SparkLauncher.java
{code}
private void cacheFiles(String cacheFiles) throws IOException {
if (cacheFiles != null && !cacheFiles.isEmpty()) {
File tmpFolder = Files.createTempDirectory("cache").toFile();
tmpFolder.deleteOnExit();
for (String file : cacheFiles.split(",")) {
String fileName = extractFileName(file.trim());
Path src = new Path(extractFileUrl(file.trim()));
File tmpFile = new File(tmpFolder, fileName);
Path tmpFilePath = new Path(tmpFile.getAbsolutePath());
FileSystem fs = tmpFilePath.getFileSystem(jobConf);
//TODO: Specify the hdfs path directly to spark and avoid the unnecessary download and upload in SparkLauncher.java
fs.copyToLocalFile(src, tmpFilePath);
tmpFile.deleteOnExit();
LOG.info(String.format("CacheFile:%s", fileName));
addResourceToSparkJobWorkingDirectory(tmpFile, fileName,
ResourceType.FILE);
}
}
}
{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)