You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Nandor Kollar (JIRA)" <ji...@apache.org> on 2017/05/25 16:40:05 UTC

[jira] [Assigned] (PIG-5241) Specify the hdfs path directly to spark and avoid the unnecessary download and upload in SparkLauncher.java

     [ https://issues.apache.org/jira/browse/PIG-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nandor Kollar reassigned PIG-5241:
----------------------------------

    Assignee: Nandor Kollar

> Specify the hdfs path directly to spark and avoid the unnecessary download and upload in SparkLauncher.java
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-5241
>                 URL: https://issues.apache.org/jira/browse/PIG-5241
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: Nandor Kollar
>             Fix For: spark-branch
>
>
> //TODO: Specify the hdfs path directly to spark and avoid the unnecessary download and upload in SparkLauncher.java
> {code}
>   private void cacheFiles(String cacheFiles) throws IOException {
>         if (cacheFiles != null && !cacheFiles.isEmpty()) {
>             File tmpFolder = Files.createTempDirectory("cache").toFile();
>             tmpFolder.deleteOnExit();
>             for (String file : cacheFiles.split(",")) {
>                 String fileName = extractFileName(file.trim());
>                 Path src = new Path(extractFileUrl(file.trim()));
>                 File tmpFile = new File(tmpFolder, fileName);
>                 Path tmpFilePath = new Path(tmpFile.getAbsolutePath());
>                 FileSystem fs = tmpFilePath.getFileSystem(jobConf);
>                 //TODO: Specify the hdfs path directly to spark and avoid the unnecessary download and upload in SparkLauncher.java
>                 fs.copyToLocalFile(src, tmpFilePath);
>                 tmpFile.deleteOnExit();
>                 LOG.info(String.format("CacheFile:%s", fileName));
>                 addResourceToSparkJobWorkingDirectory(tmpFile, fileName,
>                         ResourceType.FILE);
>             }
>         }
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)