You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Xuzhou Yin (JIRA)" <ji...@apache.org> on 2018/09/27 04:57:00 UTC

[jira] [Created] (PIG-5360) Pig sets working directory of input file systems causes exception thrown

Xuzhou Yin created PIG-5360:
-------------------------------

             Summary: Pig sets working directory of input file systems causes exception thrown
                 Key: PIG-5360
                 URL: https://issues.apache.org/jira/browse/PIG-5360
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.17.0
            Reporter: Xuzhou Yin
             Fix For: 0.18.0


{color:#000000}In getSplits() method in PigInputFormat, Pig is trying to set the working directory of input File System to jobContext.getWorkingDirectory(), which is always the default working directory of default file system (eg. hdfs://host:port/user/userId in case of HDFS) unless “mapreduce.job.working.dir” is explicitly set to non-default value. So if the input path uses non-default file system (eg. EmrFS), then it will fail since it is trying to set the working directory of EmrFS to a HDFS path.{color}

{color:#000000}The proposed change it to completely remove this logic of setting working directory. There are several reasons for doing so. {color}

{color:#000000}Firstly, getSplits() is only supposed to return a list of input splits. It should not have side effects (especially doing so can potentially change the output path).{color}

{color:#000000}Secondly, there is inconsistency between the working directories of input and output file systems. if "mapreduce.job.working.dir" is set to non-default value, it will affect the output path only (if it is a relative path) because input path will be made qualified even before this logic.{color}

{color:#000000}Thirdly, there is already a "CD" functionality that allows customers to change the working directory. However, this logic will overwrite the "CD" functionality if input and output paths both use default file system.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)