You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2014/03/18 21:10:44 UTC

[jira] [Reopened] (PIG-3815) Hadoop bug causes to pig to fail silently with jar cache

     [ https://issues.apache.org/jira/browse/PIG-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy reopened PIG-3815:
-------------------------------------


Actually I see some issue with this patch. Reopening jira.

   1) Changing os.close() to IOUtils.closeQuietly(os); is not good. You can close the input quietly, but not output especially HDFS outputstream. HDFS can create empty files  without data which can be accessed through NN fine if os.close() failed. We have been bitten by this a lot of time. In internal projects, we delete the file and retry if os.close() failed.  So please let the pig script fail if os.close() failed rather than causing unexpected behavior.

   2) addFileToClassPath is already doing  file.toUri().getPath(). I don't see where the hadoop bug is coming from. 

http://svn.apache.org/viewvc/hadoop/common/branches/branch-1.0/src/mapred/org/apache/hadoop/filecache/DistributedCache.java?revision=1206848&view=markup

{code}
public static void addFileToClassPath
           (Path file, Configuration conf, FileSystem fs)
        throws IOException {
    String filepath = file.toUri().getPath();
    String classpath = conf.get("mapred.job.classpath.files");
    conf.set("mapred.job.classpath.files", classpath == null
        ? filepath
        : classpath + System.getProperty("path.separator") + filepath);
    URI uri = fs.makeQualified(file).toUri();
    addCacheFile(uri, conf);
  }
{code}

> Hadoop bug causes to pig to fail silently with jar cache
> --------------------------------------------------------
>
>                 Key: PIG-3815
>                 URL: https://issues.apache.org/jira/browse/PIG-3815
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.13.0
>            Reporter: Aniket Mokashi
>            Assignee: Aniket Mokashi
>             Fix For: 0.13.0
>
>         Attachments: PIG-3815-1.patch, PIG-3815.patch
>
>
> Pig uses DistributedCache.addFileToClassPath api that puts jars on distributed cache configuration. This uses : to separate list of files to be put of classpath via distributed cache. If fs.default.name has port number in it, it causes the tokenization logic to fail in hadoop for retrieving list of cache filenames in backend.



--
This message was sent by Atlassian JIRA
(v6.2#6252)