You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Zac Hopkinson (JIRA)" <ji...@apache.org> on 2015/12/30 20:12:49 UTC

[jira] [Created] (MAPREDUCE-6596) MultipleInputs does not escape Path characters

Zac Hopkinson created MAPREDUCE-6596:
----------------------------------------

             Summary: MultipleInputs does not escape Path characters
                 Key: MAPREDUCE-6596
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6596
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2
    Affects Versions: 2.6.2
            Reporter: Zac Hopkinson
            Assignee: Zac Hopkinson


Filenames containing commas or semicolons cause MultipleInputs to break since these characters are used for joining and storing the path names.

MultipleInputs stores mapreduce.input.multipleinputs.dir.formats as:

```
path;inputFormatClass,path2;inputFormatClass2[, ...]
```

If a filename contains one of the characters used for joining the data then getInputFormatMap and getMapperTypeMap will fail.

Looking at FileInputFormat.addInputPath() it uses escapeString and unescapeString from StringUtils. I took the same approach for escaping in MultipleInputs.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)