You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Abdul Qadeer <qa...@gmail.com> on 2020/08/21 01:39:13 UTC

[Hadoop-streaming] How to handle very long list of input files

Hi,

Lets say I have so many files to provide in the "-input" switch of
the Hadoop streaming that I hit the shell's command-line length
limit (too many args error).

I can't move my input files into a new directory (they might be in
use by someone else), or copy them into a new directory (due to
performance reason.  Too many big files), can't use a regular
expression on input file names (there might arrive some files that fit
the re, but I don't want to include them in processing yet).

*So my question is*: Is there a way in Hadoop Steaming to handle
above scenario (for example: providing a local file containing a
long list of HDFS files, instead of directly writing file names on the
command line?)


Thanks,
-Abdul