You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Jan Filipiak <Ja...@trivago.com> on 2015/07/24 16:52:48 UTC

Hdfs FSShell getmerge feature idea when using -nl

Hello hadoop users,

I have an idea about a small feature for the getmerge tool. I recently
was in the need of using the new line option -nl because the files I
needed to merge simply didn't had one.
I was merging all the files from one directory and unfortunately this
directory also included empty files, which effectively led to multiple
newlines append after some files.
I needed to remove them manually afterwards.

In this situation it is maybe good to have another argument that allows
skipping empty files. I just wrote down 2 change one could try at the
end. Do you guys consider this as a good improvement to the command line
tools?

Thing one could try to implement this feature:

The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't
return the number of bytes copied which would be convenient as one could
skip append the new line when 0 bytes where copied
Or one would check the file size before.

Please let me know If you would consider this useful and is worth a
feature ticket in Jira.

Thank you
Jan