You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Jan Filipiak <Ja...@trivago.com> on 2015/07/24 16:44:28 UTC

Hdfs fSshell getmerge

Hello hadoop users,

I have an idea about a small feature for the getmerge tool. I recently 
was in the need of using the new line option -nl because the files I 
needed to merge simply didn't had one.
I was merging all the files from one directory and unfortunately this 
directory also included empty files, which effectively led to multiple 
newlines append after some files.
I needed to remove them manually afterwards.

In this situation it is maybe good to have another argument that allows 
skipping empty files. I just wrote down 2 change one could try at the 
end. Do you guys consider this as a good improvement to the command line 
tools?

Thing one could try to implement this feature:

The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't 
return the number of bytes copied which would be convenient as one could 
skip append the new line when 0 bytes where copied
Or one would check the file size before.

Please let me know If you would consider this useful and is worth a 
feature ticket in Jira.

Thank you
Jan

Re: Hdfs fSshell getmerge

Posted by Jan Filipiak <Ja...@trivago.com>.
Sorry wrong mailing list

On 24.07.2015 16:44, Jan Filipiak wrote:
> Hello hadoop users,
>
> I have an idea about a small feature for the getmerge tool. I recently 
> was in the need of using the new line option -nl because the files I 
> needed to merge simply didn't had one.
> I was merging all the files from one directory and unfortunately this 
> directory also included empty files, which effectively led to multiple 
> newlines append after some files.
> I needed to remove them manually afterwards.
>
> In this situation it is maybe good to have another argument that 
> allows skipping empty files. I just wrote down 2 change one could try 
> at the end. Do you guys consider this as a good improvement to the 
> command line tools?
>
> Thing one could try to implement this feature:
>
> The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't 
> return the number of bytes copied which would be convenient as one 
> could skip append the new line when 0 bytes where copied
> Or one would check the file size before.
>
> Please let me know If you would consider this useful and is worth a 
> feature ticket in Jira.
>
> Thank you
> Jan