You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Adeel Qureshi <ad...@gmail.com> on 2013/08/19 21:38:50 UTC

hdfs write files in streaming fashion

I have a servlet that receives files in a streaming fashion and our
original design was to receive the file in /tmp directory and then move it
to hdfs via an external process but that seems to add an additional (may be
unnecessary step). My question is if I receive files in a servlet as a post
request (file is in body of request) and I open a bufferedwriter on hdfs
then

1. are the files really written in a streaming fashion such that nothing is
held in memory because these are huge files and maintaining in memory and
then at the end sending the whole file to hdfs wont make sense

2. if for some reason we decide half way down the file to reject it and not
move it to hdfs, since it was being streamed do we have to remove the file
or simply because the write stream isnt closed or some exception is thrown
that it will be automatically cleaned by file system.

Thanks
Adeel

Re: hdfs write files in streaming fashion

Posted by Shahab Yunus <sh...@gmail.com>.
For 1, yes the data is written in chunks to hdfs if you are using the FiLe
System API. The whole file is not first stored in memory.

For 2, I think you should anyway shouldn't rely on an exception or
'not-closing' the writer for cleaning up the partially written file. It is
not a safe and recommended practice anyway. You should do the cleanup and
free resources and delete anything that you want, explicitly for better
visibility, control and robustness.

Regards,
Shahab


On Mon, Aug 19, 2013 at 3:38 PM, Adeel Qureshi <ad...@gmail.com>wrote:

> I have a servlet that receives files in a streaming fashion and our
> original design was to receive the file in /tmp directory and then move it
> to hdfs via an external process but that seems to add an additional (may be
> unnecessary step). My question is if I receive files in a servlet as a post
> request (file is in body of request) and I open a bufferedwriter on hdfs
> then
>
> 1. are the files really written in a streaming fashion such that nothing
> is held in memory because these are huge files and maintaining in memory
> and then at the end sending the whole file to hdfs wont make sense
>
> 2. if for some reason we decide half way down the file to reject it and
> not move it to hdfs, since it was being streamed do we have to remove the
> file or simply because the write stream isnt closed or some exception is
> thrown that it will be automatically cleaned by file system.
>
> Thanks
> Adeel
>

Re: hdfs write files in streaming fashion

Posted by Shahab Yunus <sh...@gmail.com>.
For 1, yes the data is written in chunks to hdfs if you are using the FiLe
System API. The whole file is not first stored in memory.

For 2, I think you should anyway shouldn't rely on an exception or
'not-closing' the writer for cleaning up the partially written file. It is
not a safe and recommended practice anyway. You should do the cleanup and
free resources and delete anything that you want, explicitly for better
visibility, control and robustness.

Regards,
Shahab


On Mon, Aug 19, 2013 at 3:38 PM, Adeel Qureshi <ad...@gmail.com>wrote:

> I have a servlet that receives files in a streaming fashion and our
> original design was to receive the file in /tmp directory and then move it
> to hdfs via an external process but that seems to add an additional (may be
> unnecessary step). My question is if I receive files in a servlet as a post
> request (file is in body of request) and I open a bufferedwriter on hdfs
> then
>
> 1. are the files really written in a streaming fashion such that nothing
> is held in memory because these are huge files and maintaining in memory
> and then at the end sending the whole file to hdfs wont make sense
>
> 2. if for some reason we decide half way down the file to reject it and
> not move it to hdfs, since it was being streamed do we have to remove the
> file or simply because the write stream isnt closed or some exception is
> thrown that it will be automatically cleaned by file system.
>
> Thanks
> Adeel
>

Re: hdfs write files in streaming fashion

Posted by Shahab Yunus <sh...@gmail.com>.
For 1, yes the data is written in chunks to hdfs if you are using the FiLe
System API. The whole file is not first stored in memory.

For 2, I think you should anyway shouldn't rely on an exception or
'not-closing' the writer for cleaning up the partially written file. It is
not a safe and recommended practice anyway. You should do the cleanup and
free resources and delete anything that you want, explicitly for better
visibility, control and robustness.

Regards,
Shahab


On Mon, Aug 19, 2013 at 3:38 PM, Adeel Qureshi <ad...@gmail.com>wrote:

> I have a servlet that receives files in a streaming fashion and our
> original design was to receive the file in /tmp directory and then move it
> to hdfs via an external process but that seems to add an additional (may be
> unnecessary step). My question is if I receive files in a servlet as a post
> request (file is in body of request) and I open a bufferedwriter on hdfs
> then
>
> 1. are the files really written in a streaming fashion such that nothing
> is held in memory because these are huge files and maintaining in memory
> and then at the end sending the whole file to hdfs wont make sense
>
> 2. if for some reason we decide half way down the file to reject it and
> not move it to hdfs, since it was being streamed do we have to remove the
> file or simply because the write stream isnt closed or some exception is
> thrown that it will be automatically cleaned by file system.
>
> Thanks
> Adeel
>

Re: hdfs write files in streaming fashion

Posted by Shahab Yunus <sh...@gmail.com>.
For 1, yes the data is written in chunks to hdfs if you are using the FiLe
System API. The whole file is not first stored in memory.

For 2, I think you should anyway shouldn't rely on an exception or
'not-closing' the writer for cleaning up the partially written file. It is
not a safe and recommended practice anyway. You should do the cleanup and
free resources and delete anything that you want, explicitly for better
visibility, control and robustness.

Regards,
Shahab


On Mon, Aug 19, 2013 at 3:38 PM, Adeel Qureshi <ad...@gmail.com>wrote:

> I have a servlet that receives files in a streaming fashion and our
> original design was to receive the file in /tmp directory and then move it
> to hdfs via an external process but that seems to add an additional (may be
> unnecessary step). My question is if I receive files in a servlet as a post
> request (file is in body of request) and I open a bufferedwriter on hdfs
> then
>
> 1. are the files really written in a streaming fashion such that nothing
> is held in memory because these are huge files and maintaining in memory
> and then at the end sending the whole file to hdfs wont make sense
>
> 2. if for some reason we decide half way down the file to reject it and
> not move it to hdfs, since it was being streamed do we have to remove the
> file or simply because the write stream isnt closed or some exception is
> thrown that it will be automatically cleaned by file system.
>
> Thanks
> Adeel
>