You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Raghavendra K <ra...@gmail.com> on 2008/03/27 07:28:35 UTC

Append data in hdfs_write

Hi,
  I am using
hdfsWrite to write data onto a file.
Whenever I close the file and re open it for writing it will start writing
from the position 0 (rewriting the old data).
Is there any way to append data onto a file using hdfsWrite.
I cannot use hdfsTell because it works only when opened in RDONLY mode and
also I dont know the number of bytes written onto the file previously.
Please throw some light onto it.

-- 
Regards,
Raghavendra K

Re: Append data in hdfs_write

Posted by Ted Dunning <td...@veoh.com>.
Yes.

 The present work-arounds for this are pretty complicated.

option1) you can write small files relatively frequently and every time you
write some number of them, you can concatenate them and delete them.  These
concatenations can receive the same treatment.  If managed carefully in
conjunction with a safe status update mechanism like zookeeper, you can have
a pretty robust system that reflects new data with fairly low latency (on
the order of seconds behind).

option2) you can accumulate data in a non-HDFS location until it is big
enough to push to HDFS.  This can be done in conjunction with option1.  The
danger is that you run the risk of losing data if the accumulator fails
before burping data to HDFS.  This is very commonly used for log files that
are consolidated at the hourly level and transferred to HDFS.


On 3/27/08 12:02 AM, "Raghavendra K" <ra...@gmail.com> wrote:

> Hi,
> Thanks for the reply.
> Does this mean that once I close a file, I can open it only for reading?
> And if I reopen the same file to write any data then the old data will be
> lost and again its as good as a new file being created with the same name?
> 
> On Thu, Mar 27, 2008 at 12:23 PM, dhruba Borthakur <dh...@yahoo-inc.com>
> wrote:
> 
>> HDFS files, once closed, cannot be reopened for writing. See HADOOP-1700
>> for more details.
>> 
>> Thanks,
>> dhruba
>> 
>> -----Original Message-----
>> From: Raghavendra K [mailto:raghavendra83@gmail.com]
>> Sent: Wednesday, March 26, 2008 11:29 PM
>> To: core-user@hadoop.apache.org
>> Subject: Append data in hdfs_write
>> 
>> Hi,
>>  I am using
>> hdfsWrite to write data onto a file.
>> Whenever I close the file and re open it for writing it will start
>> writing
>> from the position 0 (rewriting the old data).
>> Is there any way to append data onto a file using hdfsWrite.
>> I cannot use hdfsTell because it works only when opened in RDONLY mode
>> and
>> also I dont know the number of bytes written onto the file previously.
>> Please throw some light onto it.
>> 
>> --
>> Regards,
>> Raghavendra K
>> 
> 
> 


Re: Append data in hdfs_write

Posted by Raghavendra K <ra...@gmail.com>.
Hi,
Thanks for the reply.
Does this mean that once I close a file, I can open it only for reading?
And if I reopen the same file to write any data then the old data will be
lost and again its as good as a new file being created with the same name?

On Thu, Mar 27, 2008 at 12:23 PM, dhruba Borthakur <dh...@yahoo-inc.com>
wrote:

> HDFS files, once closed, cannot be reopened for writing. See HADOOP-1700
> for more details.
>
> Thanks,
> dhruba
>
> -----Original Message-----
> From: Raghavendra K [mailto:raghavendra83@gmail.com]
> Sent: Wednesday, March 26, 2008 11:29 PM
> To: core-user@hadoop.apache.org
> Subject: Append data in hdfs_write
>
> Hi,
>  I am using
> hdfsWrite to write data onto a file.
> Whenever I close the file and re open it for writing it will start
> writing
> from the position 0 (rewriting the old data).
> Is there any way to append data onto a file using hdfsWrite.
> I cannot use hdfsTell because it works only when opened in RDONLY mode
> and
> also I dont know the number of bytes written onto the file previously.
> Please throw some light onto it.
>
> --
> Regards,
> Raghavendra K
>



-- 
Regards,
Raghavendra K

RE: Append data in hdfs_write

Posted by dhruba Borthakur <dh...@yahoo-inc.com>.
HDFS files, once closed, cannot be reopened for writing. See HADOOP-1700
for more details.

Thanks,
dhruba

-----Original Message-----
From: Raghavendra K [mailto:raghavendra83@gmail.com] 
Sent: Wednesday, March 26, 2008 11:29 PM
To: core-user@hadoop.apache.org
Subject: Append data in hdfs_write

Hi,
  I am using
hdfsWrite to write data onto a file.
Whenever I close the file and re open it for writing it will start
writing
from the position 0 (rewriting the old data).
Is there any way to append data onto a file using hdfsWrite.
I cannot use hdfsTell because it works only when opened in RDONLY mode
and
also I dont know the number of bytes written onto the file previously.
Please throw some light onto it.

-- 
Regards,
Raghavendra K

Re: Append data in hdfs_write

Posted by Khalil Honsali <k....@gmail.com>.
Hi,

As far as I know there this Jira:
http://issues.apache.org/jira/browse/HADOOP-1700
and a dozen classes that relate to file upgrading in the package
org.apache.hadoop.dfs
of the source code
But it seems it is not yet a fully functional write-append.

hope it helps.

K. Honsali

On 27/03/2008, Raghavendra K <ra...@gmail.com> wrote:
>
> Hi,
>   I am using
> hdfsWrite to write data onto a file.
> Whenever I close the file and re open it for writing it will start writing
> from the position 0 (rewriting the old data).
> Is there any way to append data onto a file using hdfsWrite.
> I cannot use hdfsTell because it works only when opened in RDONLY mode and
> also I dont know the number of bytes written onto the file previously.
> Please throw some light onto it.
>
> --
> Regards,
>
> Raghavendra K
>



--