You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Stas Oskin <st...@gmail.com> on 2009/05/26 13:08:35 UTC

When directly writing to HDFS, the data is moved only on file close

Hi.

I'm trying to continuously write data to HDFS via OutputStream(), and want
to be able to read it at the same time from another client.

Problem is, that after the file is created on HDFS with size of 0, it stays
that way, and only fills up when I close the OutputStream().

Here is a simple code sample illustrating this issue:

try {

            FSDataOutputStream out=fileSystem.create(new
Path("/test/test.bin")); // Here the file created with 0 size
            for(int i=0;i<1000;i++)
            {
                out.write(1); // Still stays 0
                out.flush(); // Even when I flush it out???
            }

            Thread.currentThread().sleep(10000);
            out.close(); //Only here the file is updated
        } catch (Exception e) {
            e.printStackTrace();
        }

So, two questions here:

1) How it's possible to write the files directly to HDFS, and have them
update there immedaitely?
2) Just for information, in this case, where the file content stays all the
time - on server local disk, in memory, etc...?

Thanks in advance.

Re: When directly writing to HDFS, the data is moved only on file close

Posted by Stas Oskin <st...@gmail.com>.
Hi.

You probably referring to the following paragraph?

After some back and forth over a set of slides presented by Sanjay on
work being done by Hairong as part of HADOOP-5744, "Revising append",
the room settled on API3 from the list of options below as the
priority feature needed by HADOOP 0.21.0.  Readers must be able to
read up to the last writer 'successful' flush.  Its not important that
the file length is 'inexact'.

If I'm understand correctly, this, means the data actually gets written to
cluster - but it's not visible until the block is closed.
Work is ongoing to allow in version 0.21 to make the file visible on
flush().

Am I correct up to here?

Regards.


2009/5/26 Tom White <to...@cloudera.com>
>
> This feature is not available yet, and is still under active
>> discussion. (The current version of HDFS will make the previous block
>> available to readers.) Michael Stack gave a good summary on the HBase
>> dev list:
>>
>>
>> http://mail-archives.apache.org/mod_mbox/hadoop-hbase-dev/200905.mbox/%3C7c962aed0905231601g533088ebj4a7a068505ba3f50@mail.gmail.com%3E
>>
>> Tom
>>
>> On Tue, May 26, 2009 at 12:08 PM, Stas Oskin <st...@gmail.com>
>> wrote:
>> > Hi.
>> >
>> > I'm trying to continuously write data to HDFS via OutputStream(), and
>> want
>> > to be able to read it at the same time from another client.
>> >
>> > Problem is, that after the file is created on HDFS with size of 0, it
>> stays
>> > that way, and only fills up when I close the OutputStream().
>> >
>> > Here is a simple code sample illustrating this issue:
>> >
>> > try {
>> >
>> >            FSDataOutputStream out=fileSystem.create(new
>> > Path("/test/test.bin")); // Here the file created with 0 size
>> >            for(int i=0;i<1000;i++)
>> >            {
>> >                out.write(1); // Still stays 0
>> >                out.flush(); // Even when I flush it out???
>> >            }
>> >
>> >            Thread.currentThread().sleep(10000);
>> >            out.close(); //Only here the file is updated
>> >        } catch (Exception e) {
>> >            e.printStackTrace();
>> >        }
>> >
>> > So, two questions here:
>> >
>> > 1) How it's possible to write the files directly to HDFS, and have them
>> > update there immedaitely?
>> > 2) Just for information, in this case, where the file content stays all
>> the
>> > time - on server local disk, in memory, etc...?
>> >
>> > Thanks in advance.
>> >
>>
>

Re: When directly writing to HDFS, the data is moved only on file close

Posted by Stas Oskin <st...@gmail.com>.
Hi.

Does it means there is no way to access the data being written to HDFS,
while it's written?

Where it's stored then via the writing - on cluster or on local disks?

Thanks.

2009/5/26 Tom White <to...@cloudera.com>

> This feature is not available yet, and is still under active
> discussion. (The current version of HDFS will make the previous block
> available to readers.) Michael Stack gave a good summary on the HBase
> dev list:
>
>
> http://mail-archives.apache.org/mod_mbox/hadoop-hbase-dev/200905.mbox/%3C7c962aed0905231601g533088ebj4a7a068505ba3f50@mail.gmail.com%3E
>
> Tom
>
> On Tue, May 26, 2009 at 12:08 PM, Stas Oskin <st...@gmail.com> wrote:
> > Hi.
> >
> > I'm trying to continuously write data to HDFS via OutputStream(), and
> want
> > to be able to read it at the same time from another client.
> >
> > Problem is, that after the file is created on HDFS with size of 0, it
> stays
> > that way, and only fills up when I close the OutputStream().
> >
> > Here is a simple code sample illustrating this issue:
> >
> > try {
> >
> >            FSDataOutputStream out=fileSystem.create(new
> > Path("/test/test.bin")); // Here the file created with 0 size
> >            for(int i=0;i<1000;i++)
> >            {
> >                out.write(1); // Still stays 0
> >                out.flush(); // Even when I flush it out???
> >            }
> >
> >            Thread.currentThread().sleep(10000);
> >            out.close(); //Only here the file is updated
> >        } catch (Exception e) {
> >            e.printStackTrace();
> >        }
> >
> > So, two questions here:
> >
> > 1) How it's possible to write the files directly to HDFS, and have them
> > update there immedaitely?
> > 2) Just for information, in this case, where the file content stays all
> the
> > time - on server local disk, in memory, etc...?
> >
> > Thanks in advance.
> >
>

Re: When directly writing to HDFS, the data is moved only on file close

Posted by Tom White <to...@cloudera.com>.
This feature is not available yet, and is still under active
discussion. (The current version of HDFS will make the previous block
available to readers.) Michael Stack gave a good summary on the HBase
dev list:

http://mail-archives.apache.org/mod_mbox/hadoop-hbase-dev/200905.mbox/%3C7c962aed0905231601g533088ebj4a7a068505ba3f50@mail.gmail.com%3E

Tom

On Tue, May 26, 2009 at 12:08 PM, Stas Oskin <st...@gmail.com> wrote:
> Hi.
>
> I'm trying to continuously write data to HDFS via OutputStream(), and want
> to be able to read it at the same time from another client.
>
> Problem is, that after the file is created on HDFS with size of 0, it stays
> that way, and only fills up when I close the OutputStream().
>
> Here is a simple code sample illustrating this issue:
>
> try {
>
>            FSDataOutputStream out=fileSystem.create(new
> Path("/test/test.bin")); // Here the file created with 0 size
>            for(int i=0;i<1000;i++)
>            {
>                out.write(1); // Still stays 0
>                out.flush(); // Even when I flush it out???
>            }
>
>            Thread.currentThread().sleep(10000);
>            out.close(); //Only here the file is updated
>        } catch (Exception e) {
>            e.printStackTrace();
>        }
>
> So, two questions here:
>
> 1) How it's possible to write the files directly to HDFS, and have them
> update there immedaitely?
> 2) Just for information, in this case, where the file content stays all the
> time - on server local disk, in memory, etc...?
>
> Thanks in advance.
>