You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Stas Oskin <st...@gmail.com> on 2009/04/10 00:45:57 UTC

HDFS read/write speeds, and read optimization

Hi.

I have 2 questions about HDFS performance:

1) How fast are the read and write operations over network, in Mbps per
second?

2) If the chunk server is located on same host as the client, is there any
optimization in read operations?
For example, Kosmos FS describe the following functionality:

"Localhost optimization: One copy of data
is placed on the chunkserver on the same
host as the client doing the write

Helps reduce network traffic"

Regards.

Re: HDFS read/write speeds, and read optimization

Posted by Stas Oskin <st...@gmail.com>.

Hi.

>
> Hypertable (a BigTable implementation) has a good KFS vs. HDFS breakdown: <
> http://code.google.com/p/hypertable/wiki/KFSvsHDFS>
>

>From this comparison it seems KFS is quite faster then HDFS for small data
transfers (for SQL commands).

Any idea if same holds true for small-medium (20Mb - 150 MB) files?


>
> >
> >
> > 2) If the chunk server is located on same host as the client, is there
> any
> > optimization in read operations?
> > For example, Kosmos FS describe the following functionality:
> >
> > "Localhost optimization: One copy of data
> > is placed on the chunkserver on the same
> > host as the client doing the write
> >
> > Helps reduce network traffic"
>
> In Hadoop-speak, we're interested in DataNodes (storage nodes) and
> TaskTrackers (compute nodes).  In terms of MapReduce, Hadoop does try and
> schedule tasks such that the data being processed by a given task on a
> given
> machine is also on that machine.  As for loading data onto a DataNode,
> loading data from a DataNode will put a replica on that node.  However, if
> you're loading data from, say, your local machine, Hadoop will choose a
> DataNode at random.
>

Ah, so if DataNode will store file to HDFS, it would try to place a replica
on this same DataNode as well? And then if this DataNode would try to read
the file. HDFS would try to read it first from itself first?

Regards.

Re: HDFS read/write speeds, and read optimization

Posted by Alex Loddengaard <al...@cloudera.com>.

Answers in-line.

Alex

On Thu, Apr 9, 2009 at 3:45 PM, Stas Oskin <st...@gmail.com> wrote:

> Hi.
>
> I have 2 questions about HDFS performance:
>
> 1) How fast are the read and write operations over network, in Mbps per
> second?

Hypertable (a BigTable implementation) has a good KFS vs. HDFS breakdown: <
http://code.google.com/p/hypertable/wiki/KFSvsHDFS>

>
>
> 2) If the chunk server is located on same host as the client, is there any
> optimization in read operations?
> For example, Kosmos FS describe the following functionality:
>
> "Localhost optimization: One copy of data
> is placed on the chunkserver on the same
> host as the client doing the write
>
> Helps reduce network traffic"

In Hadoop-speak, we're interested in DataNodes (storage nodes) and
TaskTrackers (compute nodes).  In terms of MapReduce, Hadoop does try and
schedule tasks such that the data being processed by a given task on a given
machine is also on that machine.  As for loading data onto a DataNode,
loading data from a DataNode will put a replica on that node.  However, if
you're loading data from, say, your local machine, Hadoop will choose a
DataNode at random.

>
>
> Regards.
>

Re: HDFS read/write speeds, and read optimization

Posted by Owen O'Malley <om...@apache.org>.

On Apr 10, 2009, at 9:07 AM, Stas Oskin wrote:

> From your experience,  how RAM "hungry" HDFS is? Meaning, additional  
> 4GB or
> ram (to make it 8GB aas in your case), really change anything?

I don't think the 4 to 8GB would matter much for HDFS. For Map/Reduce,  
it is very important.

-- Owen

Re: HDFS read/write speeds, and read optimization

Posted by Stas Oskin <st...@gmail.com>.

Thanks for sharing.

>
> For comparison, on a 1400 node cluster, I can checksum 100 TB in
> around 10 minutes, which means I'm seeing read averages of roughly 166
> GB/sec. For writes with replication of 3, I see roughly 40-50 minutes
> to write 100TB, so roughly 33 GB/sec average. Of course the peaks are
> much higher. Each node has 4 SATA disks, dual quad core, and 8 GB of
> ram.
>

>From your experience,  how RAM "hungry" HDFS is? Meaning, additional 4GB or
ram (to make it 8GB aas in your case), really change anything?

Regards.

Re: HDFS read/write speeds, and read optimization

Posted by Owen O'Malley <ow...@gmail.com>.

On Thu, Apr 9, 2009 at 9:30 PM, Brian Bockelman <bb...@cse.unl.edu> wrote:
>
> On Apr 9, 2009, at 5:45 PM, Stas Oskin wrote:
>
>> Hi.
>>
>> I have 2 questions about HDFS performance:
>>
>> 1) How fast are the read and write operations over network, in Mbps per
>> second?
>>
>
> Depends.  What hardware?  How much hardware?  Is the cluster under load?
>  What does your I/O load look like?  As a rule of thumb, you'll probably
> expect very close to hardware speed.

For comparison, on a 1400 node cluster, I can checksum 100 TB in
around 10 minutes, which means I'm seeing read averages of roughly 166
GB/sec. For writes with replication of 3, I see roughly 40-50 minutes
to write 100TB, so roughly 33 GB/sec average. Of course the peaks are
much higher. Each node has 4 SATA disks, dual quad core, and 8 GB of
ram.

-- Owen

Re: HDFS read/write speeds, and read optimization

Posted by Stas Oskin <st...@gmail.com>.

Hi again.

By the way, I forgot to mention that I do the tests on same machines that
serve as DataNodes. i.e. same machine acts both like as a client and
DataNode.

Regards.

Re: HDFS read/write speeds, and read optimization

Posted by Eli Collins <el...@cloudera.com>.

> I actually tested it with a simple Java test loader I quickly put together,
> which ran on each machine and continuously has written random data to DFS. I
> tuned the writing rate until I got ~77Mb/s - above it the iowait loads on
> each disk (measured by iostat) became above 50% - 60%, which is quite close
> to disks limits.

How many DNs are you using? How many copies of the benchmark are you
running? What results do you get just running a single copy of the
benchmark?

I see ~46 MB/s hadoop fs put'ing a local 1gb file from one DN, using
3-way replication. Running the test on three DNs I get around 30 MB/s.
This is a little less than half the theoretical limit (using three
hosts each with a single gigabit nic). In these tests I purged the
buffer cache before running the test, with the input file cached in
memory (more similar to your test) I get 92 MB/s on one host but about
the same rate for three hosts (we're network bound). This is about 3x
faster than what you're seeing so I suspect something's up with your
test. Would be useful for you to see what results you get running the
same test I did.

> You mentioned some TestDFSIO, any idea if it's present in 0.18.3?

It's in 0.18.3 See src/test/org/apache/hadoop/fs/TestDFSIO.java

Thanks,
Eli

Re: HDFS read/write speeds, and read optimization

Posted by Stas Oskin <st...@gmail.com>.

Hi.

Can you provide more information about your workload and the
> environment? eg are you running t.o.a.h.h.BenchmarkThroughput,
> TestDFSIO, or timing hadoop fs -put/get to transfer data to hdfs from
> another machine, looking at metrics, etc. What else is running on the
> cluster? Have you profiled? etc. 77Mb/s (<10MB/s) seems low but w/o
> context is not meaningful.
>


I actually tested it with a simple Java test loader I quickly put together,
which ran on each machine and continuously has written random data to DFS. I
tuned the writing rate until I got ~77Mb/s - above it the iowait loads on
each disk (measured by iostat) became above 50% - 60%, which is quite close
to disks limits.

If there is a more official / better way to do it, I'll be happy to get some
pointers to it.
You mentioned some TestDFSIO, any idea if it's present in 0.18.3?

Regards.

Re: HDFS read/write speeds, and read optimization

Posted by Eli Collins <el...@cloudera.com>.

Hey Stas,

Can you provide more information about your workload and the
environment? eg are you running t.o.a.h.h.BenchmarkThroughput,
TestDFSIO, or timing hadoop fs -put/get to transfer data to hdfs from
another machine, looking at metrics, etc. What else is running on the
cluster? Have you profiled? etc. 77Mb/s (<10MB/s) seems low but w/o
context is not meaningful.

Thanks,
Eli

On Sat, Jan 2, 2010 at 2:54 PM, Stas Oskin <st...@gmail.com> wrote:
> Hi.
>
> Can anyone advice on the subject below?
>
> Thanks!
>
> On Mon, Dec 28, 2009 at 9:01 PM, Stas Oskin <st...@gmail.com> wrote:
>
>> Hi.
>>
>> Going back to the subject, has anyone ever bench-marked small (10 - 20
>> node) HDFS clusters?
>>
>> I did my own speed checks, and it seems I can reach ~77Mbps, on a quad-disk
>> node. This comes to ~19Mbps per disk, which seems quite low in my opinion.
>>
>> Can anyone advice about this?
>>
>> Thanks.
>>
>

Re: HDFS read/write speeds, and read optimization

Posted by Stas Oskin <st...@gmail.com>.

Hi.

We run with 2-way replication.  The wonderful folks at Yahoo! worked through
> most of the bugs during 0.19.x IIRC.  There was never any bugs with 2-way
> replication per-se, but running a cluster with 2 replicas exposed other bugs
> at a 100x rate compared to running with 3 replicas (due to the fact that a
> silent corruption + loss of a single data node = file loss).
>
> I'd estimate we lose files at a rate of about 1 per month for 200TB of
> actual data.  That number would probably go down an order of magnitude or
> more if we were running with 3 replicas.
>
> Hope this helps.
>
>
Thanks for sharing!

So, there is a good reason to believe, that version 0.19 and higher have the
file storage / silent corruption issues sorted out?

Regards.

Re: HDFS read/write speeds, and read optimization

Posted by Brian Bockelman <bb...@cse.unl.edu>.

Hi,

We run with 2-way replication.  The wonderful folks at Yahoo! worked through most of the bugs during 0.19.x IIRC.  There was never any bugs with 2-way replication per-se, but running a cluster with 2 replicas exposed other bugs at a 100x rate compared to running with 3 replicas (due to the fact that a silent corruption + loss of a single data node = file loss).

I'd estimate we lose files at a rate of about 1 per month for 200TB of actual data.  That number would probably go down an order of magnitude or more if we were running with 3 replicas.

Hope this helps.

Brian

On Jan 10, 2010, at 3:55 AM, Eli Collins wrote:

>> data.replication = 2
>> 
>> A bit of topic - is it safe to have such number? About a year ago I heard
>> only 3 way replication was fully tested, while 2 way had some issues - was
>> it fixed in subsequent versions?
> 
> I think that's still a relatively untested configuration, though I'm
> not aware of any known bugs with it. I know of at least one cluster
> that uses 2-way replication.  Note that 3-way replication is used both
> for availability and performance, though in a write benchmark 2-way
> replication should be faster than 3-way.
> 
> Thanks,
> Eli

Re: HDFS read/write speeds, and read optimization

Posted by Eli Collins <el...@cloudera.com>.

> data.replication = 2
>
> A bit of topic - is it safe to have such number? About a year ago I heard
> only 3 way replication was fully tested, while 2 way had some issues - was
> it fixed in subsequent versions?

I think that's still a relatively untested configuration, though I'm
not aware of any known bugs with it. I know of at least one cluster
that uses 2-way replication.  Note that 3-way replication is used both
for availability and performance, though in a write benchmark 2-way
replication should be faster than 3-way.

Thanks,
Eli

Re: HDFS read/write speeds, and read optimization

Posted by Stas Oskin <st...@gmail.com>.

Hi.

Also, It would be interesting to know "data.replication" setting you have
> for this benchmark?
>
>
data.replication = 2

A bit of topic - is it safe to have such number? About a year ago I heard
only 3 way replication was fully tested, while 2 way had some issues - was
it fixed in subsequent versions?

Regards.

Re: HDFS read/write speeds, and read optimization

Posted by Rajesh Balamohan <ra...@gmail.com>.

Also, It would be interesting to know "data.replication" setting you have
for this benchmark?

On Sun, Jan 3, 2010 at 8:51 AM, Andreas Kostyrka <an...@kostyrka.org>wrote:

> Well, that all depends on many details, but:
>
> -) are you really using 4 discs (configured correctly as data
> directories?)
>
> -) What hdd/connection technology?
>
> -) And 77MB/s would match up curiously well with 1Gbit networking cards?
> So you sure that you are testing a completely local setup? Where's your
> name node running then?
>
> Andreas
>
>
> Am Sonntag, den 03.01.2010, 00:54 +0200 schrieb Stas Oskin:
> > Hi.
> >
> > Can anyone advice on the subject below?
> >
> > Thanks!
> >
> > On Mon, Dec 28, 2009 at 9:01 PM, Stas Oskin <st...@gmail.com>
> wrote:
> >
> > > Hi.
> > >
> > > Going back to the subject, has anyone ever bench-marked small (10 - 20
> > > node) HDFS clusters?
> > >
> > > I did my own speed checks, and it seems I can reach ~77Mbps, on a
> quad-disk
> > > node. This comes to ~19Mbps per disk, which seems quite low in my
> opinion.
> > >
> > > Can anyone advice about this?
> > >
> > > Thanks.
> > >
>
>


-- 
~Rajesh.B

Re: HDFS read/write speeds, and read optimization

Posted by Stas Oskin <st...@gmail.com>.

Hi.

Well, that all depends on many details, but:
>
> -) are you really using 4 discs (configured correctly as data
> directories?)
>
>
Yes, 4 directories, one per each disk.


> -) What hdd/connection technology?
>
>
SATA 3Gbp/s


> -) And 77MB/s would match up curiously well with 1Gbit networking cards?
> So you sure that you are testing a completely local setup? Where's your
> name node running then?
>
>
I actually mixed this with 77Mbp/s (bits, not bytes), sorry for confusion.

Regards.

Re: HDFS read/write speeds, and read optimization

Posted by Andreas Kostyrka <an...@kostyrka.org>.

Well, that all depends on many details, but:

-) are you really using 4 discs (configured correctly as data
directories?)

-) What hdd/connection technology?

-) And 77MB/s would match up curiously well with 1Gbit networking cards?
So you sure that you are testing a completely local setup? Where's your
name node running then?

Andreas 


Am Sonntag, den 03.01.2010, 00:54 +0200 schrieb Stas Oskin:
> Hi.
> 
> Can anyone advice on the subject below?
> 
> Thanks!
> 
> On Mon, Dec 28, 2009 at 9:01 PM, Stas Oskin <st...@gmail.com> wrote:
> 
> > Hi.
> >
> > Going back to the subject, has anyone ever bench-marked small (10 - 20
> > node) HDFS clusters?
> >
> > I did my own speed checks, and it seems I can reach ~77Mbps, on a quad-disk
> > node. This comes to ~19Mbps per disk, which seems quite low in my opinion.
> >
> > Can anyone advice about this?
> >
> > Thanks.
> >

Re: HDFS read/write speeds, and read optimization

Posted by Stas Oskin <st...@gmail.com>.

Hi.

Can anyone advice on the subject below?

Thanks!

On Mon, Dec 28, 2009 at 9:01 PM, Stas Oskin <st...@gmail.com> wrote:

> Hi.
>
> Going back to the subject, has anyone ever bench-marked small (10 - 20
> node) HDFS clusters?
>
> I did my own speed checks, and it seems I can reach ~77Mbps, on a quad-disk
> node. This comes to ~19Mbps per disk, which seems quite low in my opinion.
>
> Can anyone advice about this?
>
> Thanks.
>

Re: HDFS read/write speeds, and read optimization

Posted by Stas Oskin <st...@gmail.com>.

Hi.

Going back to the subject, has anyone ever bench-marked small (10 - 20 node)
HDFS clusters?

I did my own speed checks, and it seems I can reach ~77Mbps, on a quad-disk
node. This comes to ~19Mbps per disk, which seems quite low in my opinion.

Can anyone advice about this?

Thanks.

Re: HDFS read/write speeds, and read optimization

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.

I just wanted to add to this one other published benchmark
http://developer.yahoo.net/blogs/hadoop/2008/09/scaling_hadoop_to_4000_nodes_a.html
In this example on a very busy cluster of 4000 nodes both read and write throughputs
were close to the local disk bandwidth.
This benchmark (called TestDFSIO) uses large consequent write and reads.
You can run it yourself on your hardware to compare.

> Is it more efficient to unify the disks into one volume (RAID or LVM), and
> then present them as a single space? Or it's better to specify each disk
> separately?

There was a discussion recently on this list about RAID0 vs separate disks.
Please search the archives. Separate disks turn out to perform better.

> Reliability-wise, the latter sounds more correct, as a single/several (up to
> 3) disks going down won't take the whole node with them. But perhaps there
> is a performance penalty?

You always have block replicas on other nodes, so one node going down should not be a problem.

Thanks,
--Konstantin

Re: HDFS read/write speeds, and read optimization

Posted by Stas Oskin <st...@gmail.com>.

Hi.

>
> Depends on what kind of I/O you do - are you going to be using MapReduce
> and co-locating jobs and data?  If so, it's possible to get close to those
> speeds if you are I/O bound in your job and read right through each chunk.
>  If you have multiple disks mounted individually, you'll need the number of
> streams equal to the number of disks.  If you're going to do I/O that's not
> through MapReduce, you'll probably be bound by the network interface.
>

Btw, this what I wanted to ask as well:

Is it more efficient to unify the disks into one volume (RAID or LVM), and
then present them as a single space? Or it's better to specify each disk
separately?

Reliability-wise, the latter sounds more correct, as a single/several (up to
3) disks going down won't take the whole node with them. But perhaps there
is a performance penalty?

Re: HDFS read/write speeds, and read optimization

Posted by Brian Bockelman <bb...@cse.unl.edu>.

On Apr 10, 2009, at 9:40 AM, Stas Oskin wrote:

> Hi.
>
>
>> Depends.  What hardware?  How much hardware?  Is the cluster under  
>> load?
>> What does your I/O load look like?  As a rule of thumb, you'll  
>> probably
>> expect very close to hardware speed.
>>
>
> Standard Xeon dual cpu, quad core servers, 4 GB RAM.
> The DataNodes also do some processing, with usual loads about ~4  
> (from 8
> recommended). The IO load is linear, there are almost no write or read
> peaks.
>

Interesting -- machines are fairly RAM-poor for data processing ... I  
guess your tasks must be fairly efficient.

> By close to hardware speed, you mean results very near the results I  
> get via
> iozone?

Depends on what kind of I/O you do - are you going to be using  
MapReduce and co-locating jobs and data?  If so, it's possible to get  
close to those speeds if you are I/O bound in your job and read right  
through each chunk.  If you have multiple disks mounted individually,  
you'll need the number of streams equal to the number of disks.  If  
you're going to do I/O that's not through MapReduce, you'll probably  
be bound by the network interface.

Brian

Re: HDFS read/write speeds, and read optimization

Posted by Stas Oskin <st...@gmail.com>.

Hi.


> Depends.  What hardware?  How much hardware?  Is the cluster under load?
>  What does your I/O load look like?  As a rule of thumb, you'll probably
> expect very close to hardware speed.
>

Standard Xeon dual cpu, quad core servers, 4 GB RAM.
The DataNodes also do some processing, with usual loads about ~4 (from 8
recommended). The IO load is linear, there are almost no write or read
peaks.

By close to hardware speed, you mean results very near the results I get via
iozone?

Thanks.

Re: HDFS read/write speeds, and read optimization

Posted by Brian Bockelman <bb...@cse.unl.edu>.

On Apr 9, 2009, at 5:45 PM, Stas Oskin wrote:

> Hi.
>
> I have 2 questions about HDFS performance:
>
> 1) How fast are the read and write operations over network, in Mbps  
> per
> second?
>

Depends.  What hardware?  How much hardware?  Is the cluster under  
load?  What does your I/O load look like?  As a rule of thumb, you'll  
probably expect very close to hardware speed.

At this instant, our cluster is doing about 10k reads / sec, each read  
is 128KB.  About 1.2GB / s.  The max we've recorded on this cluster is  
8GB/s.  http://rcf.unl.edu/ganglia/?c=red-workers  However, you  
probably don't care much about that unless you get to run on our  
hardware ;)

Unfortunately, this question is kind of like asking "How fast is  
Linux?"  There's just so many different performance characteristics  
that giving a good answer is always tough.  My recommendation is to  
take an afternoon and play around with it.

> 2) If the chunk server is located on same host as the client, is  
> there any
> optimization in read operations?
> For example, Kosmos FS describe the following functionality:
>
> "Localhost optimization: One copy of data
> is placed on the chunkserver on the same
> host as the client doing the write
>
> Helps reduce network traffic"

Yes, this optimization is performed.

Brian

Re: HDFS as a logfile ??

Posted by Brian Bockelman <bb...@cse.unl.edu>.

Also, Chukwa (a project already in Hadoop contrib) is designed to do  
something similar with Hadoop directly:

http://wiki.apache.org/hadoop/Chukwa

I think some of the examples even mention Apache logs.  Haven't used  
it personally, but it looks nice.

Brian

On Apr 9, 2009, at 11:14 PM, Alex Loddengaard wrote:

> This is a great idea and a common application, Ricky.  Scribe is  
> probably
> useful for you as well:
>
> <http://sourceforge.net/projects/scribeserver/>
> <
> http://images.google.com/imgres?imgurl=http://farm3.static.flickr.com/2211/2197670659_b42810b8ba.jpg&imgrefurl=http://www.flickr.com/photos/niallkennedy/2197670659/&usg=__WLc-p9Gi_p3AdA-YuKLRZ-bdgvo=&h=375&w=500&sz=131&hl=en&start=2&sig2=P22LVO1KObby6_DDy8ujYg&um=1&tbnid=QudxiEyFOk1EpM:&tbnh=98&tbnw=130&prev=/images%3Fq%3Dfacebook%2Bscribe%2Bhadoop%26hl%3Den%26client%3Dfirefox-a%26rls%3Dorg.mozilla:en-US:official%26sa%3DN%26um%3D1&ei=48beSa74L4H-swORnPmjDg
>>
>
> Scribe is what Facebook uses to get its Apache logs to Hadoop.
> Unfortunately, HDFS doesn't (yet) have append, so you'll have to  
> batch log
> files and load them into HDFS in bulk.
>
> Alex
>
> On Thu, Apr 9, 2009 at 9:06 PM, Ricky Ho <rh...@adobe.com> wrote:
>
>> I want to analyze the traffic pattern and statistics of a distributed
>> application.  I am thinking of having the application write the  
>> events as
>> log entries into HDFS and then later I can use a Map/Reduce task to  
>> do the
>> analysis in parallel.  Is this a good approach ?
>>
>> In this case, does HDFS support concurrent write (append) to a file ?
>> Another question is whether the write API thread-safe ?
>>
>> Rgds,
>> Ricky
>>

Re: HDFS as a logfile ??

Posted by Alex Loddengaard <al...@cloudera.com>.

This is a great idea and a common application, Ricky.  Scribe is probably
useful for you as well:

<http://sourceforge.net/projects/scribeserver/>
<
http://images.google.com/imgres?imgurl=http://farm3.static.flickr.com/2211/2197670659_b42810b8ba.jpg&imgrefurl=http://www.flickr.com/photos/niallkennedy/2197670659/&usg=__WLc-p9Gi_p3AdA-YuKLRZ-bdgvo=&h=375&w=500&sz=131&hl=en&start=2&sig2=P22LVO1KObby6_DDy8ujYg&um=1&tbnid=QudxiEyFOk1EpM:&tbnh=98&tbnw=130&prev=/images%3Fq%3Dfacebook%2Bscribe%2Bhadoop%26hl%3Den%26client%3Dfirefox-a%26rls%3Dorg.mozilla:en-US:official%26sa%3DN%26um%3D1&ei=48beSa74L4H-swORnPmjDg
>

Scribe is what Facebook uses to get its Apache logs to Hadoop.
Unfortunately, HDFS doesn't (yet) have append, so you'll have to batch log
files and load them into HDFS in bulk.

Alex

On Thu, Apr 9, 2009 at 9:06 PM, Ricky Ho <rh...@adobe.com> wrote:

> I want to analyze the traffic pattern and statistics of a distributed
> application.  I am thinking of having the application write the events as
> log entries into HDFS and then later I can use a Map/Reduce task to do the
> analysis in parallel.  Is this a good approach ?
>
> In this case, does HDFS support concurrent write (append) to a file ?
>  Another question is whether the write API thread-safe ?
>
> Rgds,
> Ricky
>

Re: HDFS as a logfile ??

Posted by Ariel Rabkin <as...@gmail.com>.

Everything gets dumped into the same files.

We don't assume anything at all about the format of the input data; it
gets dumped into Hadoop sequence files, tagged with some metadata to
say what machine and app it came from, and where it was in the
original stream.

There is a slight penalty from the log-to-local disk. In practice, you
often want a local copy anyway, both for redundancy and because it's
much more convenient for human inspection.  Having a separate
collector process is indeed inelegant. However, HDFS copes badly with
many small files, so that pushes you to merge entries across either
hosts or time partitions. And since HDFS doesn't have a flush(),
having one log per source means that log entries don't become visible
quickly enough.   Hence, collectors.

It isn't gorgeous, but it seems to work fine in practice.

On Mon, Apr 13, 2009 at 8:01 AM, Ricky Ho <rh...@adobe.com> wrote:
> Ari, thanks for your note.
>
> Like to understand more how Chukwa group log entries ... If I have appA running in machine X, Y and appB running in machine Y, Z.  Each of them calling the Chukwa log API.
>
> Do I have all entries going in the same HDFS file ?  or 4 separated HDFS files based on the App/Machine combination ?
>
> If the answer of first Q is "yes", then what if appA and appB has different format of log entries ?
> If the answer of second Q is "yes", then are all these HDFS files cut at the same time boundary ?
>
> Looks like in Chukwa, application first log to a daemon, which buffer-write the log entries into a local file.  And there is a separate process to ship these data to a remote collector daemon which issue the actual HDFS write.  I observe the following overhead ...
>
> 1) The overhead of extra write to local disk and ship the data over to the collector.  If HDFS supports append, the application can then go directly to the HDFS
>
> 2) The centralized collector establish a bottleneck to the otherwise perfectly parallel HDFS architecture.
>
> Am I missing something here ?
>

-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

RE: HDFS as a logfile ??

Posted by Ricky Ho <rh...@adobe.com>.

Ari, thanks for your note.

Like to understand more how Chukwa group log entries ... If I have appA running in machine X, Y and appB running in machine Y, Z.  Each of them calling the Chukwa log API.

Do I have all entries going in the same HDFS file ?  or 4 separated HDFS files based on the App/Machine combination ?

If the answer of first Q is "yes", then what if appA and appB has different format of log entries ?
If the answer of second Q is "yes", then are all these HDFS files cut at the same time boundary ?

Looks like in Chukwa, application first log to a daemon, which buffer-write the log entries into a local file.  And there is a separate process to ship these data to a remote collector daemon which issue the actual HDFS write.  I observe the following overhead ...

1) The overhead of extra write to local disk and ship the data over to the collector.  If HDFS supports append, the application can then go directly to the HDFS

2) The centralized collector establish a bottleneck to the otherwise perfectly parallel HDFS architecture.

Am I missing something here ?

Rgds,
Ricky

-----Original Message-----
From: Ariel Rabkin [mailto:asrabkin@gmail.com] 
Sent: Monday, April 13, 2009 7:38 AM
To: core-user@hadoop.apache.org
Subject: Re: HDFS as a logfile ??

Chukwa is a Hadoop subproject aiming to do something similar, though
particularly for the case of Hadoop logs.  You may find it useful.

Hadoop unfortunately does not support concurrent appends.  As a
result, the Chukwa project found itself creating a whole new demon,
the chukwa collector, precisely to merge the event streams and write
it out, just once. We're set to do a release within the next week or
two, but in the meantime you can check it out from SVN at
https://svn.apache.org/repos/asf/hadoop/chukwa/trunk

--Ari

On Fri, Apr 10, 2009 at 12:06 AM, Ricky Ho <rh...@adobe.com> wrote:
> I want to analyze the traffic pattern and statistics of a distributed application.  I am thinking of having the application write the events as log entries into HDFS and then later I can use a Map/Reduce task to do the analysis in parallel.  Is this a good approach ?
>
> In this case, does HDFS support concurrent write (append) to a file ?  Another question is whether the write API thread-safe ?
>
> Rgds,
> Ricky
>

-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

Re: HDFS as a logfile ??

Posted by Ariel Rabkin <as...@gmail.com>.

Chukwa is a Hadoop subproject aiming to do something similar, though
particularly for the case of Hadoop logs.  You may find it useful.

Hadoop unfortunately does not support concurrent appends.  As a
result, the Chukwa project found itself creating a whole new demon,
the chukwa collector, precisely to merge the event streams and write
it out, just once. We're set to do a release within the next week or
two, but in the meantime you can check it out from SVN at
https://svn.apache.org/repos/asf/hadoop/chukwa/trunk

--Ari

On Fri, Apr 10, 2009 at 12:06 AM, Ricky Ho <rh...@adobe.com> wrote:
> I want to analyze the traffic pattern and statistics of a distributed application.  I am thinking of having the application write the events as log entries into HDFS and then later I can use a Map/Reduce task to do the analysis in parallel.  Is this a good approach ?
>
> In this case, does HDFS support concurrent write (append) to a file ?  Another question is whether the write API thread-safe ?
>
> Rgds,
> Ricky
>

-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

HDFS as a logfile ??

Posted by Ricky Ho <rh...@adobe.com>.

I want to analyze the traffic pattern and statistics of a distributed application.  I am thinking of having the application write the events as log entries into HDFS and then later I can use a Map/Reduce task to do the analysis in parallel.  Is this a good approach ?

In this case, does HDFS support concurrent write (append) to a file ?  Another question is whether the write API thread-safe ?

Rgds,
Ricky