You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Frank Grimes <fr...@yahoo.com> on 2012/02/15 16:22:56 UTC

Flume-NG, Avro and NodeJS

Hi All,

We are currently using Flume-0.9.5-SNAPSHOT to send Flume Thrift events to Node.js for realtime monitoring of events.
To do so we are making use of the node-thrift module (https://github.com/wadey/node-thrift) to run a Flume Thrift source within Node.js.

We were wondering if with a move to Flume-NG and/or Avro, we could configure a Flume sink to send events as Avro JSON instead of Avro binary to ease/simplify the consumption of events by Node.js.

Simplification aside, we're also noticing that our Node.js process uses a fair amount of CPU to handle our current load of about 1500 events per second.
Specifically, 40%-50% with occasional spikes to 100%. (not sure what's causing the spikes yet, perhaps V8 GC)
So we're also hoping that the latest Node.js/V8 might be able to parse JSON with less CPU usage than node-thrift can parse Thrift.

Any ideas/suggestions are welcome!

Thanks,

Frank Grimes

Re: Flume-NG, Avro and NodeJS

Posted by Arvind Prabhakar <ar...@cloudera.com>.
Hi Frank,

For Flume 1.x (NG), the event payload is always a byte array. You could
generate this payload from the JSON representation and reconstruct it back
at the terminal sink into the string form that you like.

Regarding performance expectations from using Avro events - I suggest that
you run a simple performance test to see if this is meets your
expectations. If it does not, we will be happy to investigate it further
and add more optimizations where they seem necessary.

Thanks,
Arvind

On Wed, Feb 15, 2012 at 7:22 AM, Frank Grimes <fr...@yahoo.com>wrote:

> Hi All,
>
> We are currently using Flume-0.9.5-SNAPSHOT to send Flume Thrift events to
> Node.js for realtime monitoring of events.
> To do so we are making use of the node-thrift module (
> https://github.com/wadey/node-thrift) to run a Flume Thrift source within
> Node.js.
>
> We were wondering if with a move to Flume-NG and/or Avro, we could
> configure a Flume sink to send events as Avro JSON instead of Avro binary
> to ease/simplify the consumption of events by Node.js.
>
> Simplification aside, we're also noticing that our Node.js process uses a
> fair amount of CPU to handle our current load of about 1500 events per
> second.
> Specifically, 40%-50% with occasional spikes to 100%. (not sure what's
> causing the spikes yet, perhaps V8 GC)
> So we're also hoping that the latest Node.js/V8 might be able to parse
> JSON with less CPU usage than node-thrift can parse Thrift.
>
> Any ideas/suggestions are welcome!
>
> Thanks,
>
> Frank Grimes