You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by Kamal Bahadur <ma...@gmail.com> on 2011/10/17 22:58:08 UTC

Cassandra Sink using Hector

Hi,

I have written a sink for writing data into Casandra using Hector API. It
looks like Hector does a great job of connection pooling and load balancing.
As soon as I start the collector, I can see 16 conections being established
between collector and cassandra. I am not sure if flume is taking advantage
of those connections in the pool. I am assuming that, Collector's append
method is not multi-threaded and therefore only one connection is being used
at any point of time. Can someone confirm this?

Thanks,
Kamal

Re: Cassandra Sink using Hector

Posted by Dani Rayan <da...@gmail.com>.
Hi Kamal,

Flume's design is such that it is horizontally scalable, add more boxes and
run more collector daemons. It should be able to handle 2000 messages per
second.
You can configure a fail over chain to avoid loss of events.

IMHO, the downside of mulch-threaded approach is lack of manageability.

On Mon, Oct 17, 2011 at 9:58 PM, Kamal Bahadur <ma...@gmail.com>wrote:

> Hi Dani,
>
> Thanks for the reply. I am using E2E relaibility mode. If I spawn new
> thread for each append call, I am not sure if the acks will be handled
> properly. I might lose an event if the child thread ends up in an exception.
> Do you have any suggestion for my use case? With current setup, I am able to
> write only 500 events per second. The expected events rate is over 2000 per
> second. I tried to increase the number of collectors and it seems to help.
> Is this my only option?
>
> Thanks,
> Kamal
>
>
> On Mon, Oct 17, 2011 at 4:42 PM, Dani Rayan <da...@gmail.com> wrote:
>
>> Hey Kamal,
>>
>> You are correct. The append method would not spawn new threads by itself.
>> However, you can still override it.
>>
>>
>> On Mon, Oct 17, 2011 at 1:58 PM, Kamal Bahadur <ma...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I have written a sink for writing data into Casandra using Hector API. It
>>> looks like Hector does a great job of connection pooling and load balancing.
>>> As soon as I start the collector, I can see 16 conections being established
>>> between collector and cassandra. I am not sure if flume is taking advantage
>>> of those connections in the pool. I am assuming that, Collector's append
>>> method is not multi-threaded and therefore only one connection is being used
>>> at any point of time. Can someone confirm this?
>>>
>>> Thanks,
>>> Kamal
>>>
>>
>>
>>
>> --
>> -Dani Abel Rayan
>>
>
>


-- 
-Dani Abel Rayan

Re: Cassandra Sink using Hector

Posted by Kamal Bahadur <ma...@gmail.com>.
Hi Dani,

Thanks for the reply. I am using E2E relaibility mode. If I spawn new thread
for each append call, I am not sure if the acks will be handled properly. I
might lose an event if the child thread ends up in an exception. Do you have
any suggestion for my use case? With current setup, I am able to write only
500 events per second. The expected events rate is over 2000 per second. I
tried to increase the number of collectors and it seems to help. Is this my
only option?

Thanks,
Kamal

On Mon, Oct 17, 2011 at 4:42 PM, Dani Rayan <da...@gmail.com> wrote:

> Hey Kamal,
>
> You are correct. The append method would not spawn new threads by itself.
> However, you can still override it.
>
>
> On Mon, Oct 17, 2011 at 1:58 PM, Kamal Bahadur <ma...@gmail.com>wrote:
>
>> Hi,
>>
>> I have written a sink for writing data into Casandra using Hector API. It
>> looks like Hector does a great job of connection pooling and load balancing.
>> As soon as I start the collector, I can see 16 conections being established
>> between collector and cassandra. I am not sure if flume is taking advantage
>> of those connections in the pool. I am assuming that, Collector's append
>> method is not multi-threaded and therefore only one connection is being used
>> at any point of time. Can someone confirm this?
>>
>> Thanks,
>> Kamal
>>
>
>
>
> --
> -Dani Abel Rayan
>

Re: Cassandra Sink using Hector

Posted by Dani Rayan <da...@gmail.com>.
Hey Kamal,

You are correct. The append method would not spawn new threads by itself.
However, you can still override it.

On Mon, Oct 17, 2011 at 1:58 PM, Kamal Bahadur <ma...@gmail.com>wrote:

> Hi,
>
> I have written a sink for writing data into Casandra using Hector API. It
> looks like Hector does a great job of connection pooling and load balancing.
> As soon as I start the collector, I can see 16 conections being established
> between collector and cassandra. I am not sure if flume is taking advantage
> of those connections in the pool. I am assuming that, Collector's append
> method is not multi-threaded and therefore only one connection is being used
> at any point of time. Can someone confirm this?
>
> Thanks,
> Kamal
>



-- 
-Dani Abel Rayan