You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Raphael Hsieh <ra...@gmail.com> on 2014/06/04 00:08:08 UTC

how does PersistentAggregate distribute the DB Calls ?

How does PersistentAggregate distribute the database calls across all the
worked nodes ?
Does it do the global aggregation then choose a single host to do a
multiget/multiput to the external db ?

Thanks
-- 
Raphael Hsieh

Re: how does PersistentAggregate distribute the DB Calls ?

Posted by Raphael Hsieh <ra...@gmail.com>.
Thanks for your quick reply nathan.
So I'm doing some debugging of my topology, and I've removed all the logic
from my MultiPut function, replacing it with a single System.out.println()
Then i am monitoring my logs to check when this gets printed out.
It looks like every single one of my hosts (workers) hits this. Does this
then indicate that I am processing many many partitions that each hit this
multiPut and prints out?
Thanks.


On Tue, Jun 3, 2014 at 3:29 PM, Nathan Marz <na...@nathanmarz.com> wrote:

> When possible it will do as much aggregation Storm-side so as to minimize
> amount it needs to interact with database. So if you do a persistent global
> count, for example, it will compute the count for the batch (in parallel),
> and then the task that finishes the global count will do a single
> get/update/put to the database.
>
>
> On Tue, Jun 3, 2014 at 3:08 PM, Raphael Hsieh <ra...@gmail.com>
> wrote:
>
>> How does PersistentAggregate distribute the database calls across all the
>> worked nodes ?
>> Does it do the global aggregation then choose a single host to do a
>> multiget/multiput to the external db ?
>>
>> Thanks
>> --
>> Raphael Hsieh
>>
>>
>>
>>
>
>
>
> --
> Twitter: @nathanmarz
> http://nathanmarz.com
>



-- 
Raphael Hsieh

Re: how does PersistentAggregate distribute the DB Calls ?

Posted by Nathan Marz <na...@nathanmarz.com>.
When possible it will do as much aggregation Storm-side so as to minimize
amount it needs to interact with database. So if you do a persistent global
count, for example, it will compute the count for the batch (in parallel),
and then the task that finishes the global count will do a single
get/update/put to the database.


On Tue, Jun 3, 2014 at 3:08 PM, Raphael Hsieh <ra...@gmail.com> wrote:

> How does PersistentAggregate distribute the database calls across all the
> worked nodes ?
> Does it do the global aggregation then choose a single host to do a
> multiget/multiput to the external db ?
>
> Thanks
> --
> Raphael Hsieh
>
>
>
>



-- 
Twitter: @nathanmarz
http://nathanmarz.com