You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Sa Li <sa...@gmail.com> on 2014/08/14 01:37:27 UTC

anyone use Storm kafkaSpout implement a HyperLoglog

Hi, All

I am thinking to implement HyperLoglog by storm with KafkaSpout, and output
not only the distinct counts, but also some kind of bitmap string, anyone
did the similar job, a guide for start is highly appreciated.

thanks

Alec

Re: anyone use Storm kafkaSpout implement a HyperLoglog

Posted by Sa Li <sa...@gmail.com>.
You are right, my plan was to store the cardinalities as well as maybe a
bitmap string into database, that surely save huge space. However, the we
already have a channel to populate the web events into postgres for some
other analytics use, which is kinda parallel with the process kafka
listening to web-server. My question is if I can do the distinct counting
in postgres which already exists, what will be the advantage to do the
similar thing in storm, of course implementation will help me to learn the
storm and kafka stuff. Maybe it is even faster because the parallelism in
storm?

thanks

Alec


On Fri, Aug 15, 2014 at 3:58 PM, Sam Goodwin <sa...@gmail.com>
wrote:

> I'm not too sure about how postgres hll works but i'm assuming you're
> going to have to send every tuple to Postgres DB remotely. This is very
> expensive. Where if you build your hll data strucuture in storm you only
> have to persist the fixed size serialized version of the hll to the
> database each transaction. This sort of solution scales much better.
>
>
> On Fri, Aug 15, 2014 at 1:42 PM, Sa Li <sa...@gmail.com> wrote:
>
>> postgresql-hll: the PostgreSQL extension adding HyperLogLog data
>> structures seems pretty good, If we do counting directly in postgresDB.
>>
>>
>> On Fri, Aug 15, 2014 at 1:38 PM, Sa Li <sa...@gmail.com> wrote:
>>
>>> Hi, all
>>>
>>> Continue this topic, I am bit of confused whether I should implement the
>>> hyperloglog in storm or perform the postgresql-hll extension in postgresDB,
>>> if I can effectively count the uniques in postgresql-hll, and write into a
>>> separate distinct count table, why would I implement that in storm? I know
>>> some developers are implementing hll in storm, and I am just unclear what
>>> the advantage to do that in storm than in database with hll-extension.
>>>
>>> thanks
>>>
>>> Alec
>>>
>>>
>>> On Wed, Aug 13, 2014 at 4:37 PM, Sa Li <sa...@gmail.com> wrote:
>>>
>>>> Hi, All
>>>>
>>>> I am thinking to implement HyperLoglog by storm with KafkaSpout, and
>>>> output not only the distinct counts, but also some kind of bitmap string,
>>>> anyone did the similar job, a guide for start is highly appreciated.
>>>>
>>>> thanks
>>>>
>>>> Alec
>>>>
>>>
>>>
>>
>

Re: anyone use Storm kafkaSpout implement a HyperLoglog

Posted by Sam Goodwin <sa...@gmail.com>.
I'm not too sure about how postgres hll works but i'm assuming you're going
to have to send every tuple to Postgres DB remotely. This is very
expensive. Where if you build your hll data strucuture in storm you only
have to persist the fixed size serialized version of the hll to the
database each transaction. This sort of solution scales much better.


On Fri, Aug 15, 2014 at 1:42 PM, Sa Li <sa...@gmail.com> wrote:

> postgresql-hll: the PostgreSQL extension adding HyperLogLog data
> structures seems pretty good, If we do counting directly in postgresDB.
>
>
> On Fri, Aug 15, 2014 at 1:38 PM, Sa Li <sa...@gmail.com> wrote:
>
>> Hi, all
>>
>> Continue this topic, I am bit of confused whether I should implement the
>> hyperloglog in storm or perform the postgresql-hll extension in postgresDB,
>> if I can effectively count the uniques in postgresql-hll, and write into a
>> separate distinct count table, why would I implement that in storm? I know
>> some developers are implementing hll in storm, and I am just unclear what
>> the advantage to do that in storm than in database with hll-extension.
>>
>> thanks
>>
>> Alec
>>
>>
>> On Wed, Aug 13, 2014 at 4:37 PM, Sa Li <sa...@gmail.com> wrote:
>>
>>> Hi, All
>>>
>>> I am thinking to implement HyperLoglog by storm with KafkaSpout, and
>>> output not only the distinct counts, but also some kind of bitmap string,
>>> anyone did the similar job, a guide for start is highly appreciated.
>>>
>>> thanks
>>>
>>> Alec
>>>
>>
>>
>

Re: anyone use Storm kafkaSpout implement a HyperLoglog

Posted by Sa Li <sa...@gmail.com>.
postgresql-hll: the PostgreSQL extension adding HyperLogLog data structures
seems pretty good, If we do counting directly in postgresDB.


On Fri, Aug 15, 2014 at 1:38 PM, Sa Li <sa...@gmail.com> wrote:

> Hi, all
>
> Continue this topic, I am bit of confused whether I should implement the
> hyperloglog in storm or perform the postgresql-hll extension in postgresDB,
> if I can effectively count the uniques in postgresql-hll, and write into a
> separate distinct count table, why would I implement that in storm? I know
> some developers are implementing hll in storm, and I am just unclear what
> the advantage to do that in storm than in database with hll-extension.
>
> thanks
>
> Alec
>
>
> On Wed, Aug 13, 2014 at 4:37 PM, Sa Li <sa...@gmail.com> wrote:
>
>> Hi, All
>>
>> I am thinking to implement HyperLoglog by storm with KafkaSpout, and
>> output not only the distinct counts, but also some kind of bitmap string,
>> anyone did the similar job, a guide for start is highly appreciated.
>>
>> thanks
>>
>> Alec
>>
>
>

Re: anyone use Storm kafkaSpout implement a HyperLoglog

Posted by Sa Li <sa...@gmail.com>.
Hi, all

Continue this topic, I am bit of confused whether I should implement the
hyperloglog in storm or perform the postgresql-hll extension in postgresDB,
if I can effectively count the uniques in postgresql-hll, and write into a
separate distinct count table, why would I implement that in storm? I know
some developers are implementing hll in storm, and I am just unclear what
the advantage to do that in storm than in database with hll-extension.

thanks

Alec


On Wed, Aug 13, 2014 at 4:37 PM, Sa Li <sa...@gmail.com> wrote:

> Hi, All
>
> I am thinking to implement HyperLoglog by storm with KafkaSpout, and
> output not only the distinct counts, but also some kind of bitmap string,
> anyone did the similar job, a guide for start is highly appreciated.
>
> thanks
>
> Alec
>