You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Raphael Hsieh <ra...@gmail.com> on 2015/01/08 20:12:36 UTC

DataStore ?

Doing a persistentAggregate to an external datastore seems like a pretty
standard use case. However Storm/Trident processes so many batches every
second, there are not many databases that can keep up with that large
amount of read/write throughput.

How have people been deciding to store their storm aggregations in a way
that external services might be able to access this data ?

-- 
Raphael Hsieh

Re: DataStore ?

Posted by Otis Gospodnetic <ot...@gmail.com>.
HBase, Cassandra, event Solr, and Elasticsearch will likely do.

If you want to decouple things a bit, put make Trident write to Kafka and
then consume from Kafka.

With so many moving pieces make sure you have a good
ops/monitoring/tracing/logging tool for troubleshooting and tuning.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Jan 12, 2015 at 6:47 PM, Nathan Marz <na...@nathanmarz.com> wrote:

> Just have Trident write directly to whatever datastore you want. Trident's
> ability to interact with external state is completely generic, and the
> auto-batching will let you make efficient use of whatever database you
> choose.
>
> On Mon, Jan 12, 2015 at 6:27 PM, Raphael Hsieh <ra...@gmail.com>
> wrote:
>
>> Thanks for your replies.
>> Nathan do you have any suggestions for external datastores? How were you
>> envisioning the use case for this? Just to stick it into a memcache and
>> from there transfer the data to a different external datastore ?
>>
>> Thanks
>>
>> On Thu, Jan 8, 2015 at 12:49 PM, Nathan Marz <na...@nathanmarz.com>
>> wrote:
>>
>>> Trident typically processes just a few batches per second. Actually
>>> you'll get much better db performance through Trident than you typically
>>> would manually *because* of the batching (instead of lots of individual
>>> round trips).
>>>
>>> On Thu, Jan 8, 2015 at 2:12 PM, Raphael Hsieh <ra...@gmail.com>
>>> wrote:
>>>
>>>> Doing a persistentAggregate to an external datastore seems like a
>>>> pretty standard use case. However Storm/Trident processes so many batches
>>>> every second, there are not many databases that can keep up with that large
>>>> amount of read/write throughput.
>>>>
>>>> How have people been deciding to store their storm aggregations in a
>>>> way that external services might be able to access this data ?
>>>>
>>>> --
>>>> Raphael Hsieh
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Twitter: @nathanmarz
>>> http://nathanmarz.com
>>>
>>
>>
>>
>> --
>> Raphael Hsieh
>>
>>
>>
>
>
>
> --
> Twitter: @nathanmarz
> http://nathanmarz.com
>

Re: DataStore ?

Posted by Nathan Marz <na...@nathanmarz.com>.
Just have Trident write directly to whatever datastore you want. Trident's
ability to interact with external state is completely generic, and the
auto-batching will let you make efficient use of whatever database you
choose.

On Mon, Jan 12, 2015 at 6:27 PM, Raphael Hsieh <ra...@gmail.com> wrote:

> Thanks for your replies.
> Nathan do you have any suggestions for external datastores? How were you
> envisioning the use case for this? Just to stick it into a memcache and
> from there transfer the data to a different external datastore ?
>
> Thanks
>
> On Thu, Jan 8, 2015 at 12:49 PM, Nathan Marz <na...@nathanmarz.com>
> wrote:
>
>> Trident typically processes just a few batches per second. Actually
>> you'll get much better db performance through Trident than you typically
>> would manually *because* of the batching (instead of lots of individual
>> round trips).
>>
>> On Thu, Jan 8, 2015 at 2:12 PM, Raphael Hsieh <ra...@gmail.com>
>> wrote:
>>
>>> Doing a persistentAggregate to an external datastore seems like a pretty
>>> standard use case. However Storm/Trident processes so many batches every
>>> second, there are not many databases that can keep up with that large
>>> amount of read/write throughput.
>>>
>>> How have people been deciding to store their storm aggregations in a way
>>> that external services might be able to access this data ?
>>>
>>> --
>>> Raphael Hsieh
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Twitter: @nathanmarz
>> http://nathanmarz.com
>>
>
>
>
> --
> Raphael Hsieh
>
>
>



-- 
Twitter: @nathanmarz
http://nathanmarz.com

Re: DataStore ?

Posted by Raphael Hsieh <ra...@gmail.com>.
Thanks for your replies.
Nathan do you have any suggestions for external datastores? How were you
envisioning the use case for this? Just to stick it into a memcache and
from there transfer the data to a different external datastore ?

Thanks

On Thu, Jan 8, 2015 at 12:49 PM, Nathan Marz <na...@nathanmarz.com> wrote:

> Trident typically processes just a few batches per second. Actually you'll
> get much better db performance through Trident than you typically would
> manually *because* of the batching (instead of lots of individual round
> trips).
>
> On Thu, Jan 8, 2015 at 2:12 PM, Raphael Hsieh <ra...@gmail.com>
> wrote:
>
>> Doing a persistentAggregate to an external datastore seems like a pretty
>> standard use case. However Storm/Trident processes so many batches every
>> second, there are not many databases that can keep up with that large
>> amount of read/write throughput.
>>
>> How have people been deciding to store their storm aggregations in a way
>> that external services might be able to access this data ?
>>
>> --
>> Raphael Hsieh
>>
>>
>>
>>
>
>
>
> --
> Twitter: @nathanmarz
> http://nathanmarz.com
>



-- 
Raphael Hsieh

Re: DataStore ?

Posted by Nathan Marz <na...@nathanmarz.com>.
Trident typically processes just a few batches per second. Actually you'll
get much better db performance through Trident than you typically would
manually *because* of the batching (instead of lots of individual round
trips).

On Thu, Jan 8, 2015 at 2:12 PM, Raphael Hsieh <ra...@gmail.com> wrote:

> Doing a persistentAggregate to an external datastore seems like a pretty
> standard use case. However Storm/Trident processes so many batches every
> second, there are not many databases that can keep up with that large
> amount of read/write throughput.
>
> How have people been deciding to store their storm aggregations in a way
> that external services might be able to access this data ?
>
> --
> Raphael Hsieh
>
>
>
>



-- 
Twitter: @nathanmarz
http://nathanmarz.com

Re: DataStore ?

Posted by Nathan Leung <nc...@gmail.com>.
Not sure I agree about Trident batches / second, though to be fair I
haven't used it myself.  That said, you can either batch yourself in storm
(thus reduced the number of DB transactions / s), or use something like
HBase or Redis / MongoDB / Couchbase depending on your needs.

On Thu, Jan 8, 2015 at 2:12 PM, Raphael Hsieh <ra...@gmail.com> wrote:

> Doing a persistentAggregate to an external datastore seems like a pretty
> standard use case. However Storm/Trident processes so many batches every
> second, there are not many databases that can keep up with that large
> amount of read/write throughput.
>
> How have people been deciding to store their storm aggregations in a way
> that external services might be able to access this data ?
>
> --
> Raphael Hsieh
>
>
>
>