You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Mike Heffner <mi...@librato.com> on 2014/04/17 17:00:22 UTC

Clearing tuple payloads after processing

We have a topology that we're trying to push the throughput on as much as
possible. While profiling the topology we found that we are holding onto a
lot of memory in our list of tuples prior to acking them. It appears that
most of this memory is coming from holding onto the original message
payload in its raw format (char[] in our case). Our topology is performing
online aggregation, so our internal tracking memory is typically quite
small as we aggregate 1,000's of messages into a single bucket. However,
maintaining the list of all raw tuple payloads that went into the
aggregation bucket for the duration of our checkpointing frequency can chew
up a significant footprint of memory.

Is there a way to clear the tuple Values() after it has been processed, but
before acking it? Our alternative solution is to try a different
serialization format that requires a smaller payload. While this would
potentially reduce our footprint by a good factor, it would still have
limits. Ideally we could strip the tuple list down to only the required
message IDs bits required for proper storm message acking.

Any ideas? We are on version 0.9.0.1.

Thanks,

Mike

-- 

  Mike Heffner <mi...@librato.com>
  Librato, Inc.

Re: Clearing tuple payloads after processing

Posted by Mike Heffner <mi...@librato.com>.
Jon,

We actually took that exact approach in our testing:

tuple.getValues().clear()

Good to hear that others have recognized as a pain point and that there's
room for improvement.

Cheers,

Mike


On Thu, Apr 17, 2014 at 8:16 PM, Jon Logan <jm...@buffalo.edu> wrote:

> I've ran into a similar issue. There's been talk in the past about fixing
> this, but it hasn't been. As a work around, you can actually use Reflection
> to get a hold of the private "values" variable, and just call clear() on it.
>
>
> https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/tuple/TupleImpl.java
>
>
> On Thu, Apr 17, 2014 at 11:00 AM, Mike Heffner <mi...@librato.com> wrote:
>
>> We have a topology that we're trying to push the throughput on as much as
>> possible. While profiling the topology we found that we are holding onto a
>> lot of memory in our list of tuples prior to acking them. It appears that
>> most of this memory is coming from holding onto the original message
>> payload in its raw format (char[] in our case). Our topology is performing
>> online aggregation, so our internal tracking memory is typically quite
>> small as we aggregate 1,000's of messages into a single bucket. However,
>> maintaining the list of all raw tuple payloads that went into the
>> aggregation bucket for the duration of our checkpointing frequency can chew
>> up a significant footprint of memory.
>>
>> Is there a way to clear the tuple Values() after it has been processed,
>> but before acking it? Our alternative solution is to try a different
>> serialization format that requires a smaller payload. While this would
>> potentially reduce our footprint by a good factor, it would still have
>> limits. Ideally we could strip the tuple list down to only the required
>> message IDs bits required for proper storm message acking.
>>
>> Any ideas? We are on version 0.9.0.1.
>>
>> Thanks,
>>
>> Mike
>>
>> --
>>
>>   Mike Heffner <mi...@librato.com>
>>   Librato, Inc.
>>
>>
>


-- 

  Mike Heffner <mi...@librato.com>
  Librato, Inc.

Re: Clearing tuple payloads after processing

Posted by Jon Logan <jm...@buffalo.edu>.
I've ran into a similar issue. There's been talk in the past about fixing
this, but it hasn't been. As a work around, you can actually use Reflection
to get a hold of the private "values" variable, and just call clear() on it.

https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/tuple/TupleImpl.java


On Thu, Apr 17, 2014 at 11:00 AM, Mike Heffner <mi...@librato.com> wrote:

> We have a topology that we're trying to push the throughput on as much as
> possible. While profiling the topology we found that we are holding onto a
> lot of memory in our list of tuples prior to acking them. It appears that
> most of this memory is coming from holding onto the original message
> payload in its raw format (char[] in our case). Our topology is performing
> online aggregation, so our internal tracking memory is typically quite
> small as we aggregate 1,000's of messages into a single bucket. However,
> maintaining the list of all raw tuple payloads that went into the
> aggregation bucket for the duration of our checkpointing frequency can chew
> up a significant footprint of memory.
>
> Is there a way to clear the tuple Values() after it has been processed,
> but before acking it? Our alternative solution is to try a different
> serialization format that requires a smaller payload. While this would
> potentially reduce our footprint by a good factor, it would still have
> limits. Ideally we could strip the tuple list down to only the required
> message IDs bits required for proper storm message acking.
>
> Any ideas? We are on version 0.9.0.1.
>
> Thanks,
>
> Mike
>
> --
>
>   Mike Heffner <mi...@librato.com>
>   Librato, Inc.
>
>