You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Pradeep Gollakota <pr...@gmail.com> on 2015/10/05 20:20:53 UTC

Dealing with large messages

Fellow Kafkaers,

We have a pretty heavyweight legacy event logging system for batch
processing. We're now sending the events into Kafka now for realtime
analytics. But we have some pretty large messages (> 40 MB).

I'm wondering if any of you have use cases where you have to send large
messages to Kafka and how you're dealing with them.

Thanks,
Pradeep

Re: Dealing with large messages

Posted by Pradeep Gollakota <pr...@gmail.com>.
Thanks for the replies!

I was rather hoping not to have to implement a side channel solution. :/

If we have to do this, we may use an HBase table with a TTL the same as our
topic so the large objects are "gc'ed"... thoughts?

On Tue, Oct 6, 2015 at 8:45 AM, Gwen Shapira <gw...@confluent.io> wrote:

> Storing large blobs in S3 or HDFS and placing URIs in Kafka is the most
> common solution I've seen in use.
>
> On Tue, Oct 6, 2015 at 8:32 AM, Joel Koshy <jj...@gmail.com> wrote:
>
> > The best practice I think is to just put large objects in a blob store
> > and have messages embed references to those blobs. Interestingly we
> > ended up having to implement large-message-support at LinkedIn but for
> > various reasons were forced to put messages inline (i.e., against the
> > above recommendation). So we ended up having to break up large
> > messages into smaller chunks. This obviously adds considerable
> > complexity to the consumer since the checkpointing can become pretty
> > complicated. There are other nuances as well - we can probably do a
> > short talk on this at an upcoming meetup.
> >
> > Joel
> >
> >
> > On Mon, Oct 5, 2015 at 9:31 PM, Rahul Jain <ra...@gmail.com> wrote:
> > > In addition to the config changes mentioned in that post, you may also
> > have
> > > to change producer config if you are using the new producer.
> > >
> > > Specifically, *max.request.size* and *request.timeout.ms
> > > <http://request.timeout.ms>* have to be increased to allow the
> producer
> > to
> > > send large messages.
> > >
> > >
> > > On 6 Oct 2015 02:02, "James Cheng" <jc...@tivo.com> wrote:
> > >
> > >> Here’s an article that Gwen wrote earlier this year on handling large
> > >> messages in Kafka.
> > >>
> > >> http://ingest.tips/2015/01/21/handling-large-messages-kafka/
> > >>
> > >> -James
> > >>
> > >> > On Oct 5, 2015, at 11:20 AM, Pradeep Gollakota <
> pradeepg26@gmail.com>
> > >> wrote:
> > >> >
> > >> > Fellow Kafkaers,
> > >> >
> > >> > We have a pretty heavyweight legacy event logging system for batch
> > >> > processing. We're now sending the events into Kafka now for realtime
> > >> > analytics. But we have some pretty large messages (> 40 MB).
> > >> >
> > >> > I'm wondering if any of you have use cases where you have to send
> > large
> > >> > messages to Kafka and how you're dealing with them.
> > >> >
> > >> > Thanks,
> > >> > Pradeep
> > >>
> > >>
> > >> ________________________________
> > >>
> > >> This email and any attachments may contain confidential and privileged
> > >> material for the sole use of the intended recipient. Any review,
> > copying,
> > >> or distribution of this email (or any attachments) by others is
> > prohibited.
> > >> If you are not the intended recipient, please contact the sender
> > >> immediately and permanently delete this email and any attachments. No
> > >> employee or agent of TiVo Inc. is authorized to conclude any binding
> > >> agreement on behalf of TiVo Inc. by email. Binding agreements with
> TiVo
> > >> Inc. may only be made by a signed written agreement.
> > >>
> >
>

Re: Dealing with large messages

Posted by Gwen Shapira <gw...@confluent.io>.
Storing large blobs in S3 or HDFS and placing URIs in Kafka is the most
common solution I've seen in use.

On Tue, Oct 6, 2015 at 8:32 AM, Joel Koshy <jj...@gmail.com> wrote:

> The best practice I think is to just put large objects in a blob store
> and have messages embed references to those blobs. Interestingly we
> ended up having to implement large-message-support at LinkedIn but for
> various reasons were forced to put messages inline (i.e., against the
> above recommendation). So we ended up having to break up large
> messages into smaller chunks. This obviously adds considerable
> complexity to the consumer since the checkpointing can become pretty
> complicated. There are other nuances as well - we can probably do a
> short talk on this at an upcoming meetup.
>
> Joel
>
>
> On Mon, Oct 5, 2015 at 9:31 PM, Rahul Jain <ra...@gmail.com> wrote:
> > In addition to the config changes mentioned in that post, you may also
> have
> > to change producer config if you are using the new producer.
> >
> > Specifically, *max.request.size* and *request.timeout.ms
> > <http://request.timeout.ms>* have to be increased to allow the producer
> to
> > send large messages.
> >
> >
> > On 6 Oct 2015 02:02, "James Cheng" <jc...@tivo.com> wrote:
> >
> >> Here’s an article that Gwen wrote earlier this year on handling large
> >> messages in Kafka.
> >>
> >> http://ingest.tips/2015/01/21/handling-large-messages-kafka/
> >>
> >> -James
> >>
> >> > On Oct 5, 2015, at 11:20 AM, Pradeep Gollakota <pr...@gmail.com>
> >> wrote:
> >> >
> >> > Fellow Kafkaers,
> >> >
> >> > We have a pretty heavyweight legacy event logging system for batch
> >> > processing. We're now sending the events into Kafka now for realtime
> >> > analytics. But we have some pretty large messages (> 40 MB).
> >> >
> >> > I'm wondering if any of you have use cases where you have to send
> large
> >> > messages to Kafka and how you're dealing with them.
> >> >
> >> > Thanks,
> >> > Pradeep
> >>
> >>
> >> ________________________________
> >>
> >> This email and any attachments may contain confidential and privileged
> >> material for the sole use of the intended recipient. Any review,
> copying,
> >> or distribution of this email (or any attachments) by others is
> prohibited.
> >> If you are not the intended recipient, please contact the sender
> >> immediately and permanently delete this email and any attachments. No
> >> employee or agent of TiVo Inc. is authorized to conclude any binding
> >> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> >> Inc. may only be made by a signed written agreement.
> >>
>

Re: Dealing with large messages

Posted by Joel Koshy <jj...@gmail.com>.
The best practice I think is to just put large objects in a blob store
and have messages embed references to those blobs. Interestingly we
ended up having to implement large-message-support at LinkedIn but for
various reasons were forced to put messages inline (i.e., against the
above recommendation). So we ended up having to break up large
messages into smaller chunks. This obviously adds considerable
complexity to the consumer since the checkpointing can become pretty
complicated. There are other nuances as well - we can probably do a
short talk on this at an upcoming meetup.

Joel


On Mon, Oct 5, 2015 at 9:31 PM, Rahul Jain <ra...@gmail.com> wrote:
> In addition to the config changes mentioned in that post, you may also have
> to change producer config if you are using the new producer.
>
> Specifically, *max.request.size* and *request.timeout.ms
> <http://request.timeout.ms>* have to be increased to allow the producer to
> send large messages.
>
>
> On 6 Oct 2015 02:02, "James Cheng" <jc...@tivo.com> wrote:
>
>> Here’s an article that Gwen wrote earlier this year on handling large
>> messages in Kafka.
>>
>> http://ingest.tips/2015/01/21/handling-large-messages-kafka/
>>
>> -James
>>
>> > On Oct 5, 2015, at 11:20 AM, Pradeep Gollakota <pr...@gmail.com>
>> wrote:
>> >
>> > Fellow Kafkaers,
>> >
>> > We have a pretty heavyweight legacy event logging system for batch
>> > processing. We're now sending the events into Kafka now for realtime
>> > analytics. But we have some pretty large messages (> 40 MB).
>> >
>> > I'm wondering if any of you have use cases where you have to send large
>> > messages to Kafka and how you're dealing with them.
>> >
>> > Thanks,
>> > Pradeep
>>
>>
>> ________________________________
>>
>> This email and any attachments may contain confidential and privileged
>> material for the sole use of the intended recipient. Any review, copying,
>> or distribution of this email (or any attachments) by others is prohibited.
>> If you are not the intended recipient, please contact the sender
>> immediately and permanently delete this email and any attachments. No
>> employee or agent of TiVo Inc. is authorized to conclude any binding
>> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
>> Inc. may only be made by a signed written agreement.
>>

Re: Dealing with large messages

Posted by Rahul Jain <ra...@gmail.com>.
In addition to the config changes mentioned in that post, you may also have
to change producer config if you are using the new producer.

Specifically, *max.request.size* and *request.timeout.ms
<http://request.timeout.ms>* have to be increased to allow the producer to
send large messages.


On 6 Oct 2015 02:02, "James Cheng" <jc...@tivo.com> wrote:

> Here’s an article that Gwen wrote earlier this year on handling large
> messages in Kafka.
>
> http://ingest.tips/2015/01/21/handling-large-messages-kafka/
>
> -James
>
> > On Oct 5, 2015, at 11:20 AM, Pradeep Gollakota <pr...@gmail.com>
> wrote:
> >
> > Fellow Kafkaers,
> >
> > We have a pretty heavyweight legacy event logging system for batch
> > processing. We're now sending the events into Kafka now for realtime
> > analytics. But we have some pretty large messages (> 40 MB).
> >
> > I'm wondering if any of you have use cases where you have to send large
> > messages to Kafka and how you're dealing with them.
> >
> > Thanks,
> > Pradeep
>
>
> ________________________________
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, copying,
> or distribution of this email (or any attachments) by others is prohibited.
> If you are not the intended recipient, please contact the sender
> immediately and permanently delete this email and any attachments. No
> employee or agent of TiVo Inc. is authorized to conclude any binding
> agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
> Inc. may only be made by a signed written agreement.
>

Re: Dealing with large messages

Posted by James Cheng <jc...@tivo.com>.
Here’s an article that Gwen wrote earlier this year on handling large messages in Kafka.

http://ingest.tips/2015/01/21/handling-large-messages-kafka/

-James

> On Oct 5, 2015, at 11:20 AM, Pradeep Gollakota <pr...@gmail.com> wrote:
>
> Fellow Kafkaers,
>
> We have a pretty heavyweight legacy event logging system for batch
> processing. We're now sending the events into Kafka now for realtime
> analytics. But we have some pretty large messages (> 40 MB).
>
> I'm wondering if any of you have use cases where you have to send large
> messages to Kafka and how you're dealing with them.
>
> Thanks,
> Pradeep


________________________________

This email and any attachments may contain confidential and privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments) by others is prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete this email and any attachments. No employee or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.