You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Tao Feng (JIRA)" <ji...@apache.org> on 2015/12/19 00:01:46 UTC

[jira] [Commented] (SAMZA-845) Reduce memory footprint for SystemConsumer

    [ https://issues.apache.org/jira/browse/SAMZA-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064919#comment-15064919 ] 

Tao Feng commented on SAMZA-845:
--------------------------------

Take a look at the code: when SystemConsumers.poll is called(call from choose method), it will call the underneath SystemConsumer(KafkaSystemConsumer)  poll method to hand over the messages. In the BlockingEnvelopeMap(KafkaSystemConsumer helper class) poll method, actually the messages will be drained from related SSP's queue which make me question whether two copies issue mentioned in SAMZA-245 and SAMZA-775 exists or not. Any suggestions?

> Reduce memory footprint for SystemConsumer
> ------------------------------------------
>
>                 Key: SAMZA-845
>                 URL: https://issues.apache.org/jira/browse/SAMZA-845
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Tao Feng
>
> Currently KafkaSystemConsumer will by default prefetch 50000 messages which is introduced in SAMZA-203. And according to Chris's comment in SAMZA-245 and SAMZA-775 comment, each message potentially will be buffered twice, one in KafkaSystemConsumer(bufferedMessages) and one in SystemConsumers(unprocessedMessagesBySSP). If each message is around 10k byte, we need to have 10k*50k*2 memory to buffer according to the comment. 
> The reason we need to buffer twice is that BrokerProxy will actively fetch message if the total number of messages  below the fetchThreshold(50000) to avoid potentially message latency performance issue and insert into KafkaSystemConsumer's bufferedMessages queue. Whenever message choosing is happen(SystemConsumers.choose is called), it will fetch messages from the bufferedMessages and insert into  its own buffer "unprocessedMessagesBySSP" and later handle over to the streamTask for processing. 
> It will be good to reduce the memory footprint here if this is the case.  I would like to hear from others about whether this is an issue and would like myself to tab on this if this is the case. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)