You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Mark Mindenhall (JIRA)" <ji...@apache.org> on 2016/10/31 18:37:58 UTC
[jira] [Created] (SAMZA-1044) Checkpointing requires
log.cleaner.enable=true
Mark Mindenhall created SAMZA-1044:
--------------------------------------
Summary: Checkpointing requires log.cleaner.enable=true
Key: SAMZA-1044
URL: https://issues.apache.org/jira/browse/SAMZA-1044
Project: Samza
Issue Type: Bug
Components: docs
Environment: linux
Reporter: Mark Mindenhall
Priority: Minor
We're running Samza 0.9.1 with kafka 0.8.2.1, which has a default setting of {{log.cleaner.enable=false}}. We didn't think we needed to enable this, as we never created any topics with {{cleanup.policy=compact}}. However, this morning we had a disk alert, and when I took a look on the broker that triggered the alert, one of the Samza checkpoint topics was consuming 29GB within the {{/logs}} folder.
Long story short, I eventually figured out that all of the checkpoint topics were created with {{cleanup.policy=compact}}, and were growing unbounded. I set {{log.cleaner.enable=true}} on each broker, and restarted them. Within minutes, the 29GB was reduced to a 200-300KB.
I thought I must have missed this when I created our jobs with checkpointing enabled, so I went and scoured the docs. There's no mention of the {{log.cleaner.enable}} setting within the documentation (unless I missed it _again_).
I should add that we've been running most of these jobs for about a year, and I noticed that each time we would deploy, it would take longer and longer to transition from {{ACCEPTED}} to {{RUNNING}} in the YARN cluster. Eventually, it was taking 10-15 minutes per job, and we didn't understand why. After bouncing our staging cluster with {{log.cleaner.enable=true}} (and letting the log cleaner finish its work), I redeployed one of our jobs, and it once again took 15-20 seconds from {{ACCEPTED}} to {{RUNNING}}.
Please mention in the documentation that {{log.cleaner.enable}} must be set to {{true}} for checkpointing to work correctly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)