You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Malcolm McFarland <mm...@cavulus.com> on 2019/09/25 22:14:26 UTC

Questions about using custom groupers

Hey folks,

We implemented a custom grouper several months ago to do some basic
stats collection prior to startup. After our most recent restart, I
started seeing this error in our system:

Grouper mismatch. Configured:
com.cavulus.grouper.OurSystemStreamPartitionGrouperFactory Actual:
org.apache.samza.container.grouper.stream.GroupByPartitionFactory

Nothing has changed in either our Kafka cluster or our grouper code in
many months. In sleuthing out the cause, it occurred to me that
perhaps a data retention or cleanup policy was causing older messages
in the *_checkpoint_* or *_coordinator_* topics to be removed.

I have two questions:

1) Where within these queues is the grouper configuration stored?
2) Would a Kafka topic cleanup.policy of "compact" cause trouble here?

Cheers,
Malcolm McFarland
Cavulus


This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
unauthorized or improper disclosure, copying, distribution, or use of
the contents of this message is prohibited. The information contained
in this message is intended only for the personal and confidential use
of the recipient(s) named above. If you have received this message in
error, please notify the sender immediately and delete the original
message.

Re: Questions about using custom groupers

Posted by Yi Pan <ni...@gmail.com>.
HI, Malcolm,

The configuration should be in *_coordinator_* topic. If your configuration
of cleanup policy for this topic is compact only, you should not lose the
configuration. If your configuration on this topic is a combination of
compact + time retention (i.e. newer Kafka version on the broker side
enabled this feature), you may lose your configuration.

-Yi

On Wed, Sep 25, 2019 at 3:14 PM Malcolm McFarland <mm...@cavulus.com>
wrote:

> Hey folks,
>
> We implemented a custom grouper several months ago to do some basic
> stats collection prior to startup. After our most recent restart, I
> started seeing this error in our system:
>
> Grouper mismatch. Configured:
> com.cavulus.grouper.OurSystemStreamPartitionGrouperFactory Actual:
> org.apache.samza.container.grouper.stream.GroupByPartitionFactory
>
> Nothing has changed in either our Kafka cluster or our grouper code in
> many months. In sleuthing out the cause, it occurred to me that
> perhaps a data retention or cleanup policy was causing older messages
> in the *_checkpoint_* or *_coordinator_* topics to be removed.
>
> I have two questions:
>
> 1) Where within these queues is the grouper configuration stored?
> 2) Would a Kafka topic cleanup.policy of "compact" cause trouble here?
>
> Cheers,
> Malcolm McFarland
> Cavulus
>
>
> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> unauthorized or improper disclosure, copying, distribution, or use of
> the contents of this message is prohibited. The information contained
> in this message is intended only for the personal and confidential use
> of the recipient(s) named above. If you have received this message in
> error, please notify the sender immediately and delete the original
> message.
>