You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Alejandro Abdelnur <tu...@gmail.com> on 2018/08/01 19:34:56 UTC

Subscriber seek earliest or latest only if subscriber never did an ack before

Hello,

I'm working on a Pulsar (Java) client application that creates a subscriber
to a topic.

The client application is stateless, it does not keep a state other than
the configuration for the subscriber (pulsar connection info, topic, name,
etc). If the client application dies a new one will be bootstrapped
(potentially in another host/container/VM) to continue with the processing
of topic messages.

Because pulsar keeps track of the last messageId delivered and ack-ed to/by
the client, the client application failover described in the previous
paragraph it works just fine. Well, almost ...

Depending on the use case, on the first creation of the subscriber, the
client application needs to process all existing messages available in the
topic or only new messages.

Using the Subscriber seek MessageId.earliest or to MesageId.latest I can
indicate from where I can start on the first run of the client application.
And, on subsequent runs, in order to continue from next message to the last
ack-ed one, I simply have to start consuming without doing a seek.

This means that I have to keep a state on the client application to know if
its the first run or not. As the client application is meant to be
stateless this complicates things.

In Kafka-land the 'auto.offset.reset' configuration allows to handle this
use case because the 'auto.offset.reset' is only honored if there consumer
group does not have a topic offset in the Kafka cluster.

Am I missing some some configuration to be able to do this in Pulsar?

Is it a Pulsar limitation?

If it is a Pulsar limitation, are the plans to address it?

Please let me know and I'll create a JIRA if it corresponds and try to
work/contribute to it.

Thanks and regards.

Alejandro

Re: Subscriber seek earliest or latest only if subscriber never did an ack before

Posted by Alejandro Abdelnur <tu...@gmail.com>.
Thanks Matteo, I've missed that method.

On Wed, Aug 1, 2018 at 10:11 PM Matteo Merli <ma...@gmail.com> wrote:

> Hi Alejandro,
>
> It is possible to specify the position where a subscription should be
> initialized (if it doesn't exist).
>
> See `ConsumerBuilder.subscriptionInitialPosition()`
>
> http://pulsar.apache.org/api/client/org/apache/pulsar/client/api/ConsumerBuilder.html#subscriptionInitialPosition-org.apache.pulsar.client.api.SubscriptionInitialPosition-
>
> The subscription can be either initialized on "earliest" or "latest"
> (default).
>
> Also keep in mind that, by default, data is not retained in Pulsar if
> there is no subscription.
> To enable data retention see:
>
> bin/pulsar-admin namespaces set-retention $MY_NAMESPACE --size 100G --time
> 1d
>
> (it's also possible to set that as the default behavior in broker.conf)
>
>
> Matteo
>
>
>
> On Wed, Aug 1, 2018 at 7:35 PM Alejandro Abdelnur <tu...@gmail.com>
> wrote:
>
>> Hello,
>>
>> I'm working on a Pulsar (Java) client application that creates a
>> subscriber to a topic.
>>
>> The client application is stateless, it does not keep a state other than
>> the configuration for the subscriber (pulsar connection info, topic, name,
>> etc). If the client application dies a new one will be bootstrapped
>> (potentially in another host/container/VM) to continue with the processing
>> of topic messages.
>>
>> Because pulsar keeps track of the last messageId delivered and ack-ed
>> to/by the client, the client application failover described in the previous
>> paragraph it works just fine. Well, almost ...
>>
>> Depending on the use case, on the first creation of the subscriber, the
>> client application needs to process all existing messages available in the
>> topic or only new messages.
>>
>> Using the Subscriber seek MessageId.earliest or to MesageId.latest I can
>> indicate from where I can start on the first run of the client application.
>> And, on subsequent runs, in order to continue from next message to the last
>> ack-ed one, I simply have to start consuming without doing a seek.
>>
>> This means that I have to keep a state on the client application to know
>> if its the first run or not. As the client application is meant to be
>> stateless this complicates things.
>>
>> In Kafka-land the 'auto.offset.reset' configuration allows to handle this
>> use case because the 'auto.offset.reset' is only honored if there consumer
>> group does not have a topic offset in the Kafka cluster.
>>
>> Am I missing some some configuration to be able to do this in Pulsar?
>>
>> Is it a Pulsar limitation?
>>
>> If it is a Pulsar limitation, are the plans to address it?
>>
>> Please let me know and I'll create a JIRA if it corresponds and try to
>> work/contribute to it.
>>
>> Thanks and regards.
>>
>> Alejandro
>>
>> --
> Matteo Merli
> <mm...@apache.org>
>

Re: Subscriber seek earliest or latest only if subscriber never did an ack before

Posted by Matteo Merli <ma...@gmail.com>.
Hi Alejandro,

It is possible to specify the position where a subscription should be
initialized (if it doesn't exist).

See `ConsumerBuilder.subscriptionInitialPosition()`
http://pulsar.apache.org/api/client/org/apache/pulsar/client/api/ConsumerBuilder.html#subscriptionInitialPosition-org.apache.pulsar.client.api.SubscriptionInitialPosition-

The subscription can be either initialized on "earliest" or "latest"
(default).

Also keep in mind that, by default, data is not retained in Pulsar if there
is no subscription.
To enable data retention see:

bin/pulsar-admin namespaces set-retention $MY_NAMESPACE --size 100G --time
1d

(it's also possible to set that as the default behavior in broker.conf)


Matteo



On Wed, Aug 1, 2018 at 7:35 PM Alejandro Abdelnur <tu...@gmail.com> wrote:

> Hello,
>
> I'm working on a Pulsar (Java) client application that creates a
> subscriber to a topic.
>
> The client application is stateless, it does not keep a state other than
> the configuration for the subscriber (pulsar connection info, topic, name,
> etc). If the client application dies a new one will be bootstrapped
> (potentially in another host/container/VM) to continue with the processing
> of topic messages.
>
> Because pulsar keeps track of the last messageId delivered and ack-ed
> to/by the client, the client application failover described in the previous
> paragraph it works just fine. Well, almost ...
>
> Depending on the use case, on the first creation of the subscriber, the
> client application needs to process all existing messages available in the
> topic or only new messages.
>
> Using the Subscriber seek MessageId.earliest or to MesageId.latest I can
> indicate from where I can start on the first run of the client application.
> And, on subsequent runs, in order to continue from next message to the last
> ack-ed one, I simply have to start consuming without doing a seek.
>
> This means that I have to keep a state on the client application to know
> if its the first run or not. As the client application is meant to be
> stateless this complicates things.
>
> In Kafka-land the 'auto.offset.reset' configuration allows to handle this
> use case because the 'auto.offset.reset' is only honored if there consumer
> group does not have a topic offset in the Kafka cluster.
>
> Am I missing some some configuration to be able to do this in Pulsar?
>
> Is it a Pulsar limitation?
>
> If it is a Pulsar limitation, are the plans to address it?
>
> Please let me know and I'll create a JIRA if it corresponds and try to
> work/contribute to it.
>
> Thanks and regards.
>
> Alejandro
>
> --
Matteo Merli
<mm...@apache.org>