You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pulsar.apache.org by Brian Candler <b....@pobox.com> on 2019/10/31 18:09:08 UTC

Question about set_state / get_state

A few questions about state storage 
<https://pulsar.apache.org/docs/en/next/functions-develop/#state-storage> 
in pulsar functions.

(1) When I do putState("foo", val), what's the scope of the name "foo"?  
Just this function, or all functions in this tenant/namespace, or 
something else?

In other words, if I write a separate function which does 
getState("foo") does it access the same value or not?  What about 
functions in a different namespace or a different tenant?

(2) Is getState("foo") distinct from getCounter("foo"), or do they share 
the same namespace?

(3) Is there a CLI tool which can be used to examine state?  Or do I 
have to write a pulsar function and send queries to it via a topic?

Thanks,

Brian.


Re: Question about set_state / get_state

Posted by Brian Candler <b....@pobox.com>.
On 02/11/2019 04:11, Sijie Guo wrote:
>
>     2. Can a newly-deployed function be told to consume the compacted
>     version of a topic?
>
>
> In theory yes. However this is not exposed as well. Can you create a 
> github issue for this?
Created: https://github.com/apache/pulsar/issues/5538


>
>     4. From the command line, can I wipe *all* the state stored for a
>     function?
>
>
> The mechanism of wiping the state is in place. You can use bk tool to 
> do so. However it might be making more sense add a command to 
> pulsar-admin.
>
>
>     5. Can I list the stored state keys, so I can iterate over them?
>     (Aside:
>     I looked in the REST API documentation, but it doesn't seem to cover
>     administering functions at all)
>
>
> The mechanism is also in place. We can also add that.
Created: https://github.com/apache/pulsar/issues/5539


Thanks!

Brian.


Re: Question about set_state / get_state

Posted by Sijie Guo <gu...@gmail.com>.
Brian,

Comments inline.

On Fri, Nov 1, 2019 at 8:01 PM Brian Candler <b....@pobox.com> wrote:

> A few other function and state questions.  When there is existing data
> in a topic:
>
> 1. Does a newly-deployed function start consuming from the start of a
> topic, or the end? (Or is this selectable?)
>

A function will place a subscription on input topics it consumes from.

By default, a newly-deployed function will create a subscription
starting from latest.

You can get around by using pulsar-admin to reset the cursor.

There is a pull request to expose a setting to specify the initial position
when creating a function.

https://github.com/apache/pulsar/pull/5532



>
> 2. Can a newly-deployed function be told to consume the compacted
> version of a topic?
>

In theory yes. However this is not exposed as well. Can you create a github
issue for this?


>
> 3. Is it possible to "rewind" the subscription of a deployed function to
> a previous point in time?
>

Yes. You can use `pulsar-admin topics reset-cursor` to rewind the
subscription.


>
> 3b. If so, will this also reset stored state to how it was at that point
> in time?
>

Unfortunately you can only rewind the subscription of the input topics.

In order to also support rewinding stored state, we have to introduce
checkpointing mechanism to Pulsar Functions.


>
> 4. From the command line, can I wipe *all* the state stored for a function?
>

The mechanism of wiping the state is in place. You can use bk tool to do
so. However it might be making more sense add a command to pulsar-admin.


>
> 5. Can I list the stored state keys, so I can iterate over them? (Aside:
> I looked in the REST API documentation, but it doesn't seem to cover
> administering functions at all)
>
>
The mechanism is also in place. We can also add that.


>
> AFAICS, functions have subscription(s) to their input topic(s); indeed I
> can see them in the output of "pulsar-admin topics subscriptions
> <topicname>"
>
> So for (3) I guess it's possible to use "reset-cursor" on the
> subscription - but I'm not sure if that's safe/recommended with pulsar
> functions.  It surely wouldn't update state though (3b).


> The reason for question (3b) is I wonder if it's possible to change the
> logic in a function (e.g. make a logic fix) and re-run it over the last
> hour/day/whatever, with the state back as it was then.  That avoids
> having to re-run from the beginning of time.


> Thanks,
>
> Brian.
>
>

Re: Question about set_state / get_state

Posted by Brian Candler <b....@pobox.com>.
A few other function and state questions.  When there is existing data 
in a topic:

1. Does a newly-deployed function start consuming from the start of a 
topic, or the end? (Or is this selectable?)

2. Can a newly-deployed function be told to consume the compacted 
version of a topic?

3. Is it possible to "rewind" the subscription of a deployed function to 
a previous point in time?

3b. If so, will this also reset stored state to how it was at that point 
in time?

4. From the command line, can I wipe *all* the state stored for a function?

5. Can I list the stored state keys, so I can iterate over them? (Aside: 
I looked in the REST API documentation, but it doesn't seem to cover 
administering functions at all)


AFAICS, functions have subscription(s) to their input topic(s); indeed I 
can see them in the output of "pulsar-admin topics subscriptions 
<topicname>"

So for (3) I guess it's possible to use "reset-cursor" on the 
subscription - but I'm not sure if that's safe/recommended with pulsar 
functions.  It surely wouldn't update state though (3b).

The reason for question (3b) is I wonder if it's possible to change the 
logic in a function (e.g. make a logic fix) and re-run it over the last 
hour/day/whatever, with the state back as it was then.  That avoids 
having to re-run from the beginning of time.

Thanks,

Brian.


Re: Question about set_state / get_state

Posted by Brian Candler <b....@pobox.com>.
On 31/10/2019 22:48, Jerry Peng wrote:
>> (3) Is there a CLI tool which can be used to examine state?  Or do I have to write a pulsar function and send queries to it via a topic?
> You can use the pulsar-admin cli to interact with Pulsar function's state e.g.
>
> ./bin/pulsar-admin functions querystate

That's excellent, thank you!

$ apache-pulsar-2.4.1/bin/pulsar-admin functions querystate --name 
womble --key lastmsg
{
   "key": "lastmsg",
   "stringValue": "Hello-9",
   "version": 259
}

$ apache-pulsar-2.4.1/bin/pulsar-admin functions querystate --name 
womble --key global_count
{
   "key": "global_count",
   "numberValue": 460,
   "version": 229
}

$ apache-pulsar-2.4.1/bin/pulsar-admin functions putstate --name womble 
-s '{"key":"lastmsg", "stringValue":"blahblah"}'



Re: Question about set_state / get_state

Posted by Jerry Peng <je...@gmail.com>.
Hello Brian,

Replying to your questions inline below:

> (1) When I do putState("foo", val), what's the scope of the name "foo"?  Just this function, or all functions in this tenant/namespace, or something else?

The state is scoped to the specific function

> (2) Is getState("foo") distinct from getCounter("foo"), or do they share the same namespace?

You have already answered this.

> (3) Is there a CLI tool which can be used to examine state?  Or do I have to write a pulsar function and send queries to it via a topic?

You can use the pulsar-admin cli to interact with Pulsar function's state e.g.

./bin/pulsar-admin functions querystate

or

./bin/pulsar-admin functions putstate

Best,

Jerry

On Thu, Oct 31, 2019 at 11:36 AM Brian Candler <b....@pobox.com> wrote:
>
> On 31/10/2019 18:09, Brian Candler wrote:
> > (2) Is getState("foo") distinct from getCounter("foo"), or do they
> > share the same namespace?
>
> I've tested it, and they are: counters are just 64-bit values, big-endian.
>
>      get_counter("foo") => 460
>
>      get_state("foo") => b'\x00\x00\x00\x00\x00\x00\x01\xcc'
>
> Also, Java API has "deleteState" but Python API has "del_counter" so
> they're clearly the same thing.
>

Re: Question about set_state / get_state

Posted by Brian Candler <b....@pobox.com>.
On 31/10/2019 18:09, Brian Candler wrote:
> (2) Is getState("foo") distinct from getCounter("foo"), or do they 
> share the same namespace?

I've tested it, and they are: counters are just 64-bit values, big-endian.

     get_counter("foo") => 460

     get_state("foo") => b'\x00\x00\x00\x00\x00\x00\x01\xcc'

Also, Java API has "deleteState" but Python API has "del_counter" so 
they're clearly the same thing.