You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pulsar.apache.org by jun aoki <ja...@apache.org> on 2018/05/06 06:42:25 UTC

Re: Any stream API for pulsar apps?

(I switched to use users@. I should've started so)
Thank you Sanjeev for explaining. It is exciting.
Is it true that pulsar cluster does some kind of resource
management/scheduling?
For example, when there are many functions registered and many messages
come in, does each function gets certain resource (memory/cpu per node)
allocations, dynamically, based on its tenant/namespace, similar way to
what Spark does?

On Sat, May 5, 2018 at 12:04 AM, Sanjeev Kulkarni <sa...@gmail.com>
wrote:

> Hi,
> While introducing stream native computing in Pulsar, we wanted to start
> with the most basic primitives aka functions. Thus 2.0 added Pulsar
> Functions. This
> <https://pulsar.incubator.apache.org/docs/latest/functions/quickstart/> is
> a good starting page to get started with functions. Advanced libraries
> based on the functions primitive(like WindowFunction and streamlet apis)
> are targetted for 2.1.
> Thanks!
>
> On Fri, May 4, 2018 at 10:53 PM, jun aoki <ja...@apache.org> wrote:
>
> > I did a 10 min google search for pulsar stream API, something comparable
> to
> > kafka stream API. (https://kafka.apache.org/documentation/streams/)
> > but I haven't been able to find any. Is there such thing exists? Or am I
> > conceptually mistaken?
> >
> > --
> > -jun
> >
>

-- 
-jun

Re: Any stream API for pulsar apps?

Posted by jun aoki <ja...@apache.org>.

Thank you Matteo! Just for the others; I read
https://pulsar.incubator.apache.org/docs/latest/getting-started/ConceptsAndArchitecture/#subscription-modes
and learned the subscription modes.

On Sun, May 6, 2018 at 3:37 PM, Matteo Merli <ma...@gmail.com> wrote:

> > Do you know if the number of pulsar functions with localrun can be
> seamlessly increased?
>
> Yes, you can just spawn more processes (or shut down some of them)
>
> > Would there be a downtime (rebalancing etc.)?
>
> There is no downtime involved. The "rebalancing" of consumers happens
> immediately.
> If you're using a "Shared' subscription, adding more consumers just
> changes the dispatch schedule to include the newly added consumers.
>
> If it's on a "Failover" subscription, the brokers will update the list of
> available and active consumers. This decision is taken by the broker who's
> serving a particular partition, without coordination with other brokers,
> though the algorithm ensures partitions are evenly assigned to consumers.
> By default, there is a 1 second artificial delay (configurable, even to 0),
> before starting the delivery on the new consumer, to avoid biggest portion
> of duplicated messages when switching active consumer.
>
> Matteo
>
> On Sun, May 6, 2018 at 2:58 PM jun aoki <ja...@apache.org> wrote:
>
>> Hi Jerry, thank you for sharing the information.
>>
>> Do you know if the number of pulsar functions with localrun can be
>> seamlessly increased?
>> Say, A single instance (or kube pod) of FunctionA with localrun is
>> consuming a topic and all the sudden (there may be a peak time for my
>> service) I need to increase it to 10 instances. Would there be a downtime
>> (rebalancing etc.)? Is the seamlessness same for degrading the number of
>> function instances?
>>
>>
>> On Sat, May 5, 2018 at 11:49 PM, Jerry Peng <je...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Currently, submitting a Pulsar functions to run in an existing Pulsar
>>> cluster does not support resource allocations/enforcements or quotas.  We
>>> may add such features in the future.  However, you can run functions in
>>> existing Kubernetes clusters via localrun mode.  Kubernetes will provide
>>> the resource allocations and enforcement mechanisms.
>>>
>>> On Sat, May 5, 2018 at 11:42 PM jun aoki <ja...@apache.org> wrote:
>>>
>>> > (I switched to use users@. I should've started so)
>>> > Thank you Sanjeev for explaining. It is exciting.
>>> > Is it true that pulsar cluster does some kind of resource
>>> management/scheduling?
>>> > For example, when there are many functions registered and many messages
>>> come in, does each function gets certain resource (memory/cpu per node)
>>> allocations, dynamically, based on its tenant/namespace, similar way to
>>> what Spark does?
>>>
>>> > On Sat, May 5, 2018 at 12:04 AM, Sanjeev Kulkarni <sanjeevrk@gmail.com
>>> >
>>> wrote:
>>>
>>> >> Hi,
>>> >> While introducing stream native computing in Pulsar, we wanted to
>>> start
>>> >> with the most basic primitives aka functions. Thus 2.0 added Pulsar
>>> >> Functions. This
>>> >> <https://pulsar.incubator.apache.org/docs/latest/
>>> functions/quickstart/>
>>> is
>>> >> a good starting page to get started with functions. Advanced libraries
>>> >> based on the functions primitive(like WindowFunction and streamlet
>>> apis)
>>> >> are targetted for 2.1.
>>> >> Thanks!
>>>
>>> >> On Fri, May 4, 2018 at 10:53 PM, jun aoki <ja...@apache.org> wrote:
>>>
>>> >> > I did a 10 min google search for pulsar stream API, something
>>> comparable to
>>> >> > kafka stream API. (https://kafka.apache.org/documentation/streams/)
>>> >> > but I haven't been able to find any. Is there such thing exists? Or
>>> am
>>> I
>>> >> > conceptually mistaken?
>>> >> >
>>> >> > --
>>> >> > -jun
>>> >> >
>>>
>>>
>>>
>>>
>>> > --
>>> > -jun
>>>
>>
>>
>>
>> --
>> -jun
>>
> --
> Matteo Merli
> <mm...@apache.org>
>



-- 
-jun

Re: Any stream API for pulsar apps?

Posted by Matteo Merli <ma...@gmail.com>.

> Do you know if the number of pulsar functions with localrun can be
seamlessly increased?

Yes, you can just spawn more processes (or shut down some of them)

> Would there be a downtime (rebalancing etc.)?

There is no downtime involved. The "rebalancing" of consumers happens
immediately.
If you're using a "Shared' subscription, adding more consumers just changes
the dispatch schedule to include the newly added consumers.

If it's on a "Failover" subscription, the brokers will update the list of
available and active consumers. This decision is taken by the broker who's
serving a particular partition, without coordination with other brokers,
though the algorithm ensures partitions are evenly assigned to consumers.
By default, there is a 1 second artificial delay (configurable, even to 0),
before starting the delivery on the new consumer, to avoid biggest portion
of duplicated messages when switching active consumer.

Matteo

On Sun, May 6, 2018 at 2:58 PM jun aoki <ja...@apache.org> wrote:

> Hi Jerry, thank you for sharing the information.
>
> Do you know if the number of pulsar functions with localrun can be
> seamlessly increased?
> Say, A single instance (or kube pod) of FunctionA with localrun is
> consuming a topic and all the sudden (there may be a peak time for my
> service) I need to increase it to 10 instances. Would there be a downtime
> (rebalancing etc.)? Is the seamlessness same for degrading the number of
> function instances?
>
>
> On Sat, May 5, 2018 at 11:49 PM, Jerry Peng <je...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Currently, submitting a Pulsar functions to run in an existing Pulsar
>> cluster does not support resource allocations/enforcements or quotas.  We
>> may add such features in the future.  However, you can run functions in
>> existing Kubernetes clusters via localrun mode.  Kubernetes will provide
>> the resource allocations and enforcement mechanisms.
>>
>> On Sat, May 5, 2018 at 11:42 PM jun aoki <ja...@apache.org> wrote:
>>
>> > (I switched to use users@. I should've started so)
>> > Thank you Sanjeev for explaining. It is exciting.
>> > Is it true that pulsar cluster does some kind of resource
>> management/scheduling?
>> > For example, when there are many functions registered and many messages
>> come in, does each function gets certain resource (memory/cpu per node)
>> allocations, dynamically, based on its tenant/namespace, similar way to
>> what Spark does?
>>
>> > On Sat, May 5, 2018 at 12:04 AM, Sanjeev Kulkarni <sa...@gmail.com>
>> wrote:
>>
>> >> Hi,
>> >> While introducing stream native computing in Pulsar, we wanted to start
>> >> with the most basic primitives aka functions. Thus 2.0 added Pulsar
>> >> Functions. This
>> >> <https://pulsar.incubator.apache.org/docs/latest/functions/quickstart/
>> >
>> is
>> >> a good starting page to get started with functions. Advanced libraries
>> >> based on the functions primitive(like WindowFunction and streamlet
>> apis)
>> >> are targetted for 2.1.
>> >> Thanks!
>>
>> >> On Fri, May 4, 2018 at 10:53 PM, jun aoki <ja...@apache.org> wrote:
>>
>> >> > I did a 10 min google search for pulsar stream API, something
>> comparable to
>> >> > kafka stream API. (https://kafka.apache.org/documentation/streams/)
>> >> > but I haven't been able to find any. Is there such thing exists? Or
>> am
>> I
>> >> > conceptually mistaken?
>> >> >
>> >> > --
>> >> > -jun
>> >> >
>>
>>
>>
>>
>> > --
>> > -jun
>>
>
>
>
> --
> -jun
>
-- 
Matteo Merli
<mm...@apache.org>

Re: Any stream API for pulsar apps?

Posted by jun aoki <ja...@apache.org>.

Hi Jerry, thank you for sharing the information.

Do you know if the number of pulsar functions with localrun can be
seamlessly increased?
Say, A single instance (or kube pod) of FunctionA with localrun is
consuming a topic and all the sudden (there may be a peak time for my
service) I need to increase it to 10 instances. Would there be a downtime
(rebalancing etc.)? Is the seamlessness same for degrading the number of
function instances?


On Sat, May 5, 2018 at 11:49 PM, Jerry Peng <je...@gmail.com>
wrote:

> Hi,
>
> Currently, submitting a Pulsar functions to run in an existing Pulsar
> cluster does not support resource allocations/enforcements or quotas.  We
> may add such features in the future.  However, you can run functions in
> existing Kubernetes clusters via localrun mode.  Kubernetes will provide
> the resource allocations and enforcement mechanisms.
>
> On Sat, May 5, 2018 at 11:42 PM jun aoki <ja...@apache.org> wrote:
>
> > (I switched to use users@. I should've started so)
> > Thank you Sanjeev for explaining. It is exciting.
> > Is it true that pulsar cluster does some kind of resource
> management/scheduling?
> > For example, when there are many functions registered and many messages
> come in, does each function gets certain resource (memory/cpu per node)
> allocations, dynamically, based on its tenant/namespace, similar way to
> what Spark does?
>
> > On Sat, May 5, 2018 at 12:04 AM, Sanjeev Kulkarni <sa...@gmail.com>
> wrote:
>
> >> Hi,
> >> While introducing stream native computing in Pulsar, we wanted to start
> >> with the most basic primitives aka functions. Thus 2.0 added Pulsar
> >> Functions. This
> >> <https://pulsar.incubator.apache.org/docs/latest/functions/quickstart/>
> is
> >> a good starting page to get started with functions. Advanced libraries
> >> based on the functions primitive(like WindowFunction and streamlet apis)
> >> are targetted for 2.1.
> >> Thanks!
>
> >> On Fri, May 4, 2018 at 10:53 PM, jun aoki <ja...@apache.org> wrote:
>
> >> > I did a 10 min google search for pulsar stream API, something
> comparable to
> >> > kafka stream API. (https://kafka.apache.org/documentation/streams/)
> >> > but I haven't been able to find any. Is there such thing exists? Or am
> I
> >> > conceptually mistaken?
> >> >
> >> > --
> >> > -jun
> >> >
>
>
>
>
> > --
> > -jun
>



-- 
-jun

Re: Any stream API for pulsar apps?

Posted by Jerry Peng <je...@gmail.com>.

Hi,

Currently, submitting a Pulsar functions to run in an existing Pulsar
cluster does not support resource allocations/enforcements or quotas.  We
may add such features in the future.  However, you can run functions in
existing Kubernetes clusters via localrun mode.  Kubernetes will provide
the resource allocations and enforcement mechanisms.

On Sat, May 5, 2018 at 11:42 PM jun aoki <ja...@apache.org> wrote:

> (I switched to use users@. I should've started so)
> Thank you Sanjeev for explaining. It is exciting.
> Is it true that pulsar cluster does some kind of resource
management/scheduling?
> For example, when there are many functions registered and many messages
come in, does each function gets certain resource (memory/cpu per node)
allocations, dynamically, based on its tenant/namespace, similar way to
what Spark does?

> On Sat, May 5, 2018 at 12:04 AM, Sanjeev Kulkarni <sa...@gmail.com>
wrote:

>> Hi,
>> While introducing stream native computing in Pulsar, we wanted to start
>> with the most basic primitives aka functions. Thus 2.0 added Pulsar
>> Functions. This
>> <https://pulsar.incubator.apache.org/docs/latest/functions/quickstart/>
is
>> a good starting page to get started with functions. Advanced libraries
>> based on the functions primitive(like WindowFunction and streamlet apis)
>> are targetted for 2.1.
>> Thanks!

>> On Fri, May 4, 2018 at 10:53 PM, jun aoki <ja...@apache.org> wrote:

>> > I did a 10 min google search for pulsar stream API, something
comparable to
>> > kafka stream API. (https://kafka.apache.org/documentation/streams/)
>> > but I haven't been able to find any. Is there such thing exists? Or am
I
>> > conceptually mistaken?
>> >
>> > --
>> > -jun
>> >




> --
> -jun