You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@openwhisk.apache.org by Michele Sciabarra <mi...@sciabarra.com> on 2019/01/09 13:12:21 UTC

Persistent Kafka Connections

Hello whiskers,

I am working on a simple webchat based on OpenWhisk  (full disclosure; it is for the book on OpenWhisk for O'Reilly!) and I am using MessageHub.

However in the documentation for the messaging package I read that the /whisk.system/messaging/messageHubProduce is deprecated, and the recommended way is to use a persistent connection to Kafka.

This statement is puzzling me.  My first idea is "why do not they cache the connection?", but I checked the code for messageHubProduce.py here:

https://github.com/apache/incubator-openwhisk-package-kafka/blob/master/action/messageHubProduce.py

and looks like the producer is cached, so I wonder if the deprecation of messageHubProduce is still valid.

Furthermore, I wrote an action to send messages to MessageHub with the idea of implementing a cached connection, and apparently works:

https://github.com/learning-apache-openwhisk/chapter11-messages/blob/master/src/persistent/main/send.go

So the question is: is this the correct way to send messages to Kafka (caching connnections?) Can I recommend this practice in the book?


-- 
  Michele Sciabarra
  michele@sciabarra.com

Re: Persistent Kafka Connections

Posted by Markus Thömmes <ma...@apache.org>.

Hi,

with Tyson's concurrency > 1 support we should be able to rehash the
requirements here. A single action that supports a high concurrency on one
shared connection should fulfill what's needed by the Kafka backend teams
here.

Outscaling a backend through a lot of functions is a common problem in the
FaaS space btw @Michele. It might well worth be mentioning that in your
book as a common pitfall.

Cheers,
Markus

Am Mi., 9. Jan. 2019 um 15:58 Uhr schrieb Michele Sciabarra <
michele@sciabarra.com>:

> This is also what I was thinking! But what I should recommend?
>
> I was thinking a crazy approach: create a NON-OPENWHISK service for
> sending messages
> for example a websocket service, use that to send messages to kafka and
> handle the answers in OpenWhisk.
>
> THere is a project that makes very simple to create websocket servers:
>
> http://websocketd.com/
>
> The nice this is that this websocket server works in a way very similar to
> the actioloop, so in principle I could write an action in the same way I do
> it in OpenWhisk  but instead of running it in OpenWhisk I can run that
> action in a Kubernetes cluster in order to serve a websocket.
>
> I wonder if I should write this in an OpenWhisk book though. Probably it
> makes sense to show the limit  of OpenWhisk and how to complement it with
> Kubernetes...
>
> --
>   Michele Sciabarra
>   michele@sciabarra.com
>
> ----- Original message -----
> From: Carlos Santana <cs...@gmail.com>
> To: dev@openwhisk.apache.org
> Subject: Re: Persistent Kafka Connections
> Date: Wed, 9 Jan 2019 09:32:17 -0500
>
> Is one of those things that works well for a hello world low scale
> deployment.
>
> In practice we saw a lot of actions running in parallel from multiple
> tenants and all of these even each one doing their caching were opening too
> many new socket connection to a kafka service and eventually the kafka
> service started to throttle and rate limit the number of new connections.
>
> This is the reason the code it’s there in the repo but deprecated and not
> recommended. From talking to kafka experts in ibm it was recommended from
> them that having stateless clients doing a lot of new connection it is a
> antipattern
>
> So they recommended to try to have a low number of connections open and
> high thruput for each of these connection, this means if you wan to produce
> from actions at scale and multitenant the actions would need to go thru
> some type of broker via http and this broker is a stateful service that
> maintains the socket connection to kafka service.
>
> — Carlos
>
> On Wed, Jan 9, 2019 at 8:12 AM Michele Sciabarra <mi...@sciabarra.com>
> wrote:
>
> > Hello whiskers,
> >
> > I am working on a simple webchat based on OpenWhisk  (full disclosure; it
> > is for the book on OpenWhisk for O'Reilly!) and I am using MessageHub.
> >
> > However in the documentation for the messaging package I read that the
> > /whisk.system/messaging/messageHubProduce is deprecated, and the
> > recommended way is to use a persistent connection to Kafka.
> >
> > This statement is puzzling me.  My first idea is "why do not they cache
> > the connection?", but I checked the code for messageHubProduce.py here:
> >
> >
> >
> https://github.com/apache/incubator-openwhisk-package-kafka/blob/master/action/messageHubProduce.py
> >
> > and looks like the producer is cached, so I wonder if the deprecation of
> > messageHubProduce is still valid.
> >
> > Furthermore, I wrote an action to send messages to MessageHub with the
> > idea of implementing a cached connection, and apparently works:
> >
> >
> >
> https://github.com/learning-apache-openwhisk/chapter11-messages/blob/master/src/persistent/main/send.go
> >
> > So the question is: is this the correct way to send messages to Kafka
> > (caching connnections?) Can I recommend this practice in the book?
> >
> >
> > --
> >   Michele Sciabarra
> >   michele@sciabarra.com
> >
> --
> Carlos Santana
> <cs...@gmail.com>
>

Re: Persistent Kafka Connections

Posted by Carlos Santana <cs...@gmail.com>.

Yeah it should be a very simple service to serve as a network transport
from your websocket clients to kafka service.

It can be in a PaaS like cloud foundry or a simple container on Kubernetes
like you suggested.

On Wed, Jan 9, 2019 at 9:58 AM Michele Sciabarra <mi...@sciabarra.com>
wrote:

> This is also what I was thinking! But what I should recommend?
>
> I was thinking a crazy approach: create a NON-OPENWHISK service for
> sending messages
> for example a websocket service, use that to send messages to kafka and
> handle the answers in OpenWhisk.
>
> THere is a project that makes very simple to create websocket servers:
>
> http://websocketd.com/
>
> The nice this is that this websocket server works in a way very similar to
> the actioloop, so in principle I could write an action in the same way I do
> it in OpenWhisk  but instead of running it in OpenWhisk I can run that
> action in a Kubernetes cluster in order to serve a websocket.
>
> I wonder if I should write this in an OpenWhisk book though. Probably it
> makes sense to show the limit  of OpenWhisk and how to complement it with
> Kubernetes...
>
> --
>   Michele Sciabarra
>   michele@sciabarra.com
>
> ----- Original message -----
> From: Carlos Santana <cs...@gmail.com>
> To: dev@openwhisk.apache.org
> Subject: Re: Persistent Kafka Connections
> Date: Wed, 9 Jan 2019 09:32:17 -0500
>
> Is one of those things that works well for a hello world low scale
> deployment.
>
> In practice we saw a lot of actions running in parallel from multiple
> tenants and all of these even each one doing their caching were opening too
> many new socket connection to a kafka service and eventually the kafka
> service started to throttle and rate limit the number of new connections.
>
> This is the reason the code it’s there in the repo but deprecated and not
> recommended. From talking to kafka experts in ibm it was recommended from
> them that having stateless clients doing a lot of new connection it is a
> antipattern
>
> So they recommended to try to have a low number of connections open and
> high thruput for each of these connection, this means if you wan to produce
> from actions at scale and multitenant the actions would need to go thru
> some type of broker via http and this broker is a stateful service that
> maintains the socket connection to kafka service.
>
> — Carlos
>
> On Wed, Jan 9, 2019 at 8:12 AM Michele Sciabarra <mi...@sciabarra.com>
> wrote:
>
> > Hello whiskers,
> >
> > I am working on a simple webchat based on OpenWhisk  (full disclosure; it
> > is for the book on OpenWhisk for O'Reilly!) and I am using MessageHub.
> >
> > However in the documentation for the messaging package I read that the
> > /whisk.system/messaging/messageHubProduce is deprecated, and the
> > recommended way is to use a persistent connection to Kafka.
> >
> > This statement is puzzling me.  My first idea is "why do not they cache
> > the connection?", but I checked the code for messageHubProduce.py here:
> >
> >
> >
> https://github.com/apache/incubator-openwhisk-package-kafka/blob/master/action/messageHubProduce.py
> >
> > and looks like the producer is cached, so I wonder if the deprecation of
> > messageHubProduce is still valid.
> >
> > Furthermore, I wrote an action to send messages to MessageHub with the
> > idea of implementing a cached connection, and apparently works:
> >
> >
> >
> https://github.com/learning-apache-openwhisk/chapter11-messages/blob/master/src/persistent/main/send.go
> >
> > So the question is: is this the correct way to send messages to Kafka
> > (caching connnections?) Can I recommend this practice in the book?
> >
> >
> > --
> >   Michele Sciabarra
> >   michele@sciabarra.com
> >
> --
> Carlos Santana
> <cs...@gmail.com>
>
-- 
Carlos Santana
<cs...@gmail.com>

Re: Persistent Kafka Connections

Posted by Michele Sciabarra <mi...@sciabarra.com>.

This is also what I was thinking! But what I should recommend?

I was thinking a crazy approach: create a NON-OPENWHISK service for sending messages
for example a websocket service, use that to send messages to kafka and handle the answers in OpenWhisk.

THere is a project that makes very simple to create websocket servers:

http://websocketd.com/

The nice this is that this websocket server works in a way very similar to the actioloop, so in principle I could write an action in the same way I do it in OpenWhisk  but instead of running it in OpenWhisk I can run that action in a Kubernetes cluster in order to serve a websocket.

I wonder if I should write this in an OpenWhisk book though. Probably it makes sense to show the limit  of OpenWhisk and how to complement it with Kubernetes...

-- 
  Michele Sciabarra
  michele@sciabarra.com

----- Original message -----
From: Carlos Santana <cs...@gmail.com>
To: dev@openwhisk.apache.org
Subject: Re: Persistent Kafka Connections
Date: Wed, 9 Jan 2019 09:32:17 -0500

Is one of those things that works well for a hello world low scale
deployment.

In practice we saw a lot of actions running in parallel from multiple
tenants and all of these even each one doing their caching were opening too
many new socket connection to a kafka service and eventually the kafka
service started to throttle and rate limit the number of new connections.

This is the reason the code it’s there in the repo but deprecated and not
recommended. From talking to kafka experts in ibm it was recommended from
them that having stateless clients doing a lot of new connection it is a
antipattern

So they recommended to try to have a low number of connections open and
high thruput for each of these connection, this means if you wan to produce
from actions at scale and multitenant the actions would need to go thru
some type of broker via http and this broker is a stateful service that
maintains the socket connection to kafka service.

— Carlos

On Wed, Jan 9, 2019 at 8:12 AM Michele Sciabarra <mi...@sciabarra.com>
wrote:

> Hello whiskers,
>
> I am working on a simple webchat based on OpenWhisk  (full disclosure; it
> is for the book on OpenWhisk for O'Reilly!) and I am using MessageHub.
>
> However in the documentation for the messaging package I read that the
> /whisk.system/messaging/messageHubProduce is deprecated, and the
> recommended way is to use a persistent connection to Kafka.
>
> This statement is puzzling me.  My first idea is "why do not they cache
> the connection?", but I checked the code for messageHubProduce.py here:
>
>
> https://github.com/apache/incubator-openwhisk-package-kafka/blob/master/action/messageHubProduce.py
>
> and looks like the producer is cached, so I wonder if the deprecation of
> messageHubProduce is still valid.
>
> Furthermore, I wrote an action to send messages to MessageHub with the
> idea of implementing a cached connection, and apparently works:
>
>
> https://github.com/learning-apache-openwhisk/chapter11-messages/blob/master/src/persistent/main/send.go
>
> So the question is: is this the correct way to send messages to Kafka
> (caching connnections?) Can I recommend this practice in the book?
>
>
> --
>   Michele Sciabarra
>   michele@sciabarra.com
>
-- 
Carlos Santana
<cs...@gmail.com>

Re: Persistent Kafka Connections

Posted by Carlos Santana <cs...@gmail.com>.

Is one of those things that works well for a hello world low scale
deployment.

In practice we saw a lot of actions running in parallel from multiple
tenants and all of these even each one doing their caching were opening too
many new socket connection to a kafka service and eventually the kafka
service started to throttle and rate limit the number of new connections.

This is the reason the code it’s there in the repo but deprecated and not
recommended. From talking to kafka experts in ibm it was recommended from
them that having stateless clients doing a lot of new connection it is a
antipattern

So they recommended to try to have a low number of connections open and
high thruput for each of these connection, this means if you wan to produce
from actions at scale and multitenant the actions would need to go thru
some type of broker via http and this broker is a stateful service that
maintains the socket connection to kafka service.

— Carlos

On Wed, Jan 9, 2019 at 8:12 AM Michele Sciabarra <mi...@sciabarra.com>
wrote:

> Hello whiskers,
>
> I am working on a simple webchat based on OpenWhisk  (full disclosure; it
> is for the book on OpenWhisk for O'Reilly!) and I am using MessageHub.
>
> However in the documentation for the messaging package I read that the
> /whisk.system/messaging/messageHubProduce is deprecated, and the
> recommended way is to use a persistent connection to Kafka.
>
> This statement is puzzling me.  My first idea is "why do not they cache
> the connection?", but I checked the code for messageHubProduce.py here:
>
>
> https://github.com/apache/incubator-openwhisk-package-kafka/blob/master/action/messageHubProduce.py
>
> and looks like the producer is cached, so I wonder if the deprecation of
> messageHubProduce is still valid.
>
> Furthermore, I wrote an action to send messages to MessageHub with the
> idea of implementing a cached connection, and apparently works:
>
>
> https://github.com/learning-apache-openwhisk/chapter11-messages/blob/master/src/persistent/main/send.go
>
> So the question is: is this the correct way to send messages to Kafka
> (caching connnections?) Can I recommend this practice in the book?
>
>
> --
>   Michele Sciabarra
>   michele@sciabarra.com
>
-- 
Carlos Santana
<cs...@gmail.com>