You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Matteo Merli <mm...@apache.org> on 2017/06/29 20:51:31 UTC

[DISCUSS] PIP-1 Pulsar proxy component

I have created the wiki page with a first proposal (as we discussed
earlier).

https://github.com/apache/incubator-pulsar/wiki/PIP-1

Please add feedback in this thread (or if people prefer we can create a
GitHub issue to have the discussion on the proposal).

Other than the proposal itself, it would be good to have some feedback on
the format as well.

Matteo


--
Matteo Merli
<mm...@apache.org>

Re: [DISCUSS] PIP-1 Pulsar proxy component

Posted by Matteo Merli <ma...@gmail.com>.
On Thu, Jun 29, 2017 at 5:18 PM, Maurice Barnum <ms...@yahoo-inc.com.invalid>
wrote:

> this seems like it should be a solved problem.from a quick look, for
> example, it seems like nginx can do this already.
> https://www.nginx.com/resources/admin-guide/tcp-load-balancing/
>

That would only work for the initial topic lookup.

After that, the client discover that topic X is server on broker Y and it
will try to directly connect to it.

In the same way, when you connect the 2nd time, the proxy cannot just be a
simple TCP proxy, because you need to
specify which particular broker you want to connect to. So, I'm my
proposal, this is done in the Connect/Connected
phase, so the proxy knows which target it needs to connect to and after
that it gets out of the way by doing simple
buffer forwarding.

--
Matteo Merli
<ma...@gmail.com>

Re: [DISCUSS] PIP-1 Pulsar proxy component

Posted by Maurice Barnum <ms...@yahoo-inc.com.INVALID>.
this seems like it should be a solved problem.from a quick look, for example, it seems like nginx can do this already.
https://www.nginx.com/resources/admin-guide/tcp-load-balancing/


On Thursday, June 29, 2017, 4:54:29 PM PDT, Rajan Dhabalia <rd...@apache.org> wrote:

>> or if people prefer we can create a GitHub issue to have the discussion
on the proposal
I think github issue would be more helpful for discussion. so, my vote to
github-issue.

Thanks,
Rajan

On Thu, Jun 29, 2017 at 1:51 PM, Matteo Merli <mm...@apache.org> wrote:

> I have created the wiki page with a first proposal (as we discussed
> earlier).
>
> https://github.com/apache/incubator-pulsar/wiki/PIP-1
>
> Please add feedback in this thread (or if people prefer we can create a
> GitHub issue to have the discussion on the proposal).
>
> Other than the proposal itself, it would be good to have some feedback on
> the format as well.
>
> Matteo
>
>
> --
> Matteo Merli
> <mm...@apache.org>
>

Re: [DISCUSS] PIP-1 Pulsar proxy component

Posted by Rajan Dhabalia <rd...@apache.org>.
>> or if people prefer we can create a GitHub issue to have the discussion
on the proposal
I think github issue would be more helpful for discussion. so, my vote to
github-issue.

Thanks,
Rajan

On Thu, Jun 29, 2017 at 1:51 PM, Matteo Merli <mm...@apache.org> wrote:

> I have created the wiki page with a first proposal (as we discussed
> earlier).
>
> https://github.com/apache/incubator-pulsar/wiki/PIP-1
>
> Please add feedback in this thread (or if people prefer we can create a
> GitHub issue to have the discussion on the proposal).
>
> Other than the proposal itself, it would be good to have some feedback on
> the format as well.
>
> Matteo
>
>
> --
> Matteo Merli
> <mm...@apache.org>
>

Re: [DISCUSS] PIP-1 Pulsar proxy component

Posted by Matteo Merli <ma...@gmail.com>.
On Fri, Jun 30, 2017 at 5:09 PM, Dave Fisher <da...@comcast.net> wrote:

> One thought is make the proxy into a WebSocket frontend but then pass
> through to the broker using the TCP protocol.
>
> WebSocket would fit a pattern that everyone is used to scaling. It could
> be setup through Tomcat with connections through a VIP.
>

The problem is still related to the stateful nature of the brokers (a topic
is only served at a particular point in time), so the VIP cannot just route
to any random broker.
So, even if we change the client to always connect through the WebSocket
service, it would still need some way to indicate which broker it needs to
connect to.

Side note: Pulsar already provides a WebSocket service, and that is used to
expose an "easier" interface that you can use from whatever language. The
problem is that it's not meant to be super-performant (the exchanges
between client and server happen in JSON and the payloads are serialized in
base64..


Matteo


--
Matteo Merli
<ma...@gmail.com>

Re: [DISCUSS] PIP-1 Pulsar proxy component

Posted by Matteo Merli <ma...@gmail.com>.
I have updated the Wiki page with a section about TLS :
https://github.com/apache/incubator-pulsar/wiki/PIP-1#tls-encryption-and-authentication

Also, I just posted a PR with the implementation:

https://github.com/apache/incubator-pulsar/pull/548



--
Matteo Merli
<ma...@gmail.com>

On Fri, Jun 30, 2017 at 5:09 PM, Dave Fisher <da...@comcast.net> wrote:

>
> > On Jun 30, 2017, at 4:39 PM, Matteo Merli <ma...@gmail.com>
> wrote:
> >
> > On Fri, Jun 30, 2017 at 12:41 PM, Dave Fisher <da...@comcast.net>
> wrote:
> >
> >> So I am clear the problem is having an SSL endpoint that authenticates
> the
> >> client and allows the messages to flow through to the correct broker.
> >>
> >
> >
> > The main problem this proposal is trying to solve is how to expose
> Pulsar,
> > which is a stateful service [1], through a stateless frontend.
>
> One thought is make the proxy into a WebSocket frontend but then pass
> through to the broker using the TCP protocol.
>
> WebSocket would fit a pattern that everyone is used to scaling. It could
> be setup through Tomcat with connections through a VIP.
>
> Think about it.
>
> > The reason to have a stateless frontend, is that it's much easier to give
> > access to this service from outside the current cluster/datacenter. If
> you
> > have a stateless service, you can easily use multiple strategies (VIP,
> DNS,
> > ....) to expose multiple instances that are composing the service and you
> > don't need to connect to specific nodes. Any node, as routed by the
> > VIP/DNS/.. mechanism will work.
> >
> > One example of this is when deploying in a cloud environment, especially
> > with some container orchestration mechanism like kubernetes. If you want
> to
> > connect and publish messages from outside the Kubernetes cluster (or from
> > outside the cloud region alltogether), having the stateless service,
> > exposed to a cloud specific load balancer, makes it a lot easier to
> deploy
> > and to control the access from the outside world.
>
> Yes and this could work with WebSocket.
>
> >
> > For the SSL part, right now in Pulsar you might want to use TLS/SSL for
> few
> > reasons:
> > 1. Transport encryption
> > 2. Broker authentication (validate the broker we're talking to is
> > legitimate)
> > 3. Client authentication (extract the client principal to be used for
> > authorization)
> >
> > All these 3 should continue to work even when a proxy is introduced. In
> > particular, for the client authentication, the proxy is responsible of
> > validating the authentication and then carry the client principal over to
> > the broker.
> >
> >
> >>> Can I assume that if the broker disconnects that the requirement is for
> > the client to reconnect and then at that point get the new broker?
> >
> > Correct, that's the behavior of the client library and it will not
> change.
> > Whenever the connection breaks, the client library internally tries to
> > re-do the topic lookup (because now it might have moved to a different
> > broker) and reconnect, with exponential backoff, until it succeeds.
> >
> >
> > [1]  brokers don't keep state on disk, but still a topic is only assign
> to
> > a single broker at a give point in time.
> >
>
> Great.
>
> Regards,
> Dave
>
> >
> > --
> > Matteo Merli
> > <ma...@gmail.com>
>
>

Re: [DISCUSS] PIP-1 Pulsar proxy component

Posted by Dave Fisher <da...@comcast.net>.
> On Jun 30, 2017, at 4:39 PM, Matteo Merli <ma...@gmail.com> wrote:
> 
> On Fri, Jun 30, 2017 at 12:41 PM, Dave Fisher <da...@comcast.net> wrote:
> 
>> So I am clear the problem is having an SSL endpoint that authenticates the
>> client and allows the messages to flow through to the correct broker.
>> 
> 
> 
> The main problem this proposal is trying to solve is how to expose Pulsar,
> which is a stateful service [1], through a stateless frontend.

One thought is make the proxy into a WebSocket frontend but then pass through to the broker using the TCP protocol.

WebSocket would fit a pattern that everyone is used to scaling. It could be setup through Tomcat with connections through a VIP.

Think about it.

> The reason to have a stateless frontend, is that it's much easier to give
> access to this service from outside the current cluster/datacenter. If you
> have a stateless service, you can easily use multiple strategies (VIP, DNS,
> ....) to expose multiple instances that are composing the service and you
> don't need to connect to specific nodes. Any node, as routed by the
> VIP/DNS/.. mechanism will work.
> 
> One example of this is when deploying in a cloud environment, especially
> with some container orchestration mechanism like kubernetes. If you want to
> connect and publish messages from outside the Kubernetes cluster (or from
> outside the cloud region alltogether), having the stateless service,
> exposed to a cloud specific load balancer, makes it a lot easier to deploy
> and to control the access from the outside world.

Yes and this could work with WebSocket.

> 
> For the SSL part, right now in Pulsar you might want to use TLS/SSL for few
> reasons:
> 1. Transport encryption
> 2. Broker authentication (validate the broker we're talking to is
> legitimate)
> 3. Client authentication (extract the client principal to be used for
> authorization)
> 
> All these 3 should continue to work even when a proxy is introduced. In
> particular, for the client authentication, the proxy is responsible of
> validating the authentication and then carry the client principal over to
> the broker.
> 
> 
>>> Can I assume that if the broker disconnects that the requirement is for
> the client to reconnect and then at that point get the new broker?
> 
> Correct, that's the behavior of the client library and it will not change.
> Whenever the connection breaks, the client library internally tries to
> re-do the topic lookup (because now it might have moved to a different
> broker) and reconnect, with exponential backoff, until it succeeds.
> 
> 
> [1]  brokers don't keep state on disk, but still a topic is only assign to
> a single broker at a give point in time.
> 

Great.

Regards,
Dave

> 
> --
> Matteo Merli
> <ma...@gmail.com>


Re: [DISCUSS] PIP-1 Pulsar proxy component

Posted by Matteo Merli <ma...@gmail.com>.
On Fri, Jun 30, 2017 at 12:41 PM, Dave Fisher <da...@comcast.net> wrote:

> So I am clear the problem is having an SSL endpoint that authenticates the
> client and allows the messages to flow through to the correct broker.
>


The main problem this proposal is trying to solve is how to expose Pulsar,
which is a stateful service [1], through a stateless frontend.

The reason to have a stateless frontend, is that it's much easier to give
access to this service from outside the current cluster/datacenter. If you
have a stateless service, you can easily use multiple strategies (VIP, DNS,
....) to expose multiple instances that are composing the service and you
don't need to connect to specific nodes. Any node, as routed by the
VIP/DNS/.. mechanism will work.

One example of this is when deploying in a cloud environment, especially
with some container orchestration mechanism like kubernetes. If you want to
connect and publish messages from outside the Kubernetes cluster (or from
outside the cloud region alltogether), having the stateless service,
exposed to a cloud specific load balancer, makes it a lot easier to deploy
and to control the access from the outside world.

For the SSL part, right now in Pulsar you might want to use TLS/SSL for few
reasons:
 1. Transport encryption
 2. Broker authentication (validate the broker we're talking to is
legitimate)
 3. Client authentication (extract the client principal to be used for
authorization)

All these 3 should continue to work even when a proxy is introduced. In
particular, for the client authentication, the proxy is responsible of
validating the authentication and then carry the client principal over to
the broker.


>> Can I assume that if the broker disconnects that the requirement is for
the client to reconnect and then at that point get the new broker?

Correct, that's the behavior of the client library and it will not change.
Whenever the connection breaks, the client library internally tries to
re-do the topic lookup (because now it might have moved to a different
broker) and reconnect, with exponential backoff, until it succeeds.


[1]  brokers don't keep state on disk, but still a topic is only assign to
a single broker at a give point in time.


--
Matteo Merli
<ma...@gmail.com>

Re: [DISCUSS] PIP-1 Pulsar proxy component

Posted by Dave Fisher <da...@comcast.net>.
> On Jun 30, 2017, at 12:12 PM, Matteo Merli <ma...@gmail.com> wrote:
> 
> On Thu, Jun 29, 2017 at 6:33 PM, Dave Fisher <da...@comcast.net> wrote:
> 
>> I mean do you think it would meet the needs of a proxy including SSL?
>> 
>> I'll look into this more as this proxy design intrigues me.
>> 
> 
> 
> So, ZooKeeper is more of a distributed coordination service and it doesn't
> really work as a proxy. We use it in Pulsar to coordinate brokers and
> storage nodes and to store metadata.
> One design trait is that we don't want to expose ZK service to our users,
> since it's a very critical piece of the infrastructure (if ZK is down, the
> Pulsar cluster cannot operate).
> 
> Here there is a high-level diagram that shows where ZK is being used in
> Pulsar
> https://github.com/apache/incubator-pulsar/blob/master/docs/Architecture.md#architecture

Thanks. Good documentation. Until recently I was a Principal Architect and led an Enterprise Architecture Board.

> 
> As Maurice commented, there are many ways to to do HTTP or TCP proxy, from
> nginx to Apache TrafficServer, but these won't work to proxy to stateful
> backend services.

Classic single threading and scalability issues.

> This is a common problem for cloud deployment. For Kafka they have the same
> issue and the solution they offer is to use a REST proxy to expose to the
> outside world (but that has a huge performance penalty, especially if you
> need to guarantee the message ordering).

Yes, a nightmare.

> 
> For the SSL part, unless you have an L4 proxy such as a VIP, the SSL needs
> to be terminated at that layer.

I am familiar with F5s and VIPs and know that these can terminate the SSL for you, but they can also be their own performance nightmares particularly if are programming which VIP to send a domain request to.

> I think this fits well anyway for most deployment and has the advantage of
> offloading the SSL portion to the proxy compared to the broker.

So I am clear the problem is having an SSL endpoint that authenticates the client and allows the messages to flow through to the correct broker. Can I assume that if the broker disconnects that the requirement is for the client to reconnect and then at that point get the new broker?

Thanks.

Regards,
Dave

> 
> 
> Matteo
> 
> 
> --
> Matteo Merli
> <ma...@gmail.com>


Re: [DISCUSS] PIP-1 Pulsar proxy component

Posted by Matteo Merli <ma...@gmail.com>.
On Thu, Jun 29, 2017 at 6:33 PM, Dave Fisher <da...@comcast.net> wrote:

> I mean do you think it would meet the needs of a proxy including SSL?
>
> I'll look into this more as this proxy design intrigues me.
>


So, ZooKeeper is more of a distributed coordination service and it doesn't
really work as a proxy. We use it in Pulsar to coordinate brokers and
storage nodes and to store metadata.
One design trait is that we don't want to expose ZK service to our users,
since it's a very critical piece of the infrastructure (if ZK is down, the
Pulsar cluster cannot operate).

Here there is a high-level diagram that shows where ZK is being used in
Pulsar
https://github.com/apache/incubator-pulsar/blob/master/docs/Architecture.md#architecture

As Maurice commented, there are many ways to to do HTTP or TCP proxy, from
nginx to Apache TrafficServer, but these won't work to proxy to stateful
backend services.
This is a common problem for cloud deployment. For Kafka they have the same
issue and the solution they offer is to use a REST proxy to expose to the
outside world (but that has a huge performance penalty, especially if you
need to guarantee the message ordering).

For the SSL part, unless you have an L4 proxy such as a VIP, the SSL needs
to be terminated at that layer.
I think this fits well anyway for most deployment and has the advantage of
offloading the SSL portion to the proxy compared to the broker.


Matteo


--
Matteo Merli
<ma...@gmail.com>

Re: [DISCUSS] PIP-1 Pulsar proxy component

Posted by Dave Fisher <da...@comcast.net>.

Sent from my iPhone

> On Jun 29, 2017, at 5:19 PM, Matteo Merli <ma...@gmail.com> wrote:
> 
>> On Thu, Jun 29, 2017 at 4:06 PM, Dave Fisher <da...@comcast.net> wrote:
>> 
>> Does Apache Zookeeper meet these needs?
>> 
> 
> Dave, do you mean with respect to SSL?

I'm asking as a developer and not as a mentor.

I mean do you think it would meet the needs of a proxy including SSL?

I'll look into this more as this proxy design intrigues me.

Regards,
Dave

> 
> --
> Matteo Merli
> <ma...@gmail.com>


Re: [DISCUSS] PIP-1 Pulsar proxy component

Posted by Matteo Merli <ma...@gmail.com>.
On Thu, Jun 29, 2017 at 4:06 PM, Dave Fisher <da...@comcast.net> wrote:

> Does Apache Zookeeper meet these needs?
>

Dave, do you mean with respect to SSL?

--
Matteo Merli
<ma...@gmail.com>

Re: [DISCUSS] PIP-1 Pulsar proxy component

Posted by Dave Fisher <da...@comcast.net>.
Hi Matteo,

Does Apache Zookeeper meet these needs?

Regards,
Dave

> On Jun 29, 2017, at 2:33 PM, Matteo Merli <ma...@gmail.com> wrote:
> 
> Good point,
> 
> I think we should support SSL, though it might get terminated at the proxy
> layer.
> 
> Whether the connection between proxy and broker it's encrypted would be
> kind of a separated issue, but in any case we cannot just forward the
> encrypted connection because that would break the SSL certificate
> validation.
> 
> 
> 
> --
> Matteo Merli
> <ma...@gmail.com>
> 
> On Thu, Jun 29, 2017 at 2:25 PM, Joe Francis <jo...@yahoo-inc.com.invalid>
> wrote:
> 
>> SSL will be supported?
>> Joe
>> 
>>    On Thursday, June 29, 2017 1:51 PM, Matteo Merli <mm...@apache.org>
>> wrote:
>> 
>> 
>> I have created the wiki page with a first proposal (as we discussed
>> earlier).
>> 
>> https://github.com/apache/incubator-pulsar/wiki/PIP-1
>> 
>> Please add feedback in this thread (or if people prefer we can create a
>> GitHub issue to have the discussion on the proposal).
>> 
>> Other than the proposal itself, it would be good to have some feedback on
>> the format as well.
>> 
>> Matteo
>> 
>> 
>> --
>> Matteo Merli
>> <mm...@apache.org>
>> 
>> 
>> 
>> 


Re: [DISCUSS] PIP-1 Pulsar proxy component

Posted by Matteo Merli <ma...@gmail.com>.
Good point,

I think we should support SSL, though it might get terminated at the proxy
layer.

Whether the connection between proxy and broker it's encrypted would be
kind of a separated issue, but in any case we cannot just forward the
encrypted connection because that would break the SSL certificate
validation.



--
Matteo Merli
<ma...@gmail.com>

On Thu, Jun 29, 2017 at 2:25 PM, Joe Francis <jo...@yahoo-inc.com.invalid>
wrote:

>  SSL will be supported?
> Joe
>
>     On Thursday, June 29, 2017 1:51 PM, Matteo Merli <mm...@apache.org>
> wrote:
>
>
>  I have created the wiki page with a first proposal (as we discussed
> earlier).
>
> https://github.com/apache/incubator-pulsar/wiki/PIP-1
>
> Please add feedback in this thread (or if people prefer we can create a
> GitHub issue to have the discussion on the proposal).
>
> Other than the proposal itself, it would be good to have some feedback on
> the format as well.
>
> Matteo
>
>
> --
> Matteo Merli
> <mm...@apache.org>
>
>
>
>

Re: [DISCUSS] PIP-1 Pulsar proxy component

Posted by Joe Francis <jo...@yahoo-inc.com.INVALID>.
 SSL will be supported?
Joe 

    On Thursday, June 29, 2017 1:51 PM, Matteo Merli <mm...@apache.org> wrote:
 

 I have created the wiki page with a first proposal (as we discussed
earlier).

https://github.com/apache/incubator-pulsar/wiki/PIP-1

Please add feedback in this thread (or if people prefer we can create a
GitHub issue to have the discussion on the proposal).

Other than the proposal itself, it would be good to have some feedback on
the format as well.

Matteo


--
Matteo Merli
<mm...@apache.org>