You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@geode.apache.org by Dan Smith <ds...@pivotal.io> on 2017/05/02 00:27:11 UTC

Re: [gemfire-dev] New Client-Server Protocol Proposal

I think any new client driver or server we develop might want to
incorporate function execution at lower level than region operations like
get and put, etc. We could then easily build operations like GET, PUT,
PUTALL, etc. on top of that by making them functions. The original client
protocol isn't designed like that because it pre-dates function execution.

The current function execution API is a little clunky and needs some work.
But what it does do is provide the fundamental logic to target operations
at members that host certain keys and retry in the case of failure.

The advantage of this approach is that if someone just builds a driver that
only supports function execution and whatever serialization framework is
required to serialize function arguments, they already have an API that
application developers could use to do pretty much anything they wanted to
do on the server. Having a Region object with methods like get and put on
it could just be a little syntatic sugar on top of that.

-Dan

On Fri, Apr 28, 2017 at 2:49 PM, Udo Kohlmeyer <uk...@pivotal.io>
wrote:

> Hi there Geode community,
>
> The new Client-Server protocol proposal is available for review.
>
> It can be viewed and commented on https://cwiki.apache.org/confl
> uence/display/GEODE/New+Client+Server+Protocol
>
> --Udo
>
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Hitesh Khamesra <hi...@yahoo.com.INVALID>.

Good point Dan !! that needs to document.


      From: Dan Smith <ds...@pivotal.io>
 To: dev@geode.apache.org 
 Sent: Wednesday, May 3, 2017 5:31 PM
 Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal
   
Okay .... but how do I has an implementer of a driver know what messages
need an event id and which don't? It seems like maybe this belongs with
those message types, rather than in a generic header. Or maybe you need to
start organizing messages into classes - eg messages that change state and
messages that don't and abstracting out commonality.

It's also not clear exactly what the event id should be set to. When do a
change the sequence id? Does it have to be monotonically increasing? What
should the uniqueId be?

-Da

On Wed, May 3, 2017 at 5:07 PM, Udo Kohlmeyer <uk...@pivotal.io> wrote:

> Correct,
>
> I did miss that. @Dan, if you look at https://cwiki.apache.org/confl
> uence/display/GEODE/Message+Structure+and+Definition#Messa
> geStructureandDefinition-MetaDataforRequests specifies how we provide
> EventId information.
>
>
>
> On 5/3/17 09:53, Bruce Schuchardt wrote:
>
>> I believe Hitesh put EventId in the metadata section.
>>
>> Le 5/2/2017 à 2:22 PM, Udo Kohlmeyer a écrit :
>>
>>> We are considering the function service, but again, this should not
>>> detract from the proposed message specification proposal.
>>>
>>> You are also correct in your observation of list of error codes not
>>> being complete nor exhaustive. Maybe the first page needs to highlight that
>>> this is a proposal and does not contain all the error codes that we could
>>> per api.
>>>
>>> As for the EventId, we will look into this and update the document
>>> accordingly.
>>>
>>> --Udo
>>>
>>>
>>> On 5/2/17 13:42, Dan Smith wrote:
>>>
>>>> I guess the value of building other messages on top of the function
>>>> service mostly comes into play when we start talking about smarter clients
>>>> that can do single hop. At that point it's really nice to have have a layer
>>>> that lets us send a message to a single primary, or all of the members that
>>>> host a region etc. It is also nice that right now if I add new function
>>>> that functionality becomes available to gfsh, REST, Java, and C++
>>>> developers automatically.
>>>>
>>>> I do agree that the new protocol could build in these concepts, and
>>>> doesn't necessarily have to use function execution to achieve the same
>>>> results. But do at least consider whether new developers will want to add
>>>> new functionality to the server via functions or via your this new
>>>> protocol. If it's harder to use the new protocol than to write a new
>>>> function and invoke it from the client, then I think we've done something
>>>> wrong.
>>>>
>>>>
>>>> A couple of other comments, now that I've looked a little more:
>>>>
>>>> 1) The list of error codes <https://cwiki.apache.org/conf
>>>> luence/display/GEODE/RegionAPI#RegionAPI-ErrorCodeDefinitions> seems
>>>> really incomplete. It looks like we've picked a few of the possible
>>>> exceptions geode could throw and assigned them integer ids? What is the
>>>> rational for the exceptions that are included here vs. other exceptions?
>>>> Also, not all messages would need to return these error codes.
>>>>
>>>> 2) The existing protocol has some functionality even for basic puts
>>>> that is not represented here. Client generate an event id that is
>>>> associated with the put and send that to the server. These event ids are
>>>> used to guarantee that if a client does put (A, 0) followed by put (A, 1),
>>>> the resulting value will always be 1, even if the client timed out and
>>>> retried put (A, 0). The event id prevents the lingered put that timed out
>>>> on the server from affecting the state. I'm not saying the new protocol has
>>>> to support this sort of behavior, but you might want to consider whether
>>>> the current protocol should specify anything about how events are retried.
>>>>
>>>> -Dan
>>>>
>>>
>>>
>>>
>>
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Dan Smith <ds...@pivotal.io>.

Okay .... but how do I has an implementer of a driver know what messages
need an event id and which don't? It seems like maybe this belongs with
those message types, rather than in a generic header. Or maybe you need to
start organizing messages into classes - eg messages that change state and
messages that don't and abstracting out commonality.

It's also not clear exactly what the event id should be set to. When do a
change the sequence id? Does it have to be monotonically increasing? What
should the uniqueId be?

-Da

On Wed, May 3, 2017 at 5:07 PM, Udo Kohlmeyer <uk...@pivotal.io> wrote:

> Correct,
>
> I did miss that. @Dan, if you look at https://cwiki.apache.org/confl
> uence/display/GEODE/Message+Structure+and+Definition#Messa
> geStructureandDefinition-MetaDataforRequests specifies how we provide
> EventId information.
>
>
>
> On 5/3/17 09:53, Bruce Schuchardt wrote:
>
>> I believe Hitesh put EventId in the metadata section.
>>
>> Le 5/2/2017 à 2:22 PM, Udo Kohlmeyer a écrit :
>>
>>> We are considering the function service, but again, this should not
>>> detract from the proposed message specification proposal.
>>>
>>> You are also correct in your observation of list of error codes not
>>> being complete nor exhaustive. Maybe the first page needs to highlight that
>>> this is a proposal and does not contain all the error codes that we could
>>> per api.
>>>
>>> As for the EventId, we will look into this and update the document
>>> accordingly.
>>>
>>> --Udo
>>>
>>>
>>> On 5/2/17 13:42, Dan Smith wrote:
>>>
>>>> I guess the value of building other messages on top of the function
>>>> service mostly comes into play when we start talking about smarter clients
>>>> that can do single hop. At that point it's really nice to have have a layer
>>>> that lets us send a message to a single primary, or all of the members that
>>>> host a region etc. It is also nice that right now if I add new function
>>>> that functionality becomes available to gfsh, REST, Java, and C++
>>>> developers automatically.
>>>>
>>>> I do agree that the new protocol could build in these concepts, and
>>>> doesn't necessarily have to use function execution to achieve the same
>>>> results. But do at least consider whether new developers will want to add
>>>> new functionality to the server via functions or via your this new
>>>> protocol. If it's harder to use the new protocol than to write a new
>>>> function and invoke it from the client, then I think we've done something
>>>> wrong.
>>>>
>>>>
>>>> A couple of other comments, now that I've looked a little more:
>>>>
>>>> 1) The list of error codes <https://cwiki.apache.org/conf
>>>> luence/display/GEODE/RegionAPI#RegionAPI-ErrorCodeDefinitions> seems
>>>> really incomplete. It looks like we've picked a few of the possible
>>>> exceptions geode could throw and assigned them integer ids? What is the
>>>> rational for the exceptions that are included here vs. other exceptions?
>>>> Also, not all messages would need to return these error codes.
>>>>
>>>> 2) The existing protocol has some functionality even for basic puts
>>>> that is not represented here. Client generate an event id that is
>>>> associated with the put and send that to the server. These event ids are
>>>> used to guarantee that if a client does put (A, 0) followed by put (A, 1),
>>>> the resulting value will always be 1, even if the client timed out and
>>>> retried put (A, 0). The event id prevents the lingered put that timed out
>>>> on the server from affecting the state. I'm not saying the new protocol has
>>>> to support this sort of behavior, but you might want to consider whether
>>>> the current protocol should specify anything about how events are retried.
>>>>
>>>> -Dan
>>>>
>>>
>>>
>>>
>>
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Udo Kohlmeyer <uk...@pivotal.io>.

Correct,

I did miss that. @Dan, if you look at 
https://cwiki.apache.org/confluence/display/GEODE/Message+Structure+and+Definition#MessageStructureandDefinition-MetaDataforRequests 
specifies how we provide EventId information.


On 5/3/17 09:53, Bruce Schuchardt wrote:
> I believe Hitesh put EventId in the metadata section.
>
> Le 5/2/2017 à 2:22 PM, Udo Kohlmeyer a écrit :
>> We are considering the function service, but again, this should not 
>> detract from the proposed message specification proposal.
>>
>> You are also correct in your observation of list of error codes not 
>> being complete nor exhaustive. Maybe the first page needs to 
>> highlight that this is a proposal and does not contain all the error 
>> codes that we could per api.
>>
>> As for the EventId, we will look into this and update the document 
>> accordingly.
>>
>> --Udo
>>
>>
>> On 5/2/17 13:42, Dan Smith wrote:
>>> I guess the value of building other messages on top of the function 
>>> service mostly comes into play when we start talking about smarter 
>>> clients that can do single hop. At that point it's really nice to 
>>> have have a layer that lets us send a message to a single primary, 
>>> or all of the members that host a region etc. It is also nice that 
>>> right now if I add new function that functionality becomes available 
>>> to gfsh, REST, Java, and C++ developers automatically.
>>>
>>> I do agree that the new protocol could build in these concepts, and 
>>> doesn't necessarily have to use function execution to achieve the 
>>> same results. But do at least consider whether new developers will 
>>> want to add new functionality to the server via functions or via 
>>> your this new protocol. If it's harder to use the new protocol than 
>>> to write a new function and invoke it from the client, then I think 
>>> we've done something wrong.
>>>
>>>
>>> A couple of other comments, now that I've looked a little more:
>>>
>>> 1) The list of error codes 
>>> <https://cwiki.apache.org/confluence/display/GEODE/RegionAPI#RegionAPI-ErrorCodeDefinitions> 
>>> seems really incomplete. It looks like we've picked a few of the 
>>> possible exceptions geode could throw and assigned them integer ids? 
>>> What is the rational for the exceptions that are included here vs. 
>>> other exceptions? Also, not all messages would need to return these 
>>> error codes.
>>>
>>> 2) The existing protocol has some functionality even for basic puts 
>>> that is not represented here. Client generate an event id that is 
>>> associated with the put and send that to the server. These event ids 
>>> are used to guarantee that if a client does put (A, 0) followed by 
>>> put (A, 1), the resulting value will always be 1, even if the client 
>>> timed out and retried put (A, 0). The event id prevents the lingered 
>>> put that timed out on the server from affecting the state. I'm not 
>>> saying the new protocol has to support this sort of behavior, but 
>>> you might want to consider whether the current protocol should 
>>> specify anything about how events are retried.
>>>
>>> -Dan
>>
>>
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Bruce Schuchardt <bs...@pivotal.io>.

I believe Hitesh put EventId in the metadata section.

Le 5/2/2017 à 2:22 PM, Udo Kohlmeyer a écrit :
> We are considering the function service, but again, this should not 
> detract from the proposed message specification proposal.
>
> You are also correct in your observation of list of error codes not 
> being complete nor exhaustive. Maybe the first page needs to highlight 
> that this is a proposal and does not contain all the error codes that 
> we could per api.
>
> As for the EventId, we will look into this and update the document 
> accordingly.
>
> --Udo
>
>
> On 5/2/17 13:42, Dan Smith wrote:
>> I guess the value of building other messages on top of the function 
>> service mostly comes into play when we start talking about smarter 
>> clients that can do single hop. At that point it's really nice to 
>> have have a layer that lets us send a message to a single primary, or 
>> all of the members that host a region etc. It is also nice that right 
>> now if I add new function that functionality becomes available to 
>> gfsh, REST, Java, and C++ developers automatically.
>>
>> I do agree that the new protocol could build in these concepts, and 
>> doesn't necessarily have to use function execution to achieve the 
>> same results. But do at least consider whether new developers will 
>> want to add new functionality to the server via functions or via your 
>> this new protocol. If it's harder to use the new protocol than to 
>> write a new function and invoke it from the client, then I think 
>> we've done something wrong.
>>
>>
>> A couple of other comments, now that I've looked a little more:
>>
>> 1) The list of error codes 
>> <https://cwiki.apache.org/confluence/display/GEODE/RegionAPI#RegionAPI-ErrorCodeDefinitions> 
>> seems really incomplete. It looks like we've picked a few of the 
>> possible exceptions geode could throw and assigned them integer ids? 
>> What is the rational for the exceptions that are included here vs. 
>> other exceptions? Also, not all messages would need to return these 
>> error codes.
>>
>> 2) The existing protocol has some functionality even for basic puts 
>> that is not represented here. Client generate an event id that is 
>> associated with the put and send that to the server. These event ids 
>> are used to guarantee that if a client does put (A, 0) followed by 
>> put (A, 1), the resulting value will always be 1, even if the client 
>> timed out and retried put (A, 0). The event id prevents the lingered 
>> put that timed out on the server from affecting the state. I'm not 
>> saying the new protocol has to support this sort of behavior, but you 
>> might want to consider whether the current protocol should specify 
>> anything about how events are retried.
>>
>> -Dan
>
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Udo Kohlmeyer <uk...@pivotal.io>.

We are considering the function service, but again, this should not 
detract from the proposed message specification proposal.

You are also correct in your observation of list of error codes not 
being complete nor exhaustive. Maybe the first page needs to highlight 
that this is a proposal and does not contain all the error codes that we 
could per api.

As for the EventId, we will look into this and update the document 
accordingly.

--Udo


On 5/2/17 13:42, Dan Smith wrote:
> I guess the value of building other messages on top of the function 
> service mostly comes into play when we start talking about smarter 
> clients that can do single hop. At that point it's really nice to have 
> have a layer that lets us send a message to a single primary, or all 
> of the members that host a region etc. It is also nice that right now 
> if I add new function that functionality becomes available to gfsh, 
> REST, Java, and C++ developers automatically.
>
> I do agree that the new protocol could build in these concepts, and 
> doesn't necessarily have to use function execution to achieve the same 
> results. But do at least consider whether new developers will want to 
> add new functionality to the server via functions or via your this new 
> protocol. If it's harder to use the new protocol than to write a new 
> function and invoke it from the client, then I think we've done 
> something wrong.
>
>
> A couple of other comments, now that I've looked a little more:
>
> 1) The list of error codes 
> <https://cwiki.apache.org/confluence/display/GEODE/RegionAPI#RegionAPI-ErrorCodeDefinitions> 
> seems really incomplete. It looks like we've picked a few of the 
> possible exceptions geode could throw and assigned them integer ids? 
> What is the rational for the exceptions that are included here vs. 
> other exceptions? Also, not all messages would need to return these 
> error codes.
>
> 2) The existing protocol has some functionality even for basic puts 
> that is not represented here. Client generate an event id that is 
> associated with the put and send that to the server. These event ids 
> are used to guarantee that if a client does put (A, 0) followed by put 
> (A, 1), the resulting value will always be 1, even if the client timed 
> out and retried put (A, 0). The event id prevents the lingered put 
> that timed out on the server from affecting the state. I'm not saying 
> the new protocol has to support this sort of behavior, but you might 
> want to consider whether the current protocol should specify anything 
> about how events are retried.
>
> -Dan

Re: New Client-Server Protocol Proposal

Posted by Hitesh Khamesra <hi...@yahoo.com.INVALID>.

(Just format change)
Here are the few things we need to consider..
1. key, value, callbackarg can be required to interpret as JSON-to-pdx2. client calls "get/getall" api and want return value as JSON. Value was serialized as pdx.3. This behavior should be optional, if possible no overhead for others.4. "putAll api" can have mixed value types(JSON and numbers etc). Dan raised about this. And may be worth to consider it.
Thus my initial thought was client should indicate this feature at message level(metadata), saying, convert pdx value to json or vice-versa.

Any thoughts?

Thanks.Hitesh.
      From: Hitesh Khamesra <hi...@yahoo.com.INVALID>
 To: "dev@geode.apache.org" <de...@geode.apache.org> 
 Sent: Wednesday, May 3, 2017 10:01 AM
 Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal

Here are the few things we need to consider..
1. key, value, callbackarg can be required to interpret as JSON-to-pdx2. client calls "get/getall" api and want return value as JSON. Value was serialized as pdx.3. This behavior should be optional, if possible no overhead for others.4. "putAll api" can have mixed type values(JSON and numbers). Dan raised about this. And may be worth to consider it.
Thus my initial thought was client should indicate this feature at message level(metadata), saying, convert pdx value to json or vice-versa.
Any thoughts?
Thanks.HItesh

      From: Jacob Barrett <jb...@pivotal.io>
 To: dev@geode.apache.org 
Cc: Udo Kohlmeyer <uk...@pivotal.io>
 Sent: Tuesday, May 2, 2017 8:11 PM
 Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal

I agree completely with Dan. There is no reason to have flags for value encoding type in the message. I would argue that should be part of the value serialization layer. If something was placed in the message layer it should be more generic and allow for an unrestricted set of encodings by some ID.

Object {
variant ID codec;
byte[] payload;
}

-Jake

Sent from my iPhone

> On May 2, 2017, at 1:42 PM, Dan Smith <ds...@pivotal.io> wrote:
> 
> I guess the value of building other messages on top of the function service
> mostly comes into play when we start talking about smarter clients that can
> do single hop. At that point it's really nice to have have a layer that
> lets us send a message to a single primary, or all of the members that host
> a region etc. It is also nice that right now if I add new function that
> functionality becomes available to gfsh, REST, Java, and C++ developers
> automatically.
> 
> I do agree that the new protocol could build in these concepts, and doesn't
> necessarily have to use function execution to achieve the same results. But
> do at least consider whether new developers will want to add new
> functionality to the server via functions or via your this new protocol. If
> it's harder to use the new protocol than to write a new function and invoke
> it from the client, then I think we've done something wrong.
> 
> 
> A couple of other comments, now that I've looked a little more:
> 
> 1) The list of error codes
> <https://cwiki.apache.org/confluence/display/GEODE/RegionAPI#RegionAPI-ErrorCodeDefinitions>
> seems really incomplete. It looks like we've picked a few of the possible
> exceptions geode could throw and assigned them integer ids? What is the
> rational for the exceptions that are included here vs. other exceptions?
> Also, not all messages would need to return these error codes.
> 
> 2) The existing protocol has some functionality even for basic puts that is
> not represented here. Client generate an event id that is associated with
> the put and send that to the server. These event ids are used to guarantee
> that if a client does put (A, 0) followed by put (A, 1), the resulting
> value will always be 1, even if the client timed out and retried put (A,
> 0). The event id prevents the lingered put that timed out on the server
> from affecting the state. I'm not saying the new protocol has to support
> this sort of behavior, but you might want to consider whether the current
> protocol should specify anything about how events are retried.
> 
> -Dan

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Bruce Schuchardt <bs...@pivotal.io>.

I agree with Dan that the spec will need to deal with the effects of 
retrying an operation.

Le 5/3/2017 à 10:58 AM, Hitesh Khamesra a écrit :
> (Sorry: one more attempt to format this message)
>
> Here are the few things we need to consider..
>
>
> 1. key, value, callbackarg can be required to interpret as JSON-to-pdx.
> 2. client calls "get/getall" api and want return value as JSON. Value was serialized as pdx.
> 3. This behavior should be optional, if possible no overhead for others.
> 4. "putAll api" can have mixed type values(JSON and numbers). Dan raised about this. And may be worth to consider it.
>
> Thus my initial thought was client should indicate this feature at message level(metadata), saying, convert pdx value to json or vice-versa.
>
>
> Any thoughts?
> Thanks.
> HItesh
>
>
>
> ________________________________
> From: Hitesh Khamesra <hi...@yahoo.com.INVALID>
> To: "dev@geode.apache.org" <de...@geode.apache.org>
> Sent: Wednesday, May 3, 2017 10:01 AM
> Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal
>
>
>
> Here are the few things we need to consider..
> 1. key, value, callbackarg can be required to interpret as JSON-to-pdx2. client calls "get/getall" api and want return value as JSON. Value was serialized as pdx.3. This behavior should be optional, if possible no overhead for others.4. "putAll api" can have mixed type values(JSON and numbers). Dan raised about this. And may be worth to consider it.
> Thus my initial thought was client should indicate this feature at message level(metadata), saying, convert pdx value to json or vice-versa.
> Any thoughts?
> Thanks.HItesh
>
>
>        From: Jacob Barrett <jb...@pivotal.io>
>
> To: dev@geode.apache.org
> Cc: Udo Kohlmeyer <uk...@pivotal.io>
> Sent: Tuesday, May 2, 2017 8:11 PM
> Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal
>    
> I agree completely with Dan. There is no reason to have flags for value encoding type in the message. I would argue that should be part of the value serialization layer. If something was placed in the message layer it should be more generic and allow for an unrestricted set of encodings by some ID.
>
> Object {
> variant ID codec;
> byte[] payload;
> }
>
>
> -Jake
>
>
> Sent from my iPhone
>
>> On May 2, 2017, at 1:42 PM, Dan Smith <ds...@pivotal.io> wrote:
>>
>> I guess the value of building other messages on top of the function service
>> mostly comes into play when we start talking about smarter clients that can
>> do single hop. At that point it's really nice to have have a layer that
>> lets us send a message to a single primary, or all of the members that host
>> a region etc. It is also nice that right now if I add new function that
>> functionality becomes available to gfsh, REST, Java, and C++ developers
>> automatically.
>>
>> I do agree that the new protocol could build in these concepts, and doesn't
>> necessarily have to use function execution to achieve the same results. But
>> do at least consider whether new developers will want to add new
>> functionality to the server via functions or via your this new protocol. If
>> it's harder to use the new protocol than to write a new function and invoke
>> it from the client, then I think we've done something wrong.
>>
>>
>> A couple of other comments, now that I've looked a little more:
>>
>> 1) The list of error codes
>> <https://cwiki.apache.org/confluence/display/GEODE/RegionAPI#RegionAPI-ErrorCodeDefinitions>
>> seems really incomplete. It looks like we've picked a few of the possible
>> exceptions geode could throw and assigned them integer ids? What is the
>> rational for the exceptions that are included here vs. other exceptions?
>> Also, not all messages would need to return these error codes.
>>
>> 2) The existing protocol has some functionality even for basic puts that is
>> not represented here. Client generate an event id that is associated with
>> the put and send that to the server. These event ids are used to guarantee
>> that if a client does put (A, 0) followed by put (A, 1), the resulting
>> value will always be 1, even if the client timed out and retried put (A,
>> 0). The event id prevents the lingered put that timed out on the server
>> from affecting the state. I'm not saying the new protocol has to support
>> this sort of behavior, but you might want to consider whether the current
>> protocol should specify anything about how events are retried.
>>
>> -Dan

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Hitesh Khamesra <hi...@yahoo.com.INVALID>.

(Sorry: one more attempt to format this message)

Here are the few things we need to consider..

1. key, value, callbackarg can be required to interpret as JSON-to-pdx.
2. client calls "get/getall" api and want return value as JSON. Value was serialized as pdx.
3. This behavior should be optional, if possible no overhead for others.
4. "putAll api" can have mixed type values(JSON and numbers). Dan raised about this. And may be worth to consider it.

Thus my initial thought was client should indicate this feature at message level(metadata), saying, convert pdx value to json or vice-versa.

Any thoughts?
Thanks.
HItesh

________________________________
From: Hitesh Khamesra <hi...@yahoo.com.INVALID>
To: "dev@geode.apache.org" <de...@geode.apache.org> 
Sent: Wednesday, May 3, 2017 10:01 AM
Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal

Here are the few things we need to consider..
1. key, value, callbackarg can be required to interpret as JSON-to-pdx2. client calls "get/getall" api and want return value as JSON. Value was serialized as pdx.3. This behavior should be optional, if possible no overhead for others.4. "putAll api" can have mixed type values(JSON and numbers). Dan raised about this. And may be worth to consider it.
Thus my initial thought was client should indicate this feature at message level(metadata), saying, convert pdx value to json or vice-versa.
Any thoughts?
Thanks.HItesh

      From: Jacob Barrett <jb...@pivotal.io>

To: dev@geode.apache.org 
Cc: Udo Kohlmeyer <uk...@pivotal.io>
Sent: Tuesday, May 2, 2017 8:11 PM
Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal

I agree completely with Dan. There is no reason to have flags for value encoding type in the message. I would argue that should be part of the value serialization layer. If something was placed in the message layer it should be more generic and allow for an unrestricted set of encodings by some ID.

Object {
variant ID codec;
byte[] payload;
}

-Jake

Sent from my iPhone

> On May 2, 2017, at 1:42 PM, Dan Smith <ds...@pivotal.io> wrote:
> 
> I guess the value of building other messages on top of the function service
> mostly comes into play when we start talking about smarter clients that can
> do single hop. At that point it's really nice to have have a layer that
> lets us send a message to a single primary, or all of the members that host
> a region etc. It is also nice that right now if I add new function that
> functionality becomes available to gfsh, REST, Java, and C++ developers
> automatically.
> 
> I do agree that the new protocol could build in these concepts, and doesn't
> necessarily have to use function execution to achieve the same results. But
> do at least consider whether new developers will want to add new
> functionality to the server via functions or via your this new protocol. If
> it's harder to use the new protocol than to write a new function and invoke
> it from the client, then I think we've done something wrong.
> 
> 
> A couple of other comments, now that I've looked a little more:
> 
> 1) The list of error codes
> <https://cwiki.apache.org/confluence/display/GEODE/RegionAPI#RegionAPI-ErrorCodeDefinitions>
> seems really incomplete. It looks like we've picked a few of the possible
> exceptions geode could throw and assigned them integer ids? What is the
> rational for the exceptions that are included here vs. other exceptions?
> Also, not all messages would need to return these error codes.
> 
> 2) The existing protocol has some functionality even for basic puts that is
> not represented here. Client generate an event id that is associated with
> the put and send that to the server. These event ids are used to guarantee
> that if a client does put (A, 0) followed by put (A, 1), the resulting
> value will always be 1, even if the client timed out and retried put (A,
> 0). The event id prevents the lingered put that timed out on the server
> from affecting the state. I'm not saying the new protocol has to support
> this sort of behavior, but you might want to consider whether the current
> protocol should specify anything about how events are retried.
> 
> -Dan

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Hitesh Khamesra <hi...@yahoo.com.INVALID>.

Here are the few things we need to consider..
1. key, value, callbackarg can be required to interpret as JSON-to-pdx2. client calls "get/getall" api and want return value as JSON. Value was serialized as pdx.3. This behavior should be optional, if possible no overhead for others.4. "putAll api" can have mixed type values(JSON and numbers). Dan raised about this. And may be worth to consider it.
Thus my initial thought was client should indicate this feature at message level(metadata), saying, convert pdx value to json or vice-versa.
Any thoughts?
Thanks.HItesh


      From: Jacob Barrett <jb...@pivotal.io>
 To: dev@geode.apache.org 
Cc: Udo Kohlmeyer <uk...@pivotal.io>
 Sent: Tuesday, May 2, 2017 8:11 PM
 Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal
   
I agree completely with Dan. There is no reason to have flags for value encoding type in the message. I would argue that should be part of the value serialization layer. If something was placed in the message layer it should be more generic and allow for an unrestricted set of encodings by some ID.

Object {
variant ID codec;
byte[] payload;
}


-Jake


Sent from my iPhone

> On May 2, 2017, at 1:42 PM, Dan Smith <ds...@pivotal.io> wrote:
> 
> I guess the value of building other messages on top of the function service
> mostly comes into play when we start talking about smarter clients that can
> do single hop. At that point it's really nice to have have a layer that
> lets us send a message to a single primary, or all of the members that host
> a region etc. It is also nice that right now if I add new function that
> functionality becomes available to gfsh, REST, Java, and C++ developers
> automatically.
> 
> I do agree that the new protocol could build in these concepts, and doesn't
> necessarily have to use function execution to achieve the same results. But
> do at least consider whether new developers will want to add new
> functionality to the server via functions or via your this new protocol. If
> it's harder to use the new protocol than to write a new function and invoke
> it from the client, then I think we've done something wrong.
> 
> 
> A couple of other comments, now that I've looked a little more:
> 
> 1) The list of error codes
> <https://cwiki.apache.org/confluence/display/GEODE/RegionAPI#RegionAPI-ErrorCodeDefinitions>
> seems really incomplete. It looks like we've picked a few of the possible
> exceptions geode could throw and assigned them integer ids? What is the
> rational for the exceptions that are included here vs. other exceptions?
> Also, not all messages would need to return these error codes.
> 
> 2) The existing protocol has some functionality even for basic puts that is
> not represented here. Client generate an event id that is associated with
> the put and send that to the server. These event ids are used to guarantee
> that if a client does put (A, 0) followed by put (A, 1), the resulting
> value will always be 1, even if the client timed out and retried put (A,
> 0). The event id prevents the lingered put that timed out on the server
> from affecting the state. I'm not saying the new protocol has to support
> this sort of behavior, but you might want to consider whether the current
> protocol should specify anything about how events are retried.
> 
> -Dan

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Hitesh Khamesra <hi...@yahoo.com.INVALID>.

We have version at api(put, get etc) level https://cwiki.apache.org/confluence/display/GEODE/Message+Structure+and+Definition#MessageStructureandDefinition-RequestHeader.
The client will connect to gemfire server by sending the "byte". That can be treated for message serialization.

      From: Michael Stolz <ms...@pivotal.io>
 To: dev@geode.apache.org 
Cc: Udo Kohlmeyer <uk...@pivotal.io>
 Sent: Wednesday, May 3, 2017 8:55 AM
 Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal
   
I'm not seeing any mention of versioning of the serialization protocol.
Versioning is critical to be able to support change over time. We must
version each layer of serialization. The transport message needs versions,
the payload serialization needs versions.

--
Mike Stolz
Principal Engineer, GemFire Product Manager
Mobile: +1-631-835-4771

On Tue, May 2, 2017 at 8:11 PM, Jacob Barrett <jb...@pivotal.io> wrote:

> I agree completely with Dan. There is no reason to have flags for value
> encoding type in the message. I would argue that should be part of the
> value serialization layer. If something was placed in the message layer it
> should be more generic and allow for an unrestricted set of encodings by
> some ID.
>
> Object {
> variant ID codec;
> byte[] payload;
> }
>
>
> -Jake
>
>
> Sent from my iPhone
>
> > On May 2, 2017, at 1:42 PM, Dan Smith <ds...@pivotal.io> wrote:
> >
> > I guess the value of building other messages on top of the function
> service
> > mostly comes into play when we start talking about smarter clients that
> can
> > do single hop. At that point it's really nice to have have a layer that
> > lets us send a message to a single primary, or all of the members that
> host
> > a region etc. It is also nice that right now if I add new function that
> > functionality becomes available to gfsh, REST, Java, and C++ developers
> > automatically.
> >
> > I do agree that the new protocol could build in these concepts, and
> doesn't
> > necessarily have to use function execution to achieve the same results.
> But
> > do at least consider whether new developers will want to add new
> > functionality to the server via functions or via your this new protocol.
> If
> > it's harder to use the new protocol than to write a new function and
> invoke
> > it from the client, then I think we've done something wrong.
> >
> >
> > A couple of other comments, now that I've looked a little more:
> >
> > 1) The list of error codes
> > <https://cwiki.apache.org/confluence/display/GEODE/RegionAPI#RegionAPI-
> ErrorCodeDefinitions>
> > seems really incomplete. It looks like we've picked a few of the possible
> > exceptions geode could throw and assigned them integer ids? What is the
> > rational for the exceptions that are included here vs. other exceptions?
> > Also, not all messages would need to return these error codes.
> >
> > 2) The existing protocol has some functionality even for basic puts that
> is
> > not represented here. Client generate an event id that is associated with
> > the put and send that to the server. These event ids are used to
> guarantee
> > that if a client does put (A, 0) followed by put (A, 1), the resulting
> > value will always be 1, even if the client timed out and retried put (A,
> > 0). The event id prevents the lingered put that timed out on the server
> > from affecting the state. I'm not saying the new protocol has to support
> > this sort of behavior, but you might want to consider whether the current
> > protocol should specify anything about how events are retried.
> >
> > -Dan
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Michael Stolz <ms...@pivotal.io>.

I'm not seeing any mention of versioning of the serialization protocol.
Versioning is critical to be able to support change over time. We must
version each layer of serialization. The transport message needs versions,
the payload serialization needs versions.

--
Mike Stolz
Principal Engineer, GemFire Product Manager
Mobile: +1-631-835-4771

On Tue, May 2, 2017 at 8:11 PM, Jacob Barrett <jb...@pivotal.io> wrote:

> I agree completely with Dan. There is no reason to have flags for value
> encoding type in the message. I would argue that should be part of the
> value serialization layer. If something was placed in the message layer it
> should be more generic and allow for an unrestricted set of encodings by
> some ID.
>
> Object {
> variant ID codec;
> byte[] payload;
> }
>
>
> -Jake
>
>
> Sent from my iPhone
>
> > On May 2, 2017, at 1:42 PM, Dan Smith <ds...@pivotal.io> wrote:
> >
> > I guess the value of building other messages on top of the function
> service
> > mostly comes into play when we start talking about smarter clients that
> can
> > do single hop. At that point it's really nice to have have a layer that
> > lets us send a message to a single primary, or all of the members that
> host
> > a region etc. It is also nice that right now if I add new function that
> > functionality becomes available to gfsh, REST, Java, and C++ developers
> > automatically.
> >
> > I do agree that the new protocol could build in these concepts, and
> doesn't
> > necessarily have to use function execution to achieve the same results.
> But
> > do at least consider whether new developers will want to add new
> > functionality to the server via functions or via your this new protocol.
> If
> > it's harder to use the new protocol than to write a new function and
> invoke
> > it from the client, then I think we've done something wrong.
> >
> >
> > A couple of other comments, now that I've looked a little more:
> >
> > 1) The list of error codes
> > <https://cwiki.apache.org/confluence/display/GEODE/RegionAPI#RegionAPI-
> ErrorCodeDefinitions>
> > seems really incomplete. It looks like we've picked a few of the possible
> > exceptions geode could throw and assigned them integer ids? What is the
> > rational for the exceptions that are included here vs. other exceptions?
> > Also, not all messages would need to return these error codes.
> >
> > 2) The existing protocol has some functionality even for basic puts that
> is
> > not represented here. Client generate an event id that is associated with
> > the put and send that to the server. These event ids are used to
> guarantee
> > that if a client does put (A, 0) followed by put (A, 1), the resulting
> > value will always be 1, even if the client timed out and retried put (A,
> > 0). The event id prevents the lingered put that timed out on the server
> > from affecting the state. I'm not saying the new protocol has to support
> > this sort of behavior, but you might want to consider whether the current
> > protocol should specify anything about how events are retried.
> >
> > -Dan
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Jacob Barrett <jb...@pivotal.io>.

I agree completely with Dan. There is no reason to have flags for value encoding type in the message. I would argue that should be part of the value serialization layer. If something was placed in the message layer it should be more generic and allow for an unrestricted set of encodings by some ID.

Object {
variant ID codec;
byte[] payload;
}


-Jake


Sent from my iPhone

> On May 2, 2017, at 1:42 PM, Dan Smith <ds...@pivotal.io> wrote:
> 
> I guess the value of building other messages on top of the function service
> mostly comes into play when we start talking about smarter clients that can
> do single hop. At that point it's really nice to have have a layer that
> lets us send a message to a single primary, or all of the members that host
> a region etc. It is also nice that right now if I add new function that
> functionality becomes available to gfsh, REST, Java, and C++ developers
> automatically.
> 
> I do agree that the new protocol could build in these concepts, and doesn't
> necessarily have to use function execution to achieve the same results. But
> do at least consider whether new developers will want to add new
> functionality to the server via functions or via your this new protocol. If
> it's harder to use the new protocol than to write a new function and invoke
> it from the client, then I think we've done something wrong.
> 
> 
> A couple of other comments, now that I've looked a little more:
> 
> 1) The list of error codes
> <https://cwiki.apache.org/confluence/display/GEODE/RegionAPI#RegionAPI-ErrorCodeDefinitions>
> seems really incomplete. It looks like we've picked a few of the possible
> exceptions geode could throw and assigned them integer ids? What is the
> rational for the exceptions that are included here vs. other exceptions?
> Also, not all messages would need to return these error codes.
> 
> 2) The existing protocol has some functionality even for basic puts that is
> not represented here. Client generate an event id that is associated with
> the put and send that to the server. These event ids are used to guarantee
> that if a client does put (A, 0) followed by put (A, 1), the resulting
> value will always be 1, even if the client timed out and retried put (A,
> 0). The event id prevents the lingered put that timed out on the server
> from affecting the state. I'm not saying the new protocol has to support
> this sort of behavior, but you might want to consider whether the current
> protocol should specify anything about how events are retried.
> 
> -Dan

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Dan Smith <ds...@pivotal.io>.

I guess the value of building other messages on top of the function service
mostly comes into play when we start talking about smarter clients that can
do single hop. At that point it's really nice to have have a layer that
lets us send a message to a single primary, or all of the members that host
a region etc. It is also nice that right now if I add new function that
functionality becomes available to gfsh, REST, Java, and C++ developers
automatically.

I do agree that the new protocol could build in these concepts, and doesn't
necessarily have to use function execution to achieve the same results. But
do at least consider whether new developers will want to add new
functionality to the server via functions or via your this new protocol. If
it's harder to use the new protocol than to write a new function and invoke
it from the client, then I think we've done something wrong.


A couple of other comments, now that I've looked a little more:

1) The list of error codes
<https://cwiki.apache.org/confluence/display/GEODE/RegionAPI#RegionAPI-ErrorCodeDefinitions>
seems really incomplete. It looks like we've picked a few of the possible
exceptions geode could throw and assigned them integer ids? What is the
rational for the exceptions that are included here vs. other exceptions?
Also, not all messages would need to return these error codes.

2) The existing protocol has some functionality even for basic puts that is
not represented here. Client generate an event id that is associated with
the put and send that to the server. These event ids are used to guarantee
that if a client does put (A, 0) followed by put (A, 1), the resulting
value will always be 1, even if the client timed out and retried put (A,
0). The event id prevents the lingered put that timed out on the server
from affecting the state. I'm not saying the new protocol has to support
this sort of behavior, but you might want to consider whether the current
protocol should specify anything about how events are retried.

-Dan

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Michael Stolz <ms...@pivotal.io>.

The TCP fragmentation is fine for what it is, but it is *not* paging, and
paging has long been something that we have wanted to get around to.

--
Mike Stolz
Principal Engineer, GemFire Product Manager
Mobile: +1-631-835-4771

On Wed, May 3, 2017 at 1:33 PM, Galen M O'Sullivan <go...@pivotal.io>
wrote:

> On Tue, May 2, 2017 at 11:52 AM, Hitesh Khamesra <
> hiteshk25@yahoo.com.invalid> wrote:
>
> > Absolutely its a implementation detail.
> >
> This doesn't answer Dan's comment. Do you think fragmentation should be
> taken care of by the TCP layer or the protocol should deal with it
> specifically?
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by William Markito Oliveira <wi...@gmail.com>.

+1 for that as well
On Thu, May 4, 2017 at 5:21 PM Dan Smith <ds...@pivotal.io> wrote:

> >
> > I wouldn't tackle that at this layer. I would suggest adding a layer
> > between the message and TCP that creates channels that are opaque to the
> > message layer above. The message layer wouldn't know if it was talking to
> > multiple sockets to the client or single socket with multiple channels.
> >
>
> ++1 on that!
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Dan Smith <ds...@pivotal.io>.

>
> I wouldn't tackle that at this layer. I would suggest adding a layer
> between the message and TCP that creates channels that are opaque to the
> message layer above. The message layer wouldn't know if it was talking to
> multiple sockets to the client or single socket with multiple channels.
>

++1 on that!

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Jacob Barrett <jb...@pivotal.io>.

On Thu, May 4, 2017 at 2:14 PM, Jacob Barrett <jb...@pivotal.io> wrote:

> > > One benefit of messageHeader with chunk is that, it gives us ability to
> > > write different messages(multiplexing) on same socket. And if thread is
> > > ready it can write its message. Though we are not there yet, but that
> will
> > > require for single socket async architecture.
> > >
> >
> > I wouldn't tackle that at this layer. I would suggest adding a layer
> > between the message and TCP that creates channels that are opaque to the
> > message layer above. The message layer wouldn't know if it was talking to
> > multiple sockets to the client or single socket with multiple channels.
>
> Though we haven't really discussed it explicitly, the new protocol is
> adding the correlationID in the expectation that at some point it will be
> possible to execute requests out-of-order. One alternative to correlationID
> if we had multiple channels would be to use one channel per message or set
> of (ordered) messages. This would be a bit expensive if we used separate
> sockets for each, but if we had channels built into the protocol, it would
> be fine. The other use of correlationID might be for retries or to check up
> on a message after some issue leading to a disconnect. However, we have the
> EventID for that.
>
> As far as I know, we don't have the async, out-of-order functionality yet.
> I believe that in the current protocol, messages are ordered and
> synchronous -- they only happen in the order they're sent, and each
> operation blocks for the previous one to finish (though you could use
> multiple connections to get similar functionality).
>

My discussion here was around the concept of interleaving chunks of
individual messages not out of order responses of individual messages. From
the end user's perspective though whether we used correlation IDs or
ordered request/response over channels is irrelevant. At that level the API
should be asynchronous anyway. Underneath we can decide that if a single
channel/socket can have out of order messaging (not to be confused with out
of order partial message interleaving) or correlation ID.

Interleaving meaning
[Req1.1][Req2.1][Res2.1][Req1.2][Res1.1][Res2.2][Res1.2]
Correlation and part index required

Out of order meaning:
[Req1][Req2][Res2][Res2]
Correlation ID required

In order over channel:
1: [Req.1]              [Req.2][Res.1]       [Res.2]
2:        [Req.1][Res.1]              [Res.2]
No correlation or ID required. Each request pulls a channel from a pool.
Naive approach is sockets, advanced is channel sublayer on the stack.

I really think In Order Over makes life easier. Maybe a hybrid approach
exists with Correlation IDs and Channels but really you can solve the same
problem with just channels, just more of them. At a Channel level it would
look synchronous request/response.

-Jake

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Galen M O'Sullivan <go...@pivotal.io>.

I think we should be presenting the current proposal as the API and message
structure, as would be laid out in an IDL. This way, we can experiment with
Protobuf, Thrift serialization, &c. for message structure without having to
exactly specify the message structure. This will make designing a protocol
easier and allow us to leverage existing serialization libraries. If we
ever get into a totally and manually specified binary protocol, charts of
binary representations will be useful, but for now I think using a library
that takes care of serialization for us will make our lives much easier.
I've made some changes (removing type data, removing RequestHeader and
ResponseHeader) that should make the wiki pages clearer.


On Thu, May 4, 2017 at 2:14 PM, Jacob Barrett <jb...@pivotal.io> wrote:
> > One benefit of messageHeader with chunk is that, it gives us ability to
> > write different messages(multiplexing) on same socket. And if thread is
> > ready it can write its message. Though we are not there yet, but that
will
> > require for single socket async architecture.
> >
>
> I wouldn't tackle that at this layer. I would suggest adding a layer
> between the message and TCP that creates channels that are opaque to the
> message layer above. The message layer wouldn't know if it was talking to
> multiple sockets to the client or single socket with multiple channels.

Though we haven't really discussed it explicitly, the new protocol is
adding the correlationID in the expectation that at some point it will be
possible to execute requests out-of-order. One alternative to correlationID
if we had multiple channels would be to use one channel per message or set
of (ordered) messages. This would be a bit expensive if we used separate
sockets for each, but if we had channels built into the protocol, it would
be fine. The other use of correlationID might be for retries or to check up
on a message after some issue leading to a disconnect. However, we have the
EventID for that.

As far as I know, we don't have the async, out-of-order functionality yet.
I believe that in the current protocol, messages are ordered and
synchronous -- they only happen in the order they're sent, and each
operation blocks for the previous one to finish (though you could use
multiple connections to get similar functionality).

Galen

On Mon, May 8, 2017 at 2:55 PM, Ernest Burghardt <eb...@pivotal.io>
wrote:

> +1 William, an even better example - that kind of representation will make
> it so much better/easier for geode users to implement against, regardless
> of language.
>
> On Mon, May 8, 2017 at 2:48 PM, William Markito Oliveira <
> william.markito@gmail.com> wrote:
>
> > +1
> >
> > I think I've shared this before, but Kafka also has good (tabular)
> > representation for messages on their protocol.
> >
> > - https://kafka.apache.org/protocol#protocol_messages
> > - https://kafka.apache.org/protocol#protocol_message_sets
> >
> > On Mon, May 8, 2017 at 4:44 PM, Ernest Burghardt <eb...@pivotal.io>
> > wrote:
> >
> > > Hello Geodes!
> > >
> > > Good discussion on what/how the messages will be/handled once a
> > connection
> > > is established.
> > >
> > > +1 to a simple initial handshake to establish version/supported
> features
> > > that client/server will be communicating.
> > >
> > > From what I've seen so far in the proposal it is missing a definition
> for
> > > the "connection"/disconnect messages.
> > > - expected to see it here:
> > > https://cwiki.apache.org/confluence/display/GEODE/Generic+System+API
> > >
> > > From a protocol perspective, this is currently a pain point for the
> > > geode-native library.
> > >
> > > As Jake mentioned previously, having messages that are class-like and
> > have
> > > a singular job helps client developers by having an explicit protocol
> to
> > > follow.
> > >
> > >
> > > The basic case a developer is going to exercise is to
> connect/disconnect.
> > > How to do this should be straightforward from the start.
> > >
> > > Geode probably does not need a 7 Layer OSI stack, but it might make
> sense
> > > to have a couple layers:
> > >
> > > 1 - transport  (network socket)
> > > 2 - protocol   (version/features)
> > > 3 - messaging (do cluster work)
> > >
> > > e.g.
> > > client library opens a socket to the server (layer 1 - check)
> > > client/server perform handshake and the connection is OPEN (layer 2 -
> > > check)
> > > pipe is open for business, client/server do work freely (layer 3 -
> check)
> > >
> > > When this is sorted out I think a couple simple sequence or activity
> > > diagrams would be very helpful to the visual-spatial folks in the
> > > community.
> > >
> > >
> > > Best,
> > > Ernie
> > >
> > > ps.  one consideration for message definition might be to use a more
> > > tabular presentation of messages followed by any
> > > definitions/cross-referencing... this is an example from a CDMA
> > protocol I
> > > have worked with in the past
> > >
> > > Location assignment message
> > >
> > > Field
> > >
> > > Length (bits)
> > >
> > > MessageID
> > >
> > > 8
> > >
> > > TransactionID
> > >
> > > 8
> > >
> > > LocationType
> > >
> > > 8
> > >
> > > LocationLength
> > >
> > > 8
> > >
> > > LocationValue
> > >
> > > 8 × LocationLength
> > >
> > >    1.
> > >
> > >    MessageID           The access network shall set this field to 0x05.
> > >    2.
> > >
> > >    TransactionID      The access network shall increment this value for
> > >    each new LocationAssignment message sent.
> > >
> > >           LocationType       The access network shall set this field to
> > the
> > > type of the location as specified in Table
> > >
> > >
> > >
> > >
> > >
> > > [image: page144image36968] [image: page144image37392] [image:
> > > page144image37976] [image: page144image38560] [image:
> > > page144image38984] [image:
> > > page144image39408]
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, May 8, 2017 at 7:48 AM, Jacob Barrett <jb...@pivotal.io>
> > wrote:
> > >
> > > > On Fri, May 5, 2017 at 2:09 PM Hitesh Khamesra <hi...@yahoo.com>
> > > > wrote:
> > > >
> > > > >
> > > > > 0. In first phase we are not doing chunking/fragmentation. And even
> > > this
> > > > > will be option for client.(
> > > > > https://cwiki.apache.org/confluence/display/GEODE/
> > > Message+Structure+and+
> > > > Definition#MessageStructureandDefinition-Protocolnegotiation
> > > > > )
> > > > >
> > > >
> > > > I highly suggest initial handshake be more relaxed than specific
> > "version
> > > > number" or flags. Consider sending objects that indicate support for
> > > > features or even a list of feature IDs. At connect server can send
> list
> > > of
> > > > feature IDs to the client. The client can respond with a set of
> feature
> > > IDs
> > > > it supports as well as any metadata associated with them, say default
> > set
> > > > of supported encodings.
> > > >
> > > >
> > > > > 1. Are you refereeing websocket/spdy? But I think we are talking
> > almost
> > > > > same thing, may be push isPartialMessage flag with chunk
> > > length(Anthony's
> > > > > example below) ?
> > > > >
> > > >
> > > > I am not sure what you mean here but if you are talking about
> layering
> > a
> > > > channel protocol handler then I guess yes. The point is that each of
> > > these
> > > > behaviors should be encapsulated in specific layers and not
> intermixed
> > > with
> > > > the message.
> > > >
> > > >
> > > > > 2. That's the part of the problem. Even if you need to serialize
> the
> > > > > "String", you need to write length first and then need to write
> > > > serialized
> > > > > utf bytes. We can implement chunked input stream and can
> de-serialize
> > > the
> > > > > object as it is coming (DataSerializable.fromData(ChunkedStream)).
> > > > >
> > > >
> > > > Right, and in this case the length is never the length of the string,
> > it
> > > is
> > > > the length of the byte encoding of the string. This is not known
> until
> > > the
> > > > encoding is complete. So by chunking we can write the length of
> smaller
> > > > buffers (from buffer pools) as the length of that sequence of bytes,
> > the
> > > > last chunk terminated with length 0. Each of those chunks can be
> based
> > > to a
> > > > UTF-8 to UTF-16 transcoder to create the String.
> > > >
> > > > -Jake
> > > >
> > >
> >
> >
> >
> > --
> > ~/William
> >
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Ernest Burghardt <eb...@pivotal.io>.

+1 William, an even better example - that kind of representation will make
it so much better/easier for geode users to implement against, regardless
of language.

On Mon, May 8, 2017 at 2:48 PM, William Markito Oliveira <
william.markito@gmail.com> wrote:

> +1
>
> I think I've shared this before, but Kafka also has good (tabular)
> representation for messages on their protocol.
>
> - https://kafka.apache.org/protocol#protocol_messages
> - https://kafka.apache.org/protocol#protocol_message_sets
>
> On Mon, May 8, 2017 at 4:44 PM, Ernest Burghardt <eb...@pivotal.io>
> wrote:
>
> > Hello Geodes!
> >
> > Good discussion on what/how the messages will be/handled once a
> connection
> > is established.
> >
> > +1 to a simple initial handshake to establish version/supported features
> > that client/server will be communicating.
> >
> > From what I've seen so far in the proposal it is missing a definition for
> > the "connection"/disconnect messages.
> > - expected to see it here:
> > https://cwiki.apache.org/confluence/display/GEODE/Generic+System+API
> >
> > From a protocol perspective, this is currently a pain point for the
> > geode-native library.
> >
> > As Jake mentioned previously, having messages that are class-like and
> have
> > a singular job helps client developers by having an explicit protocol to
> > follow.
> >
> >
> > The basic case a developer is going to exercise is to connect/disconnect.
> > How to do this should be straightforward from the start.
> >
> > Geode probably does not need a 7 Layer OSI stack, but it might make sense
> > to have a couple layers:
> >
> > 1 - transport  (network socket)
> > 2 - protocol   (version/features)
> > 3 - messaging (do cluster work)
> >
> > e.g.
> > client library opens a socket to the server (layer 1 - check)
> > client/server perform handshake and the connection is OPEN (layer 2 -
> > check)
> > pipe is open for business, client/server do work freely (layer 3 - check)
> >
> > When this is sorted out I think a couple simple sequence or activity
> > diagrams would be very helpful to the visual-spatial folks in the
> > community.
> >
> >
> > Best,
> > Ernie
> >
> > ps.  one consideration for message definition might be to use a more
> > tabular presentation of messages followed by any
> > definitions/cross-referencing... this is an example from a CDMA
> protocol I
> > have worked with in the past
> >
> > Location assignment message
> >
> > Field
> >
> > Length (bits)
> >
> > MessageID
> >
> > 8
> >
> > TransactionID
> >
> > 8
> >
> > LocationType
> >
> > 8
> >
> > LocationLength
> >
> > 8
> >
> > LocationValue
> >
> > 8 × LocationLength
> >
> >    1.
> >
> >    MessageID           The access network shall set this field to 0x05.
> >    2.
> >
> >    TransactionID      The access network shall increment this value for
> >    each new LocationAssignment message sent.
> >
> >           LocationType       The access network shall set this field to
> the
> > type of the location as specified in Table
> >
> >
> >
> >
> >
> > [image: page144image36968] [image: page144image37392] [image:
> > page144image37976] [image: page144image38560] [image:
> > page144image38984] [image:
> > page144image39408]
> >
> >
> >
> >
> >
> >
> >
> > On Mon, May 8, 2017 at 7:48 AM, Jacob Barrett <jb...@pivotal.io>
> wrote:
> >
> > > On Fri, May 5, 2017 at 2:09 PM Hitesh Khamesra <hi...@yahoo.com>
> > > wrote:
> > >
> > > >
> > > > 0. In first phase we are not doing chunking/fragmentation. And even
> > this
> > > > will be option for client.(
> > > > https://cwiki.apache.org/confluence/display/GEODE/
> > Message+Structure+and+
> > > Definition#MessageStructureandDefinition-Protocolnegotiation
> > > > )
> > > >
> > >
> > > I highly suggest initial handshake be more relaxed than specific
> "version
> > > number" or flags. Consider sending objects that indicate support for
> > > features or even a list of feature IDs. At connect server can send list
> > of
> > > feature IDs to the client. The client can respond with a set of feature
> > IDs
> > > it supports as well as any metadata associated with them, say default
> set
> > > of supported encodings.
> > >
> > >
> > > > 1. Are you refereeing websocket/spdy? But I think we are talking
> almost
> > > > same thing, may be push isPartialMessage flag with chunk
> > length(Anthony's
> > > > example below) ?
> > > >
> > >
> > > I am not sure what you mean here but if you are talking about layering
> a
> > > channel protocol handler then I guess yes. The point is that each of
> > these
> > > behaviors should be encapsulated in specific layers and not intermixed
> > with
> > > the message.
> > >
> > >
> > > > 2. That's the part of the problem. Even if you need to serialize the
> > > > "String", you need to write length first and then need to write
> > > serialized
> > > > utf bytes. We can implement chunked input stream and can de-serialize
> > the
> > > > object as it is coming (DataSerializable.fromData(ChunkedStream)).
> > > >
> > >
> > > Right, and in this case the length is never the length of the string,
> it
> > is
> > > the length of the byte encoding of the string. This is not known until
> > the
> > > encoding is complete. So by chunking we can write the length of smaller
> > > buffers (from buffer pools) as the length of that sequence of bytes,
> the
> > > last chunk terminated with length 0. Each of those chunks can be based
> > to a
> > > UTF-8 to UTF-16 transcoder to create the String.
> > >
> > > -Jake
> > >
> >
>
>
>
> --
> ~/William
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by William Markito Oliveira <wi...@gmail.com>.

+1

I think I've shared this before, but Kafka also has good (tabular)
representation for messages on their protocol.

- https://kafka.apache.org/protocol#protocol_messages
- https://kafka.apache.org/protocol#protocol_message_sets

On Mon, May 8, 2017 at 4:44 PM, Ernest Burghardt <eb...@pivotal.io>
wrote:

> Hello Geodes!
>
> Good discussion on what/how the messages will be/handled once a connection
> is established.
>
> +1 to a simple initial handshake to establish version/supported features
> that client/server will be communicating.
>
> From what I've seen so far in the proposal it is missing a definition for
> the "connection"/disconnect messages.
> - expected to see it here:
> https://cwiki.apache.org/confluence/display/GEODE/Generic+System+API
>
> From a protocol perspective, this is currently a pain point for the
> geode-native library.
>
> As Jake mentioned previously, having messages that are class-like and have
> a singular job helps client developers by having an explicit protocol to
> follow.
>
>
> The basic case a developer is going to exercise is to connect/disconnect.
> How to do this should be straightforward from the start.
>
> Geode probably does not need a 7 Layer OSI stack, but it might make sense
> to have a couple layers:
>
> 1 - transport  (network socket)
> 2 - protocol   (version/features)
> 3 - messaging (do cluster work)
>
> e.g.
> client library opens a socket to the server (layer 1 - check)
> client/server perform handshake and the connection is OPEN (layer 2 -
> check)
> pipe is open for business, client/server do work freely (layer 3 - check)
>
> When this is sorted out I think a couple simple sequence or activity
> diagrams would be very helpful to the visual-spatial folks in the
> community.
>
>
> Best,
> Ernie
>
> ps.  one consideration for message definition might be to use a more
> tabular presentation of messages followed by any
> definitions/cross-referencing... this is an example from a CDMA protocol I
> have worked with in the past
>
> Location assignment message
>
> Field
>
> Length (bits)
>
> MessageID
>
> 8
>
> TransactionID
>
> 8
>
> LocationType
>
> 8
>
> LocationLength
>
> 8
>
> LocationValue
>
> 8 × LocationLength
>
>    1.
>
>    MessageID           The access network shall set this field to 0x05.
>    2.
>
>    TransactionID      The access network shall increment this value for
>    each new LocationAssignment message sent.
>
>           LocationType       The access network shall set this field to the
> type of the location as specified in Table
>
>
>
>
>
> [image: page144image36968] [image: page144image37392] [image:
> page144image37976] [image: page144image38560] [image:
> page144image38984] [image:
> page144image39408]
>
>
>
>
>
>
>
> On Mon, May 8, 2017 at 7:48 AM, Jacob Barrett <jb...@pivotal.io> wrote:
>
> > On Fri, May 5, 2017 at 2:09 PM Hitesh Khamesra <hi...@yahoo.com>
> > wrote:
> >
> > >
> > > 0. In first phase we are not doing chunking/fragmentation. And even
> this
> > > will be option for client.(
> > > https://cwiki.apache.org/confluence/display/GEODE/
> Message+Structure+and+
> > Definition#MessageStructureandDefinition-Protocolnegotiation
> > > )
> > >
> >
> > I highly suggest initial handshake be more relaxed than specific "version
> > number" or flags. Consider sending objects that indicate support for
> > features or even a list of feature IDs. At connect server can send list
> of
> > feature IDs to the client. The client can respond with a set of feature
> IDs
> > it supports as well as any metadata associated with them, say default set
> > of supported encodings.
> >
> >
> > > 1. Are you refereeing websocket/spdy? But I think we are talking almost
> > > same thing, may be push isPartialMessage flag with chunk
> length(Anthony's
> > > example below) ?
> > >
> >
> > I am not sure what you mean here but if you are talking about layering a
> > channel protocol handler then I guess yes. The point is that each of
> these
> > behaviors should be encapsulated in specific layers and not intermixed
> with
> > the message.
> >
> >
> > > 2. That's the part of the problem. Even if you need to serialize the
> > > "String", you need to write length first and then need to write
> > serialized
> > > utf bytes. We can implement chunked input stream and can de-serialize
> the
> > > object as it is coming (DataSerializable.fromData(ChunkedStream)).
> > >
> >
> > Right, and in this case the length is never the length of the string, it
> is
> > the length of the byte encoding of the string. This is not known until
> the
> > encoding is complete. So by chunking we can write the length of smaller
> > buffers (from buffer pools) as the length of that sequence of bytes, the
> > last chunk terminated with length 0. Each of those chunks can be based
> to a
> > UTF-8 to UTF-16 transcoder to create the String.
> >
> > -Jake
> >
>



-- 
~/William

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Ernest Burghardt <eb...@pivotal.io>.

Hello Geodes!

Good discussion on what/how the messages will be/handled once a connection
is established.

+1 to a simple initial handshake to establish version/supported features
that client/server will be communicating.

From what I've seen so far in the proposal it is missing a definition for
the "connection"/disconnect messages.
- expected to see it here:
https://cwiki.apache.org/confluence/display/GEODE/Generic+System+API

From a protocol perspective, this is currently a pain point for the
geode-native library.

As Jake mentioned previously, having messages that are class-like and have
a singular job helps client developers by having an explicit protocol to
follow.

The basic case a developer is going to exercise is to connect/disconnect.
How to do this should be straightforward from the start.

Geode probably does not need a 7 Layer OSI stack, but it might make sense
to have a couple layers:

1 - transport  (network socket)
2 - protocol   (version/features)
3 - messaging (do cluster work)

e.g.
client library opens a socket to the server (layer 1 - check)
client/server perform handshake and the connection is OPEN (layer 2 - check)
pipe is open for business, client/server do work freely (layer 3 - check)

When this is sorted out I think a couple simple sequence or activity
diagrams would be very helpful to the visual-spatial folks in the community.

Best,
Ernie

ps.  one consideration for message definition might be to use a more
tabular presentation of messages followed by any
definitions/cross-referencing... this is an example from a CDMA protocol I
have worked with in the past

Location assignment message

Field

Length (bits)

MessageID

8

TransactionID

8

LocationType

8

LocationLength

8

LocationValue

8 × LocationLength

   1.

   MessageID           The access network shall set this field to 0x05.
   2.

   TransactionID      The access network shall increment this value for
   each new LocationAssignment message sent.

          LocationType       The access network shall set this field to the
type of the location as specified in Table

[image: page144image36968] [image: page144image37392] [image:
page144image37976] [image: page144image38560] [image:
page144image38984] [image:
page144image39408]

On Mon, May 8, 2017 at 7:48 AM, Jacob Barrett <jb...@pivotal.io> wrote:

> On Fri, May 5, 2017 at 2:09 PM Hitesh Khamesra <hi...@yahoo.com>
> wrote:
>
> >
> > 0. In first phase we are not doing chunking/fragmentation. And even this
> > will be option for client.(
> > https://cwiki.apache.org/confluence/display/GEODE/Message+Structure+and+
> Definition#MessageStructureandDefinition-Protocolnegotiation
> > )
> >
>
> I highly suggest initial handshake be more relaxed than specific "version
> number" or flags. Consider sending objects that indicate support for
> features or even a list of feature IDs. At connect server can send list of
> feature IDs to the client. The client can respond with a set of feature IDs
> it supports as well as any metadata associated with them, say default set
> of supported encodings.
>
>
> > 1. Are you refereeing websocket/spdy? But I think we are talking almost
> > same thing, may be push isPartialMessage flag with chunk length(Anthony's
> > example below) ?
> >
>
> I am not sure what you mean here but if you are talking about layering a
> channel protocol handler then I guess yes. The point is that each of these
> behaviors should be encapsulated in specific layers and not intermixed with
> the message.
>
>
> > 2. That's the part of the problem. Even if you need to serialize the
> > "String", you need to write length first and then need to write
> serialized
> > utf bytes. We can implement chunked input stream and can de-serialize the
> > object as it is coming (DataSerializable.fromData(ChunkedStream)).
> >
>
> Right, and in this case the length is never the length of the string, it is
> the length of the byte encoding of the string. This is not known until the
> encoding is complete. So by chunking we can write the length of smaller
> buffers (from buffer pools) as the length of that sequence of bytes, the
> last chunk terminated with length 0. Each of those chunks can be based to a
> UTF-8 to UTF-16 transcoder to create the String.
>
> -Jake
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Jacob Barrett <jb...@pivotal.io>.

On Fri, May 5, 2017 at 2:09 PM Hitesh Khamesra <hi...@yahoo.com> wrote:

>
> 0. In first phase we are not doing chunking/fragmentation. And even this
> will be option for client.(
> https://cwiki.apache.org/confluence/display/GEODE/Message+Structure+and+Definition#MessageStructureandDefinition-Protocolnegotiation
> )
>

I highly suggest initial handshake be more relaxed than specific "version
number" or flags. Consider sending objects that indicate support for
features or even a list of feature IDs. At connect server can send list of
feature IDs to the client. The client can respond with a set of feature IDs
it supports as well as any metadata associated with them, say default set
of supported encodings.

> 1. Are you refereeing websocket/spdy? But I think we are talking almost
> same thing, may be push isPartialMessage flag with chunk length(Anthony's
> example below) ?
>

I am not sure what you mean here but if you are talking about layering a
channel protocol handler then I guess yes. The point is that each of these
behaviors should be encapsulated in specific layers and not intermixed with
the message.

> 2. That's the part of the problem. Even if you need to serialize the
> "String", you need to write length first and then need to write serialized
> utf bytes. We can implement chunked input stream and can de-serialize the
> object as it is coming (DataSerializable.fromData(ChunkedStream)).
>

Right, and in this case the length is never the length of the string, it is
the length of the byte encoding of the string. This is not known until the
encoding is complete. So by chunking we can write the length of smaller
buffers (from buffer pools) as the length of that sequence of bytes, the
last chunk terminated with length 0. Each of those chunks can be based to a
UTF-8 to UTF-16 transcoder to create the String.

-Jake

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Hitesh Khamesra <hi...@yahoo.com.INVALID>.

0. In first phase we are not doing chunking/fragmentation. And even this will be option for client.(https://cwiki.apache.org/confluence/display/GEODE/Message+Structure+and+Definition#MessageStructureandDefinition-Protocolnegotiation)
1. Are you refereeing websocket/spdy? But I think we are talking almost same thing, may be push isPartialMessage flag with chunk length(Anthony's example below) ?
2. That's the part of the problem. Even if you need to serialize the "String", you need to write length first and then need to write serialized utf bytes. We can implement chunked input stream and can de-serialize the object as it is coming (DataSerializable.fromData(ChunkedStream)). 

      From: Jacob Barrett <jb...@pivotal.io>
 To: dev@geode.apache.org; Hitesh Khamesra <hi...@yahoo.com> 
Cc: Anthony Baker <ab...@pivotal.io>
 Sent: Friday, May 5, 2017 7:29 AM
 Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal

On Thu, May 4, 2017 at 2:52 PM Hitesh Khamesra <hi...@yahoo.com.invalid> wrote:

Basically, thread/layer should not hold any resources while serializing the object or chunk.  We should be able to see this flow (ms1-chunk1, msg2-chunk1, ms1-chunk2, msg3-chunk, msg2-chunk2, so on ...)

Correct, but putting that in the message layer is not appropriate. The simple solution is that the multiple channels can be achieved with multiple sockets. The later optimization is to add a channel multiplexer layer between the message and socket layers. 
If we put it in the message layer, not only does it for the message to tackle something it shouldn't be concerned with, reassembling itself, but it also forces all implementors to tackle this logic up front. By layering we can release without, implementors aren't forced into understanding the logic, and later we can release the layers and the client can negotiate.

On other pdx note: to de-serialize the pdx we need length of serialized bytes, so that we can read field offset from serialized stream, and then can read field value. Though, I can imagine with the help of pdxType, we can interpret serialized stream.

Yes, so today PDX serialization would be no worse, the PDX serializer would have to buffer, but other may not have to. The length of the buffered PDX could be used as the first chunk length and complete in single chunk. Although, I suspect that amortized overhead of splitting the chunks  will be nil anyway. 
The point is that the message encoding of values should NOT have any unbounded length fields and require long or many buffers to complete serialization. By chunking you can accomplish this by not needing to buffer the whole stream, just small (say 1k), chunks at a time to get the chunk length. 
Buffers == Latency
-Jake

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Jacob Barrett <jb...@pivotal.io>.

I would leave it for a later optimization.

Sent from my iPhone

> On May 5, 2017, at 9:08 AM, Bruce Schuchardt <bs...@pivotal.io> wrote:
> 
> This is very similar to how peer-to-peer messaging is performed in Geode.  Messages are serialized to a stream that knows how to optimally "chunk" the bytes into fixed-size packets.  On the receiving side these are fed into a similar input stream for deserialization.  The message only contains information about the operation it represents.
> 
> Why don't we do something similar for the new client/server protocol?
> 
>> Le 5/5/2017 à 7:28 AM, Jacob Barrett a écrit :
>> On Thu, May 4, 2017 at 2:52 PM Hitesh Khamesra <hi...@yahoo.com.invalid>
>> wrote:
>> 
>>> Basically, thread/layer should not hold any resources while serializing
>>> the object or chunk.  We should be able to see this flow (ms1-chunk1,
>>> msg2-chunk1, ms1-chunk2, msg3-chunk, msg2-chunk2, so on ...)
>>> 
>> Correct, but putting that in the message layer is not appropriate. The
>> simple solution is that the multiple channels can be achieved with multiple
>> sockets. The later optimization is to add a channel multiplexer layer
>> between the message and socket layers.
>> 
>> If we put it in the message layer, not only does it for the message to
>> tackle something it shouldn't be concerned with, reassembling itself, but
>> it also forces all implementors to tackle this logic up front. By layering
>> we can release without, implementors aren't forced into understanding the
>> logic, and later we can release the layers and the client can negotiate.
>> 
>> 
>> 
>>> On other pdx note: to de-serialize the pdx we need length of serialized
>>> bytes, so that we can read field offset from serialized stream, and then
>>> can read field value. Though, I can imagine with the help of pdxType, we
>>> can interpret serialized stream.
>>> 
>> Yes, so today PDX serialization would be no worse, the PDX serializer would
>> have to buffer, but other may not have to. The length of the buffered PDX
>> could be used as the first chunk length and complete in single chunk.
>> Although, I suspect that amortized overhead of splitting the chunks  will
>> be nil anyway.
>> 
>> The point is that the message encoding of values should NOT have any
>> unbounded length fields and require long or many buffers to complete
>> serialization. By chunking you can accomplish this by not needing to buffer
>> the whole stream, just small (say 1k), chunks at a time to get the chunk
>> length.
>> 
>> Buffers == Latency
>> 
>> -Jake
>> 
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Jacob Barrett <jb...@pivotal.io>.

It does! 

Both fragmenting and multiple channels as multiple sockets. 

Sent from my iPhone

> On May 5, 2017, at 10:33 AM, Galen M O'Sullivan <go...@pivotal.io> wrote:
> 
> I think TCP does exactly this for us.
> 
> On Fri, May 5, 2017 at 9:08 AM, Bruce Schuchardt <bs...@pivotal.io>
> wrote:
> 
>> This is very similar to how peer-to-peer messaging is performed in Geode.
>> Messages are serialized to a stream that knows how to optimally "chunk" the
>> bytes into fixed-size packets.  On the receiving side these are fed into a
>> similar input stream for deserialization.  The message only contains
>> information about the operation it represents.
>> 
>> Why don't we do something similar for the new client/server protocol?
>> 
>> 
>>> Le 5/5/2017 à 7:28 AM, Jacob Barrett a écrit :
>>> 
>>> On Thu, May 4, 2017 at 2:52 PM Hitesh Khamesra
>>> <hi...@yahoo.com.invalid>
>>> wrote:
>>> 
>>> Basically, thread/layer should not hold any resources while serializing
>>>> the object or chunk.  We should be able to see this flow (ms1-chunk1,
>>>> msg2-chunk1, ms1-chunk2, msg3-chunk, msg2-chunk2, so on ...)
>>>> 
>>>> Correct, but putting that in the message layer is not appropriate. The
>>> simple solution is that the multiple channels can be achieved with
>>> multiple
>>> sockets. The later optimization is to add a channel multiplexer layer
>>> between the message and socket layers.
>>> 
>>> If we put it in the message layer, not only does it for the message to
>>> tackle something it shouldn't be concerned with, reassembling itself, but
>>> it also forces all implementors to tackle this logic up front. By layering
>>> we can release without, implementors aren't forced into understanding the
>>> logic, and later we can release the layers and the client can negotiate.
>>> 
>>> 
>>> 
>>> On other pdx note: to de-serialize the pdx we need length of serialized
>>>> bytes, so that we can read field offset from serialized stream, and then
>>>> can read field value. Though, I can imagine with the help of pdxType, we
>>>> can interpret serialized stream.
>>>> 
>>>> Yes, so today PDX serialization would be no worse, the PDX serializer
>>> would
>>> have to buffer, but other may not have to. The length of the buffered PDX
>>> could be used as the first chunk length and complete in single chunk.
>>> Although, I suspect that amortized overhead of splitting the chunks  will
>>> be nil anyway.
>>> 
>>> The point is that the message encoding of values should NOT have any
>>> unbounded length fields and require long or many buffers to complete
>>> serialization. By chunking you can accomplish this by not needing to
>>> buffer
>>> the whole stream, just small (say 1k), chunks at a time to get the chunk
>>> length.
>>> 
>>> Buffers == Latency
>>> 
>>> -Jake
>>> 
>>> 
>>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Jacob Barrett <jb...@pivotal.io>.

In either case you packetize (serialize the message protocol) to buffers (fixed sizes and pooled) and flush buffers to the socket. Preferably using a async socket framework to do all the heavy lifting for you.


Sent from my iPhone

> On May 5, 2017, at 11:07 AM, Bruce Schuchardt <bs...@pivotal.io> wrote:
> 
> Yes, of course it does but we don't serialize directly to a socket output stream because it's slow.  I agree that this could be left out and added later as an optimization.
> 
>> Le 5/5/2017 à 10:33 AM, Galen M O'Sullivan a écrit :
>> I think TCP does exactly this for us.
>> 
>> On Fri, May 5, 2017 at 9:08 AM, Bruce Schuchardt <bs...@pivotal.io>
>> wrote:
>> 
>>> This is very similar to how peer-to-peer messaging is performed in Geode.
>>> Messages are serialized to a stream that knows how to optimally "chunk" the
>>> bytes into fixed-size packets.  On the receiving side these are fed into a
>>> similar input stream for deserialization.  The message only contains
>>> information about the operation it represents.
>>> 
>>> Why don't we do something similar for the new client/server protocol?
>>> 
>>> 
>>>> Le 5/5/2017 à 7:28 AM, Jacob Barrett a écrit :
>>>> 
>>>> On Thu, May 4, 2017 at 2:52 PM Hitesh Khamesra
>>>> <hi...@yahoo.com.invalid>
>>>> wrote:
>>>> 
>>>> Basically, thread/layer should not hold any resources while serializing
>>>>> the object or chunk.  We should be able to see this flow (ms1-chunk1,
>>>>> msg2-chunk1, ms1-chunk2, msg3-chunk, msg2-chunk2, so on ...)
>>>>> 
>>>>> Correct, but putting that in the message layer is not appropriate. The
>>>> simple solution is that the multiple channels can be achieved with
>>>> multiple
>>>> sockets. The later optimization is to add a channel multiplexer layer
>>>> between the message and socket layers.
>>>> 
>>>> If we put it in the message layer, not only does it for the message to
>>>> tackle something it shouldn't be concerned with, reassembling itself, but
>>>> it also forces all implementors to tackle this logic up front. By layering
>>>> we can release without, implementors aren't forced into understanding the
>>>> logic, and later we can release the layers and the client can negotiate.
>>>> 
>>>> 
>>>> 
>>>> On other pdx note: to de-serialize the pdx we need length of serialized
>>>>> bytes, so that we can read field offset from serialized stream, and then
>>>>> can read field value. Though, I can imagine with the help of pdxType, we
>>>>> can interpret serialized stream.
>>>>> 
>>>>> Yes, so today PDX serialization would be no worse, the PDX serializer
>>>> would
>>>> have to buffer, but other may not have to. The length of the buffered PDX
>>>> could be used as the first chunk length and complete in single chunk.
>>>> Although, I suspect that amortized overhead of splitting the chunks  will
>>>> be nil anyway.
>>>> 
>>>> The point is that the message encoding of values should NOT have any
>>>> unbounded length fields and require long or many buffers to complete
>>>> serialization. By chunking you can accomplish this by not needing to
>>>> buffer
>>>> the whole stream, just small (say 1k), chunks at a time to get the chunk
>>>> length.
>>>> 
>>>> Buffers == Latency
>>>> 
>>>> -Jake
>>>> 
>>>> 
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Bruce Schuchardt <bs...@pivotal.io>.

Yes, of course it does but we don't serialize directly to a socket 
output stream because it's slow.  I agree that this could be left out 
and added later as an optimization.

Le 5/5/2017 à 10:33 AM, Galen M O'Sullivan a écrit :
> I think TCP does exactly this for us.
>
> On Fri, May 5, 2017 at 9:08 AM, Bruce Schuchardt <bs...@pivotal.io>
> wrote:
>
>> This is very similar to how peer-to-peer messaging is performed in Geode.
>> Messages are serialized to a stream that knows how to optimally "chunk" the
>> bytes into fixed-size packets.  On the receiving side these are fed into a
>> similar input stream for deserialization.  The message only contains
>> information about the operation it represents.
>>
>> Why don't we do something similar for the new client/server protocol?
>>
>>
>> Le 5/5/2017 à 7:28 AM, Jacob Barrett a écrit :
>>
>>> On Thu, May 4, 2017 at 2:52 PM Hitesh Khamesra
>>> <hi...@yahoo.com.invalid>
>>> wrote:
>>>
>>> Basically, thread/layer should not hold any resources while serializing
>>>> the object or chunk.  We should be able to see this flow (ms1-chunk1,
>>>> msg2-chunk1, ms1-chunk2, msg3-chunk, msg2-chunk2, so on ...)
>>>>
>>>> Correct, but putting that in the message layer is not appropriate. The
>>> simple solution is that the multiple channels can be achieved with
>>> multiple
>>> sockets. The later optimization is to add a channel multiplexer layer
>>> between the message and socket layers.
>>>
>>> If we put it in the message layer, not only does it for the message to
>>> tackle something it shouldn't be concerned with, reassembling itself, but
>>> it also forces all implementors to tackle this logic up front. By layering
>>> we can release without, implementors aren't forced into understanding the
>>> logic, and later we can release the layers and the client can negotiate.
>>>
>>>
>>>
>>> On other pdx note: to de-serialize the pdx we need length of serialized
>>>> bytes, so that we can read field offset from serialized stream, and then
>>>> can read field value. Though, I can imagine with the help of pdxType, we
>>>> can interpret serialized stream.
>>>>
>>>> Yes, so today PDX serialization would be no worse, the PDX serializer
>>> would
>>> have to buffer, but other may not have to. The length of the buffered PDX
>>> could be used as the first chunk length and complete in single chunk.
>>> Although, I suspect that amortized overhead of splitting the chunks  will
>>> be nil anyway.
>>>
>>> The point is that the message encoding of values should NOT have any
>>> unbounded length fields and require long or many buffers to complete
>>> serialization. By chunking you can accomplish this by not needing to
>>> buffer
>>> the whole stream, just small (say 1k), chunks at a time to get the chunk
>>> length.
>>>
>>> Buffers == Latency
>>>
>>> -Jake
>>>
>>>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Galen M O'Sullivan <go...@pivotal.io>.

I think TCP does exactly this for us.

On Fri, May 5, 2017 at 9:08 AM, Bruce Schuchardt <bs...@pivotal.io>
wrote:

> This is very similar to how peer-to-peer messaging is performed in Geode.
> Messages are serialized to a stream that knows how to optimally "chunk" the
> bytes into fixed-size packets.  On the receiving side these are fed into a
> similar input stream for deserialization.  The message only contains
> information about the operation it represents.
>
> Why don't we do something similar for the new client/server protocol?
>
>
> Le 5/5/2017 à 7:28 AM, Jacob Barrett a écrit :
>
>> On Thu, May 4, 2017 at 2:52 PM Hitesh Khamesra
>> <hi...@yahoo.com.invalid>
>> wrote:
>>
>> Basically, thread/layer should not hold any resources while serializing
>>> the object or chunk.  We should be able to see this flow (ms1-chunk1,
>>> msg2-chunk1, ms1-chunk2, msg3-chunk, msg2-chunk2, so on ...)
>>>
>>> Correct, but putting that in the message layer is not appropriate. The
>> simple solution is that the multiple channels can be achieved with
>> multiple
>> sockets. The later optimization is to add a channel multiplexer layer
>> between the message and socket layers.
>>
>> If we put it in the message layer, not only does it for the message to
>> tackle something it shouldn't be concerned with, reassembling itself, but
>> it also forces all implementors to tackle this logic up front. By layering
>> we can release without, implementors aren't forced into understanding the
>> logic, and later we can release the layers and the client can negotiate.
>>
>>
>>
>> On other pdx note: to de-serialize the pdx we need length of serialized
>>> bytes, so that we can read field offset from serialized stream, and then
>>> can read field value. Though, I can imagine with the help of pdxType, we
>>> can interpret serialized stream.
>>>
>>> Yes, so today PDX serialization would be no worse, the PDX serializer
>> would
>> have to buffer, but other may not have to. The length of the buffered PDX
>> could be used as the first chunk length and complete in single chunk.
>> Although, I suspect that amortized overhead of splitting the chunks  will
>> be nil anyway.
>>
>> The point is that the message encoding of values should NOT have any
>> unbounded length fields and require long or many buffers to complete
>> serialization. By chunking you can accomplish this by not needing to
>> buffer
>> the whole stream, just small (say 1k), chunks at a time to get the chunk
>> length.
>>
>> Buffers == Latency
>>
>> -Jake
>>
>>
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Bruce Schuchardt <bs...@pivotal.io>.

This is very similar to how peer-to-peer messaging is performed in 
Geode.  Messages are serialized to a stream that knows how to optimally 
"chunk" the bytes into fixed-size packets.  On the receiving side these 
are fed into a similar input stream for deserialization.  The message 
only contains information about the operation it represents.

Why don't we do something similar for the new client/server protocol?

Le 5/5/2017 à 7:28 AM, Jacob Barrett a écrit :
> On Thu, May 4, 2017 at 2:52 PM Hitesh Khamesra <hi...@yahoo.com.invalid>
> wrote:
>
>> Basically, thread/layer should not hold any resources while serializing
>> the object or chunk.  We should be able to see this flow (ms1-chunk1,
>> msg2-chunk1, ms1-chunk2, msg3-chunk, msg2-chunk2, so on ...)
>>
> Correct, but putting that in the message layer is not appropriate. The
> simple solution is that the multiple channels can be achieved with multiple
> sockets. The later optimization is to add a channel multiplexer layer
> between the message and socket layers.
>
> If we put it in the message layer, not only does it for the message to
> tackle something it shouldn't be concerned with, reassembling itself, but
> it also forces all implementors to tackle this logic up front. By layering
> we can release without, implementors aren't forced into understanding the
> logic, and later we can release the layers and the client can negotiate.
>
>
>
>> On other pdx note: to de-serialize the pdx we need length of serialized
>> bytes, so that we can read field offset from serialized stream, and then
>> can read field value. Though, I can imagine with the help of pdxType, we
>> can interpret serialized stream.
>>
> Yes, so today PDX serialization would be no worse, the PDX serializer would
> have to buffer, but other may not have to. The length of the buffered PDX
> could be used as the first chunk length and complete in single chunk.
> Although, I suspect that amortized overhead of splitting the chunks  will
> be nil anyway.
>
> The point is that the message encoding of values should NOT have any
> unbounded length fields and require long or many buffers to complete
> serialization. By chunking you can accomplish this by not needing to buffer
> the whole stream, just small (say 1k), chunks at a time to get the chunk
> length.
>
> Buffers == Latency
>
> -Jake
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Jacob Barrett <jb...@pivotal.io>.

On Thu, May 4, 2017 at 2:52 PM Hitesh Khamesra <hi...@yahoo.com.invalid>
wrote:

> Basically, thread/layer should not hold any resources while serializing
> the object or chunk.  We should be able to see this flow (ms1-chunk1,
> msg2-chunk1, ms1-chunk2, msg3-chunk, msg2-chunk2, so on ...)
>

Correct, but putting that in the message layer is not appropriate. The
simple solution is that the multiple channels can be achieved with multiple
sockets. The later optimization is to add a channel multiplexer layer
between the message and socket layers.

If we put it in the message layer, not only does it for the message to
tackle something it shouldn't be concerned with, reassembling itself, but
it also forces all implementors to tackle this logic up front. By layering
we can release without, implementors aren't forced into understanding the
logic, and later we can release the layers and the client can negotiate.

> On other pdx note: to de-serialize the pdx we need length of serialized
> bytes, so that we can read field offset from serialized stream, and then
> can read field value. Though, I can imagine with the help of pdxType, we
> can interpret serialized stream.
>

Yes, so today PDX serialization would be no worse, the PDX serializer would
have to buffer, but other may not have to. The length of the buffered PDX
could be used as the first chunk length and complete in single chunk.
Although, I suspect that amortized overhead of splitting the chunks  will
be nil anyway.

The point is that the message encoding of values should NOT have any
unbounded length fields and require long or many buffers to complete
serialization. By chunking you can accomplish this by not needing to buffer
the whole stream, just small (say 1k), chunks at a time to get the chunk
length.

Buffers == Latency

-Jake

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Hitesh Khamesra <hi...@yahoo.com.INVALID>.

Basically, thread/layer should not hold any resources while serializing the object or chunk.  We should be able to see this flow (ms1-chunk1, msg2-chunk1, ms1-chunk2, msg3-chunk, msg2-chunk2, so on ...)



On other pdx note: to de-serialize the pdx we need length of serialized bytes, so that we can read field offset from serialized stream, and then can read field value. Though, I can imagine with the help of pdxType, we can interpret serialized stream.





________________________________
From: Jacob Barrett <jb...@pivotal.io>
To: dev@geode.apache.org; Hitesh Khamesra <hi...@yahoo.com> 
Cc: Anthony Baker <ab...@pivotal.io>
Sent: Thursday, May 4, 2017 2:14 PM
Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal



> One benefit of messageHeader with chunk is that, it gives us ability to
> write different messages(multiplexing) on same socket. And if thread is
> ready it can write its message. Though we are not there yet, but that will
> require for single socket async architecture.
>

I wouldn't tackle that at this layer. I would suggest adding a layer
between the message and TCP that creates channels that are opaque to the
message layer above. The message layer wouldn't know if it was talking to
multiple sockets to the client or single socket with multiple channels.


-Jake

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Jacob Barrett <jb...@pivotal.io>.

> One benefit of messageHeader with chunk is that, it gives us ability to
> write different messages(multiplexing) on same socket. And if thread is
> ready it can write its message. Though we are not there yet, but that will
> require for single socket async architecture.
>

I wouldn't tackle that at this layer. I would suggest adding a layer
between the message and TCP that creates channels that are opaque to the
message layer above. The message layer wouldn't know if it was talking to
multiple sockets to the client or single socket with multiple channels.

-Jake

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Hitesh Khamesra <hi...@yahoo.com.INVALID>.

>>> a. Now these two chunks will go continuous. 

>>They would appear continuous to the object serialization layer.

One benefit of messageHeader with chunk is that, it gives us ability to write different messages(multiplexing) on same socket. And if thread is ready it can write its message. Though we are not there yet, but that will require for single socket async architecture.

________________________________
From: Jacob Barrett <jb...@pivotal.io>
To: dev@geode.apache.org 
Cc: Anthony Baker <ab...@pivotal.io>
Sent: Thursday, May 4, 2017 12:48 PM
Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal

Sent from my iPhone

> On May 4, 2017, at 12:03 PM, Hitesh Khamesra <hi...@yahoo.com.INVALID> wrote:
> 
> And len 0 would indicate end of the message? 
> 
> 
> a. Now these two chunks will go continuous. 

They would appear continuous to the object serialization layer.

> 
> 
> b. If its PDX encoded then pdx header(1byte:pdxid 4byte:len 4byte:typeId) requires size of all pdx serialized bytes. So we know "size" of data upfront here. 

We could define the InputSource for the Value part such that InputSource.getLength() could return a known length or -1 if length is unknown. If length is reasonable then the object could be encoded with a single chuck of size InputSource.getLength() followed by a 0 chunk. 

Clients are likely dealing with domain objects where the serialize length is not known until serialization is complete. This would require buffering to get the length. Buffering adds heap pressure and latency.

-Jake

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Jacob Barrett <jb...@pivotal.io>.

Sent from my iPhone

> On May 4, 2017, at 12:03 PM, Hitesh Khamesra <hi...@yahoo.com.INVALID> wrote:
> 
> And len 0 would indicate end of the message? 
> 
> 
> a. Now these two chunks will go continuous. 

They would appear continuous to the object serialization layer.

> 
> 
> b. If its PDX encoded then pdx header(1byte:pdxid 4byte:len 4byte:typeId) requires size of all pdx serialized bytes. So we know "size" of data upfront here. 

We could define the InputSource for the Value part such that InputSource.getLength() could return a known length or -1 if length is unknown. If length is reasonable then the object could be encoded with a single chuck of size InputSource.getLength() followed by a 0 chunk. 

Clients are likely dealing with domain objects where the serialize length is not known until serialization is complete. This would require buffering to get the length. Buffering adds heap pressure and latency.

-Jake

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Hitesh Khamesra <hi...@yahoo.com.INVALID>.

And len 0 would indicate end of the message? 


a. Now these two chunks will go continuous. 


b. If its PDX encoded then pdx header(1byte:pdxid 4byte:len 4byte:typeId) requires size of all pdx serialized bytes. So we know "size" of data upfront here. 


c. Lets say region value is just long byte[]:  Then we have "size" to send the message.


So in both cases we know the "size" of serialized bytes(payload). So possibly we don't need to chunk that message and let tcp take care of it?


It seems we should walk through some more usecases to understand this better.



Thanks.
Hitesh



________________________________
From: Anthony Baker <ab...@pivotal.io>
To: Hitesh Khamesra <hi...@yahoo.com> 
Cc: "dev@geode.apache.org" <de...@geode.apache.org>
Sent: Thursday, May 4, 2017 11:20 AM
Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal



There would be one Message containing a single MessageHeader and a single MessageBody.  A PDX EncodedValue containing 1242 bytes that are chunked would look something like this:

PDX 1000 byte[1000] 242 byte[242] 0


Anthony



> On May 4, 2017, at 10:38 AM, Hitesh Khamesra <hi...@yahoo.com> wrote:
> 
> Hi Anthony:
> 
> Help me to understand data chunking here?
> 
>>> bytes => arbitrary byte[] that can be chunked
> 
> Message => MessageHeader MessageBody
> 
> So lets say we want to send long byte[] into two chunks, then we will send two messages? And other side will combine those two messages using "correlationId" ?
> 
> Thanks.
> HItesh
> 
> 
> 
> 
> ________________________________
> From: Anthony Baker <ab...@pivotal.io>
> To: dev@geode.apache.org 
> Cc: Hitesh Khamesra <hi...@yahoo.com>
> Sent: Wednesday, May 3, 2017 5:42 PM
> Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal
> 
> 
> 
> 
>> On May 3, 2017, at 1:33 PM, Galen M O'Sullivan <go...@pivotal.io> wrote:
>> 
>> On Tue, May 2, 2017 at 11:52 AM, Hitesh Khamesra <
>> hiteshk25@yahoo.com.invalid> wrote:
>> 
>>> Absolutely its a implementation detail.
>>> 
>> This doesn't answer Dan's comment. Do you think fragmentation should be
>> taken care of by the TCP layer or the protocol should deal with it
>> specifically?
> 
> There’s some really good feedback and discussion in this thread!  Here are a few thoughts:
> 
> 1) Optional metadata should be used for fields that are generally applicable across all messages.  If a metadata field is required or only applies to a small set of messages, it should become part of a message definition.  Of course there’s some grey area here.
> 
> 2) I think we should pull out the message fragmentation support to avoid some significant complexity.  We can later add a fragmentation / envelope layer on top without disrupting the current proposal.  I do think we should add the capability for chunking data (more on that below).
> 
> 3) I did not find any discussion of message pipelining (queuing multiple requests on a socket without waiting for a response) or out-of-order responses.  What is the plan for these capabilities and how will that affect consistency?  What about retries?
> 
> 4) Following is an alternative definition with these characteristics:
> 
> - Serialized data can either be primitive or encoded values.  Encoded values are chunked as needed to break up large objects into a series of smaller parts.
> - Because values can be chunked, the size field is removed.  This allows the message to be streamed to the socket incrementally.
> - The apiVersion is removed because we can just define a new body type with a new apiId (e.g. GetRequest2 with aipId = 1292).
> - The GetRequest tells the server what kind of encoding the client is able to understand.
> - The metadata map is not used for fields that belong in the message body.  I think it’s much easier to write a spec without if statements :-)
> 
> Message => MessageHeader MessageBody
> 
> MessageHeader => correlationId metadata
>    correlationId => integer
>    metadata => count (key value)*
>        count => integer
>        key => string
>        value => string
> 
> MessageBody => apiId body
>    apiId => integer
>    body => (see specific definitions)
> 
> GetRequest => 0 acceptEncoding key
>    0 => the API id
>    acceptEncoding => (define some encodings for byte[], JSON, PDX, *, etc)
>    key => EncodedValue
> 
> GetResponse => 1 value
>    1 => the API id
>    value => EncodedValue
> 
> PutRequest => 2 eventId key value
>    2 => the API id
>    eventId => clientId threadId sequenceId
>        clientId => string
>        threadId => integer
>        sequenceId => integer
>    key => EncodedValue
>    value => EncodedValue
> 
> EncodedValue => encoding (boolean | integer | number | string | ((length bytes)* 0))
>    encoding => (define some encodings for byte[], JSON, PDX, *, etc)
>    boolean => TRUE or FALSE
>    integer => a signed integer value
>    number => a decimal value corresponding to IEEE 754
>    string => UTF-8 text
>    bytes => arbitrary byte[] that can be chunked
> 
> 
> Anthony

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Anthony Baker <ab...@pivotal.io>.

There would be one Message containing a single MessageHeader and a single MessageBody.  A PDX EncodedValue containing 1242 bytes that are chunked would look something like this:

PDX 1000 byte[1000] 242 byte[242] 0


Anthony


> On May 4, 2017, at 10:38 AM, Hitesh Khamesra <hi...@yahoo.com> wrote:
> 
> Hi Anthony:
> 
> Help me to understand data chunking here?
> 
>>> bytes => arbitrary byte[] that can be chunked
> 
> Message => MessageHeader MessageBody
> 
> So lets say we want to send long byte[] into two chunks, then we will send two messages? And other side will combine those two messages using "correlationId" ?
> 
> Thanks.
> HItesh
> 
> 
> 
> 
> ________________________________
> From: Anthony Baker <ab...@pivotal.io>
> To: dev@geode.apache.org 
> Cc: Hitesh Khamesra <hi...@yahoo.com>
> Sent: Wednesday, May 3, 2017 5:42 PM
> Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal
> 
> 
> 
> 
>> On May 3, 2017, at 1:33 PM, Galen M O'Sullivan <go...@pivotal.io> wrote:
>> 
>> On Tue, May 2, 2017 at 11:52 AM, Hitesh Khamesra <
>> hiteshk25@yahoo.com.invalid> wrote:
>> 
>>> Absolutely its a implementation detail.
>>> 
>> This doesn't answer Dan's comment. Do you think fragmentation should be
>> taken care of by the TCP layer or the protocol should deal with it
>> specifically?
> 
> There’s some really good feedback and discussion in this thread!  Here are a few thoughts:
> 
> 1) Optional metadata should be used for fields that are generally applicable across all messages.  If a metadata field is required or only applies to a small set of messages, it should become part of a message definition.  Of course there’s some grey area here.
> 
> 2) I think we should pull out the message fragmentation support to avoid some significant complexity.  We can later add a fragmentation / envelope layer on top without disrupting the current proposal.  I do think we should add the capability for chunking data (more on that below).
> 
> 3) I did not find any discussion of message pipelining (queuing multiple requests on a socket without waiting for a response) or out-of-order responses.  What is the plan for these capabilities and how will that affect consistency?  What about retries?
> 
> 4) Following is an alternative definition with these characteristics:
> 
> - Serialized data can either be primitive or encoded values.  Encoded values are chunked as needed to break up large objects into a series of smaller parts.
> - Because values can be chunked, the size field is removed.  This allows the message to be streamed to the socket incrementally.
> - The apiVersion is removed because we can just define a new body type with a new apiId (e.g. GetRequest2 with aipId = 1292).
> - The GetRequest tells the server what kind of encoding the client is able to understand.
> - The metadata map is not used for fields that belong in the message body.  I think it’s much easier to write a spec without if statements :-)
> 
> Message => MessageHeader MessageBody
> 
> MessageHeader => correlationId metadata
>    correlationId => integer
>    metadata => count (key value)*
>        count => integer
>        key => string
>        value => string
> 
> MessageBody => apiId body
>    apiId => integer
>    body => (see specific definitions)
> 
> GetRequest => 0 acceptEncoding key
>    0 => the API id
>    acceptEncoding => (define some encodings for byte[], JSON, PDX, *, etc)
>    key => EncodedValue
> 
> GetResponse => 1 value
>    1 => the API id
>    value => EncodedValue
> 
> PutRequest => 2 eventId key value
>    2 => the API id
>    eventId => clientId threadId sequenceId
>        clientId => string
>        threadId => integer
>        sequenceId => integer
>    key => EncodedValue
>    value => EncodedValue
> 
> EncodedValue => encoding (boolean | integer | number | string | ((length bytes)* 0))
>    encoding => (define some encodings for byte[], JSON, PDX, *, etc)
>    boolean => TRUE or FALSE
>    integer => a signed integer value
>    number => a decimal value corresponding to IEEE 754
>    string => UTF-8 text
>    bytes => arbitrary byte[] that can be chunked
> 
> 
> Anthony

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Hitesh Khamesra <hi...@yahoo.com.INVALID>.

Hi Anthony:

Help me to understand data chunking here?

>> bytes => arbitrary byte[] that can be chunked

Message => MessageHeader MessageBody

So lets say we want to send long byte[] into two chunks, then we will send two messages? And other side will combine those two messages using "correlationId" ?

Thanks.
HItesh

________________________________
From: Anthony Baker <ab...@pivotal.io>
To: dev@geode.apache.org 
Cc: Hitesh Khamesra <hi...@yahoo.com>
Sent: Wednesday, May 3, 2017 5:42 PM
Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal

> On May 3, 2017, at 1:33 PM, Galen M O'Sullivan <go...@pivotal.io> wrote:
> 
> On Tue, May 2, 2017 at 11:52 AM, Hitesh Khamesra <
> hiteshk25@yahoo.com.invalid> wrote:
> 
>> Absolutely its a implementation detail.
>> 
> This doesn't answer Dan's comment. Do you think fragmentation should be
> taken care of by the TCP layer or the protocol should deal with it
> specifically?

There’s some really good feedback and discussion in this thread!  Here are a few thoughts:

1) Optional metadata should be used for fields that are generally applicable across all messages.  If a metadata field is required or only applies to a small set of messages, it should become part of a message definition.  Of course there’s some grey area here.

2) I think we should pull out the message fragmentation support to avoid some significant complexity.  We can later add a fragmentation / envelope layer on top without disrupting the current proposal.  I do think we should add the capability for chunking data (more on that below).

3) I did not find any discussion of message pipelining (queuing multiple requests on a socket without waiting for a response) or out-of-order responses.  What is the plan for these capabilities and how will that affect consistency?  What about retries?

4) Following is an alternative definition with these characteristics:

- Serialized data can either be primitive or encoded values.  Encoded values are chunked as needed to break up large objects into a series of smaller parts.
- Because values can be chunked, the size field is removed.  This allows the message to be streamed to the socket incrementally.
- The apiVersion is removed because we can just define a new body type with a new apiId (e.g. GetRequest2 with aipId = 1292).
- The GetRequest tells the server what kind of encoding the client is able to understand.
- The metadata map is not used for fields that belong in the message body.  I think it’s much easier to write a spec without if statements :-)

Message => MessageHeader MessageBody

MessageHeader => correlationId metadata
    correlationId => integer
    metadata => count (key value)*
        count => integer
        key => string
        value => string

MessageBody => apiId body
    apiId => integer
    body => (see specific definitions)

GetRequest => 0 acceptEncoding key
    0 => the API id
    acceptEncoding => (define some encodings for byte[], JSON, PDX, *, etc)
    key => EncodedValue

GetResponse => 1 value
    1 => the API id
    value => EncodedValue

PutRequest => 2 eventId key value
    2 => the API id
    eventId => clientId threadId sequenceId
        clientId => string
        threadId => integer
        sequenceId => integer
    key => EncodedValue
    value => EncodedValue

EncodedValue => encoding (boolean | integer | number | string | ((length bytes)* 0))
    encoding => (define some encodings for byte[], JSON, PDX, *, etc)
    boolean => TRUE or FALSE
    integer => a signed integer value
    number => a decimal value corresponding to IEEE 754
    string => UTF-8 text
    bytes => arbitrary byte[] that can be chunked

Anthony

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Bruce Schuchardt <bs...@pivotal.io>.

It's becoming clear that the document needs a section on EventIds.

EventIds aren't opaque to the server.  They are comparable objects and 
are used by the cache to prevent replay of older (operation eventId < 
recorded eventId) operations on the cache and on subscription queues.  
They are also used to prevent sending operations originating from a 
client back to that client in its subscription queue.

A thread's sequenceId should be incremented for each operation sent to 
the server.

In my opinion EventIds are optional for clients and only need to be 
implemented if clients are going to retry operations.  If a client 
doesn't send an EventId to the server one will be generated on the 
server for the operation.

Le 5/4/2017 à 8:46 AM, Jacob Barrett a écrit :
> The eventId is really just a once token right? Meaning that its rather
> opaque to the server and intended to keep the server from replaying a
> request that the client may have retried that was actually successful. If
> it is opaque to the server then why encode all these specific identifiers?
> Seems to me it could be optional for one and could simply be a variant int
> or byte[]. The server just needs to stash the once tokens and make sure it
> doesn't get a duplicate on this client stream.

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Jacob Barrett <jb...@pivotal.io>.

+1

On Wed, May 3, 2017 at 5:43 PM Anthony Baker <ab...@pivotal.io> wrote:

>
> 2) I think we should pull out the message fragmentation support to avoid
> some significant complexity.  We can later add a fragmentation / envelope
> layer on top without disrupting the current proposal.  I do think we should
> add the capability for chunking data (more on that below).
>

+10 Like any good engineering practice we need to keep objects well
encapsulated and focused on their singular task. A message would not be
represented as a object of "partials" but as a whole message object, so why
treat it any differently when serialized. The layer below it can chunk it
if necessary. Initially between the message (lowest level in our stack) and
the TCP socket is nothing. TCP will fragment as needed, full message is
delivered up the stack. If in the future we ant to
mulitchannel/interleave/pipeline (whatever you want to call it) we can
negotiate the support with the client and inject a layer between the
message and TCP layers that identifies unique streams of data channels. In
the interim, the naive approach to multiple channels is to open a second
socket. The important thing is that at the message layer it doesn't know
and doesn't care.


> 4) Following is an alternative definition with these characteristics:
>
> - Serialized data can either be primitive or encoded values.  Encoded
> values are chunked as needed to break up large objects into a series of
> smaller parts.
>
+1

> - Because values can be chunked, the size field is removed.  This allows
> the message to be streamed to the socket incrementally.
>
+1

> - The apiVersion is removed because we can just define a new body type
> with a new apiId (e.g. GetRequest2 with aipId = 1292).
>
+1 Think of the message as a class, you don't want to have class that has
more than a single personality. If the first argument to your class
(version) is the personality then you need to think about a new class. You
don't want the writer of the protocol to have to deduce the personality of
the object based on an argument and then have to decide which fields are
require or optional or obsolete. By making a new message you strongly type
the messages both in definition and in implementation.


> - The GetRequest tells the server what kind of encoding the client is able
> to understand.
>
+ 1 I would suggest that a default ordered list be established at initial
handshake. If a list is not provided at handshake then ALL are supported.
Then on individual request messages if the list of encodings is given then
it overrides the list allowed for that single request. If no list is
provided on the request the handshake negotiated list is assumed. If a
value being returned is not encoded in any of the encodings listed then it
is transcoded to the highest priority encoding with an available transcoder
between the source and destination encoding.

GetRequest => 0 acceptEncoding key
>     0 => the API id
>     acceptEncoding => (define some encodings for byte[], JSON, PDX, *, etc)
>     key => EncodedValue
>
Change: acceptedEncodings => encodingId*
Would it make sense to make 'key' a 'key+' or does a GetAllRequest and
GetAllResponse vary that much from GetRequest and GetResponse?

PutRequest => 2 eventId key value
>     2 => the API id
>     eventId => clientId threadId sequenceId
>         clientId => string
>         threadId => integer
>         sequenceId => integer
>     key => EncodedValue
>     value => EncodedValue
>

The eventId is really just a once token right? Meaning that its rather
opaque to the server and intended to keep the server from replaying a
request that the client may have retried that was actually successful. If
it is opaque to the server then why encode all these specific identifiers?
Seems to me it could be optional for one and could simply be a variant int
or byte[]. The server just needs to stash the once tokens and make sure it
doesn't get a duplicate on this client stream.

-Jake

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Dan Smith <ds...@pivotal.io>.

+1 to what Anthony has laid out! I think this is a better way to handle
value encodings, and it's also better to be putting message specific
details like event id with those messages.

I do wonder whether this proposal actually needs metadata headers at all?
What will eventually go in there?

-Dan

There’s some really good feedback and discussion in this thread!  Here are
> a few thoughts:
>
> 1) Optional metadata should be used for fields that are generally
> applicable across all messages.  If a metadata field is required or only
> applies to a small set of messages, it should become part of a message
> definition.  Of course there’s some grey area here.
>
> 2) I think we should pull out the message fragmentation support to avoid
> some significant complexity.  We can later add a fragmentation / envelope
> layer on top without disrupting the current proposal.  I do think we should
> add the capability for chunking data (more on that below).
>
> 3) I did not find any discussion of message pipelining (queuing multiple
> requests on a socket without waiting for a response) or out-of-order
> responses.  What is the plan for these capabilities and how will that
> affect consistency?  What about retries?
>
> 4) Following is an alternative definition with these characteristics:
>
> - Serialized data can either be primitive or encoded values.  Encoded
> values are chunked as needed to break up large objects into a series of
> smaller parts.
> - Because values can be chunked, the size field is removed.  This allows
> the message to be streamed to the socket incrementally.
> - The apiVersion is removed because we can just define a new body type
> with a new apiId (e.g. GetRequest2 with aipId = 1292).
> - The GetRequest tells the server what kind of encoding the client is able
> to understand.
> - The metadata map is not used for fields that belong in the message
> body.  I think it’s much easier to write a spec without if statements :-)
>
> Message => MessageHeader MessageBody
>
> MessageHeader => correlationId metadata
>     correlationId => integer
>     metadata => count (key value)*
>         count => integer
>         key => string
>         value => string
>
> MessageBody => apiId body
>     apiId => integer
>     body => (see specific definitions)
>
> GetRequest => 0 acceptEncoding key
>     0 => the API id
>     acceptEncoding => (define some encodings for byte[], JSON, PDX, *, etc)
>     key => EncodedValue
>
> GetResponse => 1 value
>     1 => the API id
>     value => EncodedValue
>
> PutRequest => 2 eventId key value
>     2 => the API id
>     eventId => clientId threadId sequenceId
>         clientId => string
>         threadId => integer
>         sequenceId => integer
>     key => EncodedValue
>     value => EncodedValue
>
> EncodedValue => encoding (boolean | integer | number | string | ((length
> bytes)* 0))
>     encoding => (define some encodings for byte[], JSON, PDX, *, etc)
>     boolean => TRUE or FALSE
>     integer => a signed integer value
>     number => a decimal value corresponding to IEEE 754
>     string => UTF-8 text
>     bytes => arbitrary byte[] that can be chunked
>
>
> Anthony
>
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Anthony Baker <ab...@pivotal.io>.

> On May 3, 2017, at 1:33 PM, Galen M O'Sullivan <go...@pivotal.io> wrote:
> 
> On Tue, May 2, 2017 at 11:52 AM, Hitesh Khamesra <
> hiteshk25@yahoo.com.invalid> wrote:
> 
>> Absolutely its a implementation detail.
>> 
> This doesn't answer Dan's comment. Do you think fragmentation should be
> taken care of by the TCP layer or the protocol should deal with it
> specifically?

There’s some really good feedback and discussion in this thread!  Here are a few thoughts:

1) Optional metadata should be used for fields that are generally applicable across all messages.  If a metadata field is required or only applies to a small set of messages, it should become part of a message definition.  Of course there’s some grey area here.

2) I think we should pull out the message fragmentation support to avoid some significant complexity.  We can later add a fragmentation / envelope layer on top without disrupting the current proposal.  I do think we should add the capability for chunking data (more on that below).

3) I did not find any discussion of message pipelining (queuing multiple requests on a socket without waiting for a response) or out-of-order responses.  What is the plan for these capabilities and how will that affect consistency?  What about retries?

4) Following is an alternative definition with these characteristics:

- Serialized data can either be primitive or encoded values.  Encoded values are chunked as needed to break up large objects into a series of smaller parts.
- Because values can be chunked, the size field is removed.  This allows the message to be streamed to the socket incrementally.
- The apiVersion is removed because we can just define a new body type with a new apiId (e.g. GetRequest2 with aipId = 1292).
- The GetRequest tells the server what kind of encoding the client is able to understand.
- The metadata map is not used for fields that belong in the message body.  I think it’s much easier to write a spec without if statements :-)

Message => MessageHeader MessageBody

MessageHeader => correlationId metadata
    correlationId => integer
    metadata => count (key value)*
        count => integer
        key => string
        value => string

MessageBody => apiId body
    apiId => integer
    body => (see specific definitions)

GetRequest => 0 acceptEncoding key
    0 => the API id
    acceptEncoding => (define some encodings for byte[], JSON, PDX, *, etc)
    key => EncodedValue

GetResponse => 1 value
    1 => the API id
    value => EncodedValue

PutRequest => 2 eventId key value
    2 => the API id
    eventId => clientId threadId sequenceId
        clientId => string
        threadId => integer
        sequenceId => integer
    key => EncodedValue
    value => EncodedValue

EncodedValue => encoding (boolean | integer | number | string | ((length bytes)* 0))
    encoding => (define some encodings for byte[], JSON, PDX, *, etc)
    boolean => TRUE or FALSE
    integer => a signed integer value
    number => a decimal value corresponding to IEEE 754
    string => UTF-8 text
    bytes => arbitrary byte[] that can be chunked


Anthony

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Galen M O'Sullivan <go...@pivotal.io>.

On Tue, May 2, 2017 at 11:52 AM, Hitesh Khamesra <
hiteshk25@yahoo.com.invalid> wrote:

> Absolutely its a implementation detail.
>
This doesn't answer Dan's comment. Do you think fragmentation should be
taken care of by the TCP layer or the protocol should deal with it
specifically?

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Hitesh Khamesra <hi...@yahoo.com.INVALID>.

Absolutely its a implementation detail.
JSON: Surely we can consider ValueHeader. But then every client(and message) needs to send that. Using metadata its a optional.


      From: Dan Smith <ds...@pivotal.io>
 To: Udo Kohlmeyer <uk...@pivotal.io> 
Cc: dev@geode.apache.org
 Sent: Tuesday, May 2, 2017 11:39 AM
 Subject: Re: [gemfire-dev] New Client-Server Protocol Proposal
   
> IsPartialMessage: This flag gives us ability to send partial message
> without serializing the whole key-value(request). lets say I execute
> function at server and function just returns "arraylist of object". And the
> serialized size of ""arraylist of object"" is quite big( > 2gb).
>

My point about these fields is that it really seems like this stuff should
be handled by different layers. Ideally you would have a fragmentation
layer that is invisible to people writing specific messages, so that
messages are automatically fragmented if they get to large. Think about how
a TCP socket works - you just write data and it is automatically
fragmented. Or are you expecting each individual message type to have it's
own way to doing fragmentation, but it should set this header down in your
protocol layer? That seems really messy.

JSON: this is a feature we want to introduce, where client can send JSON
> string and we want to save that JSON string into pdx.


Same thing here, JSON support sounds great, but having a header field of
JSON_KEY seems like a hacky way to do that. It seems like that might belong
in your ValueHeader.




On Tue, May 2, 2017 at 10:20 AM, Udo Kohlmeyer <uk...@pivotal.io>
wrote:

> Hey Dan,
>
> Imo, having a standardized, versioned definition for GET, PUT, PUTALL,
> etc. message, that is encoded/decoded in a manner that multiple clients
> (written in many other languages) can encode/decode these messages, is
> paramount.
>
> Having the standardized operational messages(GET,PUT,etc.) transported
> using the function service vs a more direct operation handler, that is
> another discussion and is something that should be investigated.
>
> My immediate concerns regarding "normal" operations over the function
> service are:
>
>    1. I don't believe the current function service is "stream" enabled,
>    and would require some potential rework for subscription-based operations
>    2. Can the function service handle the extra load?
>    3. Is the function service "lean" enough to sustain acceptable
>    throughput? The current client/server protocol averages around
>    40,000-50,000 messages/second.
>    4. There are some messages that are passed between the client <->
>    locator. Given that the function service is "server" specific, this
>    approach would not work for locators, where a different transport mechanism
>    is required. (but this is not a show stopper if function service proves to
>    be viable)
>    5. How much effort would be required to make the "old" function
>    service, handle the new messages, ensuring that the current behavior is
>    preserved.
>
> As per a previous discussion we had, I believe that the "function-like"
> behavior (retry, HA, write vs read optimized) can incorporated into the
> processing layer on the server. In that way all messages can benefit from
> that behavior. In addition to this, if we have a single mechanism that will
> handle messages, retry, HA, read/write optimizations, is preferable to
> having a few "bespoke" implementations. So either approach (new message
> handling) vs function service, will be preferable.
>
> "*The advantage of this approach is that if someone just builds a driver
> that only supports function execution and whatever serialization framework
> is required to serialize function arguments, they already have an API that
> application developers could use to do pretty much anything they wanted to
> do on the server. Having a Region object with methods like get and put on
> it could just be a little syntatic sugar on top of that.*"
>
> It can be argued that having a standard client/server message, with
> standardized encoding/decoding, is the same as using function execution.
> Both require a little syntactic sugar to add new functionality to an
> already standardized message.
>
> --Udo
> On 5/1/17 17:27, Dan Smith wrote:
>
> I think any new client driver or server we develop might want to
> incorporate function execution at lower level than region operations like
> get and put, etc. We could then easily build operations like GET, PUT,
> PUTALL, etc. on top of that by making them functions. The original client
> protocol isn't designed like that because it pre-dates function execution.
>
> The current function execution API is a little clunky and needs some work.
> But what it does do is provide the fundamental logic to target operations
> at members that host certain keys and retry in the case of failure.
>
> The advantage of this approach is that if someone just builds a driver
> that only supports function execution and whatever serialization framework
> is required to serialize function arguments, they already have an API that
> application developers could use to do pretty much anything they wanted to
> do on the server. Having a Region object with methods like get and put on
> it could just be a little syntatic sugar on top of that.
>
> -Dan
>
> On Fri, Apr 28, 2017 at 2:49 PM, Udo Kohlmeyer <uk...@pivotal.io>
> wrote:
>
>> Hi there Geode community,
>>
>> The new Client-Server protocol proposal is available for review.
>>
>> It can be viewed and commented on https://cwiki.apache.org/confl
>> uence/display/GEODE/New+Client+Server+Protocol
>>
>> --Udo
>>
>>
>
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Dan Smith <ds...@pivotal.io>.

> IsPartialMessage: This flag gives us ability to send partial message
> without serializing the whole key-value(request). lets say I execute
> function at server and function just returns "arraylist of object". And the
> serialized size of ""arraylist of object"" is quite big( > 2gb).
>

My point about these fields is that it really seems like this stuff should
be handled by different layers. Ideally you would have a fragmentation
layer that is invisible to people writing specific messages, so that
messages are automatically fragmented if they get to large. Think about how
a TCP socket works - you just write data and it is automatically
fragmented. Or are you expecting each individual message type to have it's
own way to doing fragmentation, but it should set this header down in your
protocol layer? That seems really messy.

JSON: this is a feature we want to introduce, where client can send JSON
> string and we want to save that JSON string into pdx.


Same thing here, JSON support sounds great, but having a header field of
JSON_KEY seems like a hacky way to do that. It seems like that might belong
in your ValueHeader.




On Tue, May 2, 2017 at 10:20 AM, Udo Kohlmeyer <uk...@pivotal.io>
wrote:

> Hey Dan,
>
> Imo, having a standardized, versioned definition for GET, PUT, PUTALL,
> etc. message, that is encoded/decoded in a manner that multiple clients
> (written in many other languages) can encode/decode these messages, is
> paramount.
>
> Having the standardized operational messages(GET,PUT,etc.) transported
> using the function service vs a more direct operation handler, that is
> another discussion and is something that should be investigated.
>
> My immediate concerns regarding "normal" operations over the function
> service are:
>
>    1. I don't believe the current function service is "stream" enabled,
>    and would require some potential rework for subscription-based operations
>    2. Can the function service handle the extra load?
>    3. Is the function service "lean" enough to sustain acceptable
>    throughput? The current client/server protocol averages around
>    40,000-50,000 messages/second.
>    4. There are some messages that are passed between the client <->
>    locator. Given that the function service is "server" specific, this
>    approach would not work for locators, where a different transport mechanism
>    is required. (but this is not a show stopper if function service proves to
>    be viable)
>    5. How much effort would be required to make the "old" function
>    service, handle the new messages, ensuring that the current behavior is
>    preserved.
>
> As per a previous discussion we had, I believe that the "function-like"
> behavior (retry, HA, write vs read optimized) can incorporated into the
> processing layer on the server. In that way all messages can benefit from
> that behavior. In addition to this, if we have a single mechanism that will
> handle messages, retry, HA, read/write optimizations, is preferable to
> having a few "bespoke" implementations. So either approach (new message
> handling) vs function service, will be preferable.
>
> "*The advantage of this approach is that if someone just builds a driver
> that only supports function execution and whatever serialization framework
> is required to serialize function arguments, they already have an API that
> application developers could use to do pretty much anything they wanted to
> do on the server. Having a Region object with methods like get and put on
> it could just be a little syntatic sugar on top of that.*"
>
> It can be argued that having a standard client/server message, with
> standardized encoding/decoding, is the same as using function execution.
> Both require a little syntactic sugar to add new functionality to an
> already standardized message.
>
> --Udo
> On 5/1/17 17:27, Dan Smith wrote:
>
> I think any new client driver or server we develop might want to
> incorporate function execution at lower level than region operations like
> get and put, etc. We could then easily build operations like GET, PUT,
> PUTALL, etc. on top of that by making them functions. The original client
> protocol isn't designed like that because it pre-dates function execution.
>
> The current function execution API is a little clunky and needs some work.
> But what it does do is provide the fundamental logic to target operations
> at members that host certain keys and retry in the case of failure.
>
> The advantage of this approach is that if someone just builds a driver
> that only supports function execution and whatever serialization framework
> is required to serialize function arguments, they already have an API that
> application developers could use to do pretty much anything they wanted to
> do on the server. Having a Region object with methods like get and put on
> it could just be a little syntatic sugar on top of that.
>
> -Dan
>
> On Fri, Apr 28, 2017 at 2:49 PM, Udo Kohlmeyer <uk...@pivotal.io>
> wrote:
>
>> Hi there Geode community,
>>
>> The new Client-Server protocol proposal is available for review.
>>
>> It can be viewed and commented on https://cwiki.apache.org/confl
>> uence/display/GEODE/New+Client+Server+Protocol
>>
>> --Udo
>>
>>
>
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Udo Kohlmeyer <uk...@pivotal.io>.

Hey Dan,

Imo, having a standardized, versioned definition for GET, PUT, PUTALL, 
etc. message, that is encoded/decoded in a manner that multiple clients 
(written in many other languages) can encode/decode these messages, is 
paramount.

Having the standardized operational messages(GET,PUT,etc.) transported 
using the function service vs a more direct operation handler, that is 
another discussion and is something that should be investigated.

My immediate concerns regarding "normal" operations over the function 
service are:

 1. I don't believe the current function service is "stream" enabled,
    and would require some potential rework for subscription-based
    operations
 2. Can the function service handle the extra load?
 3. Is the function service "lean" enough to sustain acceptable
    throughput? The current client/server protocol averages around
    40,000-50,000 messages/second.
 4. There are some messages that are passed between the client <->
    locator. Given that the function service is "server" specific, this
    approach would not work for locators, where a different transport
    mechanism is required. (but this is not a show stopper if function
    service proves to be viable)
 5. How much effort would be required to make the "old" function
    service, handle the new messages, ensuring that the current behavior
    is preserved.

As per a previous discussion we had, I believe that the "function-like" 
behavior (retry, HA, write vs read optimized) can incorporated into the 
processing layer on the server. In that way all messages can benefit 
from that behavior. In addition to this, if we have a single mechanism 
that will handle messages, retry, HA, read/write optimizations, is 
preferable to having a few "bespoke" implementations. So either approach 
(new message handling) vs function service, will be preferable.

"/The advantage of this approach is that if someone just builds a driver 
that only supports function execution and whatever serialization 
framework is required to serialize function arguments, they already have 
an API that application developers could use to do pretty much anything 
they wanted to do on the server. Having a Region object with methods 
like get and put on it could just be a little syntatic sugar on top of 
that./"

It can be argued that having a standard client/server message, with 
standardized encoding/decoding, is the same as using function execution. 
Both require a little syntactic sugar to add new functionality to an 
already standardized message.

--Udo

On 5/1/17 17:27, Dan Smith wrote:
> I think any new client driver or server we develop might want to 
> incorporate function execution at lower level than region operations 
> like get and put, etc. We could then easily build operations like GET, 
> PUT, PUTALL, etc. on top of that by making them functions. The 
> original client protocol isn't designed like that because it pre-dates 
> function execution.
>
> The current function execution API is a little clunky and needs some 
> work. But what it does do is provide the fundamental logic to target 
> operations at members that host certain keys and retry in the case of 
> failure.
>
> The advantage of this approach is that if someone just builds a driver 
> that only supports function execution and whatever serialization 
> framework is required to serialize function arguments, they already 
> have an API that application developers could use to do pretty much 
> anything they wanted to do on the server. Having a Region object with 
> methods like get and put on it could just be a little syntatic sugar 
> on top of that.
>
> -Dan
>
> On Fri, Apr 28, 2017 at 2:49 PM, Udo Kohlmeyer <ukohlmeyer@pivotal.io 
> <ma...@pivotal.io>> wrote:
>
>     Hi there Geode community,
>
>     The new Client-Server protocol proposal is available for review.
>
>     It can be viewed and commented on
>     https://cwiki.apache.org/confluence/display/GEODE/New+Client+Server+Protocol
>     <https://cwiki.apache.org/confluence/display/GEODE/New+Client+Server+Protocol>
>
>     --Udo
>
>

Re: [gemfire-dev] New Client-Server Protocol Proposal

Posted by Anthony Baker <ab...@pivotal.io>.

I think the downside of having a single generic message type is that we lose “type safety” and some efficiency.  The message definition would essentially become:

String functionName;
byte[][] args;

It’s a little more challenging for an author of a Geode Driver to fill in the args correctly compared to calling specific methods on a generated stub.  Also, if the argument data types are fixed in the message definition we can apply efficient encoding techniques automatically (e.g. varint, zigzag, optional).

I also wonder about the code path efficiency for functions vs get / put.  That would be an interesting test.


Anthony

> On May 1, 2017, at 5:27 PM, Dan Smith <ds...@pivotal.io> wrote:
> 
> I think any new client driver or server we develop might want to
> incorporate function execution at lower level than region operations like
> get and put, etc. We could then easily build operations like GET, PUT,
> PUTALL, etc. on top of that by making them functions. The original client
> protocol isn't designed like that because it pre-dates function execution.
> 
> The current function execution API is a little clunky and needs some work.
> But what it does do is provide the fundamental logic to target operations
> at members that host certain keys and retry in the case of failure.
> 
> The advantage of this approach is that if someone just builds a driver that
> only supports function execution and whatever serialization framework is
> required to serialize function arguments, they already have an API that
> application developers could use to do pretty much anything they wanted to
> do on the server. Having a Region object with methods like get and put on
> it could just be a little syntatic sugar on top of that.
> 
> -Dan
> 
> On Fri, Apr 28, 2017 at 2:49 PM, Udo Kohlmeyer <uk...@pivotal.io>
> wrote:
> 
>> Hi there Geode community,
>> 
>> The new Client-Server protocol proposal is available for review.
>> 
>> It can be viewed and commented on https://cwiki.apache.org/confl
>> uence/display/GEODE/New+Client+Server+Protocol
>> 
>> --Udo
>> 
>>