You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Mitch Claborn <mi...@claborn.net> on 2018/09/21 17:02:03 UTC

Potential change to DeltaManager

Please forgive me if this is the incorrect place or format for 
discussing this. I'm new to trying to develop for Tomcat.

I'm developing a patch for DeltaManager and I'd like to discuss with you 
developers if it could be considered for inclusion in the base code. 
Please see details below and comment.

Problem: When the "all sessions" message is sent from one node to 
another, when the receiving node is first starting up, I often run into 
various errors with one of the sessions and it fails to deserialize. 
This causes all the remaining sessions in that chunk 
(sendAllSessionsSize) to be lost by the receiver. The problem with the 
sessions is totally an application problem, but until I can figure those 
problems out and solve them I need a way to limit the impact of these 
problems to just the one session that is in error. I could set 
sendAllSessionsSize="1" but that would take a LONG time to transmit, and 
we have many thousands of sessions at any given time.

Change details:

 1. Update
    org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
    and
    org.apache.catalina.ha.session.DeltaSession.doReadObject(ObjectInput)
    to produce a more detailed error message when a session is in
    error.  New error message includes: the session index in the list of
    sessions, the session ID, the last field or attribute that was
    attempted to be read.
 2. Introduce new XML attribute verifySerializedSessions for DeltaManager.
 3. If verifySerializedSessions="true",
    org.apache.catalina.ha.session.DeltaManager.serializeSessions(Session[])
    will first serialize each session then immediately deserialize it.
    If all is good, send the session as usual.  If any errors are
    encountered, create and send a dummy session with a known session ID
    instead. (This keeps the session count, which has already been put
    in the output stream, correct for the receiving node.)
 4. Update
    org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
    to discard any received session that has the known dummy session ID.

-- 

Mitch


Re: Potential change to DeltaManager

Posted by Mitch Claborn <mi...@claborn.net>.
FYI: I've created the Bugzilla request and submitted the patch there.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62773


Mitch

On 9/27/18 5:49 PM, Mark Thomas wrote:
> Mitch,
> 
> First some general comments.
> 
> Projects at the ASF generally operate using lazy consensus meaning if 
> no-one objects after a reasonable amount of time (72 hours is a good 
> starting point for reasonable) then assume you have agreement to 
> proceed. Note that it is ApacheCon NA this week so a number of the 
> committers may be distracted and/or travelling.
> 
> It sounds like a good next step would be to create a Bugzilla 
> enhancement request and attach your patch.
> 
> Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Potential change to DeltaManager

Posted by Mark Thomas <ma...@apache.org>.
Mitch,

First some general comments.

Projects at the ASF generally operate using lazy consensus meaning if 
no-one objects after a reasonable amount of time (72 hours is a good 
starting point for reasonable) then assume you have agreement to 
proceed. Note that it is ApacheCon NA this week so a number of the 
committers may be distracted and/or travelling.

It sounds like a good next step would be to create a Bugzilla 
enhancement request and attach your patch.

Mark


On 27/09/2018 11:41, Mitch Claborn wrote:
> Any further thoughts or comments on this? I think my patch is ready for 
> prime time now.
> 
> 
> Mitch
> 
> On 09/22/2018 11:23 AM, Mitch Claborn wrote:
>> See below for answers to your questions.
>>
>> Status update: I've been running my patch in production for about 16 
>> hours with no problems. I've restarted each Tomcat (3) once and had no 
>> problems, but also detected no errors, either on send or receive. I 
>> have some code that I used in dev to force an error on a specific 
>> combination of session attribute name and value.  I'm going to put 
>> that in prod so that I can test how it behaves with a large volume of 
>> sessions and at least one error.
>>
>>
>> Mitch
>>
>> On 09/21/2018 05:00 PM, Mark Thomas wrote:
>>> On 21/09/18 18:02, Mitch Claborn wrote:
>>>> Please forgive me if this is the incorrect place or format for
>>>> discussing this. I'm new to trying to develop for Tomcat.
>>>
>>> This is the right place. Welcome to the Tomcat community.
>>>
>>>> I'm developing a patch for DeltaManager and I'd like to discuss with 
>>>> you
>>>> developers if it could be considered for inclusion in the base code.
>>>> Please see details below and comment.
>>>
>>> Will do. Please note that session replication is not an area I am
>>> particularly familiar with so if some of my comments are a little
>>> off-base I apologise.
>>>
>>>> Problem: When the "all sessions" message is sent from one node to
>>>> another, when the receiving node is first starting up, I often run into
>>>> various errors with one of the sessions and it fails to deserialize.
>>>> This causes all the remaining sessions in that chunk
>>>> (sendAllSessionsSize) to be lost by the receiver.
>>>
>>> Oops.
>>>
>>>> The problem with the
>>>> sessions is totally an application problem, but until I can figure 
>>>> those
>>>> problems out and solve them I need a way to limit the impact of these
>>>> problems to just the one session that is in error. I could set
>>>> sendAllSessionsSize="1" but that would take a LONG time to transmit, 
>>>> and
>>>> we have many thousands of sessions at any given time.
>>>
>>> That seems like a reasonable problem to try and solve.
>>>
>>>> Change details:
>>>>
>>>> 1. Update
>>>> org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
>>>>     and
>>>> org.apache.catalina.ha.session.DeltaSession.doReadObject(ObjectInput)
>>>>     to produce a more detailed error message when a session is in
>>>>     error.  New error message includes: the session index in the 
>>>> list of
>>>>     sessions, the session ID, the last field or attribute that was
>>>>     attempted to be read.
>>>
>>> I'm not sure how useful the index will be but the other information
>>> makes sense to me.
>>
>> The index gives me an indication of how many sessions were discarded 
>> because of the error.
>>
>>>
>>>> 2. Introduce new XML attribute verifySerializedSessions for 
>>>> DeltaManager.
>>>
>>> Why would a user not want to enable this feature? The performance hit of
>>> the additional deserialization on send?
>>
>> That is the only reason I can think of.
>>
>>>
>>>> 3. If verifySerializedSessions="true",
>>>> org.apache.catalina.ha.session.DeltaManager.serializeSessions(Session[]) 
>>>>
>>>>     will first serialize each session then immediately deserialize it.
>>>>     If all is good, send the session as usual.  If any errors are
>>>>     encountered, create and send a dummy session with a known 
>>>> session ID
>>>>     instead. (This keeps the session count, which has already been put
>>>>     in the output stream, correct for the receiving node.)
>>>
>>> Ah. Is the issue that serialization works but deserialization does not?
>>> That seems a little odd. Can you give an example of how this might go
>>> wrong? I am trying to understand the root cause(s) of the problem to
>>> determine if the proposed solution is appropriate. I thought
>>> DeltaSession simply skipped over attributes that it could not 
>>> deserialize.
>>
>> DeltaSession does skip attributes that are not serializable. I've had 
>> three identifiable errors, none of which I could reproduce at will.
>>
>> 1. A session with a Vector<Long> that might have contained nulls.  
>> This should not be an issue, but I fixed my code to eliminate nulls in 
>> that Vector, since they should not be there anyway.
>>
>> 2. In some of my own objects where I do my own serialization with 
>> JSON, there were some fields that I don't serialize that were not 
>> marked transient that should have been. Some of those embedded objects 
>> were thus serialized by the native serialization and caused some 
>> problems. I fixed those.
>>
>> 3. In another of my objects that I serialize with JSON, the JSON 
>> string in the serialized session was obviously corrupted and was not a 
>> valid JSON hash.  I went over the serialization code with a fine tooth 
>> come and it appears to be correct. That same code works hundreds of 
>> thousands of times a day without error.
>>
>> Especially in the case of #3, I suspect that there might be a 
>> concurrency issue - a session being modified in one request while it 
>> is being serialized in another.
>>
>> FYI, bordering on TMI: I just recently switched to DeltaManager from a 
>> custom session sharing solution where I was doing my own persistence 
>> to a database, with no in-memory storage. Concurrency was not an issue 
>> in that setup because each request received an independent copy of the 
>> session content. I could have had concurrency issues all along and not 
>> known it.
>>
>>
>>>
>>>> 4. Update
>>>> org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
>>>>     to discard any received session that has the known dummy session 
>>>> ID.
>>>
>>> This certainly looks like a problem that needs solving. I don't see any
>>> obvious issues with the approach taken but I would like a better
>>> understand of the root causes of the deserialization failures as I am
>>> wondering if there are alternative solutions that are worth considering.
>>
>> Understood. My goal with this patch is a) limit the negative effects 
>> of a serialization/deserialization error, and b) give more information 
>> about those errors so that the application can be fixed.
>>
>>>
>>> Mark
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Potential change to DeltaManager

Posted by Mitch Claborn <mi...@claborn.net>.
Any further thoughts or comments on this? I think my patch is ready for 
prime time now.


Mitch

On 09/22/2018 11:23 AM, Mitch Claborn wrote:
> See below for answers to your questions.
> 
> Status update: I've been running my patch in production for about 16 
> hours with no problems. I've restarted each Tomcat (3) once and had no 
> problems, but also detected no errors, either on send or receive. I have 
> some code that I used in dev to force an error on a specific combination 
> of session attribute name and value.  I'm going to put that in prod so 
> that I can test how it behaves with a large volume of sessions and at 
> least one error.
> 
> 
> Mitch
> 
> On 09/21/2018 05:00 PM, Mark Thomas wrote:
>> On 21/09/18 18:02, Mitch Claborn wrote:
>>> Please forgive me if this is the incorrect place or format for
>>> discussing this. I'm new to trying to develop for Tomcat.
>>
>> This is the right place. Welcome to the Tomcat community.
>>
>>> I'm developing a patch for DeltaManager and I'd like to discuss with you
>>> developers if it could be considered for inclusion in the base code.
>>> Please see details below and comment.
>>
>> Will do. Please note that session replication is not an area I am
>> particularly familiar with so if some of my comments are a little
>> off-base I apologise.
>>
>>> Problem: When the "all sessions" message is sent from one node to
>>> another, when the receiving node is first starting up, I often run into
>>> various errors with one of the sessions and it fails to deserialize.
>>> This causes all the remaining sessions in that chunk
>>> (sendAllSessionsSize) to be lost by the receiver.
>>
>> Oops.
>>
>>> The problem with the
>>> sessions is totally an application problem, but until I can figure those
>>> problems out and solve them I need a way to limit the impact of these
>>> problems to just the one session that is in error. I could set
>>> sendAllSessionsSize="1" but that would take a LONG time to transmit, and
>>> we have many thousands of sessions at any given time.
>>
>> That seems like a reasonable problem to try and solve.
>>
>>> Change details:
>>>
>>> 1. Update
>>>     
>>> org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
>>>     and
>>>     
>>> org.apache.catalina.ha.session.DeltaSession.doReadObject(ObjectInput)
>>>     to produce a more detailed error message when a session is in
>>>     error.  New error message includes: the session index in the list of
>>>     sessions, the session ID, the last field or attribute that was
>>>     attempted to be read.
>>
>> I'm not sure how useful the index will be but the other information
>> makes sense to me.
> 
> The index gives me an indication of how many sessions were discarded 
> because of the error.
> 
>>
>>> 2. Introduce new XML attribute verifySerializedSessions for 
>>> DeltaManager.
>>
>> Why would a user not want to enable this feature? The performance hit of
>> the additional deserialization on send?
> 
> That is the only reason I can think of.
> 
>>
>>> 3. If verifySerializedSessions="true",
>>>     
>>> org.apache.catalina.ha.session.DeltaManager.serializeSessions(Session[])
>>>     will first serialize each session then immediately deserialize it.
>>>     If all is good, send the session as usual.  If any errors are
>>>     encountered, create and send a dummy session with a known session ID
>>>     instead. (This keeps the session count, which has already been put
>>>     in the output stream, correct for the receiving node.)
>>
>> Ah. Is the issue that serialization works but deserialization does not?
>> That seems a little odd. Can you give an example of how this might go
>> wrong? I am trying to understand the root cause(s) of the problem to
>> determine if the proposed solution is appropriate. I thought
>> DeltaSession simply skipped over attributes that it could not 
>> deserialize.
> 
> DeltaSession does skip attributes that are not serializable. I've had 
> three identifiable errors, none of which I could reproduce at will.
> 
> 1. A session with a Vector<Long> that might have contained nulls.  This 
> should not be an issue, but I fixed my code to eliminate nulls in that 
> Vector, since they should not be there anyway.
> 
> 2. In some of my own objects where I do my own serialization with JSON, 
> there were some fields that I don't serialize that were not marked 
> transient that should have been. Some of those embedded objects were 
> thus serialized by the native serialization and caused some problems. I 
> fixed those.
> 
> 3. In another of my objects that I serialize with JSON, the JSON string 
> in the serialized session was obviously corrupted and was not a valid 
> JSON hash.  I went over the serialization code with a fine tooth come 
> and it appears to be correct. That same code works hundreds of thousands 
> of times a day without error.
> 
> Especially in the case of #3, I suspect that there might be a 
> concurrency issue - a session being modified in one request while it is 
> being serialized in another.
> 
> FYI, bordering on TMI: I just recently switched to DeltaManager from a 
> custom session sharing solution where I was doing my own persistence to 
> a database, with no in-memory storage. Concurrency was not an issue in 
> that setup because each request received an independent copy of the 
> session content. I could have had concurrency issues all along and not 
> known it.
> 
> 
>>
>>> 4. Update
>>>     
>>> org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
>>>     to discard any received session that has the known dummy session ID.
>>
>> This certainly looks like a problem that needs solving. I don't see any
>> obvious issues with the approach taken but I would like a better
>> understand of the root causes of the deserialization failures as I am
>> wondering if there are alternative solutions that are worth considering.
> 
> Understood. My goal with this patch is a) limit the negative effects of 
> a serialization/deserialization error, and b) give more information 
> about those errors so that the application can be fixed.
> 
>>
>> Mark
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Potential change to DeltaManager

Posted by Mitch Claborn <mi...@claborn.net>.
See below for answers to your questions.

Status update: I've been running my patch in production for about 16 
hours with no problems. I've restarted each Tomcat (3) once and had no 
problems, but also detected no errors, either on send or receive. I have 
some code that I used in dev to force an error on a specific combination 
of session attribute name and value.  I'm going to put that in prod so 
that I can test how it behaves with a large volume of sessions and at 
least one error.


Mitch

On 09/21/2018 05:00 PM, Mark Thomas wrote:
> On 21/09/18 18:02, Mitch Claborn wrote:
>> Please forgive me if this is the incorrect place or format for
>> discussing this. I'm new to trying to develop for Tomcat.
> 
> This is the right place. Welcome to the Tomcat community.
> 
>> I'm developing a patch for DeltaManager and I'd like to discuss with you
>> developers if it could be considered for inclusion in the base code.
>> Please see details below and comment.
> 
> Will do. Please note that session replication is not an area I am
> particularly familiar with so if some of my comments are a little
> off-base I apologise.
> 
>> Problem: When the "all sessions" message is sent from one node to
>> another, when the receiving node is first starting up, I often run into
>> various errors with one of the sessions and it fails to deserialize.
>> This causes all the remaining sessions in that chunk
>> (sendAllSessionsSize) to be lost by the receiver.
> 
> Oops.
> 
>> The problem with the
>> sessions is totally an application problem, but until I can figure those
>> problems out and solve them I need a way to limit the impact of these
>> problems to just the one session that is in error. I could set
>> sendAllSessionsSize="1" but that would take a LONG time to transmit, and
>> we have many thousands of sessions at any given time.
> 
> That seems like a reasonable problem to try and solve.
> 
>> Change details:
>>
>> 1. Update
>>     org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
>>     and
>>     org.apache.catalina.ha.session.DeltaSession.doReadObject(ObjectInput)
>>     to produce a more detailed error message when a session is in
>>     error.  New error message includes: the session index in the list of
>>     sessions, the session ID, the last field or attribute that was
>>     attempted to be read.
> 
> I'm not sure how useful the index will be but the other information
> makes sense to me.

The index gives me an indication of how many sessions were discarded 
because of the error.

> 
>> 2. Introduce new XML attribute verifySerializedSessions for DeltaManager.
> 
> Why would a user not want to enable this feature? The performance hit of
> the additional deserialization on send?

That is the only reason I can think of.

> 
>> 3. If verifySerializedSessions="true",
>>     org.apache.catalina.ha.session.DeltaManager.serializeSessions(Session[])
>>     will first serialize each session then immediately deserialize it.
>>     If all is good, send the session as usual.  If any errors are
>>     encountered, create and send a dummy session with a known session ID
>>     instead. (This keeps the session count, which has already been put
>>     in the output stream, correct for the receiving node.)
> 
> Ah. Is the issue that serialization works but deserialization does not?
> That seems a little odd. Can you give an example of how this might go
> wrong? I am trying to understand the root cause(s) of the problem to
> determine if the proposed solution is appropriate. I thought
> DeltaSession simply skipped over attributes that it could not deserialize.

DeltaSession does skip attributes that are not serializable. I've had 
three identifiable errors, none of which I could reproduce at will.

1. A session with a Vector<Long> that might have contained nulls.  This 
should not be an issue, but I fixed my code to eliminate nulls in that 
Vector, since they should not be there anyway.

2. In some of my own objects where I do my own serialization with JSON, 
there were some fields that I don't serialize that were not marked 
transient that should have been. Some of those embedded objects were 
thus serialized by the native serialization and caused some problems. I 
fixed those.

3. In another of my objects that I serialize with JSON, the JSON string 
in the serialized session was obviously corrupted and was not a valid 
JSON hash.  I went over the serialization code with a fine tooth come 
and it appears to be correct. That same code works hundreds of thousands 
of times a day without error.

Especially in the case of #3, I suspect that there might be a 
concurrency issue - a session being modified in one request while it is 
being serialized in another.

FYI, bordering on TMI: I just recently switched to DeltaManager from a 
custom session sharing solution where I was doing my own persistence to 
a database, with no in-memory storage. Concurrency was not an issue in 
that setup because each request received an independent copy of the 
session content. I could have had concurrency issues all along and not 
known it.


> 
>> 4. Update
>>     org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
>>     to discard any received session that has the known dummy session ID.
> 
> This certainly looks like a problem that needs solving. I don't see any
> obvious issues with the approach taken but I would like a better
> understand of the root causes of the deserialization failures as I am
> wondering if there are alternative solutions that are worth considering.

Understood. My goal with this patch is a) limit the negative effects of 
a serialization/deserialization error, and b) give more information 
about those errors so that the application can be fixed.

> 
> Mark
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Re: Potential change to DeltaManager

Posted by Mark Thomas <ma...@apache.org>.
On 21/09/18 18:02, Mitch Claborn wrote:
> Please forgive me if this is the incorrect place or format for
> discussing this. I'm new to trying to develop for Tomcat.

This is the right place. Welcome to the Tomcat community.

> I'm developing a patch for DeltaManager and I'd like to discuss with you
> developers if it could be considered for inclusion in the base code.
> Please see details below and comment.

Will do. Please note that session replication is not an area I am
particularly familiar with so if some of my comments are a little
off-base I apologise.

> Problem: When the "all sessions" message is sent from one node to
> another, when the receiving node is first starting up, I often run into
> various errors with one of the sessions and it fails to deserialize.
> This causes all the remaining sessions in that chunk
> (sendAllSessionsSize) to be lost by the receiver.

Oops.

> The problem with the
> sessions is totally an application problem, but until I can figure those
> problems out and solve them I need a way to limit the impact of these
> problems to just the one session that is in error. I could set
> sendAllSessionsSize="1" but that would take a LONG time to transmit, and
> we have many thousands of sessions at any given time.

That seems like a reasonable problem to try and solve.

> Change details:
> 
> 1. Update
>    org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
>    and
>    org.apache.catalina.ha.session.DeltaSession.doReadObject(ObjectInput)
>    to produce a more detailed error message when a session is in
>    error.  New error message includes: the session index in the list of
>    sessions, the session ID, the last field or attribute that was
>    attempted to be read.

I'm not sure how useful the index will be but the other information
makes sense to me.

> 2. Introduce new XML attribute verifySerializedSessions for DeltaManager.

Why would a user not want to enable this feature? The performance hit of
the additional deserialization on send?

> 3. If verifySerializedSessions="true",
>    org.apache.catalina.ha.session.DeltaManager.serializeSessions(Session[])
>    will first serialize each session then immediately deserialize it.
>    If all is good, send the session as usual.  If any errors are
>    encountered, create and send a dummy session with a known session ID
>    instead. (This keeps the session count, which has already been put
>    in the output stream, correct for the receiving node.)

Ah. Is the issue that serialization works but deserialization does not?
That seems a little odd. Can you give an example of how this might go
wrong? I am trying to understand the root cause(s) of the problem to
determine if the proposed solution is appropriate. I thought
DeltaSession simply skipped over attributes that it could not deserialize.

> 4. Update
>    org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
>    to discard any received session that has the known dummy session ID.

This certainly looks like a problem that needs solving. I don't see any
obvious issues with the approach taken but I would like a better
understand of the root causes of the deserialization failures as I am
wondering if there are alternative solutions that are worth considering.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org