You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by Mitch Claborn <mi...@claborn.net> on 2018/09/21 17:02:03 UTC
Potential change to DeltaManager
Please forgive me if this is the incorrect place or format for
discussing this. I'm new to trying to develop for Tomcat.
I'm developing a patch for DeltaManager and I'd like to discuss with you
developers if it could be considered for inclusion in the base code.
Please see details below and comment.
Problem: When the "all sessions" message is sent from one node to
another, when the receiving node is first starting up, I often run into
various errors with one of the sessions and it fails to deserialize.
This causes all the remaining sessions in that chunk
(sendAllSessionsSize) to be lost by the receiver. The problem with the
sessions is totally an application problem, but until I can figure those
problems out and solve them I need a way to limit the impact of these
problems to just the one session that is in error. I could set
sendAllSessionsSize="1" but that would take a LONG time to transmit, and
we have many thousands of sessions at any given time.
Change details:
1. Update
org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
and
org.apache.catalina.ha.session.DeltaSession.doReadObject(ObjectInput)
to produce a more detailed error message when a session is in
error. New error message includes: the session index in the list of
sessions, the session ID, the last field or attribute that was
attempted to be read.
2. Introduce new XML attribute verifySerializedSessions for DeltaManager.
3. If verifySerializedSessions="true",
org.apache.catalina.ha.session.DeltaManager.serializeSessions(Session[])
will first serialize each session then immediately deserialize it.
If all is good, send the session as usual. If any errors are
encountered, create and send a dummy session with a known session ID
instead. (This keeps the session count, which has already been put
in the output stream, correct for the receiving node.)
4. Update
org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
to discard any received session that has the known dummy session ID.
--
Mitch
Re: Potential change to DeltaManager
Posted by Mitch Claborn <mi...@claborn.net>.
FYI: I've created the Bugzilla request and submitted the patch there.
https://bz.apache.org/bugzilla/show_bug.cgi?id=62773
Mitch
On 9/27/18 5:49 PM, Mark Thomas wrote:
> Mitch,
>
> First some general comments.
>
> Projects at the ASF generally operate using lazy consensus meaning if
> no-one objects after a reasonable amount of time (72 hours is a good
> starting point for reasonable) then assume you have agreement to
> proceed. Note that it is ApacheCon NA this week so a number of the
> committers may be distracted and/or travelling.
>
> It sounds like a good next step would be to create a Bugzilla
> enhancement request and attach your patch.
>
> Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org
Re: Potential change to DeltaManager
Posted by Mark Thomas <ma...@apache.org>.
Mitch,
First some general comments.
Projects at the ASF generally operate using lazy consensus meaning if
no-one objects after a reasonable amount of time (72 hours is a good
starting point for reasonable) then assume you have agreement to
proceed. Note that it is ApacheCon NA this week so a number of the
committers may be distracted and/or travelling.
It sounds like a good next step would be to create a Bugzilla
enhancement request and attach your patch.
Mark
On 27/09/2018 11:41, Mitch Claborn wrote:
> Any further thoughts or comments on this? I think my patch is ready for
> prime time now.
>
>
> Mitch
>
> On 09/22/2018 11:23 AM, Mitch Claborn wrote:
>> See below for answers to your questions.
>>
>> Status update: I've been running my patch in production for about 16
>> hours with no problems. I've restarted each Tomcat (3) once and had no
>> problems, but also detected no errors, either on send or receive. I
>> have some code that I used in dev to force an error on a specific
>> combination of session attribute name and value. I'm going to put
>> that in prod so that I can test how it behaves with a large volume of
>> sessions and at least one error.
>>
>>
>> Mitch
>>
>> On 09/21/2018 05:00 PM, Mark Thomas wrote:
>>> On 21/09/18 18:02, Mitch Claborn wrote:
>>>> Please forgive me if this is the incorrect place or format for
>>>> discussing this. I'm new to trying to develop for Tomcat.
>>>
>>> This is the right place. Welcome to the Tomcat community.
>>>
>>>> I'm developing a patch for DeltaManager and I'd like to discuss with
>>>> you
>>>> developers if it could be considered for inclusion in the base code.
>>>> Please see details below and comment.
>>>
>>> Will do. Please note that session replication is not an area I am
>>> particularly familiar with so if some of my comments are a little
>>> off-base I apologise.
>>>
>>>> Problem: When the "all sessions" message is sent from one node to
>>>> another, when the receiving node is first starting up, I often run into
>>>> various errors with one of the sessions and it fails to deserialize.
>>>> This causes all the remaining sessions in that chunk
>>>> (sendAllSessionsSize) to be lost by the receiver.
>>>
>>> Oops.
>>>
>>>> The problem with the
>>>> sessions is totally an application problem, but until I can figure
>>>> those
>>>> problems out and solve them I need a way to limit the impact of these
>>>> problems to just the one session that is in error. I could set
>>>> sendAllSessionsSize="1" but that would take a LONG time to transmit,
>>>> and
>>>> we have many thousands of sessions at any given time.
>>>
>>> That seems like a reasonable problem to try and solve.
>>>
>>>> Change details:
>>>>
>>>> 1. Update
>>>> org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
>>>> and
>>>> org.apache.catalina.ha.session.DeltaSession.doReadObject(ObjectInput)
>>>> to produce a more detailed error message when a session is in
>>>> error. New error message includes: the session index in the
>>>> list of
>>>> sessions, the session ID, the last field or attribute that was
>>>> attempted to be read.
>>>
>>> I'm not sure how useful the index will be but the other information
>>> makes sense to me.
>>
>> The index gives me an indication of how many sessions were discarded
>> because of the error.
>>
>>>
>>>> 2. Introduce new XML attribute verifySerializedSessions for
>>>> DeltaManager.
>>>
>>> Why would a user not want to enable this feature? The performance hit of
>>> the additional deserialization on send?
>>
>> That is the only reason I can think of.
>>
>>>
>>>> 3. If verifySerializedSessions="true",
>>>> org.apache.catalina.ha.session.DeltaManager.serializeSessions(Session[])
>>>>
>>>> will first serialize each session then immediately deserialize it.
>>>> If all is good, send the session as usual. If any errors are
>>>> encountered, create and send a dummy session with a known
>>>> session ID
>>>> instead. (This keeps the session count, which has already been put
>>>> in the output stream, correct for the receiving node.)
>>>
>>> Ah. Is the issue that serialization works but deserialization does not?
>>> That seems a little odd. Can you give an example of how this might go
>>> wrong? I am trying to understand the root cause(s) of the problem to
>>> determine if the proposed solution is appropriate. I thought
>>> DeltaSession simply skipped over attributes that it could not
>>> deserialize.
>>
>> DeltaSession does skip attributes that are not serializable. I've had
>> three identifiable errors, none of which I could reproduce at will.
>>
>> 1. A session with a Vector<Long> that might have contained nulls.
>> This should not be an issue, but I fixed my code to eliminate nulls in
>> that Vector, since they should not be there anyway.
>>
>> 2. In some of my own objects where I do my own serialization with
>> JSON, there were some fields that I don't serialize that were not
>> marked transient that should have been. Some of those embedded objects
>> were thus serialized by the native serialization and caused some
>> problems. I fixed those.
>>
>> 3. In another of my objects that I serialize with JSON, the JSON
>> string in the serialized session was obviously corrupted and was not a
>> valid JSON hash. I went over the serialization code with a fine tooth
>> come and it appears to be correct. That same code works hundreds of
>> thousands of times a day without error.
>>
>> Especially in the case of #3, I suspect that there might be a
>> concurrency issue - a session being modified in one request while it
>> is being serialized in another.
>>
>> FYI, bordering on TMI: I just recently switched to DeltaManager from a
>> custom session sharing solution where I was doing my own persistence
>> to a database, with no in-memory storage. Concurrency was not an issue
>> in that setup because each request received an independent copy of the
>> session content. I could have had concurrency issues all along and not
>> known it.
>>
>>
>>>
>>>> 4. Update
>>>> org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
>>>> to discard any received session that has the known dummy session
>>>> ID.
>>>
>>> This certainly looks like a problem that needs solving. I don't see any
>>> obvious issues with the approach taken but I would like a better
>>> understand of the root causes of the deserialization failures as I am
>>> wondering if there are alternative solutions that are worth considering.
>>
>> Understood. My goal with this patch is a) limit the negative effects
>> of a serialization/deserialization error, and b) give more information
>> about those errors so that the application can be fixed.
>>
>>>
>>> Mark
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org
Re: Potential change to DeltaManager
Posted by Mitch Claborn <mi...@claborn.net>.
Any further thoughts or comments on this? I think my patch is ready for
prime time now.
Mitch
On 09/22/2018 11:23 AM, Mitch Claborn wrote:
> See below for answers to your questions.
>
> Status update: I've been running my patch in production for about 16
> hours with no problems. I've restarted each Tomcat (3) once and had no
> problems, but also detected no errors, either on send or receive. I have
> some code that I used in dev to force an error on a specific combination
> of session attribute name and value. I'm going to put that in prod so
> that I can test how it behaves with a large volume of sessions and at
> least one error.
>
>
> Mitch
>
> On 09/21/2018 05:00 PM, Mark Thomas wrote:
>> On 21/09/18 18:02, Mitch Claborn wrote:
>>> Please forgive me if this is the incorrect place or format for
>>> discussing this. I'm new to trying to develop for Tomcat.
>>
>> This is the right place. Welcome to the Tomcat community.
>>
>>> I'm developing a patch for DeltaManager and I'd like to discuss with you
>>> developers if it could be considered for inclusion in the base code.
>>> Please see details below and comment.
>>
>> Will do. Please note that session replication is not an area I am
>> particularly familiar with so if some of my comments are a little
>> off-base I apologise.
>>
>>> Problem: When the "all sessions" message is sent from one node to
>>> another, when the receiving node is first starting up, I often run into
>>> various errors with one of the sessions and it fails to deserialize.
>>> This causes all the remaining sessions in that chunk
>>> (sendAllSessionsSize) to be lost by the receiver.
>>
>> Oops.
>>
>>> The problem with the
>>> sessions is totally an application problem, but until I can figure those
>>> problems out and solve them I need a way to limit the impact of these
>>> problems to just the one session that is in error. I could set
>>> sendAllSessionsSize="1" but that would take a LONG time to transmit, and
>>> we have many thousands of sessions at any given time.
>>
>> That seems like a reasonable problem to try and solve.
>>
>>> Change details:
>>>
>>> 1. Update
>>>
>>> org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
>>> and
>>>
>>> org.apache.catalina.ha.session.DeltaSession.doReadObject(ObjectInput)
>>> to produce a more detailed error message when a session is in
>>> error. New error message includes: the session index in the list of
>>> sessions, the session ID, the last field or attribute that was
>>> attempted to be read.
>>
>> I'm not sure how useful the index will be but the other information
>> makes sense to me.
>
> The index gives me an indication of how many sessions were discarded
> because of the error.
>
>>
>>> 2. Introduce new XML attribute verifySerializedSessions for
>>> DeltaManager.
>>
>> Why would a user not want to enable this feature? The performance hit of
>> the additional deserialization on send?
>
> That is the only reason I can think of.
>
>>
>>> 3. If verifySerializedSessions="true",
>>>
>>> org.apache.catalina.ha.session.DeltaManager.serializeSessions(Session[])
>>> will first serialize each session then immediately deserialize it.
>>> If all is good, send the session as usual. If any errors are
>>> encountered, create and send a dummy session with a known session ID
>>> instead. (This keeps the session count, which has already been put
>>> in the output stream, correct for the receiving node.)
>>
>> Ah. Is the issue that serialization works but deserialization does not?
>> That seems a little odd. Can you give an example of how this might go
>> wrong? I am trying to understand the root cause(s) of the problem to
>> determine if the proposed solution is appropriate. I thought
>> DeltaSession simply skipped over attributes that it could not
>> deserialize.
>
> DeltaSession does skip attributes that are not serializable. I've had
> three identifiable errors, none of which I could reproduce at will.
>
> 1. A session with a Vector<Long> that might have contained nulls. This
> should not be an issue, but I fixed my code to eliminate nulls in that
> Vector, since they should not be there anyway.
>
> 2. In some of my own objects where I do my own serialization with JSON,
> there were some fields that I don't serialize that were not marked
> transient that should have been. Some of those embedded objects were
> thus serialized by the native serialization and caused some problems. I
> fixed those.
>
> 3. In another of my objects that I serialize with JSON, the JSON string
> in the serialized session was obviously corrupted and was not a valid
> JSON hash. I went over the serialization code with a fine tooth come
> and it appears to be correct. That same code works hundreds of thousands
> of times a day without error.
>
> Especially in the case of #3, I suspect that there might be a
> concurrency issue - a session being modified in one request while it is
> being serialized in another.
>
> FYI, bordering on TMI: I just recently switched to DeltaManager from a
> custom session sharing solution where I was doing my own persistence to
> a database, with no in-memory storage. Concurrency was not an issue in
> that setup because each request received an independent copy of the
> session content. I could have had concurrency issues all along and not
> known it.
>
>
>>
>>> 4. Update
>>>
>>> org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
>>> to discard any received session that has the known dummy session ID.
>>
>> This certainly looks like a problem that needs solving. I don't see any
>> obvious issues with the approach taken but I would like a better
>> understand of the root causes of the deserialization failures as I am
>> wondering if there are alternative solutions that are worth considering.
>
> Understood. My goal with this patch is a) limit the negative effects of
> a serialization/deserialization error, and b) give more information
> about those errors so that the application can be fixed.
>
>>
>> Mark
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org
Re: Potential change to DeltaManager
Posted by Mitch Claborn <mi...@claborn.net>.
See below for answers to your questions.
Status update: I've been running my patch in production for about 16
hours with no problems. I've restarted each Tomcat (3) once and had no
problems, but also detected no errors, either on send or receive. I have
some code that I used in dev to force an error on a specific combination
of session attribute name and value. I'm going to put that in prod so
that I can test how it behaves with a large volume of sessions and at
least one error.
Mitch
On 09/21/2018 05:00 PM, Mark Thomas wrote:
> On 21/09/18 18:02, Mitch Claborn wrote:
>> Please forgive me if this is the incorrect place or format for
>> discussing this. I'm new to trying to develop for Tomcat.
>
> This is the right place. Welcome to the Tomcat community.
>
>> I'm developing a patch for DeltaManager and I'd like to discuss with you
>> developers if it could be considered for inclusion in the base code.
>> Please see details below and comment.
>
> Will do. Please note that session replication is not an area I am
> particularly familiar with so if some of my comments are a little
> off-base I apologise.
>
>> Problem: When the "all sessions" message is sent from one node to
>> another, when the receiving node is first starting up, I often run into
>> various errors with one of the sessions and it fails to deserialize.
>> This causes all the remaining sessions in that chunk
>> (sendAllSessionsSize) to be lost by the receiver.
>
> Oops.
>
>> The problem with the
>> sessions is totally an application problem, but until I can figure those
>> problems out and solve them I need a way to limit the impact of these
>> problems to just the one session that is in error. I could set
>> sendAllSessionsSize="1" but that would take a LONG time to transmit, and
>> we have many thousands of sessions at any given time.
>
> That seems like a reasonable problem to try and solve.
>
>> Change details:
>>
>> 1. Update
>> org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
>> and
>> org.apache.catalina.ha.session.DeltaSession.doReadObject(ObjectInput)
>> to produce a more detailed error message when a session is in
>> error. New error message includes: the session index in the list of
>> sessions, the session ID, the last field or attribute that was
>> attempted to be read.
>
> I'm not sure how useful the index will be but the other information
> makes sense to me.
The index gives me an indication of how many sessions were discarded
because of the error.
>
>> 2. Introduce new XML attribute verifySerializedSessions for DeltaManager.
>
> Why would a user not want to enable this feature? The performance hit of
> the additional deserialization on send?
That is the only reason I can think of.
>
>> 3. If verifySerializedSessions="true",
>> org.apache.catalina.ha.session.DeltaManager.serializeSessions(Session[])
>> will first serialize each session then immediately deserialize it.
>> If all is good, send the session as usual. If any errors are
>> encountered, create and send a dummy session with a known session ID
>> instead. (This keeps the session count, which has already been put
>> in the output stream, correct for the receiving node.)
>
> Ah. Is the issue that serialization works but deserialization does not?
> That seems a little odd. Can you give an example of how this might go
> wrong? I am trying to understand the root cause(s) of the problem to
> determine if the proposed solution is appropriate. I thought
> DeltaSession simply skipped over attributes that it could not deserialize.
DeltaSession does skip attributes that are not serializable. I've had
three identifiable errors, none of which I could reproduce at will.
1. A session with a Vector<Long> that might have contained nulls. This
should not be an issue, but I fixed my code to eliminate nulls in that
Vector, since they should not be there anyway.
2. In some of my own objects where I do my own serialization with JSON,
there were some fields that I don't serialize that were not marked
transient that should have been. Some of those embedded objects were
thus serialized by the native serialization and caused some problems. I
fixed those.
3. In another of my objects that I serialize with JSON, the JSON string
in the serialized session was obviously corrupted and was not a valid
JSON hash. I went over the serialization code with a fine tooth come
and it appears to be correct. That same code works hundreds of thousands
of times a day without error.
Especially in the case of #3, I suspect that there might be a
concurrency issue - a session being modified in one request while it is
being serialized in another.
FYI, bordering on TMI: I just recently switched to DeltaManager from a
custom session sharing solution where I was doing my own persistence to
a database, with no in-memory storage. Concurrency was not an issue in
that setup because each request received an independent copy of the
session content. I could have had concurrency issues all along and not
known it.
>
>> 4. Update
>> org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
>> to discard any received session that has the known dummy session ID.
>
> This certainly looks like a problem that needs solving. I don't see any
> obvious issues with the approach taken but I would like a better
> understand of the root causes of the deserialization failures as I am
> wondering if there are alternative solutions that are worth considering.
Understood. My goal with this patch is a) limit the negative effects of
a serialization/deserialization error, and b) give more information
about those errors so that the application can be fixed.
>
> Mark
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org
Re: Potential change to DeltaManager
Posted by Mark Thomas <ma...@apache.org>.
On 21/09/18 18:02, Mitch Claborn wrote:
> Please forgive me if this is the incorrect place or format for
> discussing this. I'm new to trying to develop for Tomcat.
This is the right place. Welcome to the Tomcat community.
> I'm developing a patch for DeltaManager and I'd like to discuss with you
> developers if it could be considered for inclusion in the base code.
> Please see details below and comment.
Will do. Please note that session replication is not an area I am
particularly familiar with so if some of my comments are a little
off-base I apologise.
> Problem: When the "all sessions" message is sent from one node to
> another, when the receiving node is first starting up, I often run into
> various errors with one of the sessions and it fails to deserialize.
> This causes all the remaining sessions in that chunk
> (sendAllSessionsSize) to be lost by the receiver.
Oops.
> The problem with the
> sessions is totally an application problem, but until I can figure those
> problems out and solve them I need a way to limit the impact of these
> problems to just the one session that is in error. I could set
> sendAllSessionsSize="1" but that would take a LONG time to transmit, and
> we have many thousands of sessions at any given time.
That seems like a reasonable problem to try and solve.
> Change details:
>
> 1. Update
> org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
> and
> org.apache.catalina.ha.session.DeltaSession.doReadObject(ObjectInput)
> to produce a more detailed error message when a session is in
> error. New error message includes: the session index in the list of
> sessions, the session ID, the last field or attribute that was
> attempted to be read.
I'm not sure how useful the index will be but the other information
makes sense to me.
> 2. Introduce new XML attribute verifySerializedSessions for DeltaManager.
Why would a user not want to enable this feature? The performance hit of
the additional deserialization on send?
> 3. If verifySerializedSessions="true",
> org.apache.catalina.ha.session.DeltaManager.serializeSessions(Session[])
> will first serialize each session then immediately deserialize it.
> If all is good, send the session as usual. If any errors are
> encountered, create and send a dummy session with a known session ID
> instead. (This keeps the session count, which has already been put
> in the output stream, correct for the receiving node.)
Ah. Is the issue that serialization works but deserialization does not?
That seems a little odd. Can you give an example of how this might go
wrong? I am trying to understand the root cause(s) of the problem to
determine if the proposed solution is appropriate. I thought
DeltaSession simply skipped over attributes that it could not deserialize.
> 4. Update
> org.apache.catalina.ha.session.DeltaManager.deserializeSessions(byte[])
> to discard any received session that has the known dummy session ID.
This certainly looks like a problem that needs solving. I don't see any
obvious issues with the approach taken but I would like a better
understand of the root causes of the deserialization failures as I am
wondering if there are alternative solutions that are worth considering.
Mark
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org