You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tomee.apache.org by David Blevins <da...@gmail.com> on 2019/06/07 00:35:54 UTC

ThreadContext set-to-list tweak

Doychin, thank you for so many code improvements!  It's very excellent to have your eyes on the code and see so many good commits.

Consider this a very late PR review :)  I did one tweak I wanted to run by you to see what thoughts you had.  I switched this from a Set back to a List:

 - https://github.com/apache/tomee/commit/4273365ef2902191e6a1fe025daa8877cbefc193#diff-cebe7eb85461eb2f286e293fea91a020

I don't think this is the resting state of the code, probably we need to do one of two things.

The hesitation is effectively on executing code in hash order.  My experience is that it isn't random enough to prevent code from truly becoming order-independent.  Some of the worst bugs I've wasted hours or days on turned out to be code that people reported as working most the time and would "randomly" fail.  In the end they would often turn out to be due to hashes and code that was written to be order independent, but actually wasn't.

One of the worst was a "bug" in CXF where if two classes had the same @Path("/foo") at the top, one would "randomly" be chosen based on hash order because the classes were in a hashset.  In practice the order was incredibly stable for many restarts (jvm instances) in a row and the "right" service was selected.  I could easily get 4 restarts in a row and see no issues, then the fifth "bam" it would "break."

Overall it took a 50 to 100 restarts with a debugger attached and about 4 solid hours to find the issue.  I could only reproduce the issue 50% of that time, so 2 of those hours were wasted.  In the 50% where I could reproduce the issue, getting nearer to the source of the problem was much harder because I wouldn't know if that time the issue was going to show up.  You never knew if the time spent stepping through code was going to be for nothing as many that jvm instance wouldn't have the issue.

In the end, I decided there were three bugs and the worst was the last one:

 1. The user had a bug because you can't put @Path("/foo") on one class and the exact @Path("/foo") on another class
 2. CXF had a bug because it did not tell the user you cannot do this and ran the incorrect code anyway
 3. CXF had another bug because it was written to execute code potentially critical code in a random order

In the end *I* had zero bugs but was the guy losing 4 hours.  The reason I decided #3 was the worst was because of this:

 3. If the execution order had been stable, there would not have been a "bug".  Whatever behavior the user saw (good or bad) would have been 100% consistent and reliable.  It would have always worked or never worked.

     -if always: If the user liked the behavior, I'd have never been called.
     -if never: If the user didn't like the behavior, I would have been able to reproduce it every time and find the issue likely 3x faster.

Effectively the randomness became a both a bug-cost-applifier and bug-obfuscator.  Since hashcodes (and therefore the execution order) are stable for the duration of the vm and you don't restart that often, it can be weeks or months before you notice any issues.

The realization I came to is that code should never be executed in hash order, it should always be a stable order.  It will either work or not work on the first try.

From that perspective, the code in either set or list form is ultimately not stable and needs fixing.  You were smart to see the list as a lie, because it is fed from static initializers so ultimately dependent on class-loading order which isn't guaranteed.

What we should do is add a guarantee in the order.  The fix for CXF (which I never made unfortunately) would have been to first sort the list of JAX-RS classes by class name, then select one by the path.  So then if the code "worked" and you pushed it into production it would always work.  It might break again in the future, but only due to code change.

We should probably do something like that in this code given how critical it is.  If order is unimportant, then the outside code (the ThreadContextListeners) probably don't mind if the ThreadContext sorts the list using some ordering of it's choosing so it will always at least be stable.

So on ThreadContext I changed the set back to a list, but did not implement something that would re-sort the list each time a listener is added or removed.  I wanted to first share the above and see if this is something you might want to hack on together.

Let me know if you do, because I think it would be incredibly fun to hack on one of the most important classes in the entire codebase with you. :)

On the sorting, we could probably use classname to determine order.  The trick would be the thread-safety.  We would need to rethink how we do that.  Sorting the list while another thread might be modifying it would just lead to an unstable sort.  Maybe an AtomicReference or something.  Dunno.

Anyway, huge email.  Sorry for the book :)  Love to hear your thoughts! 


-- 
David Blevins
http://twitter.com/dblevins
http://www.tomitribe.com

Re: ThreadContext set-to-list tweak

Posted by Doychin Bondzhev <do...@dsoft-bg.com>.

Can you test with this:

Use ListOrderedSet from commons-collections and initialize it with 
CopyOnWriteArraySet and CopyOnWriteArrayList instances.

It will preserve the add order and will make sure that there is only one 
instance of each listener.


On 7.6.2019 �. 3:35, David Blevins wrote:
> Doychin, thank you for so many code improvements!  It's very excellent to have your eyes on the code and see so many good commits.
>
> Consider this a very late PR review :)  I did one tweak I wanted to run by you to see what thoughts you had.  I switched this from a Set back to a List:
>
>   - https://github.com/apache/tomee/commit/4273365ef2902191e6a1fe025daa8877cbefc193#diff-cebe7eb85461eb2f286e293fea91a020
>
> I don't think this is the resting state of the code, probably we need to do one of two things.
>
> The hesitation is effectively on executing code in hash order.  My experience is that it isn't random enough to prevent code from truly becoming order-independent.  Some of the worst bugs I've wasted hours or days on turned out to be code that people reported as working most the time and would "randomly" fail.  In the end they would often turn out to be due to hashes and code that was written to be order independent, but actually wasn't.
>
> One of the worst was a "bug" in CXF where if two classes had the same @Path("/foo") at the top, one would "randomly" be chosen based on hash order because the classes were in a hashset.  In practice the order was incredibly stable for many restarts (jvm instances) in a row and the "right" service was selected.  I could easily get 4 restarts in a row and see no issues, then the fifth "bam" it would "break."
>
> Overall it took a 50 to 100 restarts with a debugger attached and about 4 solid hours to find the issue.  I could only reproduce the issue 50% of that time, so 2 of those hours were wasted.  In the 50% where I could reproduce the issue, getting nearer to the source of the problem was much harder because I wouldn't know if that time the issue was going to show up.  You never knew if the time spent stepping through code was going to be for nothing as many that jvm instance wouldn't have the issue.
>
> In the end, I decided there were three bugs and the worst was the last one:
>
>   1. The user had a bug because you can't put @Path("/foo") on one class and the exact @Path("/foo") on another class
>   2. CXF had a bug because it did not tell the user you cannot do this and ran the incorrect code anyway
>   3. CXF had another bug because it was written to execute code potentially critical code in a random order
>
> In the end *I* had zero bugs but was the guy losing 4 hours.  The reason I decided #3 was the worst was because of this:
>
>   3. If the execution order had been stable, there would not have been a "bug".  Whatever behavior the user saw (good or bad) would have been 100% consistent and reliable.  It would have always worked or never worked.
>
>       -if always: If the user liked the behavior, I'd have never been called.
>       -if never: If the user didn't like the behavior, I would have been able to reproduce it every time and find the issue likely 3x faster.
>
> Effectively the randomness became a both a bug-cost-applifier and bug-obfuscator.  Since hashcodes (and therefore the execution order) are stable for the duration of the vm and you don't restart that often, it can be weeks or months before you notice any issues.
>
> The realization I came to is that code should never be executed in hash order, it should always be a stable order.  It will either work or not work on the first try.
>
>  From that perspective, the code in either set or list form is ultimately not stable and needs fixing.  You were smart to see the list as a lie, because it is fed from static initializers so ultimately dependent on class-loading order which isn't guaranteed.
>
> What we should do is add a guarantee in the order.  The fix for CXF (which I never made unfortunately) would have been to first sort the list of JAX-RS classes by class name, then select one by the path.  So then if the code "worked" and you pushed it into production it would always work.  It might break again in the future, but only due to code change.
>
> We should probably do something like that in this code given how critical it is.  If order is unimportant, then the outside code (the ThreadContextListeners) probably don't mind if the ThreadContext sorts the list using some ordering of it's choosing so it will always at least be stable.
>
> So on ThreadContext I changed the set back to a list, but did not implement something that would re-sort the list each time a listener is added or removed.  I wanted to first share the above and see if this is something you might want to hack on together.
>
> Let me know if you do, because I think it would be incredibly fun to hack on one of the most important classes in the entire codebase with you. :)
>
> On the sorting, we could probably use classname to determine order.  The trick would be the thread-safety.  We would need to rethink how we do that.  Sorting the list while another thread might be modifying it would just lead to an unstable sort.  Maybe an AtomicReference or something.  Dunno.
>
> Anyway, huge email.  Sorry for the book :)  Love to hear your thoughts!
>
>

-- 
Doychin Bondzhev
dSoft-Bulgaria Ltd.
PowerPro - billing & provisioning solution for Service providers
http://www.dsoft-bg.com/
Mobile: +359888243116

Re: ThreadContext set-to-list tweak

Posted by Doychin Bondzhev <do...@dsoft-bg.com>.

I hope this can wait few days? I just need to finish something during 
the next few days then I can propose a solution will execute the 
listeners in predictable order (based on their addition to the 
collection) and keep the Set behavior so it will not accept same 
listener more then once. That was the main reason I changed that 
collection to be Set.

On 7.6.2019 г. 3:35, David Blevins wrote:
> Doychin, thank you for so many code improvements!  It's very excellent to have your eyes on the code and see so many good commits.
>
> Consider this a very late PR review :)  I did one tweak I wanted to run by you to see what thoughts you had.  I switched this from a Set back to a List:
>
>   - https://github.com/apache/tomee/commit/4273365ef2902191e6a1fe025daa8877cbefc193#diff-cebe7eb85461eb2f286e293fea91a020
>
> I don't think this is the resting state of the code, probably we need to do one of two things.
>
> The hesitation is effectively on executing code in hash order.  My experience is that it isn't random enough to prevent code from truly becoming order-independent.  Some of the worst bugs I've wasted hours or days on turned out to be code that people reported as working most the time and would "randomly" fail.  In the end they would often turn out to be due to hashes and code that was written to be order independent, but actually wasn't.
>
> One of the worst was a "bug" in CXF where if two classes had the same @Path("/foo") at the top, one would "randomly" be chosen based on hash order because the classes were in a hashset.  In practice the order was incredibly stable for many restarts (jvm instances) in a row and the "right" service was selected.  I could easily get 4 restarts in a row and see no issues, then the fifth "bam" it would "break."
>
> Overall it took a 50 to 100 restarts with a debugger attached and about 4 solid hours to find the issue.  I could only reproduce the issue 50% of that time, so 2 of those hours were wasted.  In the 50% where I could reproduce the issue, getting nearer to the source of the problem was much harder because I wouldn't know if that time the issue was going to show up.  You never knew if the time spent stepping through code was going to be for nothing as many that jvm instance wouldn't have the issue.
>
> In the end, I decided there were three bugs and the worst was the last one:
>
>   1. The user had a bug because you can't put @Path("/foo") on one class and the exact @Path("/foo") on another class
>   2. CXF had a bug because it did not tell the user you cannot do this and ran the incorrect code anyway
>   3. CXF had another bug because it was written to execute code potentially critical code in a random order
>
> In the end *I* had zero bugs but was the guy losing 4 hours.  The reason I decided #3 was the worst was because of this:
>
>   3. If the execution order had been stable, there would not have been a "bug".  Whatever behavior the user saw (good or bad) would have been 100% consistent and reliable.  It would have always worked or never worked.
>
>       -if always: If the user liked the behavior, I'd have never been called.
>       -if never: If the user didn't like the behavior, I would have been able to reproduce it every time and find the issue likely 3x faster.
>
> Effectively the randomness became a both a bug-cost-applifier and bug-obfuscator.  Since hashcodes (and therefore the execution order) are stable for the duration of the vm and you don't restart that often, it can be weeks or months before you notice any issues.
>
> The realization I came to is that code should never be executed in hash order, it should always be a stable order.  It will either work or not work on the first try.
>
>  From that perspective, the code in either set or list form is ultimately not stable and needs fixing.  You were smart to see the list as a lie, because it is fed from static initializers so ultimately dependent on class-loading order which isn't guaranteed.
>
> What we should do is add a guarantee in the order.  The fix for CXF (which I never made unfortunately) would have been to first sort the list of JAX-RS classes by class name, then select one by the path.  So then if the code "worked" and you pushed it into production it would always work.  It might break again in the future, but only due to code change.
>
> We should probably do something like that in this code given how critical it is.  If order is unimportant, then the outside code (the ThreadContextListeners) probably don't mind if the ThreadContext sorts the list using some ordering of it's choosing so it will always at least be stable.
>
> So on ThreadContext I changed the set back to a list, but did not implement something that would re-sort the list each time a listener is added or removed.  I wanted to first share the above and see if this is something you might want to hack on together.
>
> Let me know if you do, because I think it would be incredibly fun to hack on one of the most important classes in the entire codebase with you. :)
>
> On the sorting, we could probably use classname to determine order.  The trick would be the thread-safety.  We would need to rethink how we do that.  Sorting the list while another thread might be modifying it would just lead to an unstable sort.  Maybe an AtomicReference or something.  Dunno.
>
> Anyway, huge email.  Sorry for the book :)  Love to hear your thoughts!
>
>

-- 
Doychin Bondzhev
dSoft-Bulgaria Ltd.
PowerPro - billing & provisioning solution for Service providers
http://www.dsoft-bg.com/
Mobile: +359888243116