You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tomcat.apache.org by Mark Thomas <ma...@apache.org> on 2022/08/23 20:42:55 UTC

[DISCUSS] MessageBytes refactoring

Hi all,

I've been looking at a fix for bug 66196. My ideas so far have revolved 
around MessageBytes but the solutions are being made more complex by the 
current behaviour of MessageBytes in some cases.

For example (I'm using strings in place of byte[] and char[] to keep it 
simple):

mb.setBytes("aaa");
mb.setChars("bbb");
mb.toBytes();

mb.getByteChunk() returns "aaa" whereas I'd expect it to be "bbb".

I'd like to refactor MessageBytes so it always behaves as if it has a 
single current value regardless of whether that value was set as a 
String, byte[] or char[]. If a get() method is called for a different 
type, conversion occurs on demand.

I'm reasonably confident that changing MessageBytes to always have a 
single, consistent value will also enable a few useful optimizations - 
particularly around ISO-8859-1 String to byte conversions which gets 
used a lot for HTTP response headers.

Note: As currently, if you write to the ByteChunk or CharChunk directly 
the caller is expected to take responsibility for keeping the values in 
sync or dealing with the consequences.

Thoughts?

Mark


[1] https://bz.apache.org/bugzilla/show_bug.cgi?id=66196

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: [DISCUSS] MessageBytes refactoring

Posted by Han Li <li...@apache.org>.


> 2022年8月24日 16:16，Mark Thomas <ma...@apache.org> 写道：
> 
> On 24/08/2022 09:08, Rémy Maucherat wrote:
>> On Tue, Aug 23, 2022 at 10:43 PM Mark Thomas <ma...@apache.org> wrote:
>>> 
>>> Hi all,
>>> 
>>> I've been looking at a fix for bug 66196. My ideas so far have revolved
>>> around MessageBytes but the solutions are being made more complex by the
>>> current behaviour of MessageBytes in some cases.
>>> 
>>> For example (I'm using strings in place of byte[] and char[] to keep it
>>> simple):
>>> 
>>> mb.setBytes("aaa");
>>> mb.setChars("bbb");
>>> mb.toBytes();
>>> 
>>> mb.getByteChunk() returns "aaa" whereas I'd expect it to be "bbb".
>>> 
>>> I'd like to refactor MessageBytes so it always behaves as if it has a
>>> single current value regardless of whether that value was set as a
>>> String, byte[] or char[]. If a get() method is called for a different
>>> type, conversion occurs on demand.
>>> 
>>> I'm reasonably confident that changing MessageBytes to always have a
>>> single, consistent value will also enable a few useful optimizations -
>>> particularly around ISO-8859-1 String to byte conversions which gets
>>> used a lot for HTTP response headers.
>>> 
>>> Note: As currently, if you write to the ByteChunk or CharChunk directly
>>> the caller is expected to take responsibility for keeping the values in
>>> sync or dealing with the consequences.
>>> 
>>> Thoughts?
>> Well, this is a bit risky obviously but you can attempt it.
> 
> Fair point.
> 
> On my first pass I found that the RewriteValve was accessing the internals directly. That case looked to be manageable. I agree the risk is that this is happening in other places that don't get spotted.
> 
> One option would be to refactor 10.1.x but delay the back-port to see if any regressions emerge.

Are there any sub-tasks I can do? I would be happy to help!

Han

> 
> Mark
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org <ma...@tomcat.apache.org>
> For additional commands, e-mail: dev-help@tomcat.apache.org <ma...@tomcat.apache.org>

Re: [DISCUSS] MessageBytes refactoring

Posted by Mark Thomas <ma...@apache.org>.

On 24/08/2022 09:08, Rémy Maucherat wrote:
> On Tue, Aug 23, 2022 at 10:43 PM Mark Thomas <ma...@apache.org> wrote:
>>
>> Hi all,
>>
>> I've been looking at a fix for bug 66196. My ideas so far have revolved
>> around MessageBytes but the solutions are being made more complex by the
>> current behaviour of MessageBytes in some cases.
>>
>> For example (I'm using strings in place of byte[] and char[] to keep it
>> simple):
>>
>> mb.setBytes("aaa");
>> mb.setChars("bbb");
>> mb.toBytes();
>>
>> mb.getByteChunk() returns "aaa" whereas I'd expect it to be "bbb".
>>
>> I'd like to refactor MessageBytes so it always behaves as if it has a
>> single current value regardless of whether that value was set as a
>> String, byte[] or char[]. If a get() method is called for a different
>> type, conversion occurs on demand.
>>
>> I'm reasonably confident that changing MessageBytes to always have a
>> single, consistent value will also enable a few useful optimizations -
>> particularly around ISO-8859-1 String to byte conversions which gets
>> used a lot for HTTP response headers.
>>
>> Note: As currently, if you write to the ByteChunk or CharChunk directly
>> the caller is expected to take responsibility for keeping the values in
>> sync or dealing with the consequences.
>>
>> Thoughts?
> 
> Well, this is a bit risky obviously but you can attempt it.

Fair point.

On my first pass I found that the RewriteValve was accessing the 
internals directly. That case looked to be manageable. I agree the risk 
is that this is happening in other places that don't get spotted.

One option would be to refactor 10.1.x but delay the back-port to see if 
any regressions emerge.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: [DISCUSS] MessageBytes refactoring

Posted by Rémy Maucherat <re...@apache.org>.

On Tue, Aug 23, 2022 at 10:43 PM Mark Thomas <ma...@apache.org> wrote:
>
> Hi all,
>
> I've been looking at a fix for bug 66196. My ideas so far have revolved
> around MessageBytes but the solutions are being made more complex by the
> current behaviour of MessageBytes in some cases.
>
> For example (I'm using strings in place of byte[] and char[] to keep it
> simple):
>
> mb.setBytes("aaa");
> mb.setChars("bbb");
> mb.toBytes();
>
> mb.getByteChunk() returns "aaa" whereas I'd expect it to be "bbb".
>
> I'd like to refactor MessageBytes so it always behaves as if it has a
> single current value regardless of whether that value was set as a
> String, byte[] or char[]. If a get() method is called for a different
> type, conversion occurs on demand.
>
> I'm reasonably confident that changing MessageBytes to always have a
> single, consistent value will also enable a few useful optimizations -
> particularly around ISO-8859-1 String to byte conversions which gets
> used a lot for HTTP response headers.
>
> Note: As currently, if you write to the ByteChunk or CharChunk directly
> the caller is expected to take responsibility for keeping the values in
> sync or dealing with the consequences.
>
> Thoughts?

Well, this is a bit risky obviously but you can attempt it.

Rémy

> Mark
>
>
> [1] https://bz.apache.org/bugzilla/show_bug.cgi?id=66196
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org