You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Vladimir Strigun <vs...@gmail.com> on 2006/04/26 12:20:44 UTC

[classlib] charset decoding

On 4/26/06, Andrew Zhang <zh...@gmail.com> wrote:
> Hi Vladimir,
>
> I think we both understand the spec :)
>
> Here I propose two solutions:
>
> 1. Set the ByteBuffer to the limit, and store the remaining ByteBuffer in
> decoder. Invokers doesn't care the position of in. If the result is
> UNDERFLOW and there're furthur input, just pass the new input to the
> decoder. It's a typical streaming decoder.  That's what Harmony does
> currently.
>
> 2. Decoder doesn't store remaining ByteBuffer. Position of "in" is set to
> indicate the remaining ByteBuffer. Invoker should take care to generate new
> input ByteBuffer for next invocation.  RI acts in this way.
>
> Both are acceptable.
>
> However, I think solution 1 is more reasonable.
>
> "Is it possible to store bytes in decoder, support streaming decoding, and,
> at the same time sets correct position in input buffer after each operation?
> "
>
> Yes.  In fact, your patch will make Harmony act as the description above.
>
> However, I don't think it solve the problem. Maybe invoker is more
> confusable and may think:
>
> "Is the remaining bytebuffer maintained in decoder or in ?  Shall I get the
> remaining buffer from in and pass it for next invocation?"
>
> Anyway, we need a decision on this compatibility issue.
> By the way, Vladimir, does solution one cause any problem on other classlib
> implementation?


I try to find best solution for Harmony-166 and during fix preparation
I've found current issue.

Method test_read from tests.api.java.io.InputStreamReaderTest failed
and first fix I've created was with in.available() but I agree with
Mikhail that it's possible not the best solution.

In current implementation of InputStreamReader endOfInput variable
sets to true if reader can't read less that 8192 bytes. When
InputStreamReader try to read one character from
LimitedByteArrayInputStream (A ByteArrayInputStream that only returns
a single byte per read) true as a parameter passed to charset decoder,
nevertheless we still have further input from
LimitedByteArrayInputStream.

2 methods from tests.api.java.io.OutputStreamWriterTest also failed
because of read() method implementation of InputStreamReader.

IMO, we have 2 ways to fix Harmony-166 :

1. Don't change the behavior of CharsetDecoder, and use in.available()
method for endOfInput variable. If in.available() > 0 pass false to
decoder, read additional one byte from input stream and try to decode
again. If decoding operation can decode remaining portion + one
additional byte to character, stop here, else try to read addition
byte again. Function fillBuf will invoke with the next invocation of
read() method and next portion of 8192 bytes will be read from input
stream.

2. Change the behavior of CharsetDecoder and operate with
bytes.remaining() for calculating endOfInput. In this case, we always
pass false with first invocation of decode method. Algorithm further
is almost the same as above, but we stop the cycle if all bytes
decoded successfully (i.e if bytes.hasRemaining() is false).

What do you think about it?


Thanks.
Vladimir.

> Any comment?
>
> Thanks !
>
>
> Vladimir Strigun commented on HARMONY-410:
> ------------------------------------------
>
> Hi Richard,
>
> Thanks for the clarification, I agree that streaming decode is good thing,
> but I'd like to explain my understanding of specification :)
> According to specification of CharsetDecoder decoding operation should
> process by the following steps:
> "
> 2. Invoke the decode method zero or more times, as long as additional input
> may be available, passing false for the endOfInput argument and filling the
> input buffer and flushing the output buffer between invocations;
>
> 3. Invoke the decode method one final time, passing true for the endOfInput
> argument; and then
> "
> spec also says:
> "The buffers' positions will be advanced to reflect the bytes read and the
> characters written, but their marks and limits will not be modified"
>
> I understand these sentences in the next way:
> invoke decode with endOfInput = false if you have additional input, then
> fill the buffer (i.e. add to buffer some additional input), invoke decode
> again and pass correct endOfInput parameter dependent of availability of
> input.
>
> Example you provided is very useful and, of course, 1st option looks better,
> but what I'm suggest here is to reflect actual processed bytes in input.
> After first invocation of decode, not all bytes processed actually, i.e.
> almost all bytes processed, but some stored in decoder to further operation.
> Is it possible to store bytes in decoder, support streaming decoding, and,
> at the same time sets correct position in input buffer after each operation?
>
> Thanks.
> Vladimir.
>
> > method decode(ByteBuffer, CharBuffer, boolean) should set correct position
> in ByteBuffer
> >
> ----------------------------------------------------------------------------------------
> >
> >          Key: HARMONY-410
> >          URL: http://issues.apache.org/jira/browse/HARMONY-410
> >      Project: Harmony
> >         Type: Bug
>
> >   Components: Classlib
> >     Reporter: Vladimir Strigun
> >     Assignee: Mikhail Loenko
> >     Priority: Minor
> >  Attachments: Harmony-410_patch.txt, harmony-410_test.txt
> >
> > When ByteBuffer contain incomplete sequence of bytes for successful
> decoding, position in ByteBuffer should be set to latest successful byte. I
> will attach testcase and patch soon.
>
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
>  http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>  http://www.atlassian.com/software/jira
>
>
>
>
>
> --
> Andrew Zhang
> China Software Development Lab, IBM
>
>

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib] charset decoding

Posted by Mikhail Loenko <ml...@gmail.com>.
2006/4/26, Paulex Yang <pa...@gmail.com>:
> Andrew Zhang wrote:
> >
> > Mikhail Loenko wrote:
> >> 2006/4/26, Andrew Zhang <zh...@gmail.com>:
> >>> Vladimir Strigun wrote:
> >>>> On 4/26/06, Andrew Zhang <zh...@gmail.com> wrote:
> >>>> I try to find best solution for Harmony-166 and during fix preparation
> >>>> I've found current issue.
> >>>>
> >>>> Method test_read from tests.api.java.io.InputStreamReaderTest failed
> >>>> and first fix I've created was with in.available() but I agree with
> >>>> Mikhail that it's possible not the best solution.
> >>> why?
> >>>
> >>> Yes, the spec says "The available method for class InputStream always
> >>> returns 0.". But it also says "This method should be overridden by
> >>> subclasses.". :)
> >>>
> >>> Here's the available description:
> >>> "Returns the number of bytes that can be read (or skipped over) from
> >>> this input stream without blocking by the next caller of a method for
> >>> this input stream."
> >>>
> >>> If someone writes a subclass of InputStream that available returns 0
> >>> even if there are some data available, then he should take the
> >>> result of
> >>> cheating or contradicting with spec.
> >
> >>
> >> What if someone just unable to say if the bytes are available?
> >> For example, he reads something from a hardware module and
> >> the only way to know whether there are bytes is try reading?
> >>
> >
> > I'm a little confused.
> >
> > How does solution 2 handle this problem?
> I agree with Mikhail that available is not reliable.  You can judge the
> stream end has been reached if and only if you got -1 returned from read
> methods.
>
> I did a quick view of Harmony-166 and patches(seems it's the root of why
> Harmony-410 is raised), I think there should be some easier way to fix
> this bug,  while the CharsetDecoder doesn't need to be modified, pls.
> see below for details.
>
> The fillBuf() of InputStreamReader should be looks like:
>
>    private void fillBuf() throws IOException {
>        chars.clear();
>        int read = 0;
>       // if we haven't reached the end of stream, and cannot decode
> even one character,
>      // go on to read and decode
>        while(read != -1 && chars.position() == 0){
>            try {
>                read = in.read(bytes.array());
>            } catch (IOException e) {
>                chars.limit(0);
>                throw e;
>            }
>            boolean endOfInput = false;
>            if(read == -1){
>                //if we have reached the end of stream, try the last
> time to decode
>                bytes.limit(0);
>                endOfInput = true;
>            }else{
>                //if we read some bytes, try to decode them
>                bytes.limit(read);
>                read = 0;
>            }
>            decoder.decode(bytes, chars, endOfInput);
>            bytes.clear();
>        }
>        chars.flip();
>    }
>
> I've got a pass to run Vladimir's test case for Harmony-166.
>
> If no one objections, I'll attach patch based on this.

+1 from me



 comments?
> >
> >> Thanks,
> >> Mikhail
> >>
> >>>> In current implementation of InputStreamReader endOfInput variable
> >>>> sets to true if reader can't read less that 8192 bytes. When
> >>>> InputStreamReader try to read one character from
> >>>> LimitedByteArrayInputStream (A ByteArrayInputStream that only returns
> >>>> a single byte per read) true as a parameter passed to charset decoder,
> >>>> nevertheless we still have further input from
> >>>> LimitedByteArrayInputStream.
> >>>>
> >>>> 2 methods from tests.api.java.io.OutputStreamWriterTest also failed
> >>>> because of read() method implementation of InputStreamReader.
> >>>>
> >>>> IMO, we have 2 ways to fix Harmony-166 :
> >>>>
> >>>> 1. Don't change the behavior of CharsetDecoder, and use in.available()
> >>>> method for endOfInput variable. If in.available() > 0 pass false to
> >>>> decoder, read additional one byte from input stream and try to decode
> >>>> again. If decoding operation can decode remaining portion + one
> >>>> additional byte to character, stop here, else try to read addition
> >>>> byte again. Function fillBuf will invoke with the next invocation of
> >>>> read() method and next portion of 8192 bytes will be read from input
> >>>> stream.
> >>>>
> >>> It's reasonable.
> >>> After all, streaming decode is useful and helpful to users.
> >>> IMO, streaming decode is a highlight compared with some RI :)
> >>>
> >>>> 2. Change the behavior of CharsetDecoder and operate with
> >>>> bytes.remaining() for calculating endOfInput. In this case, we always
> >>>> pass false with first invocation of decode method. Algorithm further
> >>>> is almost the same as above, but we stop the cycle if all bytes
> >>>> decoded successfully (i.e if bytes.hasRemaining() is false).
> >
> > Vladimir, Could you please give the detail code or pseudo code of
> > these two solutions?
> >
> > Thanks!
> >
> >>>>
> >>>> What do you think about it?
> >>>>
> >>>>
> >>>> Thanks.
> >>>> Vladimir.
> >>> Thanks!
> >
> >
> >
> > ---------------------------------------------------------------------
> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> >
> >
>
>
> --
> Paulex Yang
> China Software Development Lab
> IBM
>
>
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib] charset decoding

Posted by Vladimir Strigun <vs...@gmail.com>.
On 4/28/06, Paulex Yang <pa...@gmail.com> wrote:
> Richard Liang wrote:
> > Paulex Yang wrote:
> >> I see, the OutputStreamWriterTest does fail, seems there's some
> >> problem for CharsetDecoder to handling UTF-8 byte stream, but the
> >> InputStreamReader itself should work because it pass all tests for
> >> other decoding. I'll go on to study it.
> >>
> >> And I also realized there is a bug in my former proposal - it cannot
> >> support the non-blocking InputStream, so the revisited version is as
> >> below:
> >>    private void fillBuf() throws IOException {
> >>        chars.clear();
> >>        int read = 0;
> >>        do{
> >>            try {
> >>                read = in.read(bytes.array());
> >>            } catch (IOException e) {
> >>                chars.limit(0);
> >>                throw e;
> >>            }
> >>            boolean endOfInput = false;
> >>            if(read == -1){
> >>                bytes.limit(0);
> >>                endOfInput = true;
> >>            }else{
> >>                bytes.limit(read);
> >>            }
> >>            decoder.decode(bytes, chars, endOfInput);
> >>            bytes.clear();
> >>        }while(read > 0 && chars.position() == 0);
> >> //the main difference with prior version is to check read>0 instead
> >> of read != -1, so that the InputStreamReader based on non-blocking
> >> InputStream can return immediatelly
> >>        chars.flip();
> >>    }
> >>
> > Yes. It IS a bug of ICU4JNI. I have submitted a bug [1] for ICU and
> > have proposed a fix for it.
> >
> > [1] http://bugs.icu-project.org/cgi-bin/icu-bugs/incoming?findid=5180
> >
>
> Great! Thank you, Richard. Vladimir, how about I attach patch for
> InputStreamReader to Harmony-166 at first? At least, it can make
> InputStreamReaderTest pass.

Yes, of course. Thank you!

Vladimir.

> Richard, have you tried the OutputStreamWriterTest with your patch for
> ICU and my modification to InputStreamReader?
>
> --
> Paulex Yang
> China Software Development Lab
> IBM
>
>
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib] charset decoding

Posted by Paulex Yang <pa...@gmail.com>.
Richard Liang wrote:
> Paulex Yang wrote:
>> I see, the OutputStreamWriterTest does fail, seems there's some 
>> problem for CharsetDecoder to handling UTF-8 byte stream, but the 
>> InputStreamReader itself should work because it pass all tests for 
>> other decoding. I'll go on to study it.
>>
>> And I also realized there is a bug in my former proposal - it cannot 
>> support the non-blocking InputStream, so the revisited version is as 
>> below:
>>    private void fillBuf() throws IOException {
>>        chars.clear();
>>        int read = 0;
>>        do{
>>            try {
>>                read = in.read(bytes.array());
>>            } catch (IOException e) {
>>                chars.limit(0);
>>                throw e;
>>            }
>>            boolean endOfInput = false;
>>            if(read == -1){
>>                bytes.limit(0);
>>                endOfInput = true;
>>            }else{
>>                bytes.limit(read);
>>            }
>>            decoder.decode(bytes, chars, endOfInput);
>>            bytes.clear();
>>        }while(read > 0 && chars.position() == 0);
>> //the main difference with prior version is to check read>0 instead 
>> of read != -1, so that the InputStreamReader based on non-blocking 
>> InputStream can return immediatelly
>>        chars.flip();
>>    }  
>>
> Yes. It IS a bug of ICU4JNI. I have submitted a bug [1] for ICU and 
> have proposed a fix for it.
>
> [1] http://bugs.icu-project.org/cgi-bin/icu-bugs/incoming?findid=5180
>

Great! Thank you, Richard. Vladimir, how about I attach patch for 
InputStreamReader to Harmony-166 at first? At least, it can make 
InputStreamReaderTest pass.

Richard, have you tried the OutputStreamWriterTest with your patch for 
ICU and my modification to InputStreamReader?

-- 
Paulex Yang
China Software Development Lab
IBM



---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib] charset decoding

Posted by Richard Liang <ri...@gmail.com>.
Paulex Yang wrote:
> Vladimir Strigun wrote:
>> On 4/26/06, Paulex Yang <pa...@gmail.com> wrote:
>>  
>>> Andrew Zhang wrote:
>>>    
>>>> Mikhail Loenko wrote:
>>>>      
>>>>> 2006/4/26, Andrew Zhang <zh...@gmail.com>:
>>>>>        
>>>>>> Vladimir Strigun wrote:
>>>>>>          
>>>>>>> On 4/26/06, Andrew Zhang <zh...@gmail.com> wrote:
>>>>>>> I try to find best solution for Harmony-166 and during fix 
>>>>>>> preparation
>>>>>>> I've found current issue.
>>>>>>>
>>>>>>> Method test_read from tests.api.java.io.InputStreamReaderTest 
>>>>>>> failed
>>>>>>> and first fix I've created was with in.available() but I agree with
>>>>>>> Mikhail that it's possible not the best solution.
>>>>>>>             
>>>>>> why?
>>>>>>
>>>>>> Yes, the spec says "The available method for class InputStream 
>>>>>> always
>>>>>> returns 0.". But it also says "This method should be overridden by
>>>>>> subclasses.". :)
>>>>>>
>>>>>> Here's the available description:
>>>>>> "Returns the number of bytes that can be read (or skipped over) from
>>>>>> this input stream without blocking by the next caller of a method 
>>>>>> for
>>>>>> this input stream."
>>>>>>
>>>>>> If someone writes a subclass of InputStream that available returns 0
>>>>>> even if there are some data available, then he should take the
>>>>>> result of
>>>>>> cheating or contradicting with spec.
>>>>>>           
>>>>> What if someone just unable to say if the bytes are available?
>>>>> For example, he reads something from a hardware module and
>>>>> the only way to know whether there are bytes is try reading?
>>>>>
>>>>>         
>>>> I'm a little confused.
>>>>
>>>> How does solution 2 handle this problem?
>>>>       
>>> I agree with Mikhail that available is not reliable.  You can judge the
>>> stream end has been reached if and only if you got -1 returned from 
>>> read
>>> methods.
>>>
>>> I did a quick view of Harmony-166 and patches(seems it's the root of 
>>> why
>>> Harmony-410 is raised), I think there should be some easier way to fix
>>> this bug,  while the CharsetDecoder doesn't need to be modified, pls.
>>> see below for details.
>>>
>>> The fillBuf() of InputStreamReader should be looks like:
>>>
>>>    private void fillBuf() throws IOException {
>>>        chars.clear();
>>>        int read = 0;
>>>       // if we haven't reached the end of stream, and cannot decode
>>> even one character,
>>>      // go on to read and decode
>>>        while(read != -1 && chars.position() == 0){
>>>            try {
>>>                read = in.read(bytes.array());
>>>            } catch (IOException e) {
>>>                chars.limit(0);
>>>                throw e;
>>>            }
>>>            boolean endOfInput = false;
>>>            if(read == -1){
>>>                //if we have reached the end of stream, try the last
>>> time to decode
>>>                bytes.limit(0);
>>>                endOfInput = true;
>>>            }else{
>>>                //if we read some bytes, try to decode them
>>>                bytes.limit(read);
>>>                read = 0;
>>>            }
>>>            decoder.decode(bytes, chars, endOfInput);
>>>            bytes.clear();
>>>        }
>>>        chars.flip();
>>>    }
>>>
>>> I've got a pass to run Vladimir's test case for Harmony-166.
>>>
>>> If no one objections, I'll attach patch based on this. comments?
>>>     
>>
>> Paulex, unfortunately I can't agree with your patch.
>> InputStreamReaderTest passed, but OutputStreamWriterTest still failed.
>> Could you please try to run test I have mentioned?
>>   
> I see, the OutputStreamWriterTest does fail, seems there's some 
> problem for CharsetDecoder to handling UTF-8 byte stream, but the 
> InputStreamReader itself should work because it pass all tests for 
> other decoding. I'll go on to study it.
>
> And I also realized there is a bug in my former proposal - it cannot 
> support the non-blocking InputStream, so the revisited version is as 
> below:
>    private void fillBuf() throws IOException {
>        chars.clear();
>        int read = 0;
>        do{
>            try {
>                read = in.read(bytes.array());
>            } catch (IOException e) {
>                chars.limit(0);
>                throw e;
>            }
>            boolean endOfInput = false;
>            if(read == -1){
>                bytes.limit(0);
>                endOfInput = true;
>            }else{
>                bytes.limit(read);
>            }
>            decoder.decode(bytes, chars, endOfInput);
>            bytes.clear();
>        }while(read > 0 && chars.position() == 0);
> //the main difference with prior version is to check read>0 instead of 
> read != -1, so that the InputStreamReader based on non-blocking 
> InputStream can return immediatelly
>        chars.flip();
>    }
>> Thanks.
>> Vladimir.
>>
>>  
>>>>> Thanks,
>>>>> Mikhail
>>>>>
>>>>>        
>>>>>>> In current implementation of InputStreamReader endOfInput variable
>>>>>>> sets to true if reader can't read less that 8192 bytes. When
>>>>>>> InputStreamReader try to read one character from
>>>>>>> LimitedByteArrayInputStream (A ByteArrayInputStream that only 
>>>>>>> returns
>>>>>>> a single byte per read) true as a parameter passed to charset 
>>>>>>> decoder,
>>>>>>> nevertheless we still have further input from
>>>>>>> LimitedByteArrayInputStream.
>>>>>>>
>>>>>>> 2 methods from tests.api.java.io.OutputStreamWriterTest also failed
>>>>>>> because of read() method implementation of InputStreamReader.
>>>>>>>
>>>>>>> IMO, we have 2 ways to fix Harmony-166 :
>>>>>>>
>>>>>>> 1. Don't change the behavior of CharsetDecoder, and use 
>>>>>>> in.available()
>>>>>>> method for endOfInput variable. If in.available() > 0 pass false to
>>>>>>> decoder, read additional one byte from input stream and try to 
>>>>>>> decode
>>>>>>> again. If decoding operation can decode remaining portion + one
>>>>>>> additional byte to character, stop here, else try to read addition
>>>>>>> byte again. Function fillBuf will invoke with the next 
>>>>>>> invocation of
>>>>>>> read() method and next portion of 8192 bytes will be read from 
>>>>>>> input
>>>>>>> stream.
>>>>>>>
>>>>>>>             
>>>>>> It's reasonable.
>>>>>> After all, streaming decode is useful and helpful to users.
>>>>>> IMO, streaming decode is a highlight compared with some RI :)
>>>>>>
>>>>>>          
>>>>>>> 2. Change the behavior of CharsetDecoder and operate with
>>>>>>> bytes.remaining() for calculating endOfInput. In this case, we 
>>>>>>> always
>>>>>>> pass false with first invocation of decode method. Algorithm 
>>>>>>> further
>>>>>>> is almost the same as above, but we stop the cycle if all bytes
>>>>>>> decoded successfully (i.e if bytes.hasRemaining() is false).
>>>>>>>             
>>>> Vladimir, Could you please give the detail code or pseudo code of
>>>> these two solutions?
>>>>
>>>> Thanks!
>>>>
>>>>      
>>>>>>> What do you think about it?
>>>>>>>
>>>>>>>
>>>>>>> Thanks.
>>>>>>> Vladimir.
>>>>>>>             
>>>>>> Thanks!
>>>>>>           
>>>>
>>>> ---------------------------------------------------------------------
>>>> Terms of use : http://incubator.apache.org/harmony/mailing.html
>>>> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
>>>> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>>>>
>>>>
>>>>       
>>> -- 
>>> Paulex Yang
>>> China Software Development Lab
>>> IBM
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> Terms of use : http://incubator.apache.org/harmony/mailing.html
>>> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>>>
>>>
>>>     
>>
>> ---------------------------------------------------------------------
>> Terms of use : http://incubator.apache.org/harmony/mailing.html
>> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>>
>>
>>   
>
>
Yes. It IS a bug of ICU4JNI. I have submitted a bug [1] for ICU and have 
proposed a fix for it.

[1] http://bugs.icu-project.org/cgi-bin/icu-bugs/incoming?findid=5180

-- 
Richard Liang
China Software Development Lab, IBM 



---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib] charset decoding

Posted by Paulex Yang <pa...@gmail.com>.
Vladimir Strigun wrote:
> On 4/26/06, Paulex Yang <pa...@gmail.com> wrote:
>   
>> Andrew Zhang wrote:
>>     
>>> Mikhail Loenko wrote:
>>>       
>>>> 2006/4/26, Andrew Zhang <zh...@gmail.com>:
>>>>         
>>>>> Vladimir Strigun wrote:
>>>>>           
>>>>>> On 4/26/06, Andrew Zhang <zh...@gmail.com> wrote:
>>>>>> I try to find best solution for Harmony-166 and during fix preparation
>>>>>> I've found current issue.
>>>>>>
>>>>>> Method test_read from tests.api.java.io.InputStreamReaderTest failed
>>>>>> and first fix I've created was with in.available() but I agree with
>>>>>> Mikhail that it's possible not the best solution.
>>>>>>             
>>>>> why?
>>>>>
>>>>> Yes, the spec says "The available method for class InputStream always
>>>>> returns 0.". But it also says "This method should be overridden by
>>>>> subclasses.". :)
>>>>>
>>>>> Here's the available description:
>>>>> "Returns the number of bytes that can be read (or skipped over) from
>>>>> this input stream without blocking by the next caller of a method for
>>>>> this input stream."
>>>>>
>>>>> If someone writes a subclass of InputStream that available returns 0
>>>>> even if there are some data available, then he should take the
>>>>> result of
>>>>> cheating or contradicting with spec.
>>>>>           
>>>> What if someone just unable to say if the bytes are available?
>>>> For example, he reads something from a hardware module and
>>>> the only way to know whether there are bytes is try reading?
>>>>
>>>>         
>>> I'm a little confused.
>>>
>>> How does solution 2 handle this problem?
>>>       
>> I agree with Mikhail that available is not reliable.  You can judge the
>> stream end has been reached if and only if you got -1 returned from read
>> methods.
>>
>> I did a quick view of Harmony-166 and patches(seems it's the root of why
>> Harmony-410 is raised), I think there should be some easier way to fix
>> this bug,  while the CharsetDecoder doesn't need to be modified, pls.
>> see below for details.
>>
>> The fillBuf() of InputStreamReader should be looks like:
>>
>>    private void fillBuf() throws IOException {
>>        chars.clear();
>>        int read = 0;
>>       // if we haven't reached the end of stream, and cannot decode
>> even one character,
>>      // go on to read and decode
>>        while(read != -1 && chars.position() == 0){
>>            try {
>>                read = in.read(bytes.array());
>>            } catch (IOException e) {
>>                chars.limit(0);
>>                throw e;
>>            }
>>            boolean endOfInput = false;
>>            if(read == -1){
>>                //if we have reached the end of stream, try the last
>> time to decode
>>                bytes.limit(0);
>>                endOfInput = true;
>>            }else{
>>                //if we read some bytes, try to decode them
>>                bytes.limit(read);
>>                read = 0;
>>            }
>>            decoder.decode(bytes, chars, endOfInput);
>>            bytes.clear();
>>        }
>>        chars.flip();
>>    }
>>
>> I've got a pass to run Vladimir's test case for Harmony-166.
>>
>> If no one objections, I'll attach patch based on this. comments?
>>     
>
> Paulex, unfortunately I can't agree with your patch.
> InputStreamReaderTest passed, but OutputStreamWriterTest still failed.
> Could you please try to run test I have mentioned?
>   
I see, the OutputStreamWriterTest does fail, seems there's some problem 
for CharsetDecoder to handling UTF-8 byte stream, but the 
InputStreamReader itself should work because it pass all tests for other 
decoding. I'll go on to study it.

And I also realized there is a bug in my former proposal - it cannot 
support the non-blocking InputStream, so the revisited version is as below:
    private void fillBuf() throws IOException {
        chars.clear();
        int read = 0;
        do{
            try {
                read = in.read(bytes.array());
            } catch (IOException e) {
                chars.limit(0);
                throw e;
            }
            boolean endOfInput = false;
            if(read == -1){
                bytes.limit(0);
                endOfInput = true;
            }else{
                bytes.limit(read);
            }
            decoder.decode(bytes, chars, endOfInput);
            bytes.clear();
        }while(read > 0 && chars.position() == 0);
//the main difference with prior version is to check read>0 instead of 
read != -1, so that the InputStreamReader based on non-blocking 
InputStream can return immediatelly
        chars.flip();
    }
> Thanks.
> Vladimir.
>
>   
>>>> Thanks,
>>>> Mikhail
>>>>
>>>>         
>>>>>> In current implementation of InputStreamReader endOfInput variable
>>>>>> sets to true if reader can't read less that 8192 bytes. When
>>>>>> InputStreamReader try to read one character from
>>>>>> LimitedByteArrayInputStream (A ByteArrayInputStream that only returns
>>>>>> a single byte per read) true as a parameter passed to charset decoder,
>>>>>> nevertheless we still have further input from
>>>>>> LimitedByteArrayInputStream.
>>>>>>
>>>>>> 2 methods from tests.api.java.io.OutputStreamWriterTest also failed
>>>>>> because of read() method implementation of InputStreamReader.
>>>>>>
>>>>>> IMO, we have 2 ways to fix Harmony-166 :
>>>>>>
>>>>>> 1. Don't change the behavior of CharsetDecoder, and use in.available()
>>>>>> method for endOfInput variable. If in.available() > 0 pass false to
>>>>>> decoder, read additional one byte from input stream and try to decode
>>>>>> again. If decoding operation can decode remaining portion + one
>>>>>> additional byte to character, stop here, else try to read addition
>>>>>> byte again. Function fillBuf will invoke with the next invocation of
>>>>>> read() method and next portion of 8192 bytes will be read from input
>>>>>> stream.
>>>>>>
>>>>>>             
>>>>> It's reasonable.
>>>>> After all, streaming decode is useful and helpful to users.
>>>>> IMO, streaming decode is a highlight compared with some RI :)
>>>>>
>>>>>           
>>>>>> 2. Change the behavior of CharsetDecoder and operate with
>>>>>> bytes.remaining() for calculating endOfInput. In this case, we always
>>>>>> pass false with first invocation of decode method. Algorithm further
>>>>>> is almost the same as above, but we stop the cycle if all bytes
>>>>>> decoded successfully (i.e if bytes.hasRemaining() is false).
>>>>>>             
>>> Vladimir, Could you please give the detail code or pseudo code of
>>> these two solutions?
>>>
>>> Thanks!
>>>
>>>       
>>>>>> What do you think about it?
>>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>> Vladimir.
>>>>>>             
>>>>> Thanks!
>>>>>           
>>>
>>> ---------------------------------------------------------------------
>>> Terms of use : http://incubator.apache.org/harmony/mailing.html
>>> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>>>
>>>
>>>       
>> --
>> Paulex Yang
>> China Software Development Lab
>> IBM
>>
>>
>>
>> ---------------------------------------------------------------------
>> Terms of use : http://incubator.apache.org/harmony/mailing.html
>> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>>
>>
>>     
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>
>   


-- 
Paulex Yang
China Software Development Lab
IBM



---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib] charset decoding

Posted by Vladimir Strigun <vs...@gmail.com>.
On 4/26/06, Paulex Yang <pa...@gmail.com> wrote:
> Andrew Zhang wrote:
> >
> > Mikhail Loenko wrote:
> >> 2006/4/26, Andrew Zhang <zh...@gmail.com>:
> >>> Vladimir Strigun wrote:
> >>>> On 4/26/06, Andrew Zhang <zh...@gmail.com> wrote:
> >>>> I try to find best solution for Harmony-166 and during fix preparation
> >>>> I've found current issue.
> >>>>
> >>>> Method test_read from tests.api.java.io.InputStreamReaderTest failed
> >>>> and first fix I've created was with in.available() but I agree with
> >>>> Mikhail that it's possible not the best solution.
> >>> why?
> >>>
> >>> Yes, the spec says "The available method for class InputStream always
> >>> returns 0.". But it also says "This method should be overridden by
> >>> subclasses.". :)
> >>>
> >>> Here's the available description:
> >>> "Returns the number of bytes that can be read (or skipped over) from
> >>> this input stream without blocking by the next caller of a method for
> >>> this input stream."
> >>>
> >>> If someone writes a subclass of InputStream that available returns 0
> >>> even if there are some data available, then he should take the
> >>> result of
> >>> cheating or contradicting with spec.
> >
> >>
> >> What if someone just unable to say if the bytes are available?
> >> For example, he reads something from a hardware module and
> >> the only way to know whether there are bytes is try reading?
> >>
> >
> > I'm a little confused.
> >
> > How does solution 2 handle this problem?
> I agree with Mikhail that available is not reliable.  You can judge the
> stream end has been reached if and only if you got -1 returned from read
> methods.
>
> I did a quick view of Harmony-166 and patches(seems it's the root of why
> Harmony-410 is raised), I think there should be some easier way to fix
> this bug,  while the CharsetDecoder doesn't need to be modified, pls.
> see below for details.
>
> The fillBuf() of InputStreamReader should be looks like:
>
>    private void fillBuf() throws IOException {
>        chars.clear();
>        int read = 0;
>       // if we haven't reached the end of stream, and cannot decode
> even one character,
>      // go on to read and decode
>        while(read != -1 && chars.position() == 0){
>            try {
>                read = in.read(bytes.array());
>            } catch (IOException e) {
>                chars.limit(0);
>                throw e;
>            }
>            boolean endOfInput = false;
>            if(read == -1){
>                //if we have reached the end of stream, try the last
> time to decode
>                bytes.limit(0);
>                endOfInput = true;
>            }else{
>                //if we read some bytes, try to decode them
>                bytes.limit(read);
>                read = 0;
>            }
>            decoder.decode(bytes, chars, endOfInput);
>            bytes.clear();
>        }
>        chars.flip();
>    }
>
> I've got a pass to run Vladimir's test case for Harmony-166.
>
> If no one objections, I'll attach patch based on this. comments?

Paulex, unfortunately I can't agree with your patch.
InputStreamReaderTest passed, but OutputStreamWriterTest still failed.
Could you please try to run test I have mentioned?

Thanks.
Vladimir.

> >
> >> Thanks,
> >> Mikhail
> >>
> >>>> In current implementation of InputStreamReader endOfInput variable
> >>>> sets to true if reader can't read less that 8192 bytes. When
> >>>> InputStreamReader try to read one character from
> >>>> LimitedByteArrayInputStream (A ByteArrayInputStream that only returns
> >>>> a single byte per read) true as a parameter passed to charset decoder,
> >>>> nevertheless we still have further input from
> >>>> LimitedByteArrayInputStream.
> >>>>
> >>>> 2 methods from tests.api.java.io.OutputStreamWriterTest also failed
> >>>> because of read() method implementation of InputStreamReader.
> >>>>
> >>>> IMO, we have 2 ways to fix Harmony-166 :
> >>>>
> >>>> 1. Don't change the behavior of CharsetDecoder, and use in.available()
> >>>> method for endOfInput variable. If in.available() > 0 pass false to
> >>>> decoder, read additional one byte from input stream and try to decode
> >>>> again. If decoding operation can decode remaining portion + one
> >>>> additional byte to character, stop here, else try to read addition
> >>>> byte again. Function fillBuf will invoke with the next invocation of
> >>>> read() method and next portion of 8192 bytes will be read from input
> >>>> stream.
> >>>>
> >>> It's reasonable.
> >>> After all, streaming decode is useful and helpful to users.
> >>> IMO, streaming decode is a highlight compared with some RI :)
> >>>
> >>>> 2. Change the behavior of CharsetDecoder and operate with
> >>>> bytes.remaining() for calculating endOfInput. In this case, we always
> >>>> pass false with first invocation of decode method. Algorithm further
> >>>> is almost the same as above, but we stop the cycle if all bytes
> >>>> decoded successfully (i.e if bytes.hasRemaining() is false).
> >
> > Vladimir, Could you please give the detail code or pseudo code of
> > these two solutions?
> >
> > Thanks!
> >
> >>>>
> >>>> What do you think about it?
> >>>>
> >>>>
> >>>> Thanks.
> >>>> Vladimir.
> >>> Thanks!
> >
> >
> >
> > ---------------------------------------------------------------------
> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> >
> >
>
>
> --
> Paulex Yang
> China Software Development Lab
> IBM
>
>
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib] charset decoding

Posted by Paulex Yang <pa...@gmail.com>.
Andrew Zhang wrote:
>
> Mikhail Loenko wrote:
>> 2006/4/26, Andrew Zhang <zh...@gmail.com>:
>>> Vladimir Strigun wrote:
>>>> On 4/26/06, Andrew Zhang <zh...@gmail.com> wrote:
>>>> I try to find best solution for Harmony-166 and during fix preparation
>>>> I've found current issue.
>>>>
>>>> Method test_read from tests.api.java.io.InputStreamReaderTest failed
>>>> and first fix I've created was with in.available() but I agree with
>>>> Mikhail that it's possible not the best solution.
>>> why?
>>>
>>> Yes, the spec says "The available method for class InputStream always
>>> returns 0.". But it also says "This method should be overridden by
>>> subclasses.". :)
>>>
>>> Here's the available description:
>>> "Returns the number of bytes that can be read (or skipped over) from
>>> this input stream without blocking by the next caller of a method for
>>> this input stream."
>>>
>>> If someone writes a subclass of InputStream that available returns 0
>>> even if there are some data available, then he should take the 
>>> result of
>>> cheating or contradicting with spec.
>
>>
>> What if someone just unable to say if the bytes are available?
>> For example, he reads something from a hardware module and
>> the only way to know whether there are bytes is try reading?
>>
>
> I'm a little confused.
>
> How does solution 2 handle this problem?
I agree with Mikhail that available is not reliable.  You can judge the 
stream end has been reached if and only if you got -1 returned from read 
methods.

I did a quick view of Harmony-166 and patches(seems it's the root of why 
Harmony-410 is raised), I think there should be some easier way to fix 
this bug,  while the CharsetDecoder doesn't need to be modified, pls. 
see below for details.

The fillBuf() of InputStreamReader should be looks like:

    private void fillBuf() throws IOException {
        chars.clear();
        int read = 0;
       // if we haven't reached the end of stream, and cannot decode 
even one character,
      // go on to read and decode
        while(read != -1 && chars.position() == 0){
            try {
                read = in.read(bytes.array());
            } catch (IOException e) {
                chars.limit(0);
                throw e;
            }
            boolean endOfInput = false;
            if(read == -1){
                //if we have reached the end of stream, try the last 
time to decode
                bytes.limit(0);
                endOfInput = true;
            }else{
                //if we read some bytes, try to decode them
                bytes.limit(read);
                read = 0;
            }
            decoder.decode(bytes, chars, endOfInput);
            bytes.clear();
        }
        chars.flip();
    }

I've got a pass to run Vladimir's test case for Harmony-166.

If no one objections, I'll attach patch based on this. comments?
>
>> Thanks,
>> Mikhail
>>
>>>> In current implementation of InputStreamReader endOfInput variable
>>>> sets to true if reader can't read less that 8192 bytes. When
>>>> InputStreamReader try to read one character from
>>>> LimitedByteArrayInputStream (A ByteArrayInputStream that only returns
>>>> a single byte per read) true as a parameter passed to charset decoder,
>>>> nevertheless we still have further input from
>>>> LimitedByteArrayInputStream.
>>>>
>>>> 2 methods from tests.api.java.io.OutputStreamWriterTest also failed
>>>> because of read() method implementation of InputStreamReader.
>>>>
>>>> IMO, we have 2 ways to fix Harmony-166 :
>>>>
>>>> 1. Don't change the behavior of CharsetDecoder, and use in.available()
>>>> method for endOfInput variable. If in.available() > 0 pass false to
>>>> decoder, read additional one byte from input stream and try to decode
>>>> again. If decoding operation can decode remaining portion + one
>>>> additional byte to character, stop here, else try to read addition
>>>> byte again. Function fillBuf will invoke with the next invocation of
>>>> read() method and next portion of 8192 bytes will be read from input
>>>> stream.
>>>>
>>> It's reasonable.
>>> After all, streaming decode is useful and helpful to users.
>>> IMO, streaming decode is a highlight compared with some RI :)
>>>
>>>> 2. Change the behavior of CharsetDecoder and operate with
>>>> bytes.remaining() for calculating endOfInput. In this case, we always
>>>> pass false with first invocation of decode method. Algorithm further
>>>> is almost the same as above, but we stop the cycle if all bytes
>>>> decoded successfully (i.e if bytes.hasRemaining() is false).
>
> Vladimir, Could you please give the detail code or pseudo code of 
> these two solutions?
>
> Thanks!
>
>>>>
>>>> What do you think about it?
>>>>
>>>>
>>>> Thanks.
>>>> Vladimir.
>>> Thanks!
>
>
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>


-- 
Paulex Yang
China Software Development Lab
IBM



---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib] charset decoding

Posted by Andrew Zhang <zh...@gmail.com>.
Mikhail Loenko wrote:
> 2006/4/26, Andrew Zhang <zh...@gmail.com>:
>> Vladimir Strigun wrote:
>>> On 4/26/06, Andrew Zhang <zh...@gmail.com> wrote:
>>> I try to find best solution for Harmony-166 and during fix preparation
>>> I've found current issue.
>>>
>>> Method test_read from tests.api.java.io.InputStreamReaderTest failed
>>> and first fix I've created was with in.available() but I agree with
>>> Mikhail that it's possible not the best solution.
>> why?
>>
>> Yes, the spec says "The available method for class InputStream always
>> returns 0.". But it also says "This method should be overridden by
>> subclasses.". :)
>>
>> Here's the available description:
>> "Returns the number of bytes that can be read (or skipped over) from
>> this input stream without blocking by the next caller of a method for
>> this input stream."
>>
>> If someone writes a subclass of InputStream that available returns 0
>> even if there are some data available, then he should take the result of
>> cheating or contradicting with spec.

> 
> What if someone just unable to say if the bytes are available?
> For example, he reads something from a hardware module and
> the only way to know whether there are bytes is try reading?
> 

I'm a little confused.

How does solution 2 handle this problem?

> Thanks,
> Mikhail
> 
>>> In current implementation of InputStreamReader endOfInput variable
>>> sets to true if reader can't read less that 8192 bytes. When
>>> InputStreamReader try to read one character from
>>> LimitedByteArrayInputStream (A ByteArrayInputStream that only returns
>>> a single byte per read) true as a parameter passed to charset decoder,
>>> nevertheless we still have further input from
>>> LimitedByteArrayInputStream.
>>>
>>> 2 methods from tests.api.java.io.OutputStreamWriterTest also failed
>>> because of read() method implementation of InputStreamReader.
>>>
>>> IMO, we have 2 ways to fix Harmony-166 :
>>>
>>> 1. Don't change the behavior of CharsetDecoder, and use in.available()
>>> method for endOfInput variable. If in.available() > 0 pass false to
>>> decoder, read additional one byte from input stream and try to decode
>>> again. If decoding operation can decode remaining portion + one
>>> additional byte to character, stop here, else try to read addition
>>> byte again. Function fillBuf will invoke with the next invocation of
>>> read() method and next portion of 8192 bytes will be read from input
>>> stream.
>>>
>> It's reasonable.
>> After all, streaming decode is useful and helpful to users.
>> IMO, streaming decode is a highlight compared with some RI :)
>>
>>> 2. Change the behavior of CharsetDecoder and operate with
>>> bytes.remaining() for calculating endOfInput. In this case, we always
>>> pass false with first invocation of decode method. Algorithm further
>>> is almost the same as above, but we stop the cycle if all bytes
>>> decoded successfully (i.e if bytes.hasRemaining() is false).

Vladimir, Could you please give the detail code or pseudo code of these 
two solutions?

Thanks!

>>>
>>> What do you think about it?
>>>
>>>
>>> Thanks.
>>> Vladimir.
>> Thanks!



---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib] charset decoding

Posted by Mikhail Loenko <ml...@gmail.com>.
2006/4/26, Andrew Zhang <zh...@gmail.com>:
> Vladimir Strigun wrote:
> > On 4/26/06, Andrew Zhang <zh...@gmail.com> wrote:
>
> >
> > I try to find best solution for Harmony-166 and during fix preparation
> > I've found current issue.
> >
> > Method test_read from tests.api.java.io.InputStreamReaderTest failed
> > and first fix I've created was with in.available() but I agree with
> > Mikhail that it's possible not the best solution.
> why?
>
> Yes, the spec says "The available method for class InputStream always
> returns 0.". But it also says "This method should be overridden by
> subclasses.". :)
>
> Here's the available description:
> "Returns the number of bytes that can be read (or skipped over) from
> this input stream without blocking by the next caller of a method for
> this input stream."
>
> If someone writes a subclass of InputStream that available returns 0
> even if there are some data available, then he should take the result of
> cheating or contradicting with spec.

What if someone just unable to say if the bytes are available?
For example, he reads something from a hardware module and
the only way to know whether there are bytes is try reading?

Thanks,
Mikhail

> >
> > In current implementation of InputStreamReader endOfInput variable
> > sets to true if reader can't read less that 8192 bytes. When
> > InputStreamReader try to read one character from
> > LimitedByteArrayInputStream (A ByteArrayInputStream that only returns
> > a single byte per read) true as a parameter passed to charset decoder,
> > nevertheless we still have further input from
> > LimitedByteArrayInputStream.
> >
> > 2 methods from tests.api.java.io.OutputStreamWriterTest also failed
> > because of read() method implementation of InputStreamReader.
> >
> > IMO, we have 2 ways to fix Harmony-166 :
> >
> > 1. Don't change the behavior of CharsetDecoder, and use in.available()
> > method for endOfInput variable. If in.available() > 0 pass false to
> > decoder, read additional one byte from input stream and try to decode
> > again. If decoding operation can decode remaining portion + one
> > additional byte to character, stop here, else try to read addition
> > byte again. Function fillBuf will invoke with the next invocation of
> > read() method and next portion of 8192 bytes will be read from input
> > stream.
> >
> It's reasonable.
> After all, streaming decode is useful and helpful to users.
> IMO, streaming decode is a highlight compared with some RI :)
>
> > 2. Change the behavior of CharsetDecoder and operate with
> > bytes.remaining() for calculating endOfInput. In this case, we always
> > pass false with first invocation of decode method. Algorithm further
> > is almost the same as above, but we stop the cycle if all bytes
> > decoded successfully (i.e if bytes.hasRemaining() is false).
> >
> > What do you think about it?
> >
> >
> > Thanks.
> > Vladimir.
>
> Thanks!
>
> --
> Andrew Zhang
> China Software Development Lab, IBM
>
> >
> >> Any comment?
> >>
> >> Thanks !
> >>
> >>
> >> Vladimir Strigun commented on HARMONY-410:
> >> ------------------------------------------
> >>
> >> Hi Richard,
> >>
> >> Thanks for the clarification, I agree that streaming decode is good thing,
> >> but I'd like to explain my understanding of specification :)
> >> According to specification of CharsetDecoder decoding operation should
> >> process by the following steps:
> >> "
> >> 2. Invoke the decode method zero or more times, as long as additional input
> >> may be available, passing false for the endOfInput argument and filling the
> >> input buffer and flushing the output buffer between invocations;
> >>
> >> 3. Invoke the decode method one final time, passing true for the endOfInput
> >> argument; and then
> >> "
> >> spec also says:
> >> "The buffers' positions will be advanced to reflect the bytes read and the
> >> characters written, but their marks and limits will not be modified"
> >>
> >> I understand these sentences in the next way:
> >> invoke decode with endOfInput = false if you have additional input, then
> >> fill the buffer (i.e. add to buffer some additional input), invoke decode
> >> again and pass correct endOfInput parameter dependent of availability of
> >> input.
> >>
> >> Example you provided is very useful and, of course, 1st option looks better,
> >> but what I'm suggest here is to reflect actual processed bytes in input.
> >> After first invocation of decode, not all bytes processed actually, i.e.
> >> almost all bytes processed, but some stored in decoder to further operation.
> >> Is it possible to store bytes in decoder, support streaming decoding, and,
> >> at the same time sets correct position in input buffer after each operation?
> >>
> >> Thanks.
> >> Vladimir.
> >>
> >>> method decode(ByteBuffer, CharBuffer, boolean) should set correct position
> >> in ByteBuffer
> >> ----------------------------------------------------------------------------------------
> >>>          Key: HARMONY-410
> >>>          URL: http://issues.apache.org/jira/browse/HARMONY-410
> >>>      Project: Harmony
> >>>         Type: Bug
> >>>   Components: Classlib
> >>>     Reporter: Vladimir Strigun
> >>>     Assignee: Mikhail Loenko
> >>>     Priority: Minor
> >>>  Attachments: Harmony-410_patch.txt, harmony-410_test.txt
> >>>
> >>> When ByteBuffer contain incomplete sequence of bytes for successful
> >> decoding, position in ByteBuffer should be set to latest successful byte. I
> >> will attach testcase and patch soon.
> >>
> >> --
> >> This message is automatically generated by JIRA.
> >> -
> >> If you think it was sent incorrectly contact one of the administrators:
> >>  http://issues.apache.org/jira/secure/Administrators.jspa
> >> -
> >> For more information on JIRA, see:
> >>  http://www.atlassian.com/software/jira
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Andrew Zhang
> >> China Software Development Lab, IBM
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> >
> >
>
>
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib] charset decoding

Posted by Andrew Zhang <zh...@gmail.com>.
Vladimir Strigun wrote:
> On 4/26/06, Andrew Zhang <zh...@gmail.com> wrote:

> 
> I try to find best solution for Harmony-166 and during fix preparation
> I've found current issue.
> 
> Method test_read from tests.api.java.io.InputStreamReaderTest failed
> and first fix I've created was with in.available() but I agree with
> Mikhail that it's possible not the best solution.
why?

Yes, the spec says "The available method for class InputStream always 
returns 0.". But it also says "This method should be overridden by 
subclasses.". :)

Here's the available description:
"Returns the number of bytes that can be read (or skipped over) from 
this input stream without blocking by the next caller of a method for 
this input stream."

If someone writes a subclass of InputStream that available returns 0 
even if there are some data available, then he should take the result of 
cheating or contradicting with spec.
> 
> In current implementation of InputStreamReader endOfInput variable
> sets to true if reader can't read less that 8192 bytes. When
> InputStreamReader try to read one character from
> LimitedByteArrayInputStream (A ByteArrayInputStream that only returns
> a single byte per read) true as a parameter passed to charset decoder,
> nevertheless we still have further input from
> LimitedByteArrayInputStream.
> 
> 2 methods from tests.api.java.io.OutputStreamWriterTest also failed
> because of read() method implementation of InputStreamReader.
> 
> IMO, we have 2 ways to fix Harmony-166 :
> 
> 1. Don't change the behavior of CharsetDecoder, and use in.available()
> method for endOfInput variable. If in.available() > 0 pass false to
> decoder, read additional one byte from input stream and try to decode
> again. If decoding operation can decode remaining portion + one
> additional byte to character, stop here, else try to read addition
> byte again. Function fillBuf will invoke with the next invocation of
> read() method and next portion of 8192 bytes will be read from input
> stream.
> 
It's reasonable.
After all, streaming decode is useful and helpful to users.
IMO, streaming decode is a highlight compared with some RI :)

> 2. Change the behavior of CharsetDecoder and operate with
> bytes.remaining() for calculating endOfInput. In this case, we always
> pass false with first invocation of decode method. Algorithm further
> is almost the same as above, but we stop the cycle if all bytes
> decoded successfully (i.e if bytes.hasRemaining() is false).
> 
> What do you think about it?
> 
> 
> Thanks.
> Vladimir.

Thanks!

--
Andrew Zhang
China Software Development Lab, IBM

> 
>> Any comment?
>>
>> Thanks !
>>
>>
>> Vladimir Strigun commented on HARMONY-410:
>> ------------------------------------------
>>
>> Hi Richard,
>>
>> Thanks for the clarification, I agree that streaming decode is good thing,
>> but I'd like to explain my understanding of specification :)
>> According to specification of CharsetDecoder decoding operation should
>> process by the following steps:
>> "
>> 2. Invoke the decode method zero or more times, as long as additional input
>> may be available, passing false for the endOfInput argument and filling the
>> input buffer and flushing the output buffer between invocations;
>>
>> 3. Invoke the decode method one final time, passing true for the endOfInput
>> argument; and then
>> "
>> spec also says:
>> "The buffers' positions will be advanced to reflect the bytes read and the
>> characters written, but their marks and limits will not be modified"
>>
>> I understand these sentences in the next way:
>> invoke decode with endOfInput = false if you have additional input, then
>> fill the buffer (i.e. add to buffer some additional input), invoke decode
>> again and pass correct endOfInput parameter dependent of availability of
>> input.
>>
>> Example you provided is very useful and, of course, 1st option looks better,
>> but what I'm suggest here is to reflect actual processed bytes in input.
>> After first invocation of decode, not all bytes processed actually, i.e.
>> almost all bytes processed, but some stored in decoder to further operation.
>> Is it possible to store bytes in decoder, support streaming decoding, and,
>> at the same time sets correct position in input buffer after each operation?
>>
>> Thanks.
>> Vladimir.
>>
>>> method decode(ByteBuffer, CharBuffer, boolean) should set correct position
>> in ByteBuffer
>> ----------------------------------------------------------------------------------------
>>>          Key: HARMONY-410
>>>          URL: http://issues.apache.org/jira/browse/HARMONY-410
>>>      Project: Harmony
>>>         Type: Bug
>>>   Components: Classlib
>>>     Reporter: Vladimir Strigun
>>>     Assignee: Mikhail Loenko
>>>     Priority: Minor
>>>  Attachments: Harmony-410_patch.txt, harmony-410_test.txt
>>>
>>> When ByteBuffer contain incomplete sequence of bytes for successful
>> decoding, position in ByteBuffer should be set to latest successful byte. I
>> will attach testcase and patch soon.
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> If you think it was sent incorrectly contact one of the administrators:
>>  http://issues.apache.org/jira/secure/Administrators.jspa
>> -
>> For more information on JIRA, see:
>>  http://www.atlassian.com/software/jira
>>
>>
>>
>>
>>
>> --
>> Andrew Zhang
>> China Software Development Lab, IBM
>>
>>
> 
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> 
> 



---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org