You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Curt Arnold <ca...@apache.org> on 2005/04/06 22:06:02 UTC

Re: apr-iconv 1.0

All my previous discussion was speculative on apr-iconv's behavior.  I 
was in the process of integrating use of apr_xlate to support 
specifying the encoding of log files but had not gotten to the point of 
testing at the time of the discussion.

I have ran into several issues when using apr_xlate delegating to (a 
hacked) apr-iconv which I did not encounter when delegating to iconv.  
I don't believe the issues are related to my hacking of the load 
process, but that is always a possibility.  At this point, my plans are 
to use Win32 API calls to support a small set of encodings on Windows 
and apr_xlate on other platforms.

The issues that I ran into:

1. apr_iconv dereferences inbytes_left without checking for NULL

 From the doc comments from apr_xlate_conv_buffer:

  * To correctly terminate the output buffer for some multi-byte
  * character set encodings, a final call must be made to this function
  * after the complete input string has been converted, passing
  * the inbuf and inbytes_left parameters as NULL.  (Note that this
  * mode only works from version 1.1.0 onwards)

If the final call is made as suggested, apr_iconv will case a null 
pointer exception.  GNU iconv does not.

2. apr_iconv does not have a WCHAR_T encoding.

3. apr_xlate_open(&convset, APR_LOCALE_CHARSET, ...) fails for common 
code pages (like 1252) on Windows

This could be an artifact of my hacking.  When this is attempted on my 
machine, the code determines the current code page (in my case 1252, 
Western European) and creates a corresponding encoding name like 
Cp1252.  However, the corresponding encoding module is called 
"windows-1252" not "Cp1252".

The last two may be artifacts of my hacking.  I didn't see any code 
that appeared to alias Cp1252 or WCHAR_T to an available encoding, but 
maybe I missed something.  However, it was enough to shake any 
confidence I had in the approach I was using.


Re: apr-iconv 1.0

Posted by Curt Arnold <ca...@apache.org>.
On Apr 7, 2005, at 10:54 AM, William A. Rowe, Jr. wrote:

> At 03:06 PM 4/6/2005, Curt Arnold wrote:
>> The issues that I ran into:
>>
>> 1. apr_iconv dereferences inbytes_left without checking for NULL
>>
>> From the doc comments from apr_xlate_conv_buffer:
>>
>> If the final call is made as suggested, apr_iconv will case a null 
>> pointer exception.  GNU iconv does not.
>
> I'll look.  My concern is that this suggested doc change is part
> of a gnu libiconv-ism, which would break on FreeBSD.  But I need
> to look.

I wasn't suggesting a doc change, the doc is right and doing the final 
call to apr_xlate(&conv, NULL, NULL) is the proper thing to do though 
it is only essential for some fairly obscure encodings.  However, if 
you do the right thing and apr_xlate is using apr_iconv, it will fault.


>
>> 2. apr_iconv does not have a WCHAR_T encoding.
>
> Isn't wchar_t the preference, from ANSI/C99 headers?

The encoding and width of the wchar_t type is platform-dependent.  GNU 
iconv has an encoding named "WCHAR_T" that can be used to convert, for 
example, from a wchar_t* to some other encoding.  Without an "WCHAR_T" 
encoding, my code needs to know that wchar_t* on Win32 is UTF-16LE, 
UCS-4 on some other platform, etc and use the appropriate encoding 
name.


>
>> 3. apr_xlate_open(&convset, APR_LOCALE_CHARSET, ...) fails for common 
>> code pages (like 1252) on Windows
>>
>> This could be an artifact of my hacking.  When this is attempted on 
>> my machine, the code determines the current code page (in my case 
>> 1252, Western European) and creates a corresponding encoding name 
>> like Cp1252.  However, the corresponding encoding module is called 
>> "windows-1252" not "Cp1252".
>>

If you are really interested in tracking these down, I can write the 
test cases.  However just the number of issues I was immediately 
running into was enough to make me rethink my plan on using apr-xlate 
for encoding services on all platforms.	


Re: apr-iconv 1.0

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 03:06 PM 4/6/2005, Curt Arnold wrote:
>The issues that I ran into:
>
>1. apr_iconv dereferences inbytes_left without checking for NULL
>
> From the doc comments from apr_xlate_conv_buffer:
>
>If the final call is made as suggested, apr_iconv will case a null pointer exception.  GNU iconv does not.

I'll look.  My concern is that this suggested doc change is part
of a gnu libiconv-ism, which would break on FreeBSD.  But I need
to look.  

>2. apr_iconv does not have a WCHAR_T encoding.

Isn't wchar_t the preference, from ANSI/C99 headers?

>3. apr_xlate_open(&convset, APR_LOCALE_CHARSET, ...) fails for common code pages (like 1252) on Windows
>
>This could be an artifact of my hacking.  When this is attempted on my machine, the code determines the current code page (in my case 1252, Western European) and creates a corresponding encoding name like Cp1252.  However, the corresponding encoding module is called "windows-1252" not "Cp1252".

I can look.

Bill