You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Stefan Küng <to...@gmail.com> on 2005/10/12 17:18:51 UTC

[PATCH]: Trying to get rid of iconv dependency on windows

Hi,

This patch introduces a new compile time define NOICONV which if set, 
will do the UTF8 string conversions with the Windows API's instead of 
using apr_iconv.

This might not be useful for the CL client, but other Subversion clients 
which use Subversion only as a library can reduce the size of the 
shipped binaries almost in half. Also, it's a little bit faster than 
iconv (not much: an 'svn st' of the Subversion trunk working copy is 
about 15ms faster).

One thing remains though: the build script always includes apr_iconv, 
even if it isn't specified with --with-apr-iconv. gen-make.py even 
errors out with a stacktrace if apr-iconv isn't found. I tried to go 
through those python scripts, but I don't know python at all and I just 
can't make any sense of all those scripts. Maybe someone else can step 
in for this.

I hope I got the patch right (code style, log comment).
Just remember: please comment first on the patch content, then on the style.

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.tigris.org

Re: [PATCH]: Trying to get rid of iconv dependency on windows

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Thu, Oct 13, 2005 at 05:27:55PM +0200, Stefan Küng wrote:
> >One quick question: If I remember correctly, the Win32 functions will fill
> >up an output buffer completely if the buffer is too small, without a null
> >terminator. I'm not familiar with the APR functions - do they do the same?
> 
> The behaviour if the function fails is undefined, i.e. the content of 
> the output buffer. But I don't see a problem: if the function fails, the 
> output isn't used anyway.

Ok, I see that now (it wasn't clear by just looking at the diff). I was
confused by 'return success and set *outbytes_left to zero' being the
(undocumented, as far as I can see) return condition that indicated that
the output buffer wasn't large enough. I was also a little confused by
the names of the formal parameters.

But I see you inherited both of those from APR anyway.

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH]: Trying to get rid of iconv dependency on windows

Posted by Stefan Küng <to...@gmail.com>.
Malcolm Rowe wrote:
> On Wed, Oct 12, 2005 at 09:18:42PM +0200, Stefan Küng wrote:
> 
>>And I don't think it's an argument to prevent this patch from going in, 
>>if you can't even tell under *what* circumstances these functions are 
>>broken.
>>
> 
> 
> You're right, sorry, that was a pretty pointless post, wasn't it? (I
> blame the cold I've got at the moment).
> 
> One quick question: If I remember correctly, the Win32 functions will fill
> up an output buffer completely if the buffer is too small, without a null
> terminator. I'm not familiar with the APR functions - do they do the same?

The behaviour if the function fails is undefined, i.e. the content of 
the output buffer. But I don't see a problem: if the function fails, the 
output isn't used anyway.

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH]: Trying to get rid of iconv dependency on windows

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Wed, Oct 12, 2005 at 09:18:42PM +0200, Stefan Küng wrote:
> And I don't think it's an argument to prevent this patch from going in, 
> if you can't even tell under *what* circumstances these functions are 
> broken.
> 

You're right, sorry, that was a pretty pointless post, wasn't it? (I
blame the cold I've got at the moment).

One quick question: If I remember correctly, the Win32 functions will fill
up an output buffer completely if the buffer is too small, without a null
terminator. I'm not familiar with the APR functions - do they do the same?

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH]: Trying to get rid of iconv dependency on windows

Posted by Stefan Küng <to...@gmail.com>.
Malcolm Rowe wrote:
> On Wed, Oct 12, 2005 at 07:18:51PM +0200, Stefan Küng wrote:
> 
>>This patch introduces a new compile time define NOICONV which if set, 
>>will do the UTF8 string conversions with the Windows API's instead of 
>>using apr_iconv.
>>
> 
> 
> Be careful. I'm sure I remember finding out that the Windows functions
> are completely broken in some circumstances, although for the life of me
> I can't remember what those are, other than it had something to do with
> UTF-8. It may well be restricted to CJK, Thai, or Vietnamese codepages,
> and/or be to do with composite characters, in which case it might not
> be a problem.

Well, a google search with "multibytetowidechar broken utf8" brings up 
only four entries in the group search. I think that's enough (at least 
for me) to assume that it works.

And I don't think it's an argument to prevent this patch from going in, 
if you can't even tell under *what* circumstances these functions are 
broken.

If you can show us that those API's are really broken, then that would 
be another matter, but like this? Also remember that e.g. TSVN uses 
those API's exclusively to convert from/to UTF8 when passing/getting 
params from the Subversion API.

Stefan


-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.tigris.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH]: Trying to get rid of iconv dependency on windows

Posted by Malcolm Rowe <ma...@farside.org.uk>.
On Wed, Oct 12, 2005 at 07:18:51PM +0200, Stefan Küng wrote:
> This patch introduces a new compile time define NOICONV which if set, 
> will do the UTF8 string conversions with the Windows API's instead of 
> using apr_iconv.
> 

Be careful. I'm sure I remember finding out that the Windows functions
are completely broken in some circumstances, although for the life of me
I can't remember what those are, other than it had something to do with
UTF-8. It may well be restricted to CJK, Thai, or Vietnamese codepages,
and/or be to do with composite characters, in which case it might not
be a problem.

Regards,
Malcolm

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH]: Trying to get rid of iconv dependency on windows

Posted by Julian Foad <ju...@btopenworld.com>.
Stefan Küng wrote:
> 
> I hope I got the patch right (code style, log comment).
> Just remember: please comment first on the patch content, then on the 
> style.

Heh :-)  I think we are learning to do that, but it's wise of you to remind us 
from time to time!

- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH]: Trying to get rid of iconv dependency on windows

Posted by Branko Čibej <br...@xbc.nu>.
Ivan Zhakov wrote:

>On 10/14/05, Stefan Küng <to...@gmail.com> wrote:
>  
>
>>[[[
>>Compile time switch to use the Windows API's instead of apr_iconv to do
>>the UTF-8 conversions.
>>
>>* subversion/libsvn_subr/utf.c
>>  (win_xlate_t)          : New struct to use instead of apr_xlate_t.
>>  (win_xlate_conv_buffer): New function, replacing apr_xlate_conv_buffer
>>                           but using the Windows API instead of iconv.
>>  (get_xlate_handle_node): Instead of calling apr_xlate_open fill in the
>>                           handle struct with Windows API information.
>>  (convert_to_stringbuf):  Check if the source string has zero length
>>                           before allocating memory.
>>  (convert_to_stringbuf):  Call win_xlate_conv_buffer instead of
>>                           apr_xlate_conv_buffer.
>>]]]
>>    
>>
>I am +1 on idea remove iconv dependency on Windows, but have question
>on implementation. What for you use xlate_handle stuff? Why you didn't
>create helper and call it directly from svn_utf_stringbuf_to_utf8(),
>svn_utf_string_to_utf8(), svn_utf_cstring_to_utf8()
>svn_utf_cstring_to_utf8_ex(), svn_utf_cstring_from_utf8(),
>svn_utf_cstring_from_utf8_ex() without using xlate_handle stuff?
>
That change would be a lot more invasive...

> As I
>understand xlate handle needed to avoid often icon DLLs
>loading/unloading, but Windows already cares about this.
>
>Anyway win_xlate_t should store codepage indentifier instead of
>codepage strings. And move it's conversion to get_xlate_handle_node()
>instead of win_xlate_conv_buffer().
>  
>
Anyway... before proceeding any further with this patch, I'd like to see 
some definite answers about the recent report that Windows translation 
functions aren't reliable for all encodings. If that's true, repllacing 
apr_iconv would be taking a step backwards (and would therefore generate 
a -1 from me).

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH]: Trying to get rid of iconv dependency on windows

Posted by Ivan Zhakov <ch...@gmail.com>.
On 10/14/05, Stefan Küng <to...@gmail.com> wrote:
> [[[
> Compile time switch to use the Windows API's instead of apr_iconv to do
> the UTF-8 conversions.
>
> * subversion/libsvn_subr/utf.c
>   (win_xlate_t)          : New struct to use instead of apr_xlate_t.
>   (win_xlate_conv_buffer): New function, replacing apr_xlate_conv_buffer
>                            but using the Windows API instead of iconv.
>   (get_xlate_handle_node): Instead of calling apr_xlate_open fill in the
>                            handle struct with Windows API information.
>   (convert_to_stringbuf):  Check if the source string has zero length
>                            before allocating memory.
>   (convert_to_stringbuf):  Call win_xlate_conv_buffer instead of
>                            apr_xlate_conv_buffer.
> ]]]
I am +1 on idea remove iconv dependency on Windows, but have question
on implementation. What for you use xlate_handle stuff? Why you didn't
create helper and call it directly from svn_utf_stringbuf_to_utf8(),
svn_utf_string_to_utf8(), svn_utf_cstring_to_utf8()
svn_utf_cstring_to_utf8_ex(), svn_utf_cstring_from_utf8(),
svn_utf_cstring_from_utf8_ex() without using xlate_handle stuff? As I
understand xlate handle needed to avoid often icon DLLs
loading/unloading, but Windows already cares about this.

Anyway win_xlate_t should store codepage indentifier instead of
codepage strings. And move it's conversion to get_xlate_handle_node()
instead of win_xlate_conv_buffer().

--
Ivan Zhakov

Re: [PATCH]: Trying to get rid of iconv dependency on windows

Posted by Stefan Küng <to...@gmail.com>.
Garrett Rooney wrote:

> This solves the problem, but we generally try to avoid using
> malloc/free unless we absolutely have to (i.e. we're passing something
> to third party code that expects to free it later, or we get something
> from third party code that it expects us to free).  To do this with
> pools you should be storing a subpool in win_xlate_t, use that to
> apr_palloc your memory, then clear the pool when you're done with the
> memory.

Point taken.
New patch attached.

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.tigris.org

Re: [PATCH]: Trying to get rid of iconv dependency on windows

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 10/13/05, Stefan Küng <to...@gmail.com> wrote:
> Garrett Rooney wrote:
> > One comment that occurred to me as I read over this patch...
> >
> >
> >>+typedef struct win_xlate_t {
> >>+  apr_pool_t *pool;
> >>+  char *frompage;
> >>+  char *topage;
> >>+  char *sbcs_table;
> >>+} win_xlate_t;
> >
> >
> > Storing a pool in this structure throws up a red flag for me...
> >
> [snip]
> > And this is why.  Won't we be allocating that buffer every time we
> > call this function, without ever clearing that pool?  If we really
> > need to allocate that buffer each time, we also need to be sure we're
> > clearing that pool when we're done with it, which means that
> > convset->pool really should be a specially created subpool just for
> > that purpose.
>
> Good point. Revised patch attached.

This solves the problem, but we generally try to avoid using
malloc/free unless we absolutely have to (i.e. we're passing something
to third party code that expects to free it later, or we get something
from third party code that it expects us to free).  To do this with
pools you should be storing a subpool in win_xlate_t, use that to
apr_palloc your memory, then clear the pool when you're done with the
memory.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: [PATCH]: Trying to get rid of iconv dependency on windows

Posted by Stefan Küng <to...@gmail.com>.
Garrett Rooney wrote:
> One comment that occurred to me as I read over this patch...
> 
> 
>>+typedef struct win_xlate_t {
>>+  apr_pool_t *pool;
>>+  char *frompage;
>>+  char *topage;
>>+  char *sbcs_table;
>>+} win_xlate_t;
> 
> 
> Storing a pool in this structure throws up a red flag for me...
> 
[snip]
> And this is why.  Won't we be allocating that buffer every time we
> call this function, without ever clearing that pool?  If we really
> need to allocate that buffer each time, we also need to be sure we're
> clearing that pool when we're done with it, which means that
> convset->pool really should be a specially created subpool just for
> that purpose.

Good point. Revised patch attached.

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.tigris.org

Re: [PATCH]: Trying to get rid of iconv dependency on windows

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
One comment that occurred to me as I read over this patch...

> +typedef struct win_xlate_t {
> +  apr_pool_t *pool;
> +  char *frompage;
> +  char *topage;
> +  char *sbcs_table;
> +} win_xlate_t;

Storing a pool in this structure throws up a red flag for me...

> +static apr_status_t win_xlate_conv_buffer(win_xlate_t *convset,
> +                                          const char * inbuf,
> +                                          apr_size_t *inbytes_left,
> +                                          char *outbuf,
> +                                          apr_size_t *outbytes_left)
> +{
> +  svn_boolean_t fromUTF8 = FALSE;
> +  int requiredlen = 0;
> +  wchar_t * widebuffer = 0;
> +  apr_size_t convertedchars = 0;
> +  UINT CP = CP_ACP;
> +
> +  if (convset->frompage)
> +  {
> +    if (apr_strnatcmp (convset->frompage, "UTF-8") == 0)
> +      fromUTF8 = TRUE;
> +       else
> +      CP = atoi(convset->frompage+2);
> +  }
> +  if ((fromUTF8)&&(convset->topage))
> +  {
> +    CP = atoi(convset->topage+2);
> +  }
> +  if (CP == 0)
> +    CP = CP_THREAD_ACP;
> +
> +  requiredlen = MultiByteToWideChar(fromUTF8 ? CP_UTF8 : CP, 0, inbuf, -1, 0, 0);
> +  if (requiredlen == 0)
> +    return APR_EINVAL;
> +  widebuffer = apr_pcalloc(convset->pool, requiredlen*sizeof(wchar_t));
> +  if (widebuffer == 0)
> +    return APR_ENOMEM;

And this is why.  Won't we be allocating that buffer every time we
call this function, without ever clearing that pool?  If we really
need to allocate that buffer each time, we also need to be sure we're
clearing that pool when we're done with it, which means that
convset->pool really should be a specially created subpool just for
that purpose.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: [PATCH]: Trying to get rid of iconv dependency on windows

Posted by Stefan Küng <to...@gmail.com>.
Branko Čibej wrote:
> Stefan Küng wrote:
> 
>> [[[
>> Compile time switch to use the Windows API's instead of apr_iconv to do
>> the UTF-8 conversions.
>>
>> * subversion/libsvn_subr/utf.c
>>  (win_xlate_conv_buffer): New function, replacing apr_xlate_conv_buffer
>>                           but using the Windows API instead of iconv.
>>  (get_xlate_handle_node): Instead of calling apr_xlate_open fill in the
>>                           handle struct with Windows API information.
>>  (convert_to_stringbuf):  Check if the source string has zero length
>>                           before allocating memory.
>>  (convert_to_stringbuf):  Call win_xlate_conv_buffer instead of
>>                           apr_xlate_conv_buffer.
>> ]]]
>> Index: subversion/libsvn_subr/utf.c
>> ===================================================================
>> --- subversion/libsvn_subr/utf.c    (Revision 16653)
>> +++ subversion/libsvn_subr/utf.c    (Arbeitskopie)
>> @@ -125,6 +125,70 @@
>>     }
>> }
>>
>> +#if defined(WIN32) && defined(NOICONV)
>> +/* declare the struct here, so we can use its members */
>> +struct apr_xlate_t {
>> +    apr_pool_t *pool;
>> +    char *frompage;
>> +    char *topage;
>> +    char *sbcs_table;
>> +};
>>  
>>
> We can't redefine apr_xlate_t. You'll have to call it something else.

Ok, revised patch attached.

> [...]
> 
>> +  requiredlen = MultiByteToWideChar(fromUTF8 ? CP_UTF8 : CP, 0, 
>> inbuf, -1, 0, 0);
>>  
>>
> IIRC CP_UTF8 is only supported on Win2k (maybe even WinXP) and later. So 
> the conversion will fail on older versions of Windows.

Actually, CP_UTF8 is available for:
Windows 98/Me, Windows NT 4.0 and later

for Win95, you'd have to link with the MSLU (Microsoft Layer for 
Unicode) dll to get CP_UTF8. But that's then up to the clients using the 
library (if they want to support Win95 at all).

Stefan

-- 
        ___
   oo  // \\      "De Chelonian Mobile"
  (_,\/ \_/ \     TortoiseSVN
    \ \_/_\_/>    The coolest Interface to (Sub)Version Control
    /_/   \_\     http://tortoisesvn.tigris.org

Re: [PATCH]: Trying to get rid of iconv dependency on windows

Posted by Branko Čibej <br...@xbc.nu>.
Stefan Küng wrote:

>[[[
>Compile time switch to use the Windows API's instead of apr_iconv to do
>the UTF-8 conversions.
>
>* subversion/libsvn_subr/utf.c
>  (win_xlate_conv_buffer): New function, replacing apr_xlate_conv_buffer
>                           but using the Windows API instead of iconv.
>  (get_xlate_handle_node): Instead of calling apr_xlate_open fill in the
>                           handle struct with Windows API information.
>  (convert_to_stringbuf):  Check if the source string has zero length
>                           before allocating memory.
>  (convert_to_stringbuf):  Call win_xlate_conv_buffer instead of
>                           apr_xlate_conv_buffer.
>]]]
>Index: subversion/libsvn_subr/utf.c
>===================================================================
>--- subversion/libsvn_subr/utf.c	(Revision 16653)
>+++ subversion/libsvn_subr/utf.c	(Arbeitskopie)
>@@ -125,6 +125,70 @@
>     }
> }
> 
>+#if defined(WIN32) && defined(NOICONV)
>+/* declare the struct here, so we can use its members */
>+struct apr_xlate_t {
>+	apr_pool_t *pool;
>+	char *frompage;
>+	char *topage;
>+	char *sbcs_table;
>+};
>  
>
We can't redefine apr_xlate_t. You'll have to call it something else.

[...]

>+  requiredlen = MultiByteToWideChar(fromUTF8 ? CP_UTF8 : CP, 0, inbuf, -1, 0, 0);
>  
>
IIRC CP_UTF8 is only supported on Win2k (maybe even WinXP) and later. So 
the conversion will fail on older versions of Windows.

-- Brane


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org