You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Nicolás Lichtmaier <ni...@reloco.com.ar> on 2004/04/05 12:59:26 UTC

Is it woth to have SVN_REVNUM_T_FMT?

The current use of SVN_REVNUM_T_FMT is not compatible with gettext. Why? 
Because xgettext can't process this:

_("Revision: " SVN_REVNUM_T_FMT "%")

At first thought creating a function... like 
get_human_readable_revision, but... Is it worth? SVN_REVNUM_T_FMT only 
has "ld", which is a standard C thing. It will probably never be 
changed? Isn't this a case of overengineering?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Is it woth to have SVN_REVNUM_T_FMT?

Posted by Branko Čibej <br...@xbc.nu>.
C.A.T.Magic wrote:

> Branko Čibej wrote:
>
>> That's the solution I like the least, because ut's not just a simple
>> search and replace -- at least not in the standard printf syntax -- and
>> it slows things down tremendously. On the other hand, none of the
>> printfs is in a performance-critial part of the code.
>
>
> how much does gettext impact the speed?
> does it use something like strcmp's (slow)
> or hashtables (average) inside, or does it replace
> the strings in the sourcecode with unique id's (fast)?

Gettext uses a string-to-string optimized hash table. When used with
GCC, it can take advantage of constant string keys by using a
pointer-based translation cache; that's one reason why you typically
don't want to parse the format strings before you hand them over to
gettext. You could do it afterwards, though.

So we have three options:

   1. Transform all paraneters to strings and use only %s in the format
      strings. This is the easiest short-term solution, and would also
      work for marshaling composite error messages from the server.
   2. Extend APR's formatter and use custom type-specific format codes.
      This looks like the cleanest solution for the long term, but would
      require an APR upgrade (and that also means we'd have to wait for
      the next httpd release)
   3. Use custom type-specific format codes but introduce a post-gettext
      parsing pass that would replace them with APR's #defines before
      they get to printf.

A combination of 3 and 2 woudl work, too -- do 3 first, then later fall
back to 2 if the available APR version supports it.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Is it woth to have SVN_REVNUM_T_FMT?

Posted by "C.A.T.Magic" <c....@gmx.at>.
Branko Čibej wrote:

> That's the solution I like the least, because ut's not just a simple
> search and replace -- at least not in the standard printf syntax -- and
> it slows things down tremendously. On the other hand, none of the
> printfs is in a performance-critial part of the code.

how much does gettext impact the speed?
does it use something like strcmp's (slow)
or hashtables (average) inside, or does it replace
the strings in the sourcecode with unique id's (fast)?

c.a.t.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Is it woth to have SVN_REVNUM_T_FMT?

Posted by Branko Čibej <br...@xbc.nu>.
C.A.T.Magic wrote:

> Branko Čibej wrote:
>
>> C.A.T.Magic wrote:
>>
>>
>>> Branko Cibej wrote:
>>>
>>>
>>>> -- Always use %ld (or %lu), and cast the values. Not good, because
>>>>   we then truncate 64-bit values on 32-bit platforms. Unfortunately
>>>>   there is no portable format string for 64-bit integers.
>>>>
>>>> -- Define our own formats (e.g., %R for revision numbers) and write
>>>>   our own printf format scanner that converts them to the defined
>>>>   macros (urgh!)
>>>
>>>
>>>
>>> why doesn't
>>>     apr_file_printf
>>> do this by itself?
>>> this way you could get rid of all these weird
>>>     apr_file_printf(out, "%" APR_SIZE_T_FMT " bytes were read\n",
>>>                    total_bytes);
>>> constructs at the same time.
>>> I think APR could even be modified for this without creating
>>> incompatibilities with existing applications,
>>> because "%" APR_SIZE_T_FMT " bytes" would simply resolve to
>>> something like "%Z bytes" for all applications.
>>> but i don't know the design decisions for this in APR.
>>
>>
>>
>> These could of course be added to APR, and that would be the nicest
>> solution -- *if* we use APR's vformatter functions exclusively. But
>> that's something we can't require from all clients.
>
>
> maybe the modified APR (or svn) vformatter could support BOTH,

Therewas obviously never any doubt about that; we can't change APR's
printf rewrite, we can only extend it.

> old AND newstyle formats at the same time, this way other
> applications can continue to use the existing format names,
> e.g. %ld %s AND new ones like %Az (size) %Ai (int32)
> %AI (int64) %As (string) etc
> (assuming the letter 'A' is unused yet - i didn't check that)
> if apr doesnt want to support such a modified function it should
> still be possible to write an svn specific one that performas
> 'search&replace' of its %A specific flags into the APR_SIZE_T_FMT
> macros.

That's the solution I like the least, because ut's not just a simple
search and replace -- at least not in the standard printf syntax -- and
it slows things down tremendously. On the other hand, none of the
printfs is in a performance-critial part of the code.

> on the other hand all this feels like reinventing another wheel
> (printf) - i wonder how other applications solve this
> gettext+platform_specific_format problem. I'm also not sure about
> if a handwritten printf will be slower than the system (libc?)
> printf. - btw, since this is opensource, is it possible to
> 'grab' the standard libc printf and modify it for this?

Um. There _is_ no "standard" printf. If you're thinking about the one in
glibc, that's a no-no because of GPL -- and it's likely to be slower
than APR's version, because IIRC it's designed to be extensible.

Oh, well...

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Is it woth to have SVN_REVNUM_T_FMT?

Posted by "C.A.T.Magic" <c....@gmx.at>.
Branko Čibej wrote:
> C.A.T.Magic wrote:
> 
> 
>>Branko Cibej wrote:
>>
>>
>>>-- Always use %ld (or %lu), and cast the values. Not good, because
>>>   we then truncate 64-bit values on 32-bit platforms. Unfortunately
>>>   there is no portable format string for 64-bit integers.
>>>
>>>-- Define our own formats (e.g., %R for revision numbers) and write
>>>   our own printf format scanner that converts them to the defined
>>>   macros (urgh!)
>>
>>
>>why doesn't
>>     apr_file_printf
>>do this by itself?
>>this way you could get rid of all these weird
>>     apr_file_printf(out, "%" APR_SIZE_T_FMT " bytes were read\n",
>>                    total_bytes);
>>constructs at the same time.
>>I think APR could even be modified for this without creating
>>incompatibilities with existing applications,
>>because "%" APR_SIZE_T_FMT " bytes" would simply resolve to
>>something like "%Z bytes" for all applications.
>>but i don't know the design decisions for this in APR.
> 
> 
> These could of course be added to APR, and that would be the nicest
> solution -- *if* we use APR's vformatter functions exclusively. But
> that's something we can't require from all clients.

maybe the modified APR (or svn) vformatter could support BOTH,
old AND newstyle formats at the same time, this way other
applications can continue to use the existing format names,
e.g. %ld %s AND new ones like %Az (size) %Ai (int32)
%AI (int64) %As (string) etc
(assuming the letter 'A' is unused yet - i didn't check that)
if apr doesnt want to support such a modified function it should
still be possible to write an svn specific one that performas
'search&replace' of its %A specific flags into the APR_SIZE_T_FMT
macros.
on the other hand all this feels like reinventing another wheel
(printf) - i wonder how other applications solve this 
gettext+platform_specific_format problem. I'm also not sure about
if a handwritten printf will be slower than the system (libc?)
printf. - btw, since this is opensource, is it possible to
'grab' the standard libc printf and modify it for this?

:-)
c.a.t.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Is it woth to have SVN_REVNUM_T_FMT?

Posted by Branko Čibej <br...@xbc.nu>.
C.A.T.Magic wrote:

> Branko Cibej wrote:
>
>> -- Always use %ld (or %lu), and cast the values. Not good, because
>>    we then truncate 64-bit values on 32-bit platforms. Unfortunately
>>    there is no portable format string for 64-bit integers.
>>
>> -- Define our own formats (e.g., %R for revision numbers) and write
>>    our own printf format scanner that converts them to the defined
>>    macros (urgh!)
>
>
> why doesn't
>      apr_file_printf
> do this by itself?
> this way you could get rid of all these weird
>      apr_file_printf(out, "%" APR_SIZE_T_FMT " bytes were read\n",
>                     total_bytes);
> constructs at the same time.
> I think APR could even be modified for this without creating
> incompatibilities with existing applications,
> because "%" APR_SIZE_T_FMT " bytes" would simply resolve to
> something like "%Z bytes" for all applications.
> but i don't know the design decisions for this in APR.

These could of course be added to APR, and that would be the nicest
solution -- *if* we use APR's vformatter functions exclusively. But
that's something we can't require from all clients.

Or can we? Personally I'd be quite happy whith this restriction.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Is it woth to have SVN_REVNUM_T_FMT?

Posted by "C.A.T.Magic" <c....@gmx.at>.
Branko Cibej wrote:

> Quoting John Peacock <jp...@rowman.com>:
> 
> 
>>Nicolás Lichtmaier wrote:
>>
>>
>>>>>xgettext scans the source files wihout preprocessing them. And it
>>
>>makes
>>
>>>>>sense because any change in the text would render the translations
>>>>>obsolete.
>>
>>Except that doesn't make sense!  The whole point of preprocessing macros
>>is to 
>>change elements of the source code based on compile-time choices.  And
>>that 
>>includes string literals.  If xgettext cannot handle that, it is a
>>serious 
>>limitation.
>>
>>
>>>It could be done, but IMO format strings should be *constant* when
>>
>>using 
>>
>>>gettext. So I see just two solutions, to use a function
>>
>>format_revision() 
>>
>>>or just to replace this macro with the proper "%ld" and move over
>>
>>=).
>>
>>In this case, yes we should probably just hardcode %ld, but the use of
>>macros to 
>>provide crossplatform format strings is widespread (some might consider
>>it 
>>pervasive).  A better way of scanning the source files (preferrably
>>after 
>>preprocessing) would be the preferred way to deal with this, rather than
>>cripple 
>>the code to fit xgettext's limitations.
> 
> 
> I said before that in order to use gettext, we must get rid of all defined
> format strings (ours and APRs). Scanning the files after preprocessing doesn't
> help, because the format strings mmy be change between platforms, and the last
> thing you want is platform-specific translations.
> 
> There are several ways to do away with defined format strings, none of them are
> very nice:
> 
> -- Always use %ld (or %lu), and cast the values. Not good, because
>    we then truncate 64-bit values on 32-bit platforms. Unfortunately
>    there is no portable format string for 64-bit integers.
> 
> -- Define our own formats (e.g., %R for revision numbers) and write
>    our own printf format scanner that converts them to the defined
>    macros (urgh!)

why doesn't
      apr_file_printf
do this by itself?
this way you could get rid of all these weird
      apr_file_printf(out, "%" APR_SIZE_T_FMT " bytes were read\n",
                     total_bytes);
constructs at the same time.
I think APR could even be modified for this without creating 
incompatibilities with existing applications,
because "%" APR_SIZE_T_FMT " bytes" would simply resolve to
something like "%Z bytes" for all applications.
but i don't know the design decisions for this in APR.

> -- Extend gettext to support macro expansion in the translations,
>    and use the defined macros explicitly (aargh!)
> 
> -- Convert all non-strings to strings internally, and use only %s
>    in the translated format strings.
> 
> I sort of feel that the last option is the closest to being feasible in the
> short term.

a little more work for implementing a special printf will make
your life much easier than to wade through the code and change 
everything a second time after that "short term" :-)
and a later change of the syntax may render 20 language files unusable...

:-)
c.a.t.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Is it woth to have SVN_REVNUM_T_FMT?

Posted by Branko Cibej <br...@xbc.nu>.
Quoting John Peacock <jp...@rowman.com>:

> Nicolás Lichtmaier wrote:
> 
> > 
> >>> xgettext scans the source files wihout preprocessing them. And it
> makes
> >>> sense because any change in the text would render the translations
> >>> obsolete.
> 
> Except that doesn't make sense!  The whole point of preprocessing macros
> is to 
> change elements of the source code based on compile-time choices.  And
> that 
> includes string literals.  If xgettext cannot handle that, it is a
> serious 
> limitation.
> 
> > It could be done, but IMO format strings should be *constant* when
> using 
> > gettext. So I see just two solutions, to use a function
> format_revision() 
> > or just to replace this macro with the proper "%ld" and move over
> =).
> 
> In this case, yes we should probably just hardcode %ld, but the use of
> macros to 
> provide crossplatform format strings is widespread (some might consider
> it 
> pervasive).  A better way of scanning the source files (preferrably
> after 
> preprocessing) would be the preferred way to deal with this, rather than
> cripple 
> the code to fit xgettext's limitations.

I said before that in order to use gettext, we must get rid of all defined
format strings (ours and APRs). Scanning the files after preprocessing doesn't
help, because the format strings mmy be change between platforms, and the last
thing you want is platform-specific translations.

There are several ways to do away with defined format strings, none of them are
very nice:

-- Always use %ld (or %lu), and cast the values. Not good, because
   we then truncate 64-bit values on 32-bit platforms. Unfortunately
   there is no portable format string for 64-bit integers.

-- Define our own formats (e.g., %R for revision numbers) and write
   our own printf format scanner that converts them to the defined
   macros (urgh!)

-- Extend gettext to support macro expansion in the translations,
   and use the defined macros explicitly (aargh!)

-- Convert all non-strings to strings internally, and use only %s
   in the translated format strings.

I sort of feel that the last option is the closest to being feasible in the
short term.


-- Brane

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Is it woth to have SVN_REVNUM_T_FMT?

Posted by John Peacock <jp...@rowman.com>.
Nicolás Lichtmaier wrote:

> 
>>> xgettext scans the source files wihout preprocessing them. And it makes
>>> sense because any change in the text would render the translations
>>> obsolete.

Except that doesn't make sense!  The whole point of preprocessing macros is to 
change elements of the source code based on compile-time choices.  And that 
includes string literals.  If xgettext cannot handle that, it is a serious 
limitation.

> It could be done, but IMO format strings should be *constant* when using 
> gettext. So I see just two solutions, to use a function format_revision() 
> or just to replace this macro with the proper "%ld" and move over =).

In this case, yes we should probably just hardcode %ld, but the use of macros to 
provide crossplatform format strings is widespread (some might consider it 
pervasive).  A better way of scanning the source files (preferrably after 
preprocessing) would be the preferred way to deal with this, rather than cripple 
the code to fit xgettext's limitations.

John

-- 
John Peacock
Director of Information Research and Technology
Rowman & Littlefield Publishing Group
4501 Forbes Boulevard
Suite H
Lanham, MD  20706
301-459-3366 x.5010
fax 301-429-5748

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Is it woth to have SVN_REVNUM_T_FMT?

Posted by Nicolás Lichtmaier <ni...@reloco.com.ar>.
>>xgettext scans the source files wihout preprocessing them. And it makes
>>sense because any change in the text would render the translations
>>obsolete.
>>    
>>
>Oh, I see... 
>
>How about writing a small wrapper script that does this special preprocessing 
>and passing that output to xgettext ? This would of course be a bad hack, 
>especially since it wouldn't catch other #defines.
>
>Or even write a wrapper script that calls the cpp preprocessor on a source 
>file and feed the preprocessed source to xgettext ? I'm not sure whether this 
>could result in unwanted extra strings or something like that...
>  
>

It could be done, but IMO format strings should be *constant* when using 
gettext. So I see just to solutions, to use a function format_revision() 
ot just to replace this macro with the proper "%ld" and move over =).


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Is it woth to have SVN_REVNUM_T_FMT?

Posted by Marc Haisenko <ha...@webport.de>.
On Monday 05 April 2004 15:30, Nicolás Lichtmaier wrote:
> >As SVN_REVNUM_T_FMT is a #define the C preprocessor converts this into
> > just one string: "Revision: ld%" (I'm sure you've misquoted the example
> > ;-) So this should be fine for xgettext, or is xgettext a macro ?
>
> xgettext scans the source files wihout preprocessing them. And it makes
> sense because any change in the text would render the translations
> obsolete.

Oh, I see... 

How about writing a small wrapper script that does this special preprocessing 
and passing that output to xgettext ? This would of course be a bad hack, 
especially since it wouldn't catch other #defines.

Or even write a wrapper script that calls the cpp preprocessor on a source 
file and feed the preprocessed source to xgettext ? I'm not sure whether this 
could result in unwanted extra strings or something like that...

-- 
Marc Haisenko
Systemspezialist
Webport IT-Services GmbH
mailto: haisenko@webport.de

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Is it woth to have SVN_REVNUM_T_FMT?

Posted by Nicolás Lichtmaier <ni...@reloco.com.ar>.
>As SVN_REVNUM_T_FMT is a #define the C preprocessor converts this into just 
>one string: "Revision: ld%" (I'm sure you've misquoted the example ;-) So 
>this should be fine for xgettext, or is xgettext a macro ?
>  
>

xgettext scans the source files wihout preprocessing them. And it makes 
sense because any change in the text would render the translations obsolete.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Is it woth to have SVN_REVNUM_T_FMT?

Posted by Marc Haisenko <ha...@webport.de>.
On Monday 05 April 2004 14:59, Nicolás Lichtmaier wrote:
> The current use of SVN_REVNUM_T_FMT is not compatible with gettext. Why?
> Because xgettext can't process this:
>
> _("Revision: " SVN_REVNUM_T_FMT "%")
>
> At first thought creating a function... like
> get_human_readable_revision, but... Is it worth? SVN_REVNUM_T_FMT only
> has "ld", which is a standard C thing. It will probably never be
> changed? Isn't this a case of overengineering?

As SVN_REVNUM_T_FMT is a #define the C preprocessor converts this into just 
one string: "Revision: ld%" (I'm sure you've misquoted the example ;-) So 
this should be fine for xgettext, or is xgettext a macro ?

-- 
Marc Haisenko
Systemspezialist
Webport IT-Services GmbH
mailto: haisenko@webport.de

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Is it woth to have SVN_REVNUM_T_FMT?

Posted by Sander Striker <st...@apache.org>.
Hi Ulrich,

Given your background with i18n, would you care to shed some
light on how to use xgettext in this situation?  Isn't there
some option to make xgettext scan after a preprocessor step?
(I can think of some sideeffects of the preprocessor :( )

Maybe some general advise on how to go about using gettext in SVN?

Thanks,

Sander

> -----Original Message-----
> From: Nicolas Lichtmaier [mailto:nick@reloco.com.ar]
> Sent: Monday, April 05, 2004 2:59 PM
> To: dev@subversion.tigris.org
> Subject: Is it woth to have SVN_REVNUM_T_FMT?
> 
> 
> The current use of SVN_REVNUM_T_FMT is not compatible with gettext. Why? 
> Because xgettext can't process this:
> 
> _("Revision: " SVN_REVNUM_T_FMT "%")
> 
> At first thought creating a function... like 
> get_human_readable_revision, but... Is it worth? SVN_REVNUM_T_FMT only 
> has "ld", which is a standard C thing. It will probably never be 
> changed? Isn't this a case of overengineering?
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org