You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "B. Smith-Mannschott" <bs...@gmail.com> on 2009/04/19 17:55:26 UTC

[PATCH] #3404: string function to normalize eol-style

I've been reworking my patch for #3404 and the first piece is ready
for review, so I though I'd post it.

[[[
Issue #3404: Provide string function to normalize eol-style.

* subversion/subversion/include/svn_string.h
  (svn_string_with_normalized_eol_style): new function
* subversion/libsvn_subr/svn_string.c
  (svn_string_with_normalized_eol_style): new function
]]]

Changes relative the corresponding part of my original patch:

* The static function svn_stringbuf_normalize_eol_style has been
folded into svn_string_with_normalized_eol_style(). This allows some
improvements:

- svn_stringbuf_normalize_eol_style no longer mistakenly uses the
public naming convention because it doesn't exist anymore.
- The use of a stringbuf intermediary between the two functions evaporates.
- We don't do as much needless copying. Previously we copied three
times, including the actual normalisation. Now we copy only once or
not at all.

* svn_string_with_normalized_eol_style now handles the rarer CR
eol-style convention in addition to LF and CRLF.

* svn_string_with_normalized_eol_style will now just return the
original svn_string_t if it requires no modification, saving some
work.

// Ben

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1809905

Re: [PATCH] #3404: string function to normalize eol-style

Posted by Branko Cibej <br...@xbc.nu>.
B. Smith-Mannschott wrote:
> On Sun, Apr 19, 2009 at 22:28, Stefan Sperling <st...@elego.de> wrote:
>   
>> On Sun, Apr 19, 2009 at 07:55:26PM +0200, B. Smith-Mannschott wrote:
>>     
>>> I've been reworking my patch for #3404 and the first piece is ready
>>> for review, so I though I'd post it.
>>>
>>> [[[
>>> Issue #3404: Provide string function to normalize eol-style.
>>>
>>> * subversion/subversion/include/svn_string.h
>>>   (svn_string_with_normalized_eol_style): new function
>>> * subversion/libsvn_subr/svn_string.c
>>>   (svn_string_with_normalized_eol_style): new function
>>> ]]]
>>>       
>> How is this different from svn_subst_translate_cstring2() ?
>>
>> Quoting svn_subst.h:
>>
>> /**
>>  * Convenience routine: a variant of svn_subst_translate_stream3() which
>>  * operates on cstrings.
>>  *
>>  * @since New in 1.3.
>>  *
>>  * Return a new string in @a *dst, allocated in @a pool, by copying the
>>  * contents of string @a src, possibly performing line ending and keyword
>>  * translations.
>>  *
>>  * If @a eol_str and @a keywords are @c NULL, behavior is just a byte-for-byte
>>  * copy.
>>  */
>> svn_error_t *
>> svn_subst_translate_cstring2(const char *src,
>>                             const char **dst,
>>                             const char *eol_str,
>>                             svn_boolean_t repair,
>>                             apr_hash_t *keywords,
>>                             svn_boolean_t expand,
>>                             apr_pool_t *pool);
>>
>>     
>
> I was not aware of svn_subst_translate_cstring2(), just as I'm not
> aware of -- oh, I'd say 95% of the functions that make up Subversion.
> The code is still *very* new to me.
>
> So, now I've looked it up, and followed it right down the rabbit hole.
> I've read it, skimmed the things it calls and tried to reason my way
> through the indirection via function pointers...
>
> * svn_subst_translate_cstring2  37
> *** svn_stream_from_stringbuf   18
> ***** svn_stream_create         12
> ***** (read_handler_stringbuf)
> ***** (write_handler_stringbuf)
> *** svn_stream_write             6
> ***** -> write_handler_stringbuf 8
> *** svn_subst_stream_translated 62
> ***** svn_stream_create         12
> ***** create_translation_baton  25
> ***** (translated_stream_read)  52
> ****** translate_chunk         161
> ******** translate_newline      32
> *********** translate_write     12
> ***** (translated_stream_write) 11
> ***** (translated_stream_close) 13
> ----------------------------------
>                                463
>
> Finally, in translate_chunk I found the logic that identifies new
> lines. It's rather hard to follow because of course It's doing keyword
> substitution at the same time and normalizing to an arbitrary
> eol-style, not just LF. Oh, and it'll optionally error out for you
> when it discovers inconsistencies, but I don't need that. In the end,
> I think it behaves as my function is intended to. I could use it, I
> suppose. Perhaps I should. (code reuse...)
>
> Still, it seems like overkill to get about 500 lines of svn_stream and
> translation machinery involved just to achieve eol-normalization of a
> string, which is already entirely in memory.
>   

I tend to sort of agree here ... I've been amazed by how opaque our
stream translation code is before. Perhaps if you abstract the
EOL-normalization out of translate_chunk and put it as a separate
function in svn_string?

The point of the whole exercise being, of course, that it does not
suddenly require twice as many allocations in the stream-translation
call stack.

-- Brane

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1847389

Re: [PATCH] #3404: string function to normalize eol-style

Posted by "B. Smith-Mannschott" <bs...@gmail.com>.
On Sun, Apr 19, 2009 at 22:28, Stefan Sperling <st...@elego.de> wrote:
> On Sun, Apr 19, 2009 at 07:55:26PM +0200, B. Smith-Mannschott wrote:
>> I've been reworking my patch for #3404 and the first piece is ready
>> for review, so I though I'd post it.
>>
>> [[[
>> Issue #3404: Provide string function to normalize eol-style.
>>
>> * subversion/subversion/include/svn_string.h
>>   (svn_string_with_normalized_eol_style): new function
>> * subversion/libsvn_subr/svn_string.c
>>   (svn_string_with_normalized_eol_style): new function
>> ]]]
>
> How is this different from svn_subst_translate_cstring2() ?
>
> Quoting svn_subst.h:
>
> /**
>  * Convenience routine: a variant of svn_subst_translate_stream3() which
>  * operates on cstrings.
>  *
>  * @since New in 1.3.
>  *
>  * Return a new string in @a *dst, allocated in @a pool, by copying the
>  * contents of string @a src, possibly performing line ending and keyword
>  * translations.
>  *
>  * If @a eol_str and @a keywords are @c NULL, behavior is just a byte-for-byte
>  * copy.
>  */
> svn_error_t *
> svn_subst_translate_cstring2(const char *src,
>                             const char **dst,
>                             const char *eol_str,
>                             svn_boolean_t repair,
>                             apr_hash_t *keywords,
>                             svn_boolean_t expand,
>                             apr_pool_t *pool);
>

I was not aware of svn_subst_translate_cstring2(), just as I'm not
aware of -- oh, I'd say 95% of the functions that make up Subversion.
The code is still *very* new to me.

So, now I've looked it up, and followed it right down the rabbit hole.
I've read it, skimmed the things it calls and tried to reason my way
through the indirection via function pointers...

* svn_subst_translate_cstring2  37
*** svn_stream_from_stringbuf   18
***** svn_stream_create         12
***** (read_handler_stringbuf)
***** (write_handler_stringbuf)
*** svn_stream_write             6
***** -> write_handler_stringbuf 8
*** svn_subst_stream_translated 62
***** svn_stream_create         12
***** create_translation_baton  25
***** (translated_stream_read)  52
****** translate_chunk         161
******** translate_newline      32
*********** translate_write     12
***** (translated_stream_write) 11
***** (translated_stream_close) 13
----------------------------------
                               463

Finally, in translate_chunk I found the logic that identifies new
lines. It's rather hard to follow because of course It's doing keyword
substitution at the same time and normalizing to an arbitrary
eol-style, not just LF. Oh, and it'll optionally error out for you
when it discovers inconsistencies, but I don't need that. In the end,
I think it behaves as my function is intended to. I could use it, I
suppose. Perhaps I should. (code reuse...)

Still, it seems like overkill to get about 500 lines of svn_stream and
translation machinery involved just to achieve eol-normalization of a
string, which is already entirely in memory.

// Ben

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=1829142


Re: [PATCH] #3404: string function to normalize eol-style

Posted by Stefan Sperling <st...@elego.de>.
On Sun, Apr 19, 2009 at 07:55:26PM +0200, B. Smith-Mannschott wrote:
> I've been reworking my patch for #3404 and the first piece is ready
> for review, so I though I'd post it.
> 
> [[[
> Issue #3404: Provide string function to normalize eol-style.
> 
> * subversion/subversion/include/svn_string.h
>   (svn_string_with_normalized_eol_style): new function
> * subversion/libsvn_subr/svn_string.c
>   (svn_string_with_normalized_eol_style): new function
> ]]]

How is this different from svn_subst_translate_cstring2() ?

Quoting svn_subst.h:

/**
 * Convenience routine: a variant of svn_subst_translate_stream3() which
 * operates on cstrings.
 *
 * @since New in 1.3.
 *
 * Return a new string in @a *dst, allocated in @a pool, by copying the
 * contents of string @a src, possibly performing line ending and keyword
 * translations.
 *
 * If @a eol_str and @a keywords are @c NULL, behavior is just a byte-for-byte
 * copy.
 */
svn_error_t *
svn_subst_translate_cstring2(const char *src,
                             const char **dst,
                             const char *eol_str,
                             svn_boolean_t repair,
                             apr_hash_t *keywords,
                             svn_boolean_t expand,
                             apr_pool_t *pool);

Stefan