You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Kamesh Jayachandran <ka...@collab.net> on 2006/11/20 12:59:01 UTC

[PATCH][merge-tracking]Data loss due to line buffered SAX stream on mergeinfo retrieval.

Hi All,
Find the patch and log.

You could see this issue with url to url copy over ra_dav if source of 
copy is having multiple mergeinfos.


With regards
Kamesh Jayachandran

Re: [PATCH][merge-tracking]Data loss due to line buffered SAX stream on mergeinfo retrieval.

Posted by Kamesh Jayachandran <ka...@collab.net>.
Found one small issue with this patch and fixed the same.

Attaching the new patch and log.

With regards
Kamesh Jayachandran
Kamesh Jayachandran wrote:
> Hi All,
> Find the patch and log.
>
> You could see this issue with url to url copy over ra_dav if source of 
> copy is having multiple mergeinfos.
>
>
> With regards
> Kamesh Jayachandran
> ------------------------------------------------------------------------
>
> [[[
> Data loss due to line buffered SAX stream on cdata_handler.
>
> * subversion/libsvn_ra_dav/mergeinfo.c
>   (cdata_handler): 
>    This function is called for each line in the data for a given tag 
>    and hence we need to concatenate the current data with the 
>    previously accumulated data.
>
> Patch by: Kamesh Jayachandran <ka...@collab.net>
> ]]]
>   
> ------------------------------------------------------------------------
>
> Index: subversion/libsvn_ra_dav/mergeinfo.c
> ===================================================================
> --- subversion/libsvn_ra_dav/mergeinfo.c	(revision 22315)
> +++ subversion/libsvn_ra_dav/mergeinfo.c	(working copy)
> @@ -127,6 +127,8 @@
>  {
>    struct mergeinfo_baton *mb = baton;
>    apr_size_t nlen = len;
> +  char tmpc;
> +  char *mutable_cdata = (char*)cdata;
>  
>    switch (state)
>      {
> @@ -134,7 +136,13 @@
>        mb->curr_path = apr_pstrndup(mb->pool, cdata, nlen);
>        break;
>      case ELEM_merge_info_info:
> -      mb->curr_info = apr_pstrndup(mb->pool, cdata, nlen);
> +      /* if the data is having multiple lines we need to concatenate the
> +       * current data with the previous one.
> +       */
> +      tmpc = mutable_cdata[nlen];
> +      mutable_cdata[len]='\0';
> +      mb->curr_info = apr_pstrcat(mb->pool, mb->curr_info, cdata, NULL);
> +      mutable_cdata[nlen] = tmpc;
>        break;
>      default:
>        break;
>
>   
> ------------------------------------------------------------------------
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org


Re: [PATCH][merge-tracking]Data loss due to line buffered SAX stream on mergeinfo retrieval.

Posted by Kamesh Jayachandran <ka...@collab.net>.
>> I'm assuming that we shouldn't be touching memory past the end of
>> "cdata[len - 1]", as the API semantics of our SAX parser don't imply
>> we have any business with it...something like the patch I provided
>> above seems like a safer and more clear way to go.
>>
>> - Dan
>>   
> Now I have another thought. Why not use string streams? which should 
> be more clean and efficient way to address the problem?
>
> Will post a patch soon.
>
> With regards
> Kamesh Jayachandran
>
Realised 'stringbuf streams' won't fit the bill.
Did it using stringbuf which looks much clean and posted a patch with a 
subject 'ra_dav mergeinfo accumulation over 'stringbuf' rather than a 
'char *''

With regards
Kamesh Jayachandran

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH][merge-tracking]Data loss due to line buffered SAX stream on mergeinfo retrieval.

Posted by Kamesh Jayachandran <ka...@collab.net>.
Daniel Rall wrote:
> On Tue, 21 Nov 2006, Kamesh Jayachandran wrote:
>
>   
>>>> --- subversion/libsvn_ra_dav/mergeinfo.c	(revision 22315)
>>>> +++ subversion/libsvn_ra_dav/mergeinfo.c	(working copy)
>>>> @@ -127,6 +127,8 @@
>>>>         
> ...
>   
>>>> @@ -134,7 +136,13 @@
>>>>       mb->curr_path = apr_pstrndup(mb->pool, cdata, nlen);
>>>>       break;
>>>>     case ELEM_merge_info_info:
>>>> -      mb->curr_info = apr_pstrndup(mb->pool, cdata, nlen);
>>>> +      /* if the data is having multiple lines we need to concatenate the
>>>> +       * current data with the previous one.
>>>> +       */
>>>> +      tmpc = mutable_cdata[nlen];
>>>>         
>>> This will index off the end of the "mutable_cdata" string, which is
>>> only "len" character long, rather than the "len + 1" characters this
>>> would require.
>>>       
>> It is 'len' not 'len + 1'.
>> Say for data 
>> '/trunk/abc:1-15....something_else_we_should_not_bother........................' 
>> len is 16(including the new line).
>>     
>
> Are you trying to represent a piece of memory here?  cdata[16] would
> be the 17th character in the string, which is one character off of the
> end of the memory which we're supposed to have access to in this
> context.
>
>   
>> so setting cdata[16] = '\0' sets the overwrites a char next to '\n'.
>>     
>
> Yeah, one character past the newline.
>
>   
>>> Looks like an attempt to avoid copying "cdata"...
>>>       
>> Yes.
>>     
>
> I took a stab at making this more efficient -- would you test out this
> patch?
>
> Index: subversion/libsvn_ra_dav/mergeinfo.c
> ===================================================================
> --- subversion/libsvn_ra_dav/mergeinfo.c        (revision 22373)
> +++ subversion/libsvn_ra_dav/mergeinfo.c        (working copy)
> @@ -137,9 +137,12 @@
>      case ELEM_merge_info_info:
>        if (mb->curr_info)
>          {
> -          char *to_append = apr_pstrmemdup(mb->pool, cdata, len);
> -          mb->curr_info = apr_pstrcat(mb->pool, mb->curr_info, to_append,
> -                                      NULL);
> +          struct iovec fragments[] =
> +            {
> +              { (void *) mb->curr_info, strlen(mb->curr_info) },
> +              { (void *) cdata, len }
> +            };
> +          mb->curr_info = apr_pstrcatv(mb->pool, fragments, 2, NULL);
>          }
>        else
>          {
>
>
> ...
>   
>> Are you ok if I give a patch which uses a 'memory clobbering+restoration 
>> approach' with consistent 'len or nlen not both'?
>>     
>
> I'm assuming that we shouldn't be touching memory past the end of
> "cdata[len - 1]", as the API semantics of our SAX parser don't imply
> we have any business with it...something like the patch I provided
> above seems like a safer and more clear way to go.
>
> - Dan
>   
Now I have another thought. Why not use string streams? which should be 
more clean and efficient way to address the problem?

Will post a patch soon.

With regards
Kamesh Jayachandran

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH][merge-tracking]Data loss due to line buffered SAX stream on mergeinfo retrieval.

Posted by Daniel Rall <dl...@collab.net>.
On Tue, 21 Nov 2006, Kamesh Jayachandran wrote:

> >>--- subversion/libsvn_ra_dav/mergeinfo.c	(revision 22315)
> >>+++ subversion/libsvn_ra_dav/mergeinfo.c	(working copy)
> >>@@ -127,6 +127,8 @@
...
> >>@@ -134,7 +136,13 @@
> >>       mb->curr_path = apr_pstrndup(mb->pool, cdata, nlen);
> >>       break;
> >>     case ELEM_merge_info_info:
> >>-      mb->curr_info = apr_pstrndup(mb->pool, cdata, nlen);
> >>+      /* if the data is having multiple lines we need to concatenate the
> >>+       * current data with the previous one.
> >>+       */
> >>+      tmpc = mutable_cdata[nlen];
> >
> >This will index off the end of the "mutable_cdata" string, which is
> >only "len" character long, rather than the "len + 1" characters this
> >would require.
>
> It is 'len' not 'len + 1'.
> Say for data 
> '/trunk/abc:1-15....something_else_we_should_not_bother........................' 
> len is 16(including the new line).

Are you trying to represent a piece of memory here?  cdata[16] would
be the 17th character in the string, which is one character off of the
end of the memory which we're supposed to have access to in this
context.

> so setting cdata[16] = '\0' sets the overwrites a char next to '\n'.

Yeah, one character past the newline.

> >Looks like an attempt to avoid copying "cdata"...
>
> Yes.

I took a stab at making this more efficient -- would you test out this
patch?

Index: subversion/libsvn_ra_dav/mergeinfo.c
===================================================================
--- subversion/libsvn_ra_dav/mergeinfo.c        (revision 22373)
+++ subversion/libsvn_ra_dav/mergeinfo.c        (working copy)
@@ -137,9 +137,12 @@
     case ELEM_merge_info_info:
       if (mb->curr_info)
         {
-          char *to_append = apr_pstrmemdup(mb->pool, cdata, len);
-          mb->curr_info = apr_pstrcat(mb->pool, mb->curr_info, to_append,
-                                      NULL);
+          struct iovec fragments[] =
+            {
+              { (void *) mb->curr_info, strlen(mb->curr_info) },
+              { (void *) cdata, len }
+            };
+          mb->curr_info = apr_pstrcatv(mb->pool, fragments, 2, NULL);
         }
       else
         {


...
> Are you ok if I give a patch which uses a 'memory clobbering+restoration 
> approach' with consistent 'len or nlen not both'?

I'm assuming that we shouldn't be touching memory past the end of
"cdata[len - 1]", as the API semantics of our SAX parser don't imply
we have any business with it...something like the patch I provided
above seems like a safer and more clear way to go.

- Dan

Re: [PATCH][merge-tracking]Data loss due to line buffered SAX stream on mergeinfo retrieval.

Posted by Kamesh Jayachandran <ka...@collab.net>.
>   
>> --- subversion/libsvn_ra_dav/mergeinfo.c	(revision 22315)
>> +++ subversion/libsvn_ra_dav/mergeinfo.c	(working copy)
>> @@ -127,6 +127,8 @@
>>  {
>>    struct mergeinfo_baton *mb = baton;
>>    apr_size_t nlen = len;
>> +  char tmpc;
>> +  char *mutable_cdata = (char*)cdata;
>>  
>>    switch (state)
>>      {
>> @@ -134,7 +136,13 @@
>>        mb->curr_path = apr_pstrndup(mb->pool, cdata, nlen);
>>        break;
>>      case ELEM_merge_info_info:
>> -      mb->curr_info = apr_pstrndup(mb->pool, cdata, nlen);
>> +      /* if the data is having multiple lines we need to concatenate the
>> +       * current data with the previous one.
>> +       */
>> +      tmpc = mutable_cdata[nlen];
>>     
>
> This will index off the end of the "mutable_cdata" string, which is
> only "len" character long, rather than the "len + 1" characters this
> would require.
>   
It is 'len' not 'len + 1'.
Say for data 
'/trunk/abc:1-15....something_else_we_should_not_bother........................' 
len is 16(including the new line).
so setting cdata[16] = '\0' sets the overwrites a char next to '\n'.
> Looks like an attempt to avoid copying "cdata"...
>   
Yes.
>   
>> +      mutable_cdata[len]='\0';
>>     
>
> We did it again, only this time we clobbered something else's memory.
> This time, we used "len" rather than "nlen" as our offset variable.
> Which we use doesn't really matter, but we should be consistent.
>
> Spacing is also missing from around the "=" character.
>
>   
Sorry for the confusion.
>> +      mb->curr_info = apr_pstrcat(mb->pool, mb->curr_info, cdata, NULL);
>>     
>
> What happens when mb->curr_info hasn't been set yet?
>   
Yes realised the same and sent the followup patch on the same. Anyway 
you fixed the same.
>   
>> +      mutable_cdata[nlen] = tmpc;
>>        break;
>>     
>
> Now we attempt to restore the data we previously clobbered, again
> changing the offset variable we're using.
>   

Yes I agree that 'inconsistent variable usage is bad'.

Are you ok if I give a patch which uses a 'memory clobbering+restoration 
approach' with consistent 'len or nlen not both'?

With regards
Kamesh Jayachandran

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH][merge-tracking]Data loss due to line buffered SAX stream on mergeinfo retrieval.

Posted by Daniel Rall <dl...@collab.net>.
On Mon, 20 Nov 2006, Kamesh Jayachandran wrote:
...
> You could see this issue with url to url copy over ra_dav if source
> of copy is having multiple mergeinfos.

Nice find, Kamesh!  I (and DannyB, apparently) didn't realize that our
SAX CDATA handlers were invoked once per line.  Given that's the case,
we need to handle merge info with both:

o A single merge source (one line)
o Multiple merge sources (N lines, one per source)

I've committed a similar fix in r22369, which also addresses my
comments on the patch below.


> --- subversion/libsvn_ra_dav/mergeinfo.c	(revision 22315)
> +++ subversion/libsvn_ra_dav/mergeinfo.c	(working copy)
> @@ -127,6 +127,8 @@
>  {
>    struct mergeinfo_baton *mb = baton;
>    apr_size_t nlen = len;
> +  char tmpc;
> +  char *mutable_cdata = (char*)cdata;
>  
>    switch (state)
>      {
> @@ -134,7 +136,13 @@
>        mb->curr_path = apr_pstrndup(mb->pool, cdata, nlen);
>        break;
>      case ELEM_merge_info_info:
> -      mb->curr_info = apr_pstrndup(mb->pool, cdata, nlen);
> +      /* if the data is having multiple lines we need to concatenate the
> +       * current data with the previous one.
> +       */
> +      tmpc = mutable_cdata[nlen];

This will index off the end of the "mutable_cdata" string, which is
only "len" character long, rather than the "len + 1" characters this
would require.

Looks like an attempt to avoid copying "cdata"...

> +      mutable_cdata[len]='\0';

We did it again, only this time we clobbered something else's memory.
This time, we used "len" rather than "nlen" as our offset variable.
Which we use doesn't really matter, but we should be consistent.

Spacing is also missing from around the "=" character.

> +      mb->curr_info = apr_pstrcat(mb->pool, mb->curr_info, cdata, NULL);

What happens when mb->curr_info hasn't been set yet?

> +      mutable_cdata[nlen] = tmpc;
>        break;

Now we attempt to restore the data we previously clobbered, again
changing the offset variable we're using.