You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Blair Zajac <bl...@orcaware.com> on 2010/03/02 00:46:50 UTC

Re: svn commit: r917772 - in /subversion/trunk/subversion/bindings/javahl: native/ src/org/apache/subversion/javahl/callback/ src/org/tigris/subversion/javahl/ tests/org/apache/subversion/javahl/

On 03/01/2010 02:46 PM, hwright@apache.org wrote:
> Author: hwright
> Date: Mon Mar  1 22:46:45 2010
> New Revision: 917772
>
> URL: http://svn.apache.org/viewvc?rev=917772&view=rev
> Log:
> JavaHL: Return properties as byte[] throughout the callback interfaces.
>
> We use byte[] in place of String because there could be binary data in the
> property, and the conversion to String would truncate the property at any
> NULL bytes.


Plus the conversion from byte[] to String depends upon the platform's 
default character set.

I see there's a number of String's constructed from the byte[].  Those 
methods should take an additional java.nio.charset.Charset and then pass 
it's name to the String() constructor.  I don't believe there should be 
any String's constructed without a Charset argument.

Maybe the svn:date we can presume a UTF-8 character set, but the 
svn:author, svn:log we shouldn't.

Regards,
Blair

RE: svn commit: r917772 - in /subversion/trunk/subversion/bindings/javahl: native/ src/org/apache/subversion/javahl/callback/ src/org/tigris/subversion/javahl/ tests/org/apache/subversion/javahl/

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Bert Huijben wrote on Tue, 2 Mar 2010 at 09:39 +0100:
> > Maybe the svn:date we can presume a UTF-8 character set, but the
> > svn:author, svn:log we shouldn't.
> 
> For the svn:* properties we currently define we declared that they
> always use utf-8 and use '\n' as line ending. Clients are responsible
> for handling the conversions. See svn_prop_needs_translation() for
> more details. (Since 1.6 we even validate this on the filesystem or ra
> layer). 
> 

The validation of properties is done in the repos layer, actually.
The FS layer doesn't assume that properties are in UTF-8.  (It does
assume/enforce that pathnames inside the repository are in UTF-8.)

> For other properties and svn:* we haven't defined yet, we can't assume
> anything. Users might have their MP3 collection stored in them ;-)
> (Most clients I know use these same normalization rules on all
> properties they edit. E.g. TortoiseSVN doesn't support editing
> properties with Windows style line endings)
> 
> 	Bert
> 
> 

RE: svn commit: r917772 - in /subversion/trunk/subversion/bindings/javahl: native/ src/org/apache/subversion/javahl/callback/ src/org/tigris/subversion/javahl/ tests/org/apache/subversion/javahl/

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Bert Huijben wrote on Tue, 2 Mar 2010 at 09:39 +0100:
> > Maybe the svn:date we can presume a UTF-8 character set, but the
> > svn:author, svn:log we shouldn't.
> 
> For the svn:* properties we currently define we declared that they
> always use utf-8 and use '\n' as line ending. Clients are responsible
> for handling the conversions. See svn_prop_needs_translation() for
> more details. (Since 1.6 we even validate this on the filesystem or ra
> layer). 
> 

The validation of properties is done in the repos layer, actually.
The FS layer doesn't assume that properties are in UTF-8.  (It does
assume/enforce that pathnames inside the repository are in UTF-8.)

> For other properties and svn:* we haven't defined yet, we can't assume
> anything. Users might have their MP3 collection stored in them ;-)
> (Most clients I know use these same normalization rules on all
> properties they edit. E.g. TortoiseSVN doesn't support editing
> properties with Windows style line endings)
> 
> 	Bert
> 
> 

Re: svn commit: r917772 - in /subversion/trunk/subversion/bindings/javahl: native/ src/org/apache/subversion/javahl/callback/ src/org/tigris/subversion/javahl/ tests/org/apache/subversion/javahl/

Posted by Blair Zajac <bl...@orcaware.com>.
Bert Huijben wrote:
> 
>> -----Original Message-----
>> From: Blair Zajac [mailto:blair@orcaware.com]
>> Sent: dinsdag 2 maart 2010 1:47
>> To: hwright@apache.org
>> Cc: dev@subversion.apache.org
>> Subject: Re: svn commit: r917772 - in
>> /subversion/trunk/subversion/bindings/javahl: native/
>> src/org/apache/subversion/javahl/callback/ src/org/tigris/subversion/javahl/
>> tests/org/apache/subversion/javahl/
>>
>> On 03/01/2010 02:46 PM, hwright@apache.org wrote:
>>> Author: hwright
>>> Date: Mon Mar  1 22:46:45 2010
>>> New Revision: 917772
>>>
>>> URL: http://svn.apache.org/viewvc?rev=917772&view=rev
>>> Log:
>>> JavaHL: Return properties as byte[] throughout the callback interfaces.
>>>
>>> We use byte[] in place of String because there could be binary data in the
>>> property, and the conversion to String would truncate the property at any
>>> NULL bytes.
>>
>> Plus the conversion from byte[] to String depends upon the platform's
>> default character set.
>>
>> I see there's a number of String's constructed from the byte[].  Those
>> methods should take an additional java.nio.charset.Charset and then pass
>> it's name to the String() constructor.  I don't believe there should be
>> any String's constructed without a Charset argument.
>>
>> Maybe the svn:date we can presume a UTF-8 character set, but the
>> svn:author, svn:log we shouldn't.
> 
> For the svn:* properties we currently define we declared that they always use utf-8 and use '\n' as line ending. Clients are responsible for handling the conversions. See svn_prop_needs_translation() for more details. (Since 1.6 we even validate this on the filesystem or ra layer). 

OK.  So I believe we should add "UTF-8" as an additional constructor parameter 
to String for the svn:* properties.

Blair

RE: svn commit: r917772 - in /subversion/trunk/subversion/bindings/javahl: native/ src/org/apache/subversion/javahl/callback/ src/org/tigris/subversion/javahl/ tests/org/apache/subversion/javahl/

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Blair Zajac [mailto:blair@orcaware.com]
> Sent: dinsdag 2 maart 2010 1:47
> To: hwright@apache.org
> Cc: dev@subversion.apache.org
> Subject: Re: svn commit: r917772 - in
> /subversion/trunk/subversion/bindings/javahl: native/
> src/org/apache/subversion/javahl/callback/ src/org/tigris/subversion/javahl/
> tests/org/apache/subversion/javahl/
> 
> On 03/01/2010 02:46 PM, hwright@apache.org wrote:
> > Author: hwright
> > Date: Mon Mar  1 22:46:45 2010
> > New Revision: 917772
> >
> > URL: http://svn.apache.org/viewvc?rev=917772&view=rev
> > Log:
> > JavaHL: Return properties as byte[] throughout the callback interfaces.
> >
> > We use byte[] in place of String because there could be binary data in the
> > property, and the conversion to String would truncate the property at any
> > NULL bytes.
> 
> 
> Plus the conversion from byte[] to String depends upon the platform's
> default character set.
> 
> I see there's a number of String's constructed from the byte[].  Those
> methods should take an additional java.nio.charset.Charset and then pass
> it's name to the String() constructor.  I don't believe there should be
> any String's constructed without a Charset argument.
> 
> Maybe the svn:date we can presume a UTF-8 character set, but the
> svn:author, svn:log we shouldn't.

For the svn:* properties we currently define we declared that they always use utf-8 and use '\n' as line ending. Clients are responsible for handling the conversions. See svn_prop_needs_translation() for more details. (Since 1.6 we even validate this on the filesystem or ra layer). 

For other properties and svn:* we haven't defined yet, we can't assume anything. Users might have their MP3 collection stored in them ;-)
(Most clients I know use these same normalization rules on all properties they edit. E.g. TortoiseSVN doesn't support editing properties with Windows style line endings)

	Bert