You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Blair Zajac <bl...@orcaware.com> on 2010/03/02 00:46:50 UTC
Re: svn commit: r917772 - in /subversion/trunk/subversion/bindings/javahl:
native/ src/org/apache/subversion/javahl/callback/ src/org/tigris/subversion/javahl/
tests/org/apache/subversion/javahl/
On 03/01/2010 02:46 PM, hwright@apache.org wrote:
> Author: hwright
> Date: Mon Mar 1 22:46:45 2010
> New Revision: 917772
>
> URL: http://svn.apache.org/viewvc?rev=917772&view=rev
> Log:
> JavaHL: Return properties as byte[] throughout the callback interfaces.
>
> We use byte[] in place of String because there could be binary data in the
> property, and the conversion to String would truncate the property at any
> NULL bytes.
Plus the conversion from byte[] to String depends upon the platform's
default character set.
I see there's a number of String's constructed from the byte[]. Those
methods should take an additional java.nio.charset.Charset and then pass
it's name to the String() constructor. I don't believe there should be
any String's constructed without a Charset argument.
Maybe the svn:date we can presume a UTF-8 character set, but the
svn:author, svn:log we shouldn't.
Regards,
Blair
RE: svn commit: r917772 - in /subversion/trunk/subversion/bindings/javahl:
native/ src/org/apache/subversion/javahl/callback/ src/org/tigris/subversion/javahl/
tests/org/apache/subversion/javahl/
Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Bert Huijben wrote on Tue, 2 Mar 2010 at 09:39 +0100:
> > Maybe the svn:date we can presume a UTF-8 character set, but the
> > svn:author, svn:log we shouldn't.
>
> For the svn:* properties we currently define we declared that they
> always use utf-8 and use '\n' as line ending. Clients are responsible
> for handling the conversions. See svn_prop_needs_translation() for
> more details. (Since 1.6 we even validate this on the filesystem or ra
> layer).
>
The validation of properties is done in the repos layer, actually.
The FS layer doesn't assume that properties are in UTF-8. (It does
assume/enforce that pathnames inside the repository are in UTF-8.)
> For other properties and svn:* we haven't defined yet, we can't assume
> anything. Users might have their MP3 collection stored in them ;-)
> (Most clients I know use these same normalization rules on all
> properties they edit. E.g. TortoiseSVN doesn't support editing
> properties with Windows style line endings)
>
> Bert
>
>
RE: svn commit: r917772 - in /subversion/trunk/subversion/bindings/javahl:
native/ src/org/apache/subversion/javahl/callback/ src/org/tigris/subversion/javahl/
tests/org/apache/subversion/javahl/
Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Bert Huijben wrote on Tue, 2 Mar 2010 at 09:39 +0100:
> > Maybe the svn:date we can presume a UTF-8 character set, but the
> > svn:author, svn:log we shouldn't.
>
> For the svn:* properties we currently define we declared that they
> always use utf-8 and use '\n' as line ending. Clients are responsible
> for handling the conversions. See svn_prop_needs_translation() for
> more details. (Since 1.6 we even validate this on the filesystem or ra
> layer).
>
The validation of properties is done in the repos layer, actually.
The FS layer doesn't assume that properties are in UTF-8. (It does
assume/enforce that pathnames inside the repository are in UTF-8.)
> For other properties and svn:* we haven't defined yet, we can't assume
> anything. Users might have their MP3 collection stored in them ;-)
> (Most clients I know use these same normalization rules on all
> properties they edit. E.g. TortoiseSVN doesn't support editing
> properties with Windows style line endings)
>
> Bert
>
>
Re: svn commit: r917772 - in /subversion/trunk/subversion/bindings/javahl:
native/ src/org/apache/subversion/javahl/callback/ src/org/tigris/subversion/javahl/
tests/org/apache/subversion/javahl/
Posted by Blair Zajac <bl...@orcaware.com>.
Bert Huijben wrote:
>
>> -----Original Message-----
>> From: Blair Zajac [mailto:blair@orcaware.com]
>> Sent: dinsdag 2 maart 2010 1:47
>> To: hwright@apache.org
>> Cc: dev@subversion.apache.org
>> Subject: Re: svn commit: r917772 - in
>> /subversion/trunk/subversion/bindings/javahl: native/
>> src/org/apache/subversion/javahl/callback/ src/org/tigris/subversion/javahl/
>> tests/org/apache/subversion/javahl/
>>
>> On 03/01/2010 02:46 PM, hwright@apache.org wrote:
>>> Author: hwright
>>> Date: Mon Mar 1 22:46:45 2010
>>> New Revision: 917772
>>>
>>> URL: http://svn.apache.org/viewvc?rev=917772&view=rev
>>> Log:
>>> JavaHL: Return properties as byte[] throughout the callback interfaces.
>>>
>>> We use byte[] in place of String because there could be binary data in the
>>> property, and the conversion to String would truncate the property at any
>>> NULL bytes.
>>
>> Plus the conversion from byte[] to String depends upon the platform's
>> default character set.
>>
>> I see there's a number of String's constructed from the byte[]. Those
>> methods should take an additional java.nio.charset.Charset and then pass
>> it's name to the String() constructor. I don't believe there should be
>> any String's constructed without a Charset argument.
>>
>> Maybe the svn:date we can presume a UTF-8 character set, but the
>> svn:author, svn:log we shouldn't.
>
> For the svn:* properties we currently define we declared that they always use utf-8 and use '\n' as line ending. Clients are responsible for handling the conversions. See svn_prop_needs_translation() for more details. (Since 1.6 we even validate this on the filesystem or ra layer).
OK. So I believe we should add "UTF-8" as an additional constructor parameter
to String for the svn:* properties.
Blair
RE: svn commit: r917772 - in /subversion/trunk/subversion/bindings/javahl: native/ src/org/apache/subversion/javahl/callback/ src/org/tigris/subversion/javahl/ tests/org/apache/subversion/javahl/
Posted by Bert Huijben <be...@qqmail.nl>.
> -----Original Message-----
> From: Blair Zajac [mailto:blair@orcaware.com]
> Sent: dinsdag 2 maart 2010 1:47
> To: hwright@apache.org
> Cc: dev@subversion.apache.org
> Subject: Re: svn commit: r917772 - in
> /subversion/trunk/subversion/bindings/javahl: native/
> src/org/apache/subversion/javahl/callback/ src/org/tigris/subversion/javahl/
> tests/org/apache/subversion/javahl/
>
> On 03/01/2010 02:46 PM, hwright@apache.org wrote:
> > Author: hwright
> > Date: Mon Mar 1 22:46:45 2010
> > New Revision: 917772
> >
> > URL: http://svn.apache.org/viewvc?rev=917772&view=rev
> > Log:
> > JavaHL: Return properties as byte[] throughout the callback interfaces.
> >
> > We use byte[] in place of String because there could be binary data in the
> > property, and the conversion to String would truncate the property at any
> > NULL bytes.
>
>
> Plus the conversion from byte[] to String depends upon the platform's
> default character set.
>
> I see there's a number of String's constructed from the byte[]. Those
> methods should take an additional java.nio.charset.Charset and then pass
> it's name to the String() constructor. I don't believe there should be
> any String's constructed without a Charset argument.
>
> Maybe the svn:date we can presume a UTF-8 character set, but the
> svn:author, svn:log we shouldn't.
For the svn:* properties we currently define we declared that they always use utf-8 and use '\n' as line ending. Clients are responsible for handling the conversions. See svn_prop_needs_translation() for more details. (Since 1.6 we even validate this on the filesystem or ra layer).
For other properties and svn:* we haven't defined yet, we can't assume anything. Users might have their MP3 collection stored in them ;-)
(Most clients I know use these same normalization rules on all properties they edit. E.g. TortoiseSVN doesn't support editing properties with Windows style line endings)
Bert