You are viewing a plain text version of this content. The canonical link for it is here.
Posted to slide-user@jakarta.apache.org by Jacob Lund <jl...@qualiware.com> on 2004/01/28 12:00:35 UTC

TXFileStore and local filesystem

Hi All!

Has anyone tried the TXFile store with filenames that includes characters
that are not allowed in the server file system?

I am using windows XP and the UI supports UTF8. My XP is English and local
is set to Danish! If I upload a file with Russian characters in the filename
then slide fails with a 500 error. Slide is set to UFT8 and running on TC
5.0.16.

My slide version I about a week old (from the SLIDE_2_0_RELEASE_BRANCH)

I will dig deeper into this, but I just wanted to hear if anyone else has
experience with this.

Without knowing anything for sure I think that java will apply the "DOS"
codepage to all IO operations. If this is the true then the filestore needs
to save all information in an escaped UTF8 mannor. 

/Jacob


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


Re: TXFileStore and local filesystem

Posted by Oliver Zeigermann <oz...@c1-fse.de>.
Michael Smith wrote:
> Oliver Zeigermann wrote:
> 
>> Jacob Lund wrote:
>>
>>> No file is created by the filestore!
>>>
>>> German letters and danish letters works fine for me too - and they 
>>> are also supported by the codepage.
>>>
>>> To avoid these problems, why not save the files in escaped UTF-8 
>>> codes - that way wee always know that is will work?!?
>>
>>
>>
>> Hmmm. Escaped? Like URL encoded? This might blow up the length quite a 
>> bit. As I know of file systems having at most 255 characters per path 
>> (like the ones both of us use) this may be undesirable for the default 
>> application. What about adding a switch for this and make it 
>> configurable?
>>
> 
> It's a slightly more complex problem than this, and needs a solution in 
> at least two places:
>   1) Filenames that aren't allowed
>      a) 'High' unicode characters, etc. These should be escaped somehow.
>      b) Disallowed characters in the low part - for example, windows
>         doesn't permit several characters such as colons and slashes.
>         This is _and must remain_ platform specific - which is painful.
>         These should get escaped too.

URLEncoder does this for us. As far as I remember it uses 7 bit ASCII 
without any special characters. The rest is encoded in %xx. So, when it 
is UTF-8 what you put into it, I do not think we have a problem with 
this any more. The call might look like

> path = URLEncoder.encode(path, "UTF-8");


>   2) What to do if we fail anyway. We might not escape everything
>      (i.e. we might just make a decision to disallow certain characters
>      in filenames), or the escaped filename might exceed the maximum
>      length allowed by the filesystem. In this case, we need to
>      propogate some appropriate error up to the application (webdav)
>      level, so we don't just give a 400.

That's right! Same thing with the relational database store. The problem 
is we only have ServiceAccessException, but we could nest others into 
it. How could this exception be handled in the WebDAV layer to generate 
a proper error message and code? I simply do no know...

Oliver


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


Re: TXFileStore and local filesystem

Posted by Michael Smith <ms...@speedlegal.com>.
Oliver Zeigermann wrote:
> Jacob Lund wrote:
> 
>> No file is created by the filestore!
>>
>> German letters and danish letters works fine for me too - and they are 
>> also supported by the codepage.
>>
>> To avoid these problems, why not save the files in escaped UTF-8 codes 
>> - that way wee always know that is will work?!?
> 
> 
> Hmmm. Escaped? Like URL encoded? This might blow up the length quite a 
> bit. As I know of file systems having at most 255 characters per path 
> (like the ones both of us use) this may be undesirable for the default 
> application. What about adding a switch for this and make it configurable?
> 

It's a slightly more complex problem than this, and needs a solution in 
at least two places:
   1) Filenames that aren't allowed
      a) 'High' unicode characters, etc. These should be escaped somehow.
      b) Disallowed characters in the low part - for example, windows
         doesn't permit several characters such as colons and slashes.
         This is _and must remain_ platform specific - which is painful.
         These should get escaped too.
   2) What to do if we fail anyway. We might not escape everything
      (i.e. we might just make a decision to disallow certain characters
      in filenames), or the escaped filename might exceed the maximum
      length allowed by the filesystem. In this case, we need to
      propogate some appropriate error up to the application (webdav)
      level, so we don't just give a 400.

Thoughts?

Mike


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


Re: TXFileStore and local filesystem

Posted by Oliver Zeigermann <oz...@c1-fse.de>.
Jacob Lund wrote:

> No file is created by the filestore!
> 
> German letters and danish letters works fine for me too - and they are also supported by the codepage.
> 
> To avoid these problems, why not save the files in escaped UTF-8 codes - that way wee always know that is will work?!?

Hmmm. Escaped? Like URL encoded? This might blow up the length quite a 
bit. As I know of file systems having at most 255 characters per path 
(like the ones both of us use) this may be undesirable for the default 
application. What about adding a switch for this and make it configurable?

Anyway, to see if this really fixes it, please apply the patch for 
FileResourceManager attached and see if it works.

Thanks again :)

Oliver


Re: TXFileStore and local filesystem

Posted by Michael Smith <ms...@speedlegal.com>.
Jacob Lund wrote:
> Thanks!
> 
> Could you then explain the relationship between UTF-8 and XML. Does is make
> sense to have unescaped UTF-8 encoding in XML or should UTF-8 always be
> escaped when used in XML?
> 

XML frequently uses unicode (more often than not, in fact), though it 
can use other character sets (XML processors all (?) use unicode 
internally). Most often, the unicode is encoded using UTF-8. Using UTF-8 
as-is in XML is sensible and a normal safe approach. Of course, you have 
to do _some_ escaping because it's XML - so things like < and > (which 
are, of course, unicode characters) still need escaping. But that's XML 
escaping (using character references like &gt;), not escaping the UTF-8.

Mike


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


Re: TXFileStore and local filesystem

Posted by Oliver Zeigermann <oz...@c1-fse.de>.
As defined in the standard UTF-8 and UTF-16 as the default character 
sets to be recognized by all spec comliant parsers. This means if you 
have something like that as a header

<?xml version="1.0"?>

you are free to use UTF-8 encoding in this XML document. You can use 
ASCII as well, as UTF-8 is backward compatible to ASCII...

Oliver

Jacob Lund wrote:

> Thanks!
> 
> Could you then explain the relationship between UTF-8 and XML. Does is make
> sense to have unescaped UTF-8 encoding in XML or should UTF-8 always be
> escaped when used in XML?
> 
> /Jacob
> 
> -----Original Message-----
> From: Michael Smith [mailto:msmith@speedlegal.com] 
> Sent: 2. februar 2004 01:04
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
> 
> Jacob Lund wrote:
> 
>>Ok! Let me see if I can explain myself - I am not an expert on this so
>>please correct me if I am wrong!
>>
>>An UTF-8 representation of one character consists of at combination of
>>characters. Now JAVA is a Unicode language and this means that one
> 
> character
> 
>>can represent "any" type of character in the world!
> 
> 
> This is incorrect. Your basic reasoning is more or less right, but your 
> terminology is incorrect in ways that will tend to confuse your thinking 
> (and that of others). You're confusing "character" and "byte" - a better 
> way to phrase this is:
> 	"A UTF-8 representation of one character consists of one or more
> bytes" 
> (note the distinction: a character is an abstract entity, any 
> representation of that character is as a series of bytes).
> 
> 
>>Basically UTF-8 only makes sense when working on an "old" 7 bit asci
> 
> system
> 
>>and you need to use characters not available in the given codepage.
> 
> 
> No. UTF-8 a) makes sense in many places, and b) doesn't specifically 
> help in this case. There's a UTF-7 that you could use for this, but 
> nobody uses UTF-7, and I really don't recommend even bothering to look 
> up the details of it.
> 
> 
>>Both UTF-8 and UTF-16 uses a varying number of bytes to represent one
>>character, where Unicode always uses 32 bit characters (maybe it is 24
> 
> bit).
> 
> This gets somewhat complex.
> Unicode does not use any number of bits for a character. Unicode 
> specifies characters (as "codepoints") as an abstract integer, with no 
> explicit representation.
> 
> THEN, you have an 'encoding' of this integer to give an explicit 
> representation of that abstract codepoint.
> 
> UTF-8 uses a variable number of bytes to represent it (from 1-4, I 
> think? I think the encoding allows for up to 6 bytes, but unicode 
> doesn't actually use more than 4). UTF-8 is very widely used - for 
> example, the overwhelming majority of XML content uses UTF-8, and 
> widespread usage on the internet is generally (though definately not 
> exclusively) migrating towards UTF-8 for most text content.
> 
> UTF-16 _generally_ uses a fixed 2 bytes per character. However, this is 
> complicated by "surrogate pairs", which are a special sort of escape 
> sequence used by unicode to allow access to codepoints outside the BMP 
> (Basic Multilingual Plane). It's worth noting here that Java's 'char' 
> type (and hence Strings, etc.) use UTF-16, but ignore things like 
> surrogates - this is mostly ok, but makes it fairly painful to do really 
> complex multilingual stuff.
> 
> There are two different versions of UTF-16, UTF-16-LE, and UTF-16-BE 
> (little endian and big endian). They are generally distinguished by the 
> use of an explicit BOM (Byte Order Marker, another 'special' unicode 
> character) as the first character of a file. When being used in memory 
> (as Java does) in an application, the character is generally stored in 
> native endianness for whatever platform is being used.
> 
> Some things (notably a lot of microsoft documentation - I haven't seen 
> this usage widely outside of MS software) uses "unicode" to mean "The 
> UTF-16-LE encoding of unicode". This is very confusing. So, for example, 
> when things say that NTFS stores filenames in unicode, it actually means 
> that they are stored in UTF16-LE. However, frequently this distinction 
> does not matter - to many applications, the only important point is that 
> unicode is being used, so the full character repertoire of unicode is 
> available (sometimes restricted only to the BMP).
> 
> 
> There's also UTF-32, which always uses 32 bits per character. It's not 
> widely used - mostly because for almost all applications, it's simply 
> wasteful of memory.
> 
> 
>>This was my understanding of the UTF standards and unicode - am I wrong
>>here?
> 
> 
> I hope I've cleared some things up, here.
> 
> Mike
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> .
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


RE: TXFileStore and local filesystem

Posted by Jacob Lund <jl...@qualiware.com>.
Thanks!

Could you then explain the relationship between UTF-8 and XML. Does is make
sense to have unescaped UTF-8 encoding in XML or should UTF-8 always be
escaped when used in XML?

/Jacob

-----Original Message-----
From: Michael Smith [mailto:msmith@speedlegal.com] 
Sent: 2. februar 2004 01:04
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem

Jacob Lund wrote:
> Ok! Let me see if I can explain myself - I am not an expert on this so
> please correct me if I am wrong!
> 
> An UTF-8 representation of one character consists of at combination of
> characters. Now JAVA is a Unicode language and this means that one
character
> can represent "any" type of character in the world!

This is incorrect. Your basic reasoning is more or less right, but your 
terminology is incorrect in ways that will tend to confuse your thinking 
(and that of others). You're confusing "character" and "byte" - a better 
way to phrase this is:
	"A UTF-8 representation of one character consists of one or more
bytes" 
(note the distinction: a character is an abstract entity, any 
representation of that character is as a series of bytes).

> 
> Basically UTF-8 only makes sense when working on an "old" 7 bit asci
system
> and you need to use characters not available in the given codepage.

No. UTF-8 a) makes sense in many places, and b) doesn't specifically 
help in this case. There's a UTF-7 that you could use for this, but 
nobody uses UTF-7, and I really don't recommend even bothering to look 
up the details of it.

> 
> Both UTF-8 and UTF-16 uses a varying number of bytes to represent one
> character, where Unicode always uses 32 bit characters (maybe it is 24
bit).

This gets somewhat complex.
Unicode does not use any number of bits for a character. Unicode 
specifies characters (as "codepoints") as an abstract integer, with no 
explicit representation.

THEN, you have an 'encoding' of this integer to give an explicit 
representation of that abstract codepoint.

UTF-8 uses a variable number of bytes to represent it (from 1-4, I 
think? I think the encoding allows for up to 6 bytes, but unicode 
doesn't actually use more than 4). UTF-8 is very widely used - for 
example, the overwhelming majority of XML content uses UTF-8, and 
widespread usage on the internet is generally (though definately not 
exclusively) migrating towards UTF-8 for most text content.

UTF-16 _generally_ uses a fixed 2 bytes per character. However, this is 
complicated by "surrogate pairs", which are a special sort of escape 
sequence used by unicode to allow access to codepoints outside the BMP 
(Basic Multilingual Plane). It's worth noting here that Java's 'char' 
type (and hence Strings, etc.) use UTF-16, but ignore things like 
surrogates - this is mostly ok, but makes it fairly painful to do really 
complex multilingual stuff.

There are two different versions of UTF-16, UTF-16-LE, and UTF-16-BE 
(little endian and big endian). They are generally distinguished by the 
use of an explicit BOM (Byte Order Marker, another 'special' unicode 
character) as the first character of a file. When being used in memory 
(as Java does) in an application, the character is generally stored in 
native endianness for whatever platform is being used.

Some things (notably a lot of microsoft documentation - I haven't seen 
this usage widely outside of MS software) uses "unicode" to mean "The 
UTF-16-LE encoding of unicode". This is very confusing. So, for example, 
when things say that NTFS stores filenames in unicode, it actually means 
that they are stored in UTF16-LE. However, frequently this distinction 
does not matter - to many applications, the only important point is that 
unicode is being used, so the full character repertoire of unicode is 
available (sometimes restricted only to the BMP).


There's also UTF-32, which always uses 32 bits per character. It's not 
widely used - mostly because for almost all applications, it's simply 
wasteful of memory.

> 
> This was my understanding of the UTF standards and unicode - am I wrong
> here?

I hope I've cleared some things up, here.

Mike


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


Re: TXFileStore and local filesystem

Posted by Michael Smith <ms...@speedlegal.com>.
Jacob Lund wrote:
> Ok! Let me see if I can explain myself - I am not an expert on this so
> please correct me if I am wrong!
> 
> An UTF-8 representation of one character consists of at combination of
> characters. Now JAVA is a Unicode language and this means that one character
> can represent "any" type of character in the world!

This is incorrect. Your basic reasoning is more or less right, but your 
terminology is incorrect in ways that will tend to confuse your thinking 
(and that of others). You're confusing "character" and "byte" - a better 
way to phrase this is:
	"A UTF-8 representation of one character consists of one or more bytes" 
(note the distinction: a character is an abstract entity, any 
representation of that character is as a series of bytes).

> 
> Basically UTF-8 only makes sense when working on an "old" 7 bit asci system
> and you need to use characters not available in the given codepage.

No. UTF-8 a) makes sense in many places, and b) doesn't specifically 
help in this case. There's a UTF-7 that you could use for this, but 
nobody uses UTF-7, and I really don't recommend even bothering to look 
up the details of it.

> 
> Both UTF-8 and UTF-16 uses a varying number of bytes to represent one
> character, where Unicode always uses 32 bit characters (maybe it is 24 bit).

This gets somewhat complex.
Unicode does not use any number of bits for a character. Unicode 
specifies characters (as "codepoints") as an abstract integer, with no 
explicit representation.

THEN, you have an 'encoding' of this integer to give an explicit 
representation of that abstract codepoint.

UTF-8 uses a variable number of bytes to represent it (from 1-4, I 
think? I think the encoding allows for up to 6 bytes, but unicode 
doesn't actually use more than 4). UTF-8 is very widely used - for 
example, the overwhelming majority of XML content uses UTF-8, and 
widespread usage on the internet is generally (though definately not 
exclusively) migrating towards UTF-8 for most text content.

UTF-16 _generally_ uses a fixed 2 bytes per character. However, this is 
complicated by "surrogate pairs", which are a special sort of escape 
sequence used by unicode to allow access to codepoints outside the BMP 
(Basic Multilingual Plane). It's worth noting here that Java's 'char' 
type (and hence Strings, etc.) use UTF-16, but ignore things like 
surrogates - this is mostly ok, but makes it fairly painful to do really 
complex multilingual stuff.

There are two different versions of UTF-16, UTF-16-LE, and UTF-16-BE 
(little endian and big endian). They are generally distinguished by the 
use of an explicit BOM (Byte Order Marker, another 'special' unicode 
character) as the first character of a file. When being used in memory 
(as Java does) in an application, the character is generally stored in 
native endianness for whatever platform is being used.

Some things (notably a lot of microsoft documentation - I haven't seen 
this usage widely outside of MS software) uses "unicode" to mean "The 
UTF-16-LE encoding of unicode". This is very confusing. So, for example, 
when things say that NTFS stores filenames in unicode, it actually means 
that they are stored in UTF16-LE. However, frequently this distinction 
does not matter - to many applications, the only important point is that 
unicode is being used, so the full character repertoire of unicode is 
available (sometimes restricted only to the BMP).


There's also UTF-32, which always uses 32 bits per character. It's not 
widely used - mostly because for almost all applications, it's simply 
wasteful of memory.

> 
> This was my understanding of the UTF standards and unicode - am I wrong
> here?

I hope I've cleared some things up, here.

Mike


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


Re: TXFileStore and local filesystem

Posted by Michael Smith <ms...@speedlegal.com>.
Julian Reschke wrote:
> Jacob Lund wrote:
> 
>> Ok! Let me see if I can explain myself - I am not an expert on this so
>> please correct me if I am wrong!
>>
>> An UTF-8 representation of one character consists of at combination of
>> characters. Now JAVA is a Unicode language and this means that one 
>> character
> 
> 
> ...of bytes.
> 
>> can represent "any" type of character in the world!
> 
> 
> Almost. Java's characters have only 16 bit, so there is a class of 
> Unicode characters that need to be represented as a sequence of two Java 
> characters.
> 
>> Basically UTF-8 only makes sense when working on an "old" 7 bit asci 
>> system
>> and you need to use characters not available in the given codepage.
> 
> 
> UTF-8 always makes sense when you need backward compatibilty with ASCII.
> 
>> Both UTF-8 and UTF-16 uses a varying number of bytes to represent one
>> character, where Unicode always uses 32 bit characters (maybe it is 24 
>> bit).
> 
> 
> Unicode doesn't "represent" at all. Unicode is just a definition of code 
> points.
> 
> *Encodings* represent Unicode characters as byte sequences, and UTF-8 
> and UTF-16 are some of the Unicode encoding.
> 
>  > ...
> 
> Julian
> 

Julian puts my points far more succinctly :-)
So, if this is enough, no need to read my rather lengthy email on some 
of the gory details of unicode.

Mike




---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


Re: TXFileStore and local filesystem

Posted by Julian Reschke <ju...@gmx.de>.
Jacob Lund wrote:

> Ok! Let me see if I can explain myself - I am not an expert on this so
> please correct me if I am wrong!
> 
> An UTF-8 representation of one character consists of at combination of
> characters. Now JAVA is a Unicode language and this means that one character

...of bytes.

> can represent "any" type of character in the world!

Almost. Java's characters have only 16 bit, so there is a class of 
Unicode characters that need to be represented as a sequence of two Java 
characters.

> Basically UTF-8 only makes sense when working on an "old" 7 bit asci system
> and you need to use characters not available in the given codepage.

UTF-8 always makes sense when you need backward compatibilty with ASCII.

> Both UTF-8 and UTF-16 uses a varying number of bytes to represent one
> character, where Unicode always uses 32 bit characters (maybe it is 24 bit).

Unicode doesn't "represent" at all. Unicode is just a definition of code 
points.

*Encodings* represent Unicode characters as byte sequences, and UTF-8 
and UTF-16 are some of the Unicode encoding.

 > ...

Julian

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760

---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


RE: TXFileStore and local filesystem

Posted by Jacob Lund <jl...@qualiware.com>.
Ok! Let me see if I can explain myself - I am not an expert on this so
please correct me if I am wrong!

An UTF-8 representation of one character consists of at combination of
characters. Now JAVA is a Unicode language and this means that one character
can represent "any" type of character in the world!

Basically UTF-8 only makes sense when working on an "old" 7 bit asci system
and you need to use characters not available in the given codepage.

Both UTF-8 and UTF-16 uses a varying number of bytes to represent one
character, where Unicode always uses 32 bit characters (maybe it is 24 bit).

This was my understanding of the UTF standards and unicode - am I wrong
here?

/Jacob

-----Original Message-----
From: Michael Smith [mailto:msmith@speedlegal.com] 
Sent: 30. januar 2004 01:44
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem

Oliver Zeigermann wrote:
> Jacob Lund wrote:
>> The correct solution might be to convert from UTF-8 to Unicode before
>> storing the data and then change the database scheme to Unicode char 
>> in all
>> fields containing strings.
> 
> 
> Hmmmm. You might be confusing certain things here. On one side there is 
> Unicode having a number for each character. On the other side there is 
> the representation in bytes. Now, UTF-8 *is* Unicode, but on the other 
> side, i.e. the representation in bytes. Thus it does not make too much 
> sense to compare Unicode with UTF-8. Do you agree?

A lot of microsoft's documentation confusingly uses "unicode" when it 
actually means "UTF-16" or "UCS-2" (I can never remember what the 
difference between those two is, and I don't know if it matters). I 
suspect rereading Jacob's mail mentally substituting "UTF-16" for 
"unicode" will make it clearer.

Mike


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


Re: TXFileStore and local filesystem

Posted by Michael Smith <ms...@speedlegal.com>.
Oliver Zeigermann wrote:
> Jacob Lund wrote:
>> The correct solution might be to convert from UTF-8 to Unicode before
>> storing the data and then change the database scheme to Unicode char 
>> in all
>> fields containing strings.
> 
> 
> Hmmmm. You might be confusing certain things here. On one side there is 
> Unicode having a number for each character. On the other side there is 
> the representation in bytes. Now, UTF-8 *is* Unicode, but on the other 
> side, i.e. the representation in bytes. Thus it does not make too much 
> sense to compare Unicode with UTF-8. Do you agree?

A lot of microsoft's documentation confusingly uses "unicode" when it 
actually means "UTF-16" or "UCS-2" (I can never remember what the 
difference between those two is, and I don't know if it matters). I 
suspect rereading Jacob's mail mentally substituting "UTF-16" for 
"unicode" will make it clearer.

Mike


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


Re: TXFileStore and local filesystem

Posted by Oliver Zeigermann <oz...@c1-fse.de>.
Thanks, patch applied :)

Unfortunately, this will not go into the head branch soon as I had 
started the merging process shortly before that patch. It will be there 
next week, though...

Oliver


Jacob Lund wrote:

> Sorry missed the file :)
> 
> -----Original Message-----
> From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
> Sent: 29. januar 2004 10:47
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
> 
> So, I think we have two problems now, I am endangered to mix up:
> 
> (1) The filestore has a problem with file names
> (2) The dabase stores have a problem as well, which is yet unclear to me
> 
> Concerning (1): Could you send the new exception after the patch was 
> applied? At least the file name given in the exceptions head followed by 
> "Can not create resource at " should look different for me to see what 
> might be be going on.
> 
> Concerning (2): Could you describe this a bit more in order to make my 
> rusty mind understand?
> 
> Concerning the Unicode vs. UTF-8 issue: How would you decode a string 
> before storing into the database? Into what? The JDBC method accepts a 
> string, so you will have to pass it one. As I said, you can only 
> decode/encode into/from bytes...
> 
> Oliver
> 
> Jacob Lund wrote:
> 
> 
>>The patch did not make any difference - it still throws the same
> 
> exception!
> 
>>What I meant about converting from UTF-8 to Unicode is that the database
>>driver can handle Unicode. In the filestore UTF-8 is converted to local
>>character set in order to create the files and this is why the filestore
> 
> (I
> 
>>think) has a problem. If the database could store the data in Unicode then
>>there would be no problem. Since java is using Unicode in strings the task
>>would simply be to decode the strings before they are stored in the
> 
> database
> 
>>and then make sure that all text fields in the database are Unicode (or
>>widechar or nchar).
>>
>>Please tell me if I am way off here!
>>
>>/Jacob
>>
>>-----Original Message-----
>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
>>Sent: 29. januar 2004 10:02
>>To: Slide Users Mailing List
>>Subject: Re: TXFileStore and local filesystem
>>
>>Jacob Lund wrote:
>>
>>
>>>No, the filestore works correctly.
>>
>>
>>OK, shall I check in the patch? Did it work for you?
>>
>>
>>
>>>>>From what I can see the filestore converts from UTF-8 to local before it
>>>stores data. This I why UTF-8 works fine for me when I upload files with
>>>Danish letters in the filename, and also why if fails when it stores files
>>>with characters not supported by the codepage.
>>>
>>>Windows XP use Unicode, but in "dos mode" it will use the old codepage
>>>types. The only thing that I can imagine is that java will use this
>>
>>codepage
>>
>>
>>>when it is doing IO operations towards the filesystem. This problem might
>>
>>be
>>
>>
>>>a problem that only appears on windows systems.
>>>
>>>I do not think that the problem is in the fill data into the database that
>>>has a problem. Some place in slide it will convert that data (in this case
>>>the uri) to UTF-8 before it is send to the client. The data stored in the
>>>database is UTF-8, and I believe that java is using Unicode. So the
>>
>>solution
>>
>>
>>>might be to convert data fetched from the database back to Unicode as soon
>>>as it arrives to the store class.
>>>
>>>The correct solution might be to convert from UTF-8 to Unicode before
>>>storing the data and then change the database scheme to Unicode char in
>>
>>all
>>
>>
>>>fields containing strings.
>>
>>
>>Hmmmm. You might be confusing certain things here. On one side there is 
>>Unicode having a number for each character. On the other side there is 
>>the representation in bytes. Now, UTF-8 *is* Unicode, but on the other 
>>side, i.e. the representation in bytes. Thus it does not make too much 
>>sense to compare Unicode with UTF-8. Do you agree?
>>
>>
>>
>>>I am guessing here since I do not have any idea of how the stores are
>>>structured in slide. I you want I would be happy to do some debugging, but
>>
>>I
>>
>>
>>>will need a short introduction to how the datastores are designed in
>>
>>slide.
>>
>>I know, proper documentation is a major problem. I will try to prepare 
>>something like a short introduction and will post it to the list as soon 
>>as it is done. This may take a while though :(
>>
>>Oliver
>>
>>
>>
>>>/Jacob
>>>
>>>-----Original Message-----
>>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
>>>Sent: 28. januar 2004 16:40
>>>To: Slide Users Mailing List
>>>Subject: Re: TXFileStore and local filesystem
>>>
>>>Jacob Lund wrote:
>>>
>>>
>>>
>>>
>>>>Sorry about that - yes I am talking about the URI!
>>>>
>>>>If I look in a record in the database, each Danish character is stored as
>>>>two "funny looking" characters corresponding to the unescaped UTF-8
>>>
>>>encoded
>>>
>>>
>>>
>>>>version - so this looks correct! However when I do a propfind on the
>>>>collection I which I place this file, then I get something like this
>>>>/files/%C3%83%C2%B8 - and this should have been representing one Danish
>>>>character. If I take the above and convert from UTF8 to my local, then I
>>>
>>>get
>>>
>>>
>>>
>>>>what is store in the database - If I then convert from UTF8 to local
> 
> again
> 
>>>>the I get the correct Danish letter.
>>>
>>>
>>>I could not find anything that might have converted the URI strings. 
>>>They are just plainly filled into the SQL like in
>>>
>>>
>>>
>>>
>>>>                      "select 1 from OBJECT o, URI u where
>>>
>>>o.URI_ID=u.URI_ID and u.URI_STRING=?");
>>>
>>>
>>>
>>>>              statement.setString(1, uri.toString());
>>>
>>>
>>>So, maybe this is a more general problem...
>>>
>>>
>>>
>>>
>>>>I seem that slide converts the URI's from the db to UTF8, but they are
>>>>already stored in unescaped UTF-8!
>>>
>>>
>>>Does this happen with the file store as well?
>>>
>>>Oliver
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>
>>>
>>>.
>>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>.
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> ------------------------------------------------------------------------
> 
> /**********************************************************************/
> /******    DROP SLIDE TABLES                                     ******/
> /**********************************************************************/
> 
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[OBJECT]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[OBJECT]
> GO
> 
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[BINDING]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[BINDING]
> GO
> 
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[PARENT_BINDING]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[PARENT_BINDING]
> GO
> 
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[CHILDREN]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[CHILDREN]
> GO
> 
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[LINKS]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[LINKS]
> GO
> 
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[LOCKS]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[LOCKS]
> GO
> 
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[VERSION_CONTENT]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[VERSION_CONTENT]
> GO
> 
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[PROPERTIES]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[PROPERTIES]
> GO
> 
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[PERMISSIONS]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[PERMISSIONS]
> GO
> 
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[VERSION_PREDS]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[VERSION_PREDS]
> GO
> 
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[VERSION_LABELS]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[VERSION_LABELS]
> GO
> 
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[VERSION_HISTORY]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[VERSION_HISTORY]
> GO
> 
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[VERSION]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[VERSION]
> GO
> 
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[BRANCH]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[BRANCH]
> GO
> 
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[LABEL]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[LABEL]
> GO
> 
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[URI]') AND OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[URI]
> GO
> 
> /**********************************************************************/
> /******    DROP EXISTING USER DEFINED DATA TYPES                 ******/
> /**********************************************************************/
> 
> sp_droptype id_type
> GO
> 
> sp_droptype uri_str_type
> GO
> 
> sp_droptype revision_no_type
> GO
> 
> sp_droptype hash_type
> GO
> 
> sp_droptype literal_str_type
> GO
> 
> sp_droptype value_str_type
> GO
> 
> 
> /**********************************************************************/
> /******    ADD USER DEFINED DATA TYPES                           ******/
> /**********************************************************************/
> 
> sp_addtype id_type,           bigint
> GO
> 
> sp_addtype uri_str_type,     "nvarchar(800)"
> GO
> 
> sp_addtype revision_no_type, "nvarchar(20)"
> GO
> 
> sp_addtype hash_type,         bigint
> GO
> 
> sp_addtype literal_str_type, "nvarchar(3000)"
> GO
> 
> sp_addtype value_str_type,   "nvarchar(255)"
> GO
> 
> /**********************************************************************/
> /******    CREATE SLIDE TABLES                                   ******/
> /**********************************************************************/
> 
> CREATE TABLE dbo.URI (
>     URI_ID          id_type               IDENTITY  UNIQUE  NOT NULL,
>     URI_STRING      uri_str_type          UNIQUE  NOT NULL,
>  --   UNIQUE NONCLUSTERED (URI_ID)
> )
> GO
> 
> CREATE INDEX XUID
> 	ON URI(URI_ID) 
> GO
> 
> CREATE INDEX XUSTRING
> 	ON URI(URI_STRING) 
> GO
> 
> CREATE TABLE dbo.OBJECT (
>     URI_ID          id_type               PRIMARY KEY,
>     CLASS_NAME      nvarchar(255)          NOT NULL,
>     CONSTRAINT      FK_OBJECT_URI_ID
>        FOREIGN KEY (URI_ID)
>        REFERENCES   URI (URI_ID)
> )
> GO    
> 
> CREATE TABLE dbo.BINDING (
>     URI_ID          id_type               NOT NULL
>         REFERENCES  URI (URI_ID),
>     NAME            uri_str_type          NOT NULL,
>     CHILD_UURI_ID    id_type              NOT NULL
>         REFERENCES  URI (URI_ID),
>     UNIQUE CLUSTERED (URI_ID, NAME, CHILD_UURI_ID)
> )
> GO
> 
> CREATE TABLE dbo.PARENT_BINDING (
>     URI_ID          id_type               NOT NULL
>         REFERENCES  URI (URI_ID),
>     NAME            uri_str_type          NOT NULL,
>     PARENT_UURI_ID    id_type             NOT NULL
>         REFERENCES  URI (URI_ID),
>     UNIQUE CLUSTERED (URI_ID, NAME, PARENT_UURI_ID)
> ) 
> GO
> 
> CREATE TABLE dbo.LINKS (
>     URI_ID          id_type               NOT NULL  FOREIGN KEY
>         REFERENCES  URI (URI_ID),
>     LINK_TO_ID      id_type               NOT NULL  FOREIGN KEY
>         REFERENCES  URI (URI_ID),
>     --UNIQUE CLUSTERED (URI_ID, LINK_TO_ID)
> )
> GO
> 
> CREATE INDEX XURI_ID
> 	ON LINKS(URI_ID) 
> GO
> 
> 
> CREATE INDEX XLINK_TO_ID
> 	ON LINKS(LINK_TO_ID) 
> GO
> 
> CREATE TABLE dbo.LOCKS (
>     LOCK_ID         id_type               PRIMARY KEY,
>     OBJECT_ID       id_type               NOT NULL  FOREIGN KEY
>        REFERENCES   URI (URI_ID),
>     SUBJECT_ID      id_type               NOT NULL  FOREIGN KEY
>        REFERENCES   URI (URI_ID),
>     TYPE_ID         id_type               NOT NULL  FOREIGN KEY
>        REFERENCES   URI (URI_ID),
>     EXPIRATION_DATE numeric(14, 0)   	  NOT NULL,
>     IS_INHERITABLE  bit                   NOT NULL, 
>     IS_EXCLUSIVE    bit                   NOT NULL,
>     OWNER           nvarchar(255),
>     CONSTRAINT      FK_LOCKS_LOCK_ID
>        FOREIGN KEY (LOCK_ID)
>        REFERENCES   URI (URI_ID)
> )
> GO
> 
> CREATE TABLE dbo.BRANCH ( 
>     BRANCH_ID       id_type               IDENTITY  UNIQUE NOT NULL,
>     BRANCH_STRING   nvarchar(255)          UNIQUE NOT NULL,
>     UNIQUE NONCLUSTERED (BRANCH_ID)
> )
> GO
> 
> CREATE TABLE dbo.LABEL (
>     LABEL_ID        id_type               IDENTITY  UNIQUE NOT NULL,
>     LABEL_STRING    nvarchar(255)          NOT NULL,
>     UNIQUE NONCLUSTERED (LABEL_ID)
> )
> GO
> 
> CREATE TABLE dbo.VERSION (
>     URI_ID          id_type               PRIMARY KEY,
>     IS_VERSIONED    bit                   NOT NULL,    
>     CONSTRAINT      FK_VERSION_URI_ID
>        FOREIGN KEY (URI_ID)
>        REFERENCES   URI (URI_ID)
> )
> GO
> 
> CREATE TABLE dbo.VERSION_HISTORY (
>     VERSION_ID      id_type               IDENTITY  UNIQUE NOT NULL,
>     URI_ID          id_type               NOT NULL  FOREIGN KEY
>        REFERENCES   VERSION (URI_ID),
>     BRANCH_ID       id_type               NOT NULL  FOREIGN KEY
>        REFERENCES   BRANCH (BRANCH_ID),
>     REVISION_NO     nVARCHAR(20)	       	  NOT NULL,
>     --UNIQUE CLUSTERED (URI_ID, BRANCH_ID, REVISION_NO)
> )
> GO
> CREATE INDEX XVERSION_HISTORY1 
> 	ON VERSION_HISTORY(URI_ID, BRANCH_ID, REVISION_NO) 
> GO
> 
> CREATE TABLE dbo.VERSION_PREDS (
>     VERSION_ID         id_type            NOT NULL  FOREIGN KEY 
>         REFERENCES  VERSION_HISTORY (VERSION_ID),
>     PREDECESSOR_ID     id_type            NOT NULL  FOREIGN KEY
>         REFERENCES  VERSION_HISTORY (VERSION_ID),
>     UNIQUE CLUSTERED (VERSION_ID, PREDECESSOR_ID)
> )
> GO
> CREATE INDEX XVERSION_PREDS1 
> 	ON VERSION_PREDS(VERSION_ID, PREDECESSOR_ID) 
> GO
> 
> CREATE TABLE dbo.VERSION_LABELS (
>     VERSION_ID         id_type            NOT NULL  FOREIGN KEY
>         REFERENCES  VERSION_HISTORY (VERSION_ID),
>     LABEL_ID           id_type            NOT NULL  FOREIGN KEY
>         REFERENCES  LABEL (LABEL_ID), 
>     UNIQUE CLUSTERED (VERSION_ID, LABEL_ID)
> )
> GO
> 
> CREATE TABLE dbo.VERSION_CONTENT (
>     VERSION_ID         id_type            PRIMARY KEY,
>     CONTENT            image              NOT NULL,
>     CONSTRAINT FK_VC_VERSION_ID 
>         FOREIGN KEY (VERSION_ID)
>         REFERENCES  VERSION_HISTORY (VERSION_ID),
> )
> GO
> 
> CREATE TABLE dbo.PROPERTIES (
>     VERSION_ID         id_type            NOT NULL  FOREIGN KEY
>         REFERENCES  VERSION_HISTORY (VERSION_ID),    
>     PROPERTY_NAMESPACE nvarchar(50)        NOT NULL, 
>     PROPERTY_NAME      nvarchar(50)        NOT NULL,        
>     PROPERTY_VALUE     nvarchar(255)       NOT NULL,
>     PROPERTY_TYPE      nvarchar(50)        NOT NULL, 
>     IS_PROTECTED       bit                NOT NULL,
>     UNIQUE CLUSTERED (VERSION_ID, PROPERTY_NAMESPACE, PROPERTY_NAME)
> )
> GO
> 
> CREATE TABLE dbo.PERMISSIONS (
>     OBJECT_ID       id_type               NOT NULL  FOREIGN KEY
>        REFERENCES   URI (URI_ID),
>     SUBJECT_ID      id_type               NOT NULL  FOREIGN KEY
>        REFERENCES   URI (URI_ID),
>     ACTION_ID       id_type               NOT NULL  FOREIGN KEY
>        REFERENCES   URI (URI_ID),
>     VERSION_NO      nVARCHAR(20)           NULL,
>     IS_INHERITABLE  bit                   NOT NULL,
>     IS_NEGATIVE     bit                   NOT NULL,
>     -- Both order and sequence would be more suitable, but can not be used
>     SUCCESSION      int                   NOT NULL,
>     UNIQUE CLUSTERED (OBJECT_ID, SUBJECT_ID, ACTION_ID),
>     UNIQUE (OBJECT_ID, SUCCESSION)
> )
> GO
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


RE: TXFileStore and local filesystem

Posted by Jacob Lund <jl...@qualiware.com>.
Sorry missed the file :)

-----Original Message-----
From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
Sent: 29. januar 2004 10:47
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem

So, I think we have two problems now, I am endangered to mix up:

(1) The filestore has a problem with file names
(2) The dabase stores have a problem as well, which is yet unclear to me

Concerning (1): Could you send the new exception after the patch was 
applied? At least the file name given in the exceptions head followed by 
"Can not create resource at " should look different for me to see what 
might be be going on.

Concerning (2): Could you describe this a bit more in order to make my 
rusty mind understand?

Concerning the Unicode vs. UTF-8 issue: How would you decode a string 
before storing into the database? Into what? The JDBC method accepts a 
string, so you will have to pass it one. As I said, you can only 
decode/encode into/from bytes...

Oliver

Jacob Lund wrote:

> The patch did not make any difference - it still throws the same
exception!
> 
> What I meant about converting from UTF-8 to Unicode is that the database
> driver can handle Unicode. In the filestore UTF-8 is converted to local
> character set in order to create the files and this is why the filestore
(I
> think) has a problem. If the database could store the data in Unicode then
> there would be no problem. Since java is using Unicode in strings the task
> would simply be to decode the strings before they are stored in the
database
> and then make sure that all text fields in the database are Unicode (or
> widechar or nchar).
> 
> Please tell me if I am way off here!
> 
> /Jacob
> 
> -----Original Message-----
> From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
> Sent: 29. januar 2004 10:02
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
> 
> Jacob Lund wrote:
> 
>>No, the filestore works correctly.
> 
> 
> OK, shall I check in the patch? Did it work for you?
> 
> 
>>>>From what I can see the filestore converts from UTF-8 to local before it
>>stores data. This I why UTF-8 works fine for me when I upload files with
>>Danish letters in the filename, and also why if fails when it stores files
>>with characters not supported by the codepage.
>>
>>Windows XP use Unicode, but in "dos mode" it will use the old codepage
>>types. The only thing that I can imagine is that java will use this
> 
> codepage
> 
>>when it is doing IO operations towards the filesystem. This problem might
> 
> be
> 
>>a problem that only appears on windows systems.
>>
>>I do not think that the problem is in the fill data into the database that
>>has a problem. Some place in slide it will convert that data (in this case
>>the uri) to UTF-8 before it is send to the client. The data stored in the
>>database is UTF-8, and I believe that java is using Unicode. So the
> 
> solution
> 
>>might be to convert data fetched from the database back to Unicode as soon
>>as it arrives to the store class.
>>
>>The correct solution might be to convert from UTF-8 to Unicode before
>>storing the data and then change the database scheme to Unicode char in
> 
> all
> 
>>fields containing strings.
> 
> 
> Hmmmm. You might be confusing certain things here. On one side there is 
> Unicode having a number for each character. On the other side there is 
> the representation in bytes. Now, UTF-8 *is* Unicode, but on the other 
> side, i.e. the representation in bytes. Thus it does not make too much 
> sense to compare Unicode with UTF-8. Do you agree?
> 
> 
>>I am guessing here since I do not have any idea of how the stores are
>>structured in slide. I you want I would be happy to do some debugging, but
> 
> I
> 
>>will need a short introduction to how the datastores are designed in
> 
> slide.
> 
> I know, proper documentation is a major problem. I will try to prepare 
> something like a short introduction and will post it to the list as soon 
> as it is done. This may take a while though :(
> 
> Oliver
> 
> 
>>/Jacob
>>
>>-----Original Message-----
>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
>>Sent: 28. januar 2004 16:40
>>To: Slide Users Mailing List
>>Subject: Re: TXFileStore and local filesystem
>>
>>Jacob Lund wrote:
>>
>>
>>
>>>Sorry about that - yes I am talking about the URI!
>>>
>>>If I look in a record in the database, each Danish character is stored as
>>>two "funny looking" characters corresponding to the unescaped UTF-8
>>
>>encoded
>>
>>
>>>version - so this looks correct! However when I do a propfind on the
>>>collection I which I place this file, then I get something like this
>>>/files/%C3%83%C2%B8 - and this should have been representing one Danish
>>>character. If I take the above and convert from UTF8 to my local, then I
>>
>>get
>>
>>
>>>what is store in the database - If I then convert from UTF8 to local
again
>>>the I get the correct Danish letter.
>>
>>
>>I could not find anything that might have converted the URI strings. 
>>They are just plainly filled into the SQL like in
>>
>>
>>
>>>                       "select 1 from OBJECT o, URI u where
>>
>>o.URI_ID=u.URI_ID and u.URI_STRING=?");
>>
>>
>>>               statement.setString(1, uri.toString());
>>
>>
>>So, maybe this is a more general problem...
>>
>>
>>
>>>I seem that slide converts the URI's from the db to UTF8, but they are
>>>already stored in unescaped UTF-8!
>>
>>
>>Does this happen with the file store as well?
>>
>>Oliver
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>.
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> .
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org

Re: TXFileStore and local filesystem

Posted by Oliver Zeigermann <oz...@c1-fse.de>.
No rights needed. First create the report. In the next step *add* the 
attachment using "Attachments: 	Create a new attachment " link in the 
top section of the page.

Bugzilla is confusing sometimes...

Oliver

Jacob Lund wrote:

> I just created an account on bugzilla!
> 
> I can see how to create a bug report but I cannot see how to attach a file!
> 
> Am I dumber than average or do I need some special rights to attach files?
> 
> /Jacob 
> 
> -----Original Message-----
> From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
> Sent: 29. januar 2004 13:12
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
> 
> The old problem with attachments :( It is missing...
> 
> Could you try to create a new bugzilla entry and add the attachment there?
> 
> Thanks :)
> 
> Oliver
> 
> Jacob Lund wrote:
> 
> 
>>First of all - the patch you just checked in for the txfilestore works
> 
> fine
> 
>>:-) 
>>
>>Some of my problems with the SQLServerAdapter was my fault - forgot to set
>>encoding to UTF-8 in slide.properties.
>>
>>However to get the SQL store working with Russian and Danish characters at
>>the same time I had to change the database scheme. It turns out that slide
>>does send the Unicode characters to the database but the database scheme
>>user 8bit char in the string fields.
>>
>>I have attached the new scheme - all I did was change varchar to nvarchar.
>>Now it works fine :-)
>>
>>/Jacob
>>
>>-----Original Message-----
>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
>>Sent: 29. januar 2004 10:47
>>To: Slide Users Mailing List
>>Subject: Re: TXFileStore and local filesystem
>>
>>So, I think we have two problems now, I am endangered to mix up:
>>
>>(1) The filestore has a problem with file names
>>(2) The dabase stores have a problem as well, which is yet unclear to me
>>
>>Concerning (1): Could you send the new exception after the patch was 
>>applied? At least the file name given in the exceptions head followed by 
>>"Can not create resource at " should look different for me to see what 
>>might be be going on.
>>
>>Concerning (2): Could you describe this a bit more in order to make my 
>>rusty mind understand?
>>
>>Concerning the Unicode vs. UTF-8 issue: How would you decode a string 
>>before storing into the database? Into what? The JDBC method accepts a 
>>string, so you will have to pass it one. As I said, you can only 
>>decode/encode into/from bytes...
>>
>>Oliver
>>
>>Jacob Lund wrote:
>>
>>
>>
>>>The patch did not make any difference - it still throws the same
>>
>>exception!
>>
>>
>>>What I meant about converting from UTF-8 to Unicode is that the database
>>>driver can handle Unicode. In the filestore UTF-8 is converted to local
>>>character set in order to create the files and this is why the filestore
>>
>>(I
>>
>>
>>>think) has a problem. If the database could store the data in Unicode then
>>>there would be no problem. Since java is using Unicode in strings the task
>>>would simply be to decode the strings before they are stored in the
>>
>>database
>>
>>
>>>and then make sure that all text fields in the database are Unicode (or
>>>widechar or nchar).
>>>
>>>Please tell me if I am way off here!
>>>
>>>/Jacob
>>>
>>>-----Original Message-----
>>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
>>>Sent: 29. januar 2004 10:02
>>>To: Slide Users Mailing List
>>>Subject: Re: TXFileStore and local filesystem
>>>
>>>Jacob Lund wrote:
>>>
>>>
>>>
>>>>No, the filestore works correctly.
>>>
>>>
>>>OK, shall I check in the patch? Did it work for you?
>>>
>>>
>>>
>>>
>>>>>>From what I can see the filestore converts from UTF-8 to local before it
>>>>stores data. This I why UTF-8 works fine for me when I upload files with
>>>>Danish letters in the filename, and also why if fails when it stores
> 
> files
> 
>>>>with characters not supported by the codepage.
>>>>
>>>>Windows XP use Unicode, but in "dos mode" it will use the old codepage
>>>>types. The only thing that I can imagine is that java will use this
>>>
>>>codepage
>>>
>>>
>>>
>>>>when it is doing IO operations towards the filesystem. This problem might
>>>
>>>be
>>>
>>>
>>>
>>>>a problem that only appears on windows systems.
>>>>
>>>>I do not think that the problem is in the fill data into the database
> 
> that
> 
>>>>has a problem. Some place in slide it will convert that data (in this
> 
> case
> 
>>>>the uri) to UTF-8 before it is send to the client. The data stored in the
>>>>database is UTF-8, and I believe that java is using Unicode. So the
>>>
>>>solution
>>>
>>>
>>>
>>>>might be to convert data fetched from the database back to Unicode as
> 
> soon
> 
>>>>as it arrives to the store class.
>>>>
>>>>The correct solution might be to convert from UTF-8 to Unicode before
>>>>storing the data and then change the database scheme to Unicode char in
>>>
>>>all
>>>
>>>
>>>
>>>>fields containing strings.
>>>
>>>
>>>Hmmmm. You might be confusing certain things here. On one side there is 
>>>Unicode having a number for each character. On the other side there is 
>>>the representation in bytes. Now, UTF-8 *is* Unicode, but on the other 
>>>side, i.e. the representation in bytes. Thus it does not make too much 
>>>sense to compare Unicode with UTF-8. Do you agree?
>>>
>>>
>>>
>>>
>>>>I am guessing here since I do not have any idea of how the stores are
>>>>structured in slide. I you want I would be happy to do some debugging,
> 
> but
> 
>>>I
>>>
>>>
>>>
>>>>will need a short introduction to how the datastores are designed in
>>>
>>>slide.
>>>
>>>I know, proper documentation is a major problem. I will try to prepare 
>>>something like a short introduction and will post it to the list as soon 
>>>as it is done. This may take a while though :(
>>>
>>>Oliver
>>>
>>>
>>>
>>>
>>>>/Jacob
>>>>
>>>>-----Original Message-----
>>>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
>>>>Sent: 28. januar 2004 16:40
>>>>To: Slide Users Mailing List
>>>>Subject: Re: TXFileStore and local filesystem
>>>>
>>>>Jacob Lund wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>Sorry about that - yes I am talking about the URI!
>>>>>
>>>>>If I look in a record in the database, each Danish character is stored
> 
> as
> 
>>>>>two "funny looking" characters corresponding to the unescaped UTF-8
>>>>
>>>>encoded
>>>>
>>>>
>>>>
>>>>
>>>>>version - so this looks correct! However when I do a propfind on the
>>>>>collection I which I place this file, then I get something like this
>>>>>/files/%C3%83%C2%B8 - and this should have been representing one Danish
>>>>>character. If I take the above and convert from UTF8 to my local, then I
>>>>
>>>>get
>>>>
>>>>
>>>>
>>>>
>>>>>what is store in the database - If I then convert from UTF8 to local
>>
>>again
>>
>>
>>>>>the I get the correct Danish letter.
>>>>
>>>>
>>>>I could not find anything that might have converted the URI strings. 
>>>>They are just plainly filled into the SQL like in
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>                     "select 1 from OBJECT o, URI u where
>>>>
>>>>o.URI_ID=u.URI_ID and u.URI_STRING=?");
>>>>
>>>>
>>>>
>>>>
>>>>>             statement.setString(1, uri.toString());
>>>>
>>>>
>>>>So, maybe this is a more general problem...
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>I seem that slide converts the URI's from the db to UTF8, but they are
>>>>>already stored in unescaped UTF-8!
>>>>
>>>>
>>>>Does this happen with the file store as well?
>>>>
>>>>Oliver
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>>
>>>>
>>>>.
>>>>
>>>
>>>
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>
>>>
>>>.
>>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>.
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> .
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


RE: TXFileStore and local filesystem

Posted by Jacob Lund <jl...@qualiware.com>.
I just created an account on bugzilla!

I can see how to create a bug report but I cannot see how to attach a file!

Am I dumber than average or do I need some special rights to attach files?

/Jacob 

-----Original Message-----
From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
Sent: 29. januar 2004 13:12
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem

The old problem with attachments :( It is missing...

Could you try to create a new bugzilla entry and add the attachment there?

Thanks :)

Oliver

Jacob Lund wrote:

> First of all - the patch you just checked in for the txfilestore works
fine
> :-) 
> 
> Some of my problems with the SQLServerAdapter was my fault - forgot to set
> encoding to UTF-8 in slide.properties.
> 
> However to get the SQL store working with Russian and Danish characters at
> the same time I had to change the database scheme. It turns out that slide
> does send the Unicode characters to the database but the database scheme
> user 8bit char in the string fields.
> 
> I have attached the new scheme - all I did was change varchar to nvarchar.
> Now it works fine :-)
> 
> /Jacob
> 
> -----Original Message-----
> From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
> Sent: 29. januar 2004 10:47
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
> 
> So, I think we have two problems now, I am endangered to mix up:
> 
> (1) The filestore has a problem with file names
> (2) The dabase stores have a problem as well, which is yet unclear to me
> 
> Concerning (1): Could you send the new exception after the patch was 
> applied? At least the file name given in the exceptions head followed by 
> "Can not create resource at " should look different for me to see what 
> might be be going on.
> 
> Concerning (2): Could you describe this a bit more in order to make my 
> rusty mind understand?
> 
> Concerning the Unicode vs. UTF-8 issue: How would you decode a string 
> before storing into the database? Into what? The JDBC method accepts a 
> string, so you will have to pass it one. As I said, you can only 
> decode/encode into/from bytes...
> 
> Oliver
> 
> Jacob Lund wrote:
> 
> 
>>The patch did not make any difference - it still throws the same
> 
> exception!
> 
>>What I meant about converting from UTF-8 to Unicode is that the database
>>driver can handle Unicode. In the filestore UTF-8 is converted to local
>>character set in order to create the files and this is why the filestore
> 
> (I
> 
>>think) has a problem. If the database could store the data in Unicode then
>>there would be no problem. Since java is using Unicode in strings the task
>>would simply be to decode the strings before they are stored in the
> 
> database
> 
>>and then make sure that all text fields in the database are Unicode (or
>>widechar or nchar).
>>
>>Please tell me if I am way off here!
>>
>>/Jacob
>>
>>-----Original Message-----
>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
>>Sent: 29. januar 2004 10:02
>>To: Slide Users Mailing List
>>Subject: Re: TXFileStore and local filesystem
>>
>>Jacob Lund wrote:
>>
>>
>>>No, the filestore works correctly.
>>
>>
>>OK, shall I check in the patch? Did it work for you?
>>
>>
>>
>>>>>From what I can see the filestore converts from UTF-8 to local before it
>>>stores data. This I why UTF-8 works fine for me when I upload files with
>>>Danish letters in the filename, and also why if fails when it stores
files
>>>with characters not supported by the codepage.
>>>
>>>Windows XP use Unicode, but in "dos mode" it will use the old codepage
>>>types. The only thing that I can imagine is that java will use this
>>
>>codepage
>>
>>
>>>when it is doing IO operations towards the filesystem. This problem might
>>
>>be
>>
>>
>>>a problem that only appears on windows systems.
>>>
>>>I do not think that the problem is in the fill data into the database
that
>>>has a problem. Some place in slide it will convert that data (in this
case
>>>the uri) to UTF-8 before it is send to the client. The data stored in the
>>>database is UTF-8, and I believe that java is using Unicode. So the
>>
>>solution
>>
>>
>>>might be to convert data fetched from the database back to Unicode as
soon
>>>as it arrives to the store class.
>>>
>>>The correct solution might be to convert from UTF-8 to Unicode before
>>>storing the data and then change the database scheme to Unicode char in
>>
>>all
>>
>>
>>>fields containing strings.
>>
>>
>>Hmmmm. You might be confusing certain things here. On one side there is 
>>Unicode having a number for each character. On the other side there is 
>>the representation in bytes. Now, UTF-8 *is* Unicode, but on the other 
>>side, i.e. the representation in bytes. Thus it does not make too much 
>>sense to compare Unicode with UTF-8. Do you agree?
>>
>>
>>
>>>I am guessing here since I do not have any idea of how the stores are
>>>structured in slide. I you want I would be happy to do some debugging,
but
>>
>>I
>>
>>
>>>will need a short introduction to how the datastores are designed in
>>
>>slide.
>>
>>I know, proper documentation is a major problem. I will try to prepare 
>>something like a short introduction and will post it to the list as soon 
>>as it is done. This may take a while though :(
>>
>>Oliver
>>
>>
>>
>>>/Jacob
>>>
>>>-----Original Message-----
>>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
>>>Sent: 28. januar 2004 16:40
>>>To: Slide Users Mailing List
>>>Subject: Re: TXFileStore and local filesystem
>>>
>>>Jacob Lund wrote:
>>>
>>>
>>>
>>>
>>>>Sorry about that - yes I am talking about the URI!
>>>>
>>>>If I look in a record in the database, each Danish character is stored
as
>>>>two "funny looking" characters corresponding to the unescaped UTF-8
>>>
>>>encoded
>>>
>>>
>>>
>>>>version - so this looks correct! However when I do a propfind on the
>>>>collection I which I place this file, then I get something like this
>>>>/files/%C3%83%C2%B8 - and this should have been representing one Danish
>>>>character. If I take the above and convert from UTF8 to my local, then I
>>>
>>>get
>>>
>>>
>>>
>>>>what is store in the database - If I then convert from UTF8 to local
> 
> again
> 
>>>>the I get the correct Danish letter.
>>>
>>>
>>>I could not find anything that might have converted the URI strings. 
>>>They are just plainly filled into the SQL like in
>>>
>>>
>>>
>>>
>>>>                      "select 1 from OBJECT o, URI u where
>>>
>>>o.URI_ID=u.URI_ID and u.URI_STRING=?");
>>>
>>>
>>>
>>>>              statement.setString(1, uri.toString());
>>>
>>>
>>>So, maybe this is a more general problem...
>>>
>>>
>>>
>>>
>>>>I seem that slide converts the URI's from the db to UTF8, but they are
>>>>already stored in unescaped UTF-8!
>>>
>>>
>>>Does this happen with the file store as well?
>>>
>>>Oliver
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>
>>>
>>>.
>>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>.
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> .
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


Re: TXFileStore and local filesystem

Posted by Oliver Zeigermann <oz...@c1-fse.de>.
The old problem with attachments :( It is missing...

Could you try to create a new bugzilla entry and add the attachment there?

Thanks :)

Oliver

Jacob Lund wrote:

> First of all - the patch you just checked in for the txfilestore works fine
> :-) 
> 
> Some of my problems with the SQLServerAdapter was my fault - forgot to set
> encoding to UTF-8 in slide.properties.
> 
> However to get the SQL store working with Russian and Danish characters at
> the same time I had to change the database scheme. It turns out that slide
> does send the Unicode characters to the database but the database scheme
> user 8bit char in the string fields.
> 
> I have attached the new scheme - all I did was change varchar to nvarchar.
> Now it works fine :-)
> 
> /Jacob
> 
> -----Original Message-----
> From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
> Sent: 29. januar 2004 10:47
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
> 
> So, I think we have two problems now, I am endangered to mix up:
> 
> (1) The filestore has a problem with file names
> (2) The dabase stores have a problem as well, which is yet unclear to me
> 
> Concerning (1): Could you send the new exception after the patch was 
> applied? At least the file name given in the exceptions head followed by 
> "Can not create resource at " should look different for me to see what 
> might be be going on.
> 
> Concerning (2): Could you describe this a bit more in order to make my 
> rusty mind understand?
> 
> Concerning the Unicode vs. UTF-8 issue: How would you decode a string 
> before storing into the database? Into what? The JDBC method accepts a 
> string, so you will have to pass it one. As I said, you can only 
> decode/encode into/from bytes...
> 
> Oliver
> 
> Jacob Lund wrote:
> 
> 
>>The patch did not make any difference - it still throws the same
> 
> exception!
> 
>>What I meant about converting from UTF-8 to Unicode is that the database
>>driver can handle Unicode. In the filestore UTF-8 is converted to local
>>character set in order to create the files and this is why the filestore
> 
> (I
> 
>>think) has a problem. If the database could store the data in Unicode then
>>there would be no problem. Since java is using Unicode in strings the task
>>would simply be to decode the strings before they are stored in the
> 
> database
> 
>>and then make sure that all text fields in the database are Unicode (or
>>widechar or nchar).
>>
>>Please tell me if I am way off here!
>>
>>/Jacob
>>
>>-----Original Message-----
>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
>>Sent: 29. januar 2004 10:02
>>To: Slide Users Mailing List
>>Subject: Re: TXFileStore and local filesystem
>>
>>Jacob Lund wrote:
>>
>>
>>>No, the filestore works correctly.
>>
>>
>>OK, shall I check in the patch? Did it work for you?
>>
>>
>>
>>>>>From what I can see the filestore converts from UTF-8 to local before it
>>>stores data. This I why UTF-8 works fine for me when I upload files with
>>>Danish letters in the filename, and also why if fails when it stores files
>>>with characters not supported by the codepage.
>>>
>>>Windows XP use Unicode, but in "dos mode" it will use the old codepage
>>>types. The only thing that I can imagine is that java will use this
>>
>>codepage
>>
>>
>>>when it is doing IO operations towards the filesystem. This problem might
>>
>>be
>>
>>
>>>a problem that only appears on windows systems.
>>>
>>>I do not think that the problem is in the fill data into the database that
>>>has a problem. Some place in slide it will convert that data (in this case
>>>the uri) to UTF-8 before it is send to the client. The data stored in the
>>>database is UTF-8, and I believe that java is using Unicode. So the
>>
>>solution
>>
>>
>>>might be to convert data fetched from the database back to Unicode as soon
>>>as it arrives to the store class.
>>>
>>>The correct solution might be to convert from UTF-8 to Unicode before
>>>storing the data and then change the database scheme to Unicode char in
>>
>>all
>>
>>
>>>fields containing strings.
>>
>>
>>Hmmmm. You might be confusing certain things here. On one side there is 
>>Unicode having a number for each character. On the other side there is 
>>the representation in bytes. Now, UTF-8 *is* Unicode, but on the other 
>>side, i.e. the representation in bytes. Thus it does not make too much 
>>sense to compare Unicode with UTF-8. Do you agree?
>>
>>
>>
>>>I am guessing here since I do not have any idea of how the stores are
>>>structured in slide. I you want I would be happy to do some debugging, but
>>
>>I
>>
>>
>>>will need a short introduction to how the datastores are designed in
>>
>>slide.
>>
>>I know, proper documentation is a major problem. I will try to prepare 
>>something like a short introduction and will post it to the list as soon 
>>as it is done. This may take a while though :(
>>
>>Oliver
>>
>>
>>
>>>/Jacob
>>>
>>>-----Original Message-----
>>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
>>>Sent: 28. januar 2004 16:40
>>>To: Slide Users Mailing List
>>>Subject: Re: TXFileStore and local filesystem
>>>
>>>Jacob Lund wrote:
>>>
>>>
>>>
>>>
>>>>Sorry about that - yes I am talking about the URI!
>>>>
>>>>If I look in a record in the database, each Danish character is stored as
>>>>two "funny looking" characters corresponding to the unescaped UTF-8
>>>
>>>encoded
>>>
>>>
>>>
>>>>version - so this looks correct! However when I do a propfind on the
>>>>collection I which I place this file, then I get something like this
>>>>/files/%C3%83%C2%B8 - and this should have been representing one Danish
>>>>character. If I take the above and convert from UTF8 to my local, then I
>>>
>>>get
>>>
>>>
>>>
>>>>what is store in the database - If I then convert from UTF8 to local
> 
> again
> 
>>>>the I get the correct Danish letter.
>>>
>>>
>>>I could not find anything that might have converted the URI strings. 
>>>They are just plainly filled into the SQL like in
>>>
>>>
>>>
>>>
>>>>                      "select 1 from OBJECT o, URI u where
>>>
>>>o.URI_ID=u.URI_ID and u.URI_STRING=?");
>>>
>>>
>>>
>>>>              statement.setString(1, uri.toString());
>>>
>>>
>>>So, maybe this is a more general problem...
>>>
>>>
>>>
>>>
>>>>I seem that slide converts the URI's from the db to UTF8, but they are
>>>>already stored in unescaped UTF-8!
>>>
>>>
>>>Does this happen with the file store as well?
>>>
>>>Oliver
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>
>>>
>>>.
>>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>.
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> .
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


RE: TXFileStore and local filesystem

Posted by Jacob Lund <jl...@qualiware.com>.
First of all - the patch you just checked in for the txfilestore works fine
:-) 

Some of my problems with the SQLServerAdapter was my fault - forgot to set
encoding to UTF-8 in slide.properties.

However to get the SQL store working with Russian and Danish characters at
the same time I had to change the database scheme. It turns out that slide
does send the Unicode characters to the database but the database scheme
user 8bit char in the string fields.

I have attached the new scheme - all I did was change varchar to nvarchar.
Now it works fine :-)

/Jacob

-----Original Message-----
From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
Sent: 29. januar 2004 10:47
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem

So, I think we have two problems now, I am endangered to mix up:

(1) The filestore has a problem with file names
(2) The dabase stores have a problem as well, which is yet unclear to me

Concerning (1): Could you send the new exception after the patch was 
applied? At least the file name given in the exceptions head followed by 
"Can not create resource at " should look different for me to see what 
might be be going on.

Concerning (2): Could you describe this a bit more in order to make my 
rusty mind understand?

Concerning the Unicode vs. UTF-8 issue: How would you decode a string 
before storing into the database? Into what? The JDBC method accepts a 
string, so you will have to pass it one. As I said, you can only 
decode/encode into/from bytes...

Oliver

Jacob Lund wrote:

> The patch did not make any difference - it still throws the same
exception!
> 
> What I meant about converting from UTF-8 to Unicode is that the database
> driver can handle Unicode. In the filestore UTF-8 is converted to local
> character set in order to create the files and this is why the filestore
(I
> think) has a problem. If the database could store the data in Unicode then
> there would be no problem. Since java is using Unicode in strings the task
> would simply be to decode the strings before they are stored in the
database
> and then make sure that all text fields in the database are Unicode (or
> widechar or nchar).
> 
> Please tell me if I am way off here!
> 
> /Jacob
> 
> -----Original Message-----
> From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
> Sent: 29. januar 2004 10:02
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
> 
> Jacob Lund wrote:
> 
>>No, the filestore works correctly.
> 
> 
> OK, shall I check in the patch? Did it work for you?
> 
> 
>>>>From what I can see the filestore converts from UTF-8 to local before it
>>stores data. This I why UTF-8 works fine for me when I upload files with
>>Danish letters in the filename, and also why if fails when it stores files
>>with characters not supported by the codepage.
>>
>>Windows XP use Unicode, but in "dos mode" it will use the old codepage
>>types. The only thing that I can imagine is that java will use this
> 
> codepage
> 
>>when it is doing IO operations towards the filesystem. This problem might
> 
> be
> 
>>a problem that only appears on windows systems.
>>
>>I do not think that the problem is in the fill data into the database that
>>has a problem. Some place in slide it will convert that data (in this case
>>the uri) to UTF-8 before it is send to the client. The data stored in the
>>database is UTF-8, and I believe that java is using Unicode. So the
> 
> solution
> 
>>might be to convert data fetched from the database back to Unicode as soon
>>as it arrives to the store class.
>>
>>The correct solution might be to convert from UTF-8 to Unicode before
>>storing the data and then change the database scheme to Unicode char in
> 
> all
> 
>>fields containing strings.
> 
> 
> Hmmmm. You might be confusing certain things here. On one side there is 
> Unicode having a number for each character. On the other side there is 
> the representation in bytes. Now, UTF-8 *is* Unicode, but on the other 
> side, i.e. the representation in bytes. Thus it does not make too much 
> sense to compare Unicode with UTF-8. Do you agree?
> 
> 
>>I am guessing here since I do not have any idea of how the stores are
>>structured in slide. I you want I would be happy to do some debugging, but
> 
> I
> 
>>will need a short introduction to how the datastores are designed in
> 
> slide.
> 
> I know, proper documentation is a major problem. I will try to prepare 
> something like a short introduction and will post it to the list as soon 
> as it is done. This may take a while though :(
> 
> Oliver
> 
> 
>>/Jacob
>>
>>-----Original Message-----
>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
>>Sent: 28. januar 2004 16:40
>>To: Slide Users Mailing List
>>Subject: Re: TXFileStore and local filesystem
>>
>>Jacob Lund wrote:
>>
>>
>>
>>>Sorry about that - yes I am talking about the URI!
>>>
>>>If I look in a record in the database, each Danish character is stored as
>>>two "funny looking" characters corresponding to the unescaped UTF-8
>>
>>encoded
>>
>>
>>>version - so this looks correct! However when I do a propfind on the
>>>collection I which I place this file, then I get something like this
>>>/files/%C3%83%C2%B8 - and this should have been representing one Danish
>>>character. If I take the above and convert from UTF8 to my local, then I
>>
>>get
>>
>>
>>>what is store in the database - If I then convert from UTF8 to local
again
>>>the I get the correct Danish letter.
>>
>>
>>I could not find anything that might have converted the URI strings. 
>>They are just plainly filled into the SQL like in
>>
>>
>>
>>>                       "select 1 from OBJECT o, URI u where
>>
>>o.URI_ID=u.URI_ID and u.URI_STRING=?");
>>
>>
>>>               statement.setString(1, uri.toString());
>>
>>
>>So, maybe this is a more general problem...
>>
>>
>>
>>>I seem that slide converts the URI's from the db to UTF8, but they are
>>>already stored in unescaped UTF-8!
>>
>>
>>Does this happen with the file store as well?
>>
>>Oliver
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>.
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> .
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


Re: TXFileStore and local filesystem

Posted by Oliver Zeigermann <oz...@c1-fse.de>.
So, I think we have two problems now, I am endangered to mix up:

(1) The filestore has a problem with file names
(2) The dabase stores have a problem as well, which is yet unclear to me

Concerning (1): Could you send the new exception after the patch was 
applied? At least the file name given in the exceptions head followed by 
"Can not create resource at " should look different for me to see what 
might be be going on.

Concerning (2): Could you describe this a bit more in order to make my 
rusty mind understand?

Concerning the Unicode vs. UTF-8 issue: How would you decode a string 
before storing into the database? Into what? The JDBC method accepts a 
string, so you will have to pass it one. As I said, you can only 
decode/encode into/from bytes...

Oliver

Jacob Lund wrote:

> The patch did not make any difference - it still throws the same exception!
> 
> What I meant about converting from UTF-8 to Unicode is that the database
> driver can handle Unicode. In the filestore UTF-8 is converted to local
> character set in order to create the files and this is why the filestore (I
> think) has a problem. If the database could store the data in Unicode then
> there would be no problem. Since java is using Unicode in strings the task
> would simply be to decode the strings before they are stored in the database
> and then make sure that all text fields in the database are Unicode (or
> widechar or nchar).
> 
> Please tell me if I am way off here!
> 
> /Jacob
> 
> -----Original Message-----
> From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
> Sent: 29. januar 2004 10:02
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
> 
> Jacob Lund wrote:
> 
>>No, the filestore works correctly.
> 
> 
> OK, shall I check in the patch? Did it work for you?
> 
> 
>>>>From what I can see the filestore converts from UTF-8 to local before it
>>stores data. This I why UTF-8 works fine for me when I upload files with
>>Danish letters in the filename, and also why if fails when it stores files
>>with characters not supported by the codepage.
>>
>>Windows XP use Unicode, but in "dos mode" it will use the old codepage
>>types. The only thing that I can imagine is that java will use this
> 
> codepage
> 
>>when it is doing IO operations towards the filesystem. This problem might
> 
> be
> 
>>a problem that only appears on windows systems.
>>
>>I do not think that the problem is in the fill data into the database that
>>has a problem. Some place in slide it will convert that data (in this case
>>the uri) to UTF-8 before it is send to the client. The data stored in the
>>database is UTF-8, and I believe that java is using Unicode. So the
> 
> solution
> 
>>might be to convert data fetched from the database back to Unicode as soon
>>as it arrives to the store class.
>>
>>The correct solution might be to convert from UTF-8 to Unicode before
>>storing the data and then change the database scheme to Unicode char in
> 
> all
> 
>>fields containing strings.
> 
> 
> Hmmmm. You might be confusing certain things here. On one side there is 
> Unicode having a number for each character. On the other side there is 
> the representation in bytes. Now, UTF-8 *is* Unicode, but on the other 
> side, i.e. the representation in bytes. Thus it does not make too much 
> sense to compare Unicode with UTF-8. Do you agree?
> 
> 
>>I am guessing here since I do not have any idea of how the stores are
>>structured in slide. I you want I would be happy to do some debugging, but
> 
> I
> 
>>will need a short introduction to how the datastores are designed in
> 
> slide.
> 
> I know, proper documentation is a major problem. I will try to prepare 
> something like a short introduction and will post it to the list as soon 
> as it is done. This may take a while though :(
> 
> Oliver
> 
> 
>>/Jacob
>>
>>-----Original Message-----
>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
>>Sent: 28. januar 2004 16:40
>>To: Slide Users Mailing List
>>Subject: Re: TXFileStore and local filesystem
>>
>>Jacob Lund wrote:
>>
>>
>>
>>>Sorry about that - yes I am talking about the URI!
>>>
>>>If I look in a record in the database, each Danish character is stored as
>>>two "funny looking" characters corresponding to the unescaped UTF-8
>>
>>encoded
>>
>>
>>>version - so this looks correct! However when I do a propfind on the
>>>collection I which I place this file, then I get something like this
>>>/files/%C3%83%C2%B8 - and this should have been representing one Danish
>>>character. If I take the above and convert from UTF8 to my local, then I
>>
>>get
>>
>>
>>>what is store in the database - If I then convert from UTF8 to local again
>>>the I get the correct Danish letter.
>>
>>
>>I could not find anything that might have converted the URI strings. 
>>They are just plainly filled into the SQL like in
>>
>>
>>
>>>                       "select 1 from OBJECT o, URI u where
>>
>>o.URI_ID=u.URI_ID and u.URI_STRING=?");
>>
>>
>>>               statement.setString(1, uri.toString());
>>
>>
>>So, maybe this is a more general problem...
>>
>>
>>
>>>I seem that slide converts the URI's from the db to UTF8, but they are
>>>already stored in unescaped UTF-8!
>>
>>
>>Does this happen with the file store as well?
>>
>>Oliver
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>.
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> .
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


RE: TXFileStore and local filesystem

Posted by Jacob Lund <jl...@qualiware.com>.
The patch did not make any difference - it still throws the same exception!

What I meant about converting from UTF-8 to Unicode is that the database
driver can handle Unicode. In the filestore UTF-8 is converted to local
character set in order to create the files and this is why the filestore (I
think) has a problem. If the database could store the data in Unicode then
there would be no problem. Since java is using Unicode in strings the task
would simply be to decode the strings before they are stored in the database
and then make sure that all text fields in the database are Unicode (or
widechar or nchar).

Please tell me if I am way off here!

/Jacob

-----Original Message-----
From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
Sent: 29. januar 2004 10:02
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem

Jacob Lund wrote:
> No, the filestore works correctly.

OK, shall I check in the patch? Did it work for you?

>>>From what I can see the filestore converts from UTF-8 to local before it
> stores data. This I why UTF-8 works fine for me when I upload files with
> Danish letters in the filename, and also why if fails when it stores files
> with characters not supported by the codepage.
> 
> Windows XP use Unicode, but in "dos mode" it will use the old codepage
> types. The only thing that I can imagine is that java will use this
codepage
> when it is doing IO operations towards the filesystem. This problem might
be
> a problem that only appears on windows systems.
> 
> I do not think that the problem is in the fill data into the database that
> has a problem. Some place in slide it will convert that data (in this case
> the uri) to UTF-8 before it is send to the client. The data stored in the
> database is UTF-8, and I believe that java is using Unicode. So the
solution
> might be to convert data fetched from the database back to Unicode as soon
> as it arrives to the store class.
> 
> The correct solution might be to convert from UTF-8 to Unicode before
> storing the data and then change the database scheme to Unicode char in
all
> fields containing strings.

Hmmmm. You might be confusing certain things here. On one side there is 
Unicode having a number for each character. On the other side there is 
the representation in bytes. Now, UTF-8 *is* Unicode, but on the other 
side, i.e. the representation in bytes. Thus it does not make too much 
sense to compare Unicode with UTF-8. Do you agree?

> I am guessing here since I do not have any idea of how the stores are
> structured in slide. I you want I would be happy to do some debugging, but
I
> will need a short introduction to how the datastores are designed in
slide.

I know, proper documentation is a major problem. I will try to prepare 
something like a short introduction and will post it to the list as soon 
as it is done. This may take a while though :(

Oliver

> /Jacob
> 
> -----Original Message-----
> From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
> Sent: 28. januar 2004 16:40
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
> 
> Jacob Lund wrote:
> 
> 
>>Sorry about that - yes I am talking about the URI!
>>
>>If I look in a record in the database, each Danish character is stored as
>>two "funny looking" characters corresponding to the unescaped UTF-8
> 
> encoded
> 
>>version - so this looks correct! However when I do a propfind on the
>>collection I which I place this file, then I get something like this
>>/files/%C3%83%C2%B8 - and this should have been representing one Danish
>>character. If I take the above and convert from UTF8 to my local, then I
> 
> get
> 
>>what is store in the database - If I then convert from UTF8 to local again
>>the I get the correct Danish letter.
> 
> 
> I could not find anything that might have converted the URI strings. 
> They are just plainly filled into the SQL like in
> 
> 
>>                        "select 1 from OBJECT o, URI u where
> 
> o.URI_ID=u.URI_ID and u.URI_STRING=?");
> 
>>                statement.setString(1, uri.toString());
> 
> 
> So, maybe this is a more general problem...
> 
> 
>>I seem that slide converts the URI's from the db to UTF8, but they are
>>already stored in unescaped UTF-8!
> 
> 
> Does this happen with the file store as well?
> 
> Oliver
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> .
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


Re: TXFileStore and local filesystem

Posted by Oliver Zeigermann <oz...@c1-fse.de>.
Jacob Lund wrote:
> No, the filestore works correctly.

OK, shall I check in the patch? Did it work for you?

>>>From what I can see the filestore converts from UTF-8 to local before it
> stores data. This I why UTF-8 works fine for me when I upload files with
> Danish letters in the filename, and also why if fails when it stores files
> with characters not supported by the codepage.
> 
> Windows XP use Unicode, but in "dos mode" it will use the old codepage
> types. The only thing that I can imagine is that java will use this codepage
> when it is doing IO operations towards the filesystem. This problem might be
> a problem that only appears on windows systems.
> 
> I do not think that the problem is in the fill data into the database that
> has a problem. Some place in slide it will convert that data (in this case
> the uri) to UTF-8 before it is send to the client. The data stored in the
> database is UTF-8, and I believe that java is using Unicode. So the solution
> might be to convert data fetched from the database back to Unicode as soon
> as it arrives to the store class.
> 
> The correct solution might be to convert from UTF-8 to Unicode before
> storing the data and then change the database scheme to Unicode char in all
> fields containing strings.

Hmmmm. You might be confusing certain things here. On one side there is 
Unicode having a number for each character. On the other side there is 
the representation in bytes. Now, UTF-8 *is* Unicode, but on the other 
side, i.e. the representation in bytes. Thus it does not make too much 
sense to compare Unicode with UTF-8. Do you agree?

> I am guessing here since I do not have any idea of how the stores are
> structured in slide. I you want I would be happy to do some debugging, but I
> will need a short introduction to how the datastores are designed in slide.

I know, proper documentation is a major problem. I will try to prepare 
something like a short introduction and will post it to the list as soon 
as it is done. This may take a while though :(

Oliver

> /Jacob
> 
> -----Original Message-----
> From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
> Sent: 28. januar 2004 16:40
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
> 
> Jacob Lund wrote:
> 
> 
>>Sorry about that - yes I am talking about the URI!
>>
>>If I look in a record in the database, each Danish character is stored as
>>two "funny looking" characters corresponding to the unescaped UTF-8
> 
> encoded
> 
>>version - so this looks correct! However when I do a propfind on the
>>collection I which I place this file, then I get something like this
>>/files/%C3%83%C2%B8 - and this should have been representing one Danish
>>character. If I take the above and convert from UTF8 to my local, then I
> 
> get
> 
>>what is store in the database - If I then convert from UTF8 to local again
>>the I get the correct Danish letter.
> 
> 
> I could not find anything that might have converted the URI strings. 
> They are just plainly filled into the SQL like in
> 
> 
>>                        "select 1 from OBJECT o, URI u where
> 
> o.URI_ID=u.URI_ID and u.URI_STRING=?");
> 
>>                statement.setString(1, uri.toString());
> 
> 
> So, maybe this is a more general problem...
> 
> 
>>I seem that slide converts the URI's from the db to UTF8, but they are
>>already stored in unescaped UTF-8!
> 
> 
> Does this happen with the file store as well?
> 
> Oliver
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> .
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


RE: TXFileStore and local filesystem

Posted by Jacob Lund <jl...@qualiware.com>.
No, the filestore works correctly.

>From what I can see the filestore converts from UTF-8 to local before it
stores data. This I why UTF-8 works fine for me when I upload files with
Danish letters in the filename, and also why if fails when it stores files
with characters not supported by the codepage.

Windows XP use Unicode, but in "dos mode" it will use the old codepage
types. The only thing that I can imagine is that java will use this codepage
when it is doing IO operations towards the filesystem. This problem might be
a problem that only appears on windows systems.

I do not think that the problem is in the fill data into the database that
has a problem. Some place in slide it will convert that data (in this case
the uri) to UTF-8 before it is send to the client. The data stored in the
database is UTF-8, and I believe that java is using Unicode. So the solution
might be to convert data fetched from the database back to Unicode as soon
as it arrives to the store class.

The correct solution might be to convert from UTF-8 to Unicode before
storing the data and then change the database scheme to Unicode char in all
fields containing strings.

I am guessing here since I do not have any idea of how the stores are
structured in slide. I you want I would be happy to do some debugging, but I
will need a short introduction to how the datastores are designed in slide.

/Jacob

-----Original Message-----
From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
Sent: 28. januar 2004 16:40
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem

Jacob Lund wrote:

> Sorry about that - yes I am talking about the URI!
> 
> If I look in a record in the database, each Danish character is stored as
> two "funny looking" characters corresponding to the unescaped UTF-8
encoded
> version - so this looks correct! However when I do a propfind on the
> collection I which I place this file, then I get something like this
> /files/%C3%83%C2%B8 - and this should have been representing one Danish
> character. If I take the above and convert from UTF8 to my local, then I
get
> what is store in the database - If I then convert from UTF8 to local again
> the I get the correct Danish letter.

I could not find anything that might have converted the URI strings. 
They are just plainly filled into the SQL like in

>                         "select 1 from OBJECT o, URI u where
o.URI_ID=u.URI_ID and u.URI_STRING=?");
>                 statement.setString(1, uri.toString());

So, maybe this is a more general problem...

> I seem that slide converts the URI's from the db to UTF8, but they are
> already stored in unescaped UTF-8!

Does this happen with the file store as well?

Oliver


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


Re: TXFileStore and local filesystem

Posted by Oliver Zeigermann <oz...@c1-fse.de>.
Jacob Lund wrote:

> Sorry about that - yes I am talking about the URI!
> 
> If I look in a record in the database, each Danish character is stored as
> two "funny looking" characters corresponding to the unescaped UTF-8 encoded
> version - so this looks correct! However when I do a propfind on the
> collection I which I place this file, then I get something like this
> /files/%C3%83%C2%B8 - and this should have been representing one Danish
> character. If I take the above and convert from UTF8 to my local, then I get
> what is store in the database - If I then convert from UTF8 to local again
> the I get the correct Danish letter.

I could not find anything that might have converted the URI strings. 
They are just plainly filled into the SQL like in

>                         "select 1 from OBJECT o, URI u where o.URI_ID=u.URI_ID and u.URI_STRING=?");
>                 statement.setString(1, uri.toString());

So, maybe this is a more general problem...

> I seem that slide converts the URI's from the db to UTF8, but they are
> already stored in unescaped UTF-8!

Does this happen with the file store as well?

Oliver


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


RE: TXFileStore and local filesystem

Posted by Jacob Lund <jl...@qualiware.com>.
Sorry about that - yes I am talking about the URI!

If I look in a record in the database, each Danish character is stored as
two "funny looking" characters corresponding to the unescaped UTF-8 encoded
version - so this looks correct! However when I do a propfind on the
collection I which I place this file, then I get something like this
/files/%C3%83%C2%B8 - and this should have been representing one Danish
character. If I take the above and convert from UTF8 to my local, then I get
what is store in the database - If I then convert from UTF8 to local again
the I get the correct Danish letter.

I seem that slide converts the URI's from the db to UTF8, but they are
already stored in unescaped UTF-8!

/Jacob

-----Original Message-----
From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
Sent: 28. januar 2004 15:25
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem

Oh, well, you seem to be one step ahead...

Jacob Lund wrote:
> I just tried the SQLServerRDMBSAdapter and it seems to have some problems
> too. Has UTF-8 been tested with the RDMS adapters?
> 
> All tables in the db are uses normal char and not Unicode char. I guess
that
> it means that slide should store data as UTF-8. Looking into the content
of
> the records it does look like UTF-8 encoded data. I this correct?

Are you talking about the URIs? Or about content? Content is stored in 
BLOBs, i.e. with no en/decoding at all as it considered binary.

What problems are you experiencing?

Oliver


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


Re: TXFileStore and local filesystem

Posted by Oliver Zeigermann <oz...@c1-fse.de>.
Oh, well, you seem to be one step ahead...

Jacob Lund wrote:
> I just tried the SQLServerRDMBSAdapter and it seems to have some problems
> too. Has UTF-8 been tested with the RDMS adapters?
> 
> All tables in the db are uses normal char and not Unicode char. I guess that
> it means that slide should store data as UTF-8. Looking into the content of
> the records it does look like UTF-8 encoded data. I this correct?

Are you talking about the URIs? Or about content? Content is stored in 
BLOBs, i.e. with no en/decoding at all as it considered binary.

What problems are you experiencing?

Oliver


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


RE: TXFileStore and local filesystem

Posted by Jacob Lund <jl...@qualiware.com>.
I just tried the SQLServerRDMBSAdapter and it seems to have some problems
too. Has UTF-8 been tested with the RDMS adapters?

All tables in the db are uses normal char and not Unicode char. I guess that
it means that slide should store data as UTF-8. Looking into the content of
the records it does look like UTF-8 encoded data. I this correct?

/Jacob

-----Original Message-----
From: Jacob Lund [mailto:jl@qualiware.com] 
Sent: 28. januar 2004 13:03
To: 'Slide Users Mailing List'
Subject: RE: TXFileStore and local filesystem

No file is created by the filestore!

German letters and danish letters works fine for me too - and they are also
supported by the codepage.

To avoid these problems, why not save the files in escaped UTF-8 codes -
that way wee always know that is will work?!?

/Jacob

Here is the stack trace:

28 Jan 2004 12:57:25 -
org.apache.slide.store.txfile.AbstractTxFileStoreService
- WARNING - Thread Thread[http8080-Processor23,5,main] marked transaction
branch
 http8080-Processor23-1075291045101-264 for rollback
org.apache.slide.store.txfile.rm.ResourceManagerSystemException:
http8080-Proces
sor23-1075291045101-264: Can not create resource at
'/files/+?-++?-Á+?-¦+?-©+æ-Ç
+æ-â+?-Á+?-++æ-ï.txt.def.xml' (ERR_SYSTEM)
Caused by: java.io.IOException: The filename, directory name, or volume
label sy
ntax is incorrect
        at java.io.WinNTFileSystem.createFileExclusively(Native Method)
        at java.io.File.createNewFile(File.java:828)
        at
org.apache.slide.store.txfile.rm.impl.FileResourceManager.createFile(
FileResourceManager.java:209)
        at
org.apache.slide.store.txfile.rm.impl.FileResourceManager.createResou
rce(FileResourceManager.java:770)
        at
org.apache.slide.store.txfile.XMLResourceDescriptor.create(XMLResourc
eDescriptor.java:201)
        at
org.apache.slide.store.txfile.TxXMLFileDescriptorsStore.createObject(
TxXMLFileDescriptorsStore.java:126)
        at
org.apache.slide.store.AbstractStore.createObject(AbstractStore.java:
597)
        at
org.apache.slide.store.ExtendedStore.createObject(ExtendedStore.java:
476)
        at
org.apache.slide.structure.StructureImpl.create(StructureImpl.java:36
2)
        at
org.apache.slide.webdav.method.PutMethod.executeRequest(PutMethod.jav
a:333)
        at
org.apache.slide.webdav.method.AbstractWebdavMethod.run(AbstractWebda
vMethod.java:355)
        at
org.apache.slide.webdav.WebdavServlet.service(WebdavServlet.java:204)

        at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
        at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl
icationFilterChain.java:284)
        at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF
ilterChain.java:204)
        at
org.apache.slide.webdav.filter.LogFilter.doFilter(LogFilter.java:182)

-----Original Message-----
From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
Sent: 28. januar 2004 12:30
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem

Hi Jacob!

I can imagine there might be problems with the filename. I do not know 
how to create filenames with Russian characters (or any other key not on 
my keyboard, but German special characters "öäüÖÄÜß" work fine ). So, 
please report any progress and try to find out where it fails. Are files 
being created? If so, how do the filenames look like.

Thanks for helping and cheers,

Oliver

Jacob Lund wrote:
> Hi All!
> 
> Has anyone tried the TXFile store with filenames that includes characters
> that are not allowed in the server file system?
> 
> I am using windows XP and the UI supports UTF8. My XP is English and local
> is set to Danish! If I upload a file with Russian characters in the
filename
> then slide fails with a 500 error. Slide is set to UFT8 and running on TC
> 5.0.16.
> 
> My slide version I about a week old (from the SLIDE_2_0_RELEASE_BRANCH)
> 
> I will dig deeper into this, but I just wanted to hear if anyone else has
> experience with this.
> 
> Without knowing anything for sure I think that java will apply the "DOS"
> codepage to all IO operations. If this is the true then the filestore
needs
> to save all information in an escaped UTF8 mannor. 
> 
> /Jacob
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> .
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


RE: TXFileStore and local filesystem

Posted by Jacob Lund <jl...@qualiware.com>.
No file is created by the filestore!

German letters and danish letters works fine for me too - and they are also supported by the codepage.

To avoid these problems, why not save the files in escaped UTF-8 codes - that way wee always know that is will work?!?

/Jacob

Here is the stack trace:

28 Jan 2004 12:57:25 - org.apache.slide.store.txfile.AbstractTxFileStoreService
- WARNING - Thread Thread[http8080-Processor23,5,main] marked transaction branch
 http8080-Processor23-1075291045101-264 for rollback
org.apache.slide.store.txfile.rm.ResourceManagerSystemException: http8080-Proces
sor23-1075291045101-264: Can not create resource at '/files/├?┬╗├?┬Á├?┬│├?┬©├æ┬Ç
├æ┬â├?┬Á├?┬╝├æ┬ï.txt.def.xml' (ERR_SYSTEM)
Caused by: java.io.IOException: The filename, directory name, or volume label sy
ntax is incorrect
        at java.io.WinNTFileSystem.createFileExclusively(Native Method)
        at java.io.File.createNewFile(File.java:828)
        at org.apache.slide.store.txfile.rm.impl.FileResourceManager.createFile(
FileResourceManager.java:209)
        at org.apache.slide.store.txfile.rm.impl.FileResourceManager.createResou
rce(FileResourceManager.java:770)
        at org.apache.slide.store.txfile.XMLResourceDescriptor.create(XMLResourc
eDescriptor.java:201)
        at org.apache.slide.store.txfile.TxXMLFileDescriptorsStore.createObject(
TxXMLFileDescriptorsStore.java:126)
        at org.apache.slide.store.AbstractStore.createObject(AbstractStore.java:
597)
        at org.apache.slide.store.ExtendedStore.createObject(ExtendedStore.java:
476)
        at org.apache.slide.structure.StructureImpl.create(StructureImpl.java:36
2)
        at org.apache.slide.webdav.method.PutMethod.executeRequest(PutMethod.jav
a:333)
        at org.apache.slide.webdav.method.AbstractWebdavMethod.run(AbstractWebda
vMethod.java:355)
        at org.apache.slide.webdav.WebdavServlet.service(WebdavServlet.java:204)

        at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
        at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl
icationFilterChain.java:284)
        at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF
ilterChain.java:204)
        at org.apache.slide.webdav.filter.LogFilter.doFilter(LogFilter.java:182)

-----Original Message-----
From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de] 
Sent: 28. januar 2004 12:30
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem

Hi Jacob!

I can imagine there might be problems with the filename. I do not know 
how to create filenames with Russian characters (or any other key not on 
my keyboard, but German special characters "öäüÖÄÜß" work fine ). So, 
please report any progress and try to find out where it fails. Are files 
being created? If so, how do the filenames look like.

Thanks for helping and cheers,

Oliver

Jacob Lund wrote:
> Hi All!
> 
> Has anyone tried the TXFile store with filenames that includes characters
> that are not allowed in the server file system?
> 
> I am using windows XP and the UI supports UTF8. My XP is English and local
> is set to Danish! If I upload a file with Russian characters in the filename
> then slide fails with a 500 error. Slide is set to UFT8 and running on TC
> 5.0.16.
> 
> My slide version I about a week old (from the SLIDE_2_0_RELEASE_BRANCH)
> 
> I will dig deeper into this, but I just wanted to hear if anyone else has
> experience with this.
> 
> Without knowing anything for sure I think that java will apply the "DOS"
> codepage to all IO operations. If this is the true then the filestore needs
> to save all information in an escaped UTF8 mannor. 
> 
> /Jacob
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> .
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org


Re: TXFileStore and local filesystem

Posted by Oliver Zeigermann <oz...@c1-fse.de>.
Hi Jacob!

I can imagine there might be problems with the filename. I do not know 
how to create filenames with Russian characters (or any other key not on 
my keyboard, but German special characters "öäüÖÄÜß" work fine ). So, 
please report any progress and try to find out where it fails. Are files 
being created? If so, how do the filenames look like.

Thanks for helping and cheers,

Oliver

Jacob Lund wrote:
> Hi All!
> 
> Has anyone tried the TXFile store with filenames that includes characters
> that are not allowed in the server file system?
> 
> I am using windows XP and the UI supports UTF8. My XP is English and local
> is set to Danish! If I upload a file with Russian characters in the filename
> then slide fails with a 500 error. Slide is set to UFT8 and running on TC
> 5.0.16.
> 
> My slide version I about a week old (from the SLIDE_2_0_RELEASE_BRANCH)
> 
> I will dig deeper into this, but I just wanted to hear if anyone else has
> experience with this.
> 
> Without knowing anything for sure I think that java will apply the "DOS"
> codepage to all IO operations. If this is the true then the filestore needs
> to save all information in an escaped UTF8 mannor. 
> 
> /Jacob
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
> 
> 
> .
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org