You are viewing a plain text version of this content. The canonical link for it is here.
Posted to slide-user@jakarta.apache.org by Jacob Lund <jl...@qualiware.com> on 2004/01/28 12:00:35 UTC
TXFileStore and local filesystem
Hi All!
Has anyone tried the TXFile store with filenames that includes characters
that are not allowed in the server file system?
I am using windows XP and the UI supports UTF8. My XP is English and local
is set to Danish! If I upload a file with Russian characters in the filename
then slide fails with a 500 error. Slide is set to UFT8 and running on TC
5.0.16.
My slide version I about a week old (from the SLIDE_2_0_RELEASE_BRANCH)
I will dig deeper into this, but I just wanted to hear if anyone else has
experience with this.
Without knowing anything for sure I think that java will apply the "DOS"
codepage to all IO operations. If this is the true then the filestore needs
to save all information in an escaped UTF8 mannor.
/Jacob
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
Re: TXFileStore and local filesystem
Posted by Oliver Zeigermann <oz...@c1-fse.de>.
Michael Smith wrote:
> Oliver Zeigermann wrote:
>
>> Jacob Lund wrote:
>>
>>> No file is created by the filestore!
>>>
>>> German letters and danish letters works fine for me too - and they
>>> are also supported by the codepage.
>>>
>>> To avoid these problems, why not save the files in escaped UTF-8
>>> codes - that way wee always know that is will work?!?
>>
>>
>>
>> Hmmm. Escaped? Like URL encoded? This might blow up the length quite a
>> bit. As I know of file systems having at most 255 characters per path
>> (like the ones both of us use) this may be undesirable for the default
>> application. What about adding a switch for this and make it
>> configurable?
>>
>
> It's a slightly more complex problem than this, and needs a solution in
> at least two places:
> 1) Filenames that aren't allowed
> a) 'High' unicode characters, etc. These should be escaped somehow.
> b) Disallowed characters in the low part - for example, windows
> doesn't permit several characters such as colons and slashes.
> This is _and must remain_ platform specific - which is painful.
> These should get escaped too.
URLEncoder does this for us. As far as I remember it uses 7 bit ASCII
without any special characters. The rest is encoded in %xx. So, when it
is UTF-8 what you put into it, I do not think we have a problem with
this any more. The call might look like
> path = URLEncoder.encode(path, "UTF-8");
> 2) What to do if we fail anyway. We might not escape everything
> (i.e. we might just make a decision to disallow certain characters
> in filenames), or the escaped filename might exceed the maximum
> length allowed by the filesystem. In this case, we need to
> propogate some appropriate error up to the application (webdav)
> level, so we don't just give a 400.
That's right! Same thing with the relational database store. The problem
is we only have ServiceAccessException, but we could nest others into
it. How could this exception be handled in the WebDAV layer to generate
a proper error message and code? I simply do no know...
Oliver
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
Re: TXFileStore and local filesystem
Posted by Michael Smith <ms...@speedlegal.com>.
Oliver Zeigermann wrote:
> Jacob Lund wrote:
>
>> No file is created by the filestore!
>>
>> German letters and danish letters works fine for me too - and they are
>> also supported by the codepage.
>>
>> To avoid these problems, why not save the files in escaped UTF-8 codes
>> - that way wee always know that is will work?!?
>
>
> Hmmm. Escaped? Like URL encoded? This might blow up the length quite a
> bit. As I know of file systems having at most 255 characters per path
> (like the ones both of us use) this may be undesirable for the default
> application. What about adding a switch for this and make it configurable?
>
It's a slightly more complex problem than this, and needs a solution in
at least two places:
1) Filenames that aren't allowed
a) 'High' unicode characters, etc. These should be escaped somehow.
b) Disallowed characters in the low part - for example, windows
doesn't permit several characters such as colons and slashes.
This is _and must remain_ platform specific - which is painful.
These should get escaped too.
2) What to do if we fail anyway. We might not escape everything
(i.e. we might just make a decision to disallow certain characters
in filenames), or the escaped filename might exceed the maximum
length allowed by the filesystem. In this case, we need to
propogate some appropriate error up to the application (webdav)
level, so we don't just give a 400.
Thoughts?
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
Re: TXFileStore and local filesystem
Posted by Oliver Zeigermann <oz...@c1-fse.de>.
Jacob Lund wrote:
> No file is created by the filestore!
>
> German letters and danish letters works fine for me too - and they are also supported by the codepage.
>
> To avoid these problems, why not save the files in escaped UTF-8 codes - that way wee always know that is will work?!?
Hmmm. Escaped? Like URL encoded? This might blow up the length quite a
bit. As I know of file systems having at most 255 characters per path
(like the ones both of us use) this may be undesirable for the default
application. What about adding a switch for this and make it configurable?
Anyway, to see if this really fixes it, please apply the patch for
FileResourceManager attached and see if it works.
Thanks again :)
Oliver
Re: TXFileStore and local filesystem
Posted by Michael Smith <ms...@speedlegal.com>.
Jacob Lund wrote:
> Thanks!
>
> Could you then explain the relationship between UTF-8 and XML. Does is make
> sense to have unescaped UTF-8 encoding in XML or should UTF-8 always be
> escaped when used in XML?
>
XML frequently uses unicode (more often than not, in fact), though it
can use other character sets (XML processors all (?) use unicode
internally). Most often, the unicode is encoded using UTF-8. Using UTF-8
as-is in XML is sensible and a normal safe approach. Of course, you have
to do _some_ escaping because it's XML - so things like < and > (which
are, of course, unicode characters) still need escaping. But that's XML
escaping (using character references like >), not escaping the UTF-8.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
Re: TXFileStore and local filesystem
Posted by Oliver Zeigermann <oz...@c1-fse.de>.
As defined in the standard UTF-8 and UTF-16 as the default character
sets to be recognized by all spec comliant parsers. This means if you
have something like that as a header
<?xml version="1.0"?>
you are free to use UTF-8 encoding in this XML document. You can use
ASCII as well, as UTF-8 is backward compatible to ASCII...
Oliver
Jacob Lund wrote:
> Thanks!
>
> Could you then explain the relationship between UTF-8 and XML. Does is make
> sense to have unescaped UTF-8 encoding in XML or should UTF-8 always be
> escaped when used in XML?
>
> /Jacob
>
> -----Original Message-----
> From: Michael Smith [mailto:msmith@speedlegal.com]
> Sent: 2. februar 2004 01:04
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
>
> Jacob Lund wrote:
>
>>Ok! Let me see if I can explain myself - I am not an expert on this so
>>please correct me if I am wrong!
>>
>>An UTF-8 representation of one character consists of at combination of
>>characters. Now JAVA is a Unicode language and this means that one
>
> character
>
>>can represent "any" type of character in the world!
>
>
> This is incorrect. Your basic reasoning is more or less right, but your
> terminology is incorrect in ways that will tend to confuse your thinking
> (and that of others). You're confusing "character" and "byte" - a better
> way to phrase this is:
> "A UTF-8 representation of one character consists of one or more
> bytes"
> (note the distinction: a character is an abstract entity, any
> representation of that character is as a series of bytes).
>
>
>>Basically UTF-8 only makes sense when working on an "old" 7 bit asci
>
> system
>
>>and you need to use characters not available in the given codepage.
>
>
> No. UTF-8 a) makes sense in many places, and b) doesn't specifically
> help in this case. There's a UTF-7 that you could use for this, but
> nobody uses UTF-7, and I really don't recommend even bothering to look
> up the details of it.
>
>
>>Both UTF-8 and UTF-16 uses a varying number of bytes to represent one
>>character, where Unicode always uses 32 bit characters (maybe it is 24
>
> bit).
>
> This gets somewhat complex.
> Unicode does not use any number of bits for a character. Unicode
> specifies characters (as "codepoints") as an abstract integer, with no
> explicit representation.
>
> THEN, you have an 'encoding' of this integer to give an explicit
> representation of that abstract codepoint.
>
> UTF-8 uses a variable number of bytes to represent it (from 1-4, I
> think? I think the encoding allows for up to 6 bytes, but unicode
> doesn't actually use more than 4). UTF-8 is very widely used - for
> example, the overwhelming majority of XML content uses UTF-8, and
> widespread usage on the internet is generally (though definately not
> exclusively) migrating towards UTF-8 for most text content.
>
> UTF-16 _generally_ uses a fixed 2 bytes per character. However, this is
> complicated by "surrogate pairs", which are a special sort of escape
> sequence used by unicode to allow access to codepoints outside the BMP
> (Basic Multilingual Plane). It's worth noting here that Java's 'char'
> type (and hence Strings, etc.) use UTF-16, but ignore things like
> surrogates - this is mostly ok, but makes it fairly painful to do really
> complex multilingual stuff.
>
> There are two different versions of UTF-16, UTF-16-LE, and UTF-16-BE
> (little endian and big endian). They are generally distinguished by the
> use of an explicit BOM (Byte Order Marker, another 'special' unicode
> character) as the first character of a file. When being used in memory
> (as Java does) in an application, the character is generally stored in
> native endianness for whatever platform is being used.
>
> Some things (notably a lot of microsoft documentation - I haven't seen
> this usage widely outside of MS software) uses "unicode" to mean "The
> UTF-16-LE encoding of unicode". This is very confusing. So, for example,
> when things say that NTFS stores filenames in unicode, it actually means
> that they are stored in UTF16-LE. However, frequently this distinction
> does not matter - to many applications, the only important point is that
> unicode is being used, so the full character repertoire of unicode is
> available (sometimes restricted only to the BMP).
>
>
> There's also UTF-32, which always uses 32 bits per character. It's not
> widely used - mostly because for almost all applications, it's simply
> wasteful of memory.
>
>
>>This was my understanding of the UTF standards and unicode - am I wrong
>>here?
>
>
> I hope I've cleared some things up, here.
>
> Mike
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> .
>
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
RE: TXFileStore and local filesystem
Posted by Jacob Lund <jl...@qualiware.com>.
Thanks!
Could you then explain the relationship between UTF-8 and XML. Does is make
sense to have unescaped UTF-8 encoding in XML or should UTF-8 always be
escaped when used in XML?
/Jacob
-----Original Message-----
From: Michael Smith [mailto:msmith@speedlegal.com]
Sent: 2. februar 2004 01:04
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem
Jacob Lund wrote:
> Ok! Let me see if I can explain myself - I am not an expert on this so
> please correct me if I am wrong!
>
> An UTF-8 representation of one character consists of at combination of
> characters. Now JAVA is a Unicode language and this means that one
character
> can represent "any" type of character in the world!
This is incorrect. Your basic reasoning is more or less right, but your
terminology is incorrect in ways that will tend to confuse your thinking
(and that of others). You're confusing "character" and "byte" - a better
way to phrase this is:
"A UTF-8 representation of one character consists of one or more
bytes"
(note the distinction: a character is an abstract entity, any
representation of that character is as a series of bytes).
>
> Basically UTF-8 only makes sense when working on an "old" 7 bit asci
system
> and you need to use characters not available in the given codepage.
No. UTF-8 a) makes sense in many places, and b) doesn't specifically
help in this case. There's a UTF-7 that you could use for this, but
nobody uses UTF-7, and I really don't recommend even bothering to look
up the details of it.
>
> Both UTF-8 and UTF-16 uses a varying number of bytes to represent one
> character, where Unicode always uses 32 bit characters (maybe it is 24
bit).
This gets somewhat complex.
Unicode does not use any number of bits for a character. Unicode
specifies characters (as "codepoints") as an abstract integer, with no
explicit representation.
THEN, you have an 'encoding' of this integer to give an explicit
representation of that abstract codepoint.
UTF-8 uses a variable number of bytes to represent it (from 1-4, I
think? I think the encoding allows for up to 6 bytes, but unicode
doesn't actually use more than 4). UTF-8 is very widely used - for
example, the overwhelming majority of XML content uses UTF-8, and
widespread usage on the internet is generally (though definately not
exclusively) migrating towards UTF-8 for most text content.
UTF-16 _generally_ uses a fixed 2 bytes per character. However, this is
complicated by "surrogate pairs", which are a special sort of escape
sequence used by unicode to allow access to codepoints outside the BMP
(Basic Multilingual Plane). It's worth noting here that Java's 'char'
type (and hence Strings, etc.) use UTF-16, but ignore things like
surrogates - this is mostly ok, but makes it fairly painful to do really
complex multilingual stuff.
There are two different versions of UTF-16, UTF-16-LE, and UTF-16-BE
(little endian and big endian). They are generally distinguished by the
use of an explicit BOM (Byte Order Marker, another 'special' unicode
character) as the first character of a file. When being used in memory
(as Java does) in an application, the character is generally stored in
native endianness for whatever platform is being used.
Some things (notably a lot of microsoft documentation - I haven't seen
this usage widely outside of MS software) uses "unicode" to mean "The
UTF-16-LE encoding of unicode". This is very confusing. So, for example,
when things say that NTFS stores filenames in unicode, it actually means
that they are stored in UTF16-LE. However, frequently this distinction
does not matter - to many applications, the only important point is that
unicode is being used, so the full character repertoire of unicode is
available (sometimes restricted only to the BMP).
There's also UTF-32, which always uses 32 bits per character. It's not
widely used - mostly because for almost all applications, it's simply
wasteful of memory.
>
> This was my understanding of the UTF standards and unicode - am I wrong
> here?
I hope I've cleared some things up, here.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
Re: TXFileStore and local filesystem
Posted by Michael Smith <ms...@speedlegal.com>.
Jacob Lund wrote:
> Ok! Let me see if I can explain myself - I am not an expert on this so
> please correct me if I am wrong!
>
> An UTF-8 representation of one character consists of at combination of
> characters. Now JAVA is a Unicode language and this means that one character
> can represent "any" type of character in the world!
This is incorrect. Your basic reasoning is more or less right, but your
terminology is incorrect in ways that will tend to confuse your thinking
(and that of others). You're confusing "character" and "byte" - a better
way to phrase this is:
"A UTF-8 representation of one character consists of one or more bytes"
(note the distinction: a character is an abstract entity, any
representation of that character is as a series of bytes).
>
> Basically UTF-8 only makes sense when working on an "old" 7 bit asci system
> and you need to use characters not available in the given codepage.
No. UTF-8 a) makes sense in many places, and b) doesn't specifically
help in this case. There's a UTF-7 that you could use for this, but
nobody uses UTF-7, and I really don't recommend even bothering to look
up the details of it.
>
> Both UTF-8 and UTF-16 uses a varying number of bytes to represent one
> character, where Unicode always uses 32 bit characters (maybe it is 24 bit).
This gets somewhat complex.
Unicode does not use any number of bits for a character. Unicode
specifies characters (as "codepoints") as an abstract integer, with no
explicit representation.
THEN, you have an 'encoding' of this integer to give an explicit
representation of that abstract codepoint.
UTF-8 uses a variable number of bytes to represent it (from 1-4, I
think? I think the encoding allows for up to 6 bytes, but unicode
doesn't actually use more than 4). UTF-8 is very widely used - for
example, the overwhelming majority of XML content uses UTF-8, and
widespread usage on the internet is generally (though definately not
exclusively) migrating towards UTF-8 for most text content.
UTF-16 _generally_ uses a fixed 2 bytes per character. However, this is
complicated by "surrogate pairs", which are a special sort of escape
sequence used by unicode to allow access to codepoints outside the BMP
(Basic Multilingual Plane). It's worth noting here that Java's 'char'
type (and hence Strings, etc.) use UTF-16, but ignore things like
surrogates - this is mostly ok, but makes it fairly painful to do really
complex multilingual stuff.
There are two different versions of UTF-16, UTF-16-LE, and UTF-16-BE
(little endian and big endian). They are generally distinguished by the
use of an explicit BOM (Byte Order Marker, another 'special' unicode
character) as the first character of a file. When being used in memory
(as Java does) in an application, the character is generally stored in
native endianness for whatever platform is being used.
Some things (notably a lot of microsoft documentation - I haven't seen
this usage widely outside of MS software) uses "unicode" to mean "The
UTF-16-LE encoding of unicode". This is very confusing. So, for example,
when things say that NTFS stores filenames in unicode, it actually means
that they are stored in UTF16-LE. However, frequently this distinction
does not matter - to many applications, the only important point is that
unicode is being used, so the full character repertoire of unicode is
available (sometimes restricted only to the BMP).
There's also UTF-32, which always uses 32 bits per character. It's not
widely used - mostly because for almost all applications, it's simply
wasteful of memory.
>
> This was my understanding of the UTF standards and unicode - am I wrong
> here?
I hope I've cleared some things up, here.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
Re: TXFileStore and local filesystem
Posted by Michael Smith <ms...@speedlegal.com>.
Julian Reschke wrote:
> Jacob Lund wrote:
>
>> Ok! Let me see if I can explain myself - I am not an expert on this so
>> please correct me if I am wrong!
>>
>> An UTF-8 representation of one character consists of at combination of
>> characters. Now JAVA is a Unicode language and this means that one
>> character
>
>
> ...of bytes.
>
>> can represent "any" type of character in the world!
>
>
> Almost. Java's characters have only 16 bit, so there is a class of
> Unicode characters that need to be represented as a sequence of two Java
> characters.
>
>> Basically UTF-8 only makes sense when working on an "old" 7 bit asci
>> system
>> and you need to use characters not available in the given codepage.
>
>
> UTF-8 always makes sense when you need backward compatibilty with ASCII.
>
>> Both UTF-8 and UTF-16 uses a varying number of bytes to represent one
>> character, where Unicode always uses 32 bit characters (maybe it is 24
>> bit).
>
>
> Unicode doesn't "represent" at all. Unicode is just a definition of code
> points.
>
> *Encodings* represent Unicode characters as byte sequences, and UTF-8
> and UTF-16 are some of the Unicode encoding.
>
> > ...
>
> Julian
>
Julian puts my points far more succinctly :-)
So, if this is enough, no need to read my rather lengthy email on some
of the gory details of unicode.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
Re: TXFileStore and local filesystem
Posted by Julian Reschke <ju...@gmx.de>.
Jacob Lund wrote:
> Ok! Let me see if I can explain myself - I am not an expert on this so
> please correct me if I am wrong!
>
> An UTF-8 representation of one character consists of at combination of
> characters. Now JAVA is a Unicode language and this means that one character
...of bytes.
> can represent "any" type of character in the world!
Almost. Java's characters have only 16 bit, so there is a class of
Unicode characters that need to be represented as a sequence of two Java
characters.
> Basically UTF-8 only makes sense when working on an "old" 7 bit asci system
> and you need to use characters not available in the given codepage.
UTF-8 always makes sense when you need backward compatibilty with ASCII.
> Both UTF-8 and UTF-16 uses a varying number of bytes to represent one
> character, where Unicode always uses 32 bit characters (maybe it is 24 bit).
Unicode doesn't "represent" at all. Unicode is just a definition of code
points.
*Encodings* represent Unicode characters as byte sequences, and UTF-8
and UTF-16 are some of the Unicode encoding.
> ...
Julian
--
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
RE: TXFileStore and local filesystem
Posted by Jacob Lund <jl...@qualiware.com>.
Ok! Let me see if I can explain myself - I am not an expert on this so
please correct me if I am wrong!
An UTF-8 representation of one character consists of at combination of
characters. Now JAVA is a Unicode language and this means that one character
can represent "any" type of character in the world!
Basically UTF-8 only makes sense when working on an "old" 7 bit asci system
and you need to use characters not available in the given codepage.
Both UTF-8 and UTF-16 uses a varying number of bytes to represent one
character, where Unicode always uses 32 bit characters (maybe it is 24 bit).
This was my understanding of the UTF standards and unicode - am I wrong
here?
/Jacob
-----Original Message-----
From: Michael Smith [mailto:msmith@speedlegal.com]
Sent: 30. januar 2004 01:44
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem
Oliver Zeigermann wrote:
> Jacob Lund wrote:
>> The correct solution might be to convert from UTF-8 to Unicode before
>> storing the data and then change the database scheme to Unicode char
>> in all
>> fields containing strings.
>
>
> Hmmmm. You might be confusing certain things here. On one side there is
> Unicode having a number for each character. On the other side there is
> the representation in bytes. Now, UTF-8 *is* Unicode, but on the other
> side, i.e. the representation in bytes. Thus it does not make too much
> sense to compare Unicode with UTF-8. Do you agree?
A lot of microsoft's documentation confusingly uses "unicode" when it
actually means "UTF-16" or "UCS-2" (I can never remember what the
difference between those two is, and I don't know if it matters). I
suspect rereading Jacob's mail mentally substituting "UTF-16" for
"unicode" will make it clearer.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
Re: TXFileStore and local filesystem
Posted by Michael Smith <ms...@speedlegal.com>.
Oliver Zeigermann wrote:
> Jacob Lund wrote:
>> The correct solution might be to convert from UTF-8 to Unicode before
>> storing the data and then change the database scheme to Unicode char
>> in all
>> fields containing strings.
>
>
> Hmmmm. You might be confusing certain things here. On one side there is
> Unicode having a number for each character. On the other side there is
> the representation in bytes. Now, UTF-8 *is* Unicode, but on the other
> side, i.e. the representation in bytes. Thus it does not make too much
> sense to compare Unicode with UTF-8. Do you agree?
A lot of microsoft's documentation confusingly uses "unicode" when it
actually means "UTF-16" or "UCS-2" (I can never remember what the
difference between those two is, and I don't know if it matters). I
suspect rereading Jacob's mail mentally substituting "UTF-16" for
"unicode" will make it clearer.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
Re: TXFileStore and local filesystem
Posted by Oliver Zeigermann <oz...@c1-fse.de>.
Thanks, patch applied :)
Unfortunately, this will not go into the head branch soon as I had
started the merging process shortly before that patch. It will be there
next week, though...
Oliver
Jacob Lund wrote:
> Sorry missed the file :)
>
> -----Original Message-----
> From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
> Sent: 29. januar 2004 10:47
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
>
> So, I think we have two problems now, I am endangered to mix up:
>
> (1) The filestore has a problem with file names
> (2) The dabase stores have a problem as well, which is yet unclear to me
>
> Concerning (1): Could you send the new exception after the patch was
> applied? At least the file name given in the exceptions head followed by
> "Can not create resource at " should look different for me to see what
> might be be going on.
>
> Concerning (2): Could you describe this a bit more in order to make my
> rusty mind understand?
>
> Concerning the Unicode vs. UTF-8 issue: How would you decode a string
> before storing into the database? Into what? The JDBC method accepts a
> string, so you will have to pass it one. As I said, you can only
> decode/encode into/from bytes...
>
> Oliver
>
> Jacob Lund wrote:
>
>
>>The patch did not make any difference - it still throws the same
>
> exception!
>
>>What I meant about converting from UTF-8 to Unicode is that the database
>>driver can handle Unicode. In the filestore UTF-8 is converted to local
>>character set in order to create the files and this is why the filestore
>
> (I
>
>>think) has a problem. If the database could store the data in Unicode then
>>there would be no problem. Since java is using Unicode in strings the task
>>would simply be to decode the strings before they are stored in the
>
> database
>
>>and then make sure that all text fields in the database are Unicode (or
>>widechar or nchar).
>>
>>Please tell me if I am way off here!
>>
>>/Jacob
>>
>>-----Original Message-----
>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
>>Sent: 29. januar 2004 10:02
>>To: Slide Users Mailing List
>>Subject: Re: TXFileStore and local filesystem
>>
>>Jacob Lund wrote:
>>
>>
>>>No, the filestore works correctly.
>>
>>
>>OK, shall I check in the patch? Did it work for you?
>>
>>
>>
>>>>>From what I can see the filestore converts from UTF-8 to local before it
>>>stores data. This I why UTF-8 works fine for me when I upload files with
>>>Danish letters in the filename, and also why if fails when it stores files
>>>with characters not supported by the codepage.
>>>
>>>Windows XP use Unicode, but in "dos mode" it will use the old codepage
>>>types. The only thing that I can imagine is that java will use this
>>
>>codepage
>>
>>
>>>when it is doing IO operations towards the filesystem. This problem might
>>
>>be
>>
>>
>>>a problem that only appears on windows systems.
>>>
>>>I do not think that the problem is in the fill data into the database that
>>>has a problem. Some place in slide it will convert that data (in this case
>>>the uri) to UTF-8 before it is send to the client. The data stored in the
>>>database is UTF-8, and I believe that java is using Unicode. So the
>>
>>solution
>>
>>
>>>might be to convert data fetched from the database back to Unicode as soon
>>>as it arrives to the store class.
>>>
>>>The correct solution might be to convert from UTF-8 to Unicode before
>>>storing the data and then change the database scheme to Unicode char in
>>
>>all
>>
>>
>>>fields containing strings.
>>
>>
>>Hmmmm. You might be confusing certain things here. On one side there is
>>Unicode having a number for each character. On the other side there is
>>the representation in bytes. Now, UTF-8 *is* Unicode, but on the other
>>side, i.e. the representation in bytes. Thus it does not make too much
>>sense to compare Unicode with UTF-8. Do you agree?
>>
>>
>>
>>>I am guessing here since I do not have any idea of how the stores are
>>>structured in slide. I you want I would be happy to do some debugging, but
>>
>>I
>>
>>
>>>will need a short introduction to how the datastores are designed in
>>
>>slide.
>>
>>I know, proper documentation is a major problem. I will try to prepare
>>something like a short introduction and will post it to the list as soon
>>as it is done. This may take a while though :(
>>
>>Oliver
>>
>>
>>
>>>/Jacob
>>>
>>>-----Original Message-----
>>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
>>>Sent: 28. januar 2004 16:40
>>>To: Slide Users Mailing List
>>>Subject: Re: TXFileStore and local filesystem
>>>
>>>Jacob Lund wrote:
>>>
>>>
>>>
>>>
>>>>Sorry about that - yes I am talking about the URI!
>>>>
>>>>If I look in a record in the database, each Danish character is stored as
>>>>two "funny looking" characters corresponding to the unescaped UTF-8
>>>
>>>encoded
>>>
>>>
>>>
>>>>version - so this looks correct! However when I do a propfind on the
>>>>collection I which I place this file, then I get something like this
>>>>/files/%C3%83%C2%B8 - and this should have been representing one Danish
>>>>character. If I take the above and convert from UTF8 to my local, then I
>>>
>>>get
>>>
>>>
>>>
>>>>what is store in the database - If I then convert from UTF8 to local
>
> again
>
>>>>the I get the correct Danish letter.
>>>
>>>
>>>I could not find anything that might have converted the URI strings.
>>>They are just plainly filled into the SQL like in
>>>
>>>
>>>
>>>
>>>> "select 1 from OBJECT o, URI u where
>>>
>>>o.URI_ID=u.URI_ID and u.URI_STRING=?");
>>>
>>>
>>>
>>>> statement.setString(1, uri.toString());
>>>
>>>
>>>So, maybe this is a more general problem...
>>>
>>>
>>>
>>>
>>>>I seem that slide converts the URI's from the db to UTF8, but they are
>>>>already stored in unescaped UTF-8!
>>>
>>>
>>>Does this happen with the file store as well?
>>>
>>>Oliver
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>
>>>
>>>.
>>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>.
>>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> ------------------------------------------------------------------------
>
> /**********************************************************************/
> /****** DROP SLIDE TABLES ******/
> /**********************************************************************/
>
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[OBJECT]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[OBJECT]
> GO
>
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[BINDING]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[BINDING]
> GO
>
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[PARENT_BINDING]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[PARENT_BINDING]
> GO
>
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[CHILDREN]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[CHILDREN]
> GO
>
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[LINKS]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[LINKS]
> GO
>
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[LOCKS]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[LOCKS]
> GO
>
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[VERSION_CONTENT]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[VERSION_CONTENT]
> GO
>
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[PROPERTIES]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[PROPERTIES]
> GO
>
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[PERMISSIONS]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[PERMISSIONS]
> GO
>
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[VERSION_PREDS]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[VERSION_PREDS]
> GO
>
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[VERSION_LABELS]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[VERSION_LABELS]
> GO
>
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[VERSION_HISTORY]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[VERSION_HISTORY]
> GO
>
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[VERSION]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[VERSION]
> GO
>
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[BRANCH]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[BRANCH]
> GO
>
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[LABEL]') and OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[LABEL]
> GO
>
> IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[URI]') AND OBJECTPROPERTY(id, N'IsUserTable') = 1)
> DROP TABLE [dbo].[URI]
> GO
>
> /**********************************************************************/
> /****** DROP EXISTING USER DEFINED DATA TYPES ******/
> /**********************************************************************/
>
> sp_droptype id_type
> GO
>
> sp_droptype uri_str_type
> GO
>
> sp_droptype revision_no_type
> GO
>
> sp_droptype hash_type
> GO
>
> sp_droptype literal_str_type
> GO
>
> sp_droptype value_str_type
> GO
>
>
> /**********************************************************************/
> /****** ADD USER DEFINED DATA TYPES ******/
> /**********************************************************************/
>
> sp_addtype id_type, bigint
> GO
>
> sp_addtype uri_str_type, "nvarchar(800)"
> GO
>
> sp_addtype revision_no_type, "nvarchar(20)"
> GO
>
> sp_addtype hash_type, bigint
> GO
>
> sp_addtype literal_str_type, "nvarchar(3000)"
> GO
>
> sp_addtype value_str_type, "nvarchar(255)"
> GO
>
> /**********************************************************************/
> /****** CREATE SLIDE TABLES ******/
> /**********************************************************************/
>
> CREATE TABLE dbo.URI (
> URI_ID id_type IDENTITY UNIQUE NOT NULL,
> URI_STRING uri_str_type UNIQUE NOT NULL,
> -- UNIQUE NONCLUSTERED (URI_ID)
> )
> GO
>
> CREATE INDEX XUID
> ON URI(URI_ID)
> GO
>
> CREATE INDEX XUSTRING
> ON URI(URI_STRING)
> GO
>
> CREATE TABLE dbo.OBJECT (
> URI_ID id_type PRIMARY KEY,
> CLASS_NAME nvarchar(255) NOT NULL,
> CONSTRAINT FK_OBJECT_URI_ID
> FOREIGN KEY (URI_ID)
> REFERENCES URI (URI_ID)
> )
> GO
>
> CREATE TABLE dbo.BINDING (
> URI_ID id_type NOT NULL
> REFERENCES URI (URI_ID),
> NAME uri_str_type NOT NULL,
> CHILD_UURI_ID id_type NOT NULL
> REFERENCES URI (URI_ID),
> UNIQUE CLUSTERED (URI_ID, NAME, CHILD_UURI_ID)
> )
> GO
>
> CREATE TABLE dbo.PARENT_BINDING (
> URI_ID id_type NOT NULL
> REFERENCES URI (URI_ID),
> NAME uri_str_type NOT NULL,
> PARENT_UURI_ID id_type NOT NULL
> REFERENCES URI (URI_ID),
> UNIQUE CLUSTERED (URI_ID, NAME, PARENT_UURI_ID)
> )
> GO
>
> CREATE TABLE dbo.LINKS (
> URI_ID id_type NOT NULL FOREIGN KEY
> REFERENCES URI (URI_ID),
> LINK_TO_ID id_type NOT NULL FOREIGN KEY
> REFERENCES URI (URI_ID),
> --UNIQUE CLUSTERED (URI_ID, LINK_TO_ID)
> )
> GO
>
> CREATE INDEX XURI_ID
> ON LINKS(URI_ID)
> GO
>
>
> CREATE INDEX XLINK_TO_ID
> ON LINKS(LINK_TO_ID)
> GO
>
> CREATE TABLE dbo.LOCKS (
> LOCK_ID id_type PRIMARY KEY,
> OBJECT_ID id_type NOT NULL FOREIGN KEY
> REFERENCES URI (URI_ID),
> SUBJECT_ID id_type NOT NULL FOREIGN KEY
> REFERENCES URI (URI_ID),
> TYPE_ID id_type NOT NULL FOREIGN KEY
> REFERENCES URI (URI_ID),
> EXPIRATION_DATE numeric(14, 0) NOT NULL,
> IS_INHERITABLE bit NOT NULL,
> IS_EXCLUSIVE bit NOT NULL,
> OWNER nvarchar(255),
> CONSTRAINT FK_LOCKS_LOCK_ID
> FOREIGN KEY (LOCK_ID)
> REFERENCES URI (URI_ID)
> )
> GO
>
> CREATE TABLE dbo.BRANCH (
> BRANCH_ID id_type IDENTITY UNIQUE NOT NULL,
> BRANCH_STRING nvarchar(255) UNIQUE NOT NULL,
> UNIQUE NONCLUSTERED (BRANCH_ID)
> )
> GO
>
> CREATE TABLE dbo.LABEL (
> LABEL_ID id_type IDENTITY UNIQUE NOT NULL,
> LABEL_STRING nvarchar(255) NOT NULL,
> UNIQUE NONCLUSTERED (LABEL_ID)
> )
> GO
>
> CREATE TABLE dbo.VERSION (
> URI_ID id_type PRIMARY KEY,
> IS_VERSIONED bit NOT NULL,
> CONSTRAINT FK_VERSION_URI_ID
> FOREIGN KEY (URI_ID)
> REFERENCES URI (URI_ID)
> )
> GO
>
> CREATE TABLE dbo.VERSION_HISTORY (
> VERSION_ID id_type IDENTITY UNIQUE NOT NULL,
> URI_ID id_type NOT NULL FOREIGN KEY
> REFERENCES VERSION (URI_ID),
> BRANCH_ID id_type NOT NULL FOREIGN KEY
> REFERENCES BRANCH (BRANCH_ID),
> REVISION_NO nVARCHAR(20) NOT NULL,
> --UNIQUE CLUSTERED (URI_ID, BRANCH_ID, REVISION_NO)
> )
> GO
> CREATE INDEX XVERSION_HISTORY1
> ON VERSION_HISTORY(URI_ID, BRANCH_ID, REVISION_NO)
> GO
>
> CREATE TABLE dbo.VERSION_PREDS (
> VERSION_ID id_type NOT NULL FOREIGN KEY
> REFERENCES VERSION_HISTORY (VERSION_ID),
> PREDECESSOR_ID id_type NOT NULL FOREIGN KEY
> REFERENCES VERSION_HISTORY (VERSION_ID),
> UNIQUE CLUSTERED (VERSION_ID, PREDECESSOR_ID)
> )
> GO
> CREATE INDEX XVERSION_PREDS1
> ON VERSION_PREDS(VERSION_ID, PREDECESSOR_ID)
> GO
>
> CREATE TABLE dbo.VERSION_LABELS (
> VERSION_ID id_type NOT NULL FOREIGN KEY
> REFERENCES VERSION_HISTORY (VERSION_ID),
> LABEL_ID id_type NOT NULL FOREIGN KEY
> REFERENCES LABEL (LABEL_ID),
> UNIQUE CLUSTERED (VERSION_ID, LABEL_ID)
> )
> GO
>
> CREATE TABLE dbo.VERSION_CONTENT (
> VERSION_ID id_type PRIMARY KEY,
> CONTENT image NOT NULL,
> CONSTRAINT FK_VC_VERSION_ID
> FOREIGN KEY (VERSION_ID)
> REFERENCES VERSION_HISTORY (VERSION_ID),
> )
> GO
>
> CREATE TABLE dbo.PROPERTIES (
> VERSION_ID id_type NOT NULL FOREIGN KEY
> REFERENCES VERSION_HISTORY (VERSION_ID),
> PROPERTY_NAMESPACE nvarchar(50) NOT NULL,
> PROPERTY_NAME nvarchar(50) NOT NULL,
> PROPERTY_VALUE nvarchar(255) NOT NULL,
> PROPERTY_TYPE nvarchar(50) NOT NULL,
> IS_PROTECTED bit NOT NULL,
> UNIQUE CLUSTERED (VERSION_ID, PROPERTY_NAMESPACE, PROPERTY_NAME)
> )
> GO
>
> CREATE TABLE dbo.PERMISSIONS (
> OBJECT_ID id_type NOT NULL FOREIGN KEY
> REFERENCES URI (URI_ID),
> SUBJECT_ID id_type NOT NULL FOREIGN KEY
> REFERENCES URI (URI_ID),
> ACTION_ID id_type NOT NULL FOREIGN KEY
> REFERENCES URI (URI_ID),
> VERSION_NO nVARCHAR(20) NULL,
> IS_INHERITABLE bit NOT NULL,
> IS_NEGATIVE bit NOT NULL,
> -- Both order and sequence would be more suitable, but can not be used
> SUCCESSION int NOT NULL,
> UNIQUE CLUSTERED (OBJECT_ID, SUBJECT_ID, ACTION_ID),
> UNIQUE (OBJECT_ID, SUCCESSION)
> )
> GO
>
>
>
>
> ------------------------------------------------------------------------
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
RE: TXFileStore and local filesystem
Posted by Jacob Lund <jl...@qualiware.com>.
Sorry missed the file :)
-----Original Message-----
From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
Sent: 29. januar 2004 10:47
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem
So, I think we have two problems now, I am endangered to mix up:
(1) The filestore has a problem with file names
(2) The dabase stores have a problem as well, which is yet unclear to me
Concerning (1): Could you send the new exception after the patch was
applied? At least the file name given in the exceptions head followed by
"Can not create resource at " should look different for me to see what
might be be going on.
Concerning (2): Could you describe this a bit more in order to make my
rusty mind understand?
Concerning the Unicode vs. UTF-8 issue: How would you decode a string
before storing into the database? Into what? The JDBC method accepts a
string, so you will have to pass it one. As I said, you can only
decode/encode into/from bytes...
Oliver
Jacob Lund wrote:
> The patch did not make any difference - it still throws the same
exception!
>
> What I meant about converting from UTF-8 to Unicode is that the database
> driver can handle Unicode. In the filestore UTF-8 is converted to local
> character set in order to create the files and this is why the filestore
(I
> think) has a problem. If the database could store the data in Unicode then
> there would be no problem. Since java is using Unicode in strings the task
> would simply be to decode the strings before they are stored in the
database
> and then make sure that all text fields in the database are Unicode (or
> widechar or nchar).
>
> Please tell me if I am way off here!
>
> /Jacob
>
> -----Original Message-----
> From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
> Sent: 29. januar 2004 10:02
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
>
> Jacob Lund wrote:
>
>>No, the filestore works correctly.
>
>
> OK, shall I check in the patch? Did it work for you?
>
>
>>>>From what I can see the filestore converts from UTF-8 to local before it
>>stores data. This I why UTF-8 works fine for me when I upload files with
>>Danish letters in the filename, and also why if fails when it stores files
>>with characters not supported by the codepage.
>>
>>Windows XP use Unicode, but in "dos mode" it will use the old codepage
>>types. The only thing that I can imagine is that java will use this
>
> codepage
>
>>when it is doing IO operations towards the filesystem. This problem might
>
> be
>
>>a problem that only appears on windows systems.
>>
>>I do not think that the problem is in the fill data into the database that
>>has a problem. Some place in slide it will convert that data (in this case
>>the uri) to UTF-8 before it is send to the client. The data stored in the
>>database is UTF-8, and I believe that java is using Unicode. So the
>
> solution
>
>>might be to convert data fetched from the database back to Unicode as soon
>>as it arrives to the store class.
>>
>>The correct solution might be to convert from UTF-8 to Unicode before
>>storing the data and then change the database scheme to Unicode char in
>
> all
>
>>fields containing strings.
>
>
> Hmmmm. You might be confusing certain things here. On one side there is
> Unicode having a number for each character. On the other side there is
> the representation in bytes. Now, UTF-8 *is* Unicode, but on the other
> side, i.e. the representation in bytes. Thus it does not make too much
> sense to compare Unicode with UTF-8. Do you agree?
>
>
>>I am guessing here since I do not have any idea of how the stores are
>>structured in slide. I you want I would be happy to do some debugging, but
>
> I
>
>>will need a short introduction to how the datastores are designed in
>
> slide.
>
> I know, proper documentation is a major problem. I will try to prepare
> something like a short introduction and will post it to the list as soon
> as it is done. This may take a while though :(
>
> Oliver
>
>
>>/Jacob
>>
>>-----Original Message-----
>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
>>Sent: 28. januar 2004 16:40
>>To: Slide Users Mailing List
>>Subject: Re: TXFileStore and local filesystem
>>
>>Jacob Lund wrote:
>>
>>
>>
>>>Sorry about that - yes I am talking about the URI!
>>>
>>>If I look in a record in the database, each Danish character is stored as
>>>two "funny looking" characters corresponding to the unescaped UTF-8
>>
>>encoded
>>
>>
>>>version - so this looks correct! However when I do a propfind on the
>>>collection I which I place this file, then I get something like this
>>>/files/%C3%83%C2%B8 - and this should have been representing one Danish
>>>character. If I take the above and convert from UTF8 to my local, then I
>>
>>get
>>
>>
>>>what is store in the database - If I then convert from UTF8 to local
again
>>>the I get the correct Danish letter.
>>
>>
>>I could not find anything that might have converted the URI strings.
>>They are just plainly filled into the SQL like in
>>
>>
>>
>>> "select 1 from OBJECT o, URI u where
>>
>>o.URI_ID=u.URI_ID and u.URI_STRING=?");
>>
>>
>>> statement.setString(1, uri.toString());
>>
>>
>>So, maybe this is a more general problem...
>>
>>
>>
>>>I seem that slide converts the URI's from the db to UTF8, but they are
>>>already stored in unescaped UTF-8!
>>
>>
>>Does this happen with the file store as well?
>>
>>Oliver
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>.
>>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> .
>
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
Re: TXFileStore and local filesystem
Posted by Oliver Zeigermann <oz...@c1-fse.de>.
No rights needed. First create the report. In the next step *add* the
attachment using "Attachments: Create a new attachment " link in the
top section of the page.
Bugzilla is confusing sometimes...
Oliver
Jacob Lund wrote:
> I just created an account on bugzilla!
>
> I can see how to create a bug report but I cannot see how to attach a file!
>
> Am I dumber than average or do I need some special rights to attach files?
>
> /Jacob
>
> -----Original Message-----
> From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
> Sent: 29. januar 2004 13:12
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
>
> The old problem with attachments :( It is missing...
>
> Could you try to create a new bugzilla entry and add the attachment there?
>
> Thanks :)
>
> Oliver
>
> Jacob Lund wrote:
>
>
>>First of all - the patch you just checked in for the txfilestore works
>
> fine
>
>>:-)
>>
>>Some of my problems with the SQLServerAdapter was my fault - forgot to set
>>encoding to UTF-8 in slide.properties.
>>
>>However to get the SQL store working with Russian and Danish characters at
>>the same time I had to change the database scheme. It turns out that slide
>>does send the Unicode characters to the database but the database scheme
>>user 8bit char in the string fields.
>>
>>I have attached the new scheme - all I did was change varchar to nvarchar.
>>Now it works fine :-)
>>
>>/Jacob
>>
>>-----Original Message-----
>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
>>Sent: 29. januar 2004 10:47
>>To: Slide Users Mailing List
>>Subject: Re: TXFileStore and local filesystem
>>
>>So, I think we have two problems now, I am endangered to mix up:
>>
>>(1) The filestore has a problem with file names
>>(2) The dabase stores have a problem as well, which is yet unclear to me
>>
>>Concerning (1): Could you send the new exception after the patch was
>>applied? At least the file name given in the exceptions head followed by
>>"Can not create resource at " should look different for me to see what
>>might be be going on.
>>
>>Concerning (2): Could you describe this a bit more in order to make my
>>rusty mind understand?
>>
>>Concerning the Unicode vs. UTF-8 issue: How would you decode a string
>>before storing into the database? Into what? The JDBC method accepts a
>>string, so you will have to pass it one. As I said, you can only
>>decode/encode into/from bytes...
>>
>>Oliver
>>
>>Jacob Lund wrote:
>>
>>
>>
>>>The patch did not make any difference - it still throws the same
>>
>>exception!
>>
>>
>>>What I meant about converting from UTF-8 to Unicode is that the database
>>>driver can handle Unicode. In the filestore UTF-8 is converted to local
>>>character set in order to create the files and this is why the filestore
>>
>>(I
>>
>>
>>>think) has a problem. If the database could store the data in Unicode then
>>>there would be no problem. Since java is using Unicode in strings the task
>>>would simply be to decode the strings before they are stored in the
>>
>>database
>>
>>
>>>and then make sure that all text fields in the database are Unicode (or
>>>widechar or nchar).
>>>
>>>Please tell me if I am way off here!
>>>
>>>/Jacob
>>>
>>>-----Original Message-----
>>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
>>>Sent: 29. januar 2004 10:02
>>>To: Slide Users Mailing List
>>>Subject: Re: TXFileStore and local filesystem
>>>
>>>Jacob Lund wrote:
>>>
>>>
>>>
>>>>No, the filestore works correctly.
>>>
>>>
>>>OK, shall I check in the patch? Did it work for you?
>>>
>>>
>>>
>>>
>>>>>>From what I can see the filestore converts from UTF-8 to local before it
>>>>stores data. This I why UTF-8 works fine for me when I upload files with
>>>>Danish letters in the filename, and also why if fails when it stores
>
> files
>
>>>>with characters not supported by the codepage.
>>>>
>>>>Windows XP use Unicode, but in "dos mode" it will use the old codepage
>>>>types. The only thing that I can imagine is that java will use this
>>>
>>>codepage
>>>
>>>
>>>
>>>>when it is doing IO operations towards the filesystem. This problem might
>>>
>>>be
>>>
>>>
>>>
>>>>a problem that only appears on windows systems.
>>>>
>>>>I do not think that the problem is in the fill data into the database
>
> that
>
>>>>has a problem. Some place in slide it will convert that data (in this
>
> case
>
>>>>the uri) to UTF-8 before it is send to the client. The data stored in the
>>>>database is UTF-8, and I believe that java is using Unicode. So the
>>>
>>>solution
>>>
>>>
>>>
>>>>might be to convert data fetched from the database back to Unicode as
>
> soon
>
>>>>as it arrives to the store class.
>>>>
>>>>The correct solution might be to convert from UTF-8 to Unicode before
>>>>storing the data and then change the database scheme to Unicode char in
>>>
>>>all
>>>
>>>
>>>
>>>>fields containing strings.
>>>
>>>
>>>Hmmmm. You might be confusing certain things here. On one side there is
>>>Unicode having a number for each character. On the other side there is
>>>the representation in bytes. Now, UTF-8 *is* Unicode, but on the other
>>>side, i.e. the representation in bytes. Thus it does not make too much
>>>sense to compare Unicode with UTF-8. Do you agree?
>>>
>>>
>>>
>>>
>>>>I am guessing here since I do not have any idea of how the stores are
>>>>structured in slide. I you want I would be happy to do some debugging,
>
> but
>
>>>I
>>>
>>>
>>>
>>>>will need a short introduction to how the datastores are designed in
>>>
>>>slide.
>>>
>>>I know, proper documentation is a major problem. I will try to prepare
>>>something like a short introduction and will post it to the list as soon
>>>as it is done. This may take a while though :(
>>>
>>>Oliver
>>>
>>>
>>>
>>>
>>>>/Jacob
>>>>
>>>>-----Original Message-----
>>>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
>>>>Sent: 28. januar 2004 16:40
>>>>To: Slide Users Mailing List
>>>>Subject: Re: TXFileStore and local filesystem
>>>>
>>>>Jacob Lund wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>Sorry about that - yes I am talking about the URI!
>>>>>
>>>>>If I look in a record in the database, each Danish character is stored
>
> as
>
>>>>>two "funny looking" characters corresponding to the unescaped UTF-8
>>>>
>>>>encoded
>>>>
>>>>
>>>>
>>>>
>>>>>version - so this looks correct! However when I do a propfind on the
>>>>>collection I which I place this file, then I get something like this
>>>>>/files/%C3%83%C2%B8 - and this should have been representing one Danish
>>>>>character. If I take the above and convert from UTF8 to my local, then I
>>>>
>>>>get
>>>>
>>>>
>>>>
>>>>
>>>>>what is store in the database - If I then convert from UTF8 to local
>>
>>again
>>
>>
>>>>>the I get the correct Danish letter.
>>>>
>>>>
>>>>I could not find anything that might have converted the URI strings.
>>>>They are just plainly filled into the SQL like in
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>> "select 1 from OBJECT o, URI u where
>>>>
>>>>o.URI_ID=u.URI_ID and u.URI_STRING=?");
>>>>
>>>>
>>>>
>>>>
>>>>> statement.setString(1, uri.toString());
>>>>
>>>>
>>>>So, maybe this is a more general problem...
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>I seem that slide converts the URI's from the db to UTF8, but they are
>>>>>already stored in unescaped UTF-8!
>>>>
>>>>
>>>>Does this happen with the file store as well?
>>>>
>>>>Oliver
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>>
>>>>
>>>>.
>>>>
>>>
>>>
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>
>>>
>>>.
>>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>.
>>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> .
>
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
RE: TXFileStore and local filesystem
Posted by Jacob Lund <jl...@qualiware.com>.
I just created an account on bugzilla!
I can see how to create a bug report but I cannot see how to attach a file!
Am I dumber than average or do I need some special rights to attach files?
/Jacob
-----Original Message-----
From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
Sent: 29. januar 2004 13:12
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem
The old problem with attachments :( It is missing...
Could you try to create a new bugzilla entry and add the attachment there?
Thanks :)
Oliver
Jacob Lund wrote:
> First of all - the patch you just checked in for the txfilestore works
fine
> :-)
>
> Some of my problems with the SQLServerAdapter was my fault - forgot to set
> encoding to UTF-8 in slide.properties.
>
> However to get the SQL store working with Russian and Danish characters at
> the same time I had to change the database scheme. It turns out that slide
> does send the Unicode characters to the database but the database scheme
> user 8bit char in the string fields.
>
> I have attached the new scheme - all I did was change varchar to nvarchar.
> Now it works fine :-)
>
> /Jacob
>
> -----Original Message-----
> From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
> Sent: 29. januar 2004 10:47
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
>
> So, I think we have two problems now, I am endangered to mix up:
>
> (1) The filestore has a problem with file names
> (2) The dabase stores have a problem as well, which is yet unclear to me
>
> Concerning (1): Could you send the new exception after the patch was
> applied? At least the file name given in the exceptions head followed by
> "Can not create resource at " should look different for me to see what
> might be be going on.
>
> Concerning (2): Could you describe this a bit more in order to make my
> rusty mind understand?
>
> Concerning the Unicode vs. UTF-8 issue: How would you decode a string
> before storing into the database? Into what? The JDBC method accepts a
> string, so you will have to pass it one. As I said, you can only
> decode/encode into/from bytes...
>
> Oliver
>
> Jacob Lund wrote:
>
>
>>The patch did not make any difference - it still throws the same
>
> exception!
>
>>What I meant about converting from UTF-8 to Unicode is that the database
>>driver can handle Unicode. In the filestore UTF-8 is converted to local
>>character set in order to create the files and this is why the filestore
>
> (I
>
>>think) has a problem. If the database could store the data in Unicode then
>>there would be no problem. Since java is using Unicode in strings the task
>>would simply be to decode the strings before they are stored in the
>
> database
>
>>and then make sure that all text fields in the database are Unicode (or
>>widechar or nchar).
>>
>>Please tell me if I am way off here!
>>
>>/Jacob
>>
>>-----Original Message-----
>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
>>Sent: 29. januar 2004 10:02
>>To: Slide Users Mailing List
>>Subject: Re: TXFileStore and local filesystem
>>
>>Jacob Lund wrote:
>>
>>
>>>No, the filestore works correctly.
>>
>>
>>OK, shall I check in the patch? Did it work for you?
>>
>>
>>
>>>>>From what I can see the filestore converts from UTF-8 to local before it
>>>stores data. This I why UTF-8 works fine for me when I upload files with
>>>Danish letters in the filename, and also why if fails when it stores
files
>>>with characters not supported by the codepage.
>>>
>>>Windows XP use Unicode, but in "dos mode" it will use the old codepage
>>>types. The only thing that I can imagine is that java will use this
>>
>>codepage
>>
>>
>>>when it is doing IO operations towards the filesystem. This problem might
>>
>>be
>>
>>
>>>a problem that only appears on windows systems.
>>>
>>>I do not think that the problem is in the fill data into the database
that
>>>has a problem. Some place in slide it will convert that data (in this
case
>>>the uri) to UTF-8 before it is send to the client. The data stored in the
>>>database is UTF-8, and I believe that java is using Unicode. So the
>>
>>solution
>>
>>
>>>might be to convert data fetched from the database back to Unicode as
soon
>>>as it arrives to the store class.
>>>
>>>The correct solution might be to convert from UTF-8 to Unicode before
>>>storing the data and then change the database scheme to Unicode char in
>>
>>all
>>
>>
>>>fields containing strings.
>>
>>
>>Hmmmm. You might be confusing certain things here. On one side there is
>>Unicode having a number for each character. On the other side there is
>>the representation in bytes. Now, UTF-8 *is* Unicode, but on the other
>>side, i.e. the representation in bytes. Thus it does not make too much
>>sense to compare Unicode with UTF-8. Do you agree?
>>
>>
>>
>>>I am guessing here since I do not have any idea of how the stores are
>>>structured in slide. I you want I would be happy to do some debugging,
but
>>
>>I
>>
>>
>>>will need a short introduction to how the datastores are designed in
>>
>>slide.
>>
>>I know, proper documentation is a major problem. I will try to prepare
>>something like a short introduction and will post it to the list as soon
>>as it is done. This may take a while though :(
>>
>>Oliver
>>
>>
>>
>>>/Jacob
>>>
>>>-----Original Message-----
>>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
>>>Sent: 28. januar 2004 16:40
>>>To: Slide Users Mailing List
>>>Subject: Re: TXFileStore and local filesystem
>>>
>>>Jacob Lund wrote:
>>>
>>>
>>>
>>>
>>>>Sorry about that - yes I am talking about the URI!
>>>>
>>>>If I look in a record in the database, each Danish character is stored
as
>>>>two "funny looking" characters corresponding to the unescaped UTF-8
>>>
>>>encoded
>>>
>>>
>>>
>>>>version - so this looks correct! However when I do a propfind on the
>>>>collection I which I place this file, then I get something like this
>>>>/files/%C3%83%C2%B8 - and this should have been representing one Danish
>>>>character. If I take the above and convert from UTF8 to my local, then I
>>>
>>>get
>>>
>>>
>>>
>>>>what is store in the database - If I then convert from UTF8 to local
>
> again
>
>>>>the I get the correct Danish letter.
>>>
>>>
>>>I could not find anything that might have converted the URI strings.
>>>They are just plainly filled into the SQL like in
>>>
>>>
>>>
>>>
>>>> "select 1 from OBJECT o, URI u where
>>>
>>>o.URI_ID=u.URI_ID and u.URI_STRING=?");
>>>
>>>
>>>
>>>> statement.setString(1, uri.toString());
>>>
>>>
>>>So, maybe this is a more general problem...
>>>
>>>
>>>
>>>
>>>>I seem that slide converts the URI's from the db to UTF8, but they are
>>>>already stored in unescaped UTF-8!
>>>
>>>
>>>Does this happen with the file store as well?
>>>
>>>Oliver
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>
>>>
>>>.
>>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>.
>>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> .
>
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
Re: TXFileStore and local filesystem
Posted by Oliver Zeigermann <oz...@c1-fse.de>.
The old problem with attachments :( It is missing...
Could you try to create a new bugzilla entry and add the attachment there?
Thanks :)
Oliver
Jacob Lund wrote:
> First of all - the patch you just checked in for the txfilestore works fine
> :-)
>
> Some of my problems with the SQLServerAdapter was my fault - forgot to set
> encoding to UTF-8 in slide.properties.
>
> However to get the SQL store working with Russian and Danish characters at
> the same time I had to change the database scheme. It turns out that slide
> does send the Unicode characters to the database but the database scheme
> user 8bit char in the string fields.
>
> I have attached the new scheme - all I did was change varchar to nvarchar.
> Now it works fine :-)
>
> /Jacob
>
> -----Original Message-----
> From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
> Sent: 29. januar 2004 10:47
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
>
> So, I think we have two problems now, I am endangered to mix up:
>
> (1) The filestore has a problem with file names
> (2) The dabase stores have a problem as well, which is yet unclear to me
>
> Concerning (1): Could you send the new exception after the patch was
> applied? At least the file name given in the exceptions head followed by
> "Can not create resource at " should look different for me to see what
> might be be going on.
>
> Concerning (2): Could you describe this a bit more in order to make my
> rusty mind understand?
>
> Concerning the Unicode vs. UTF-8 issue: How would you decode a string
> before storing into the database? Into what? The JDBC method accepts a
> string, so you will have to pass it one. As I said, you can only
> decode/encode into/from bytes...
>
> Oliver
>
> Jacob Lund wrote:
>
>
>>The patch did not make any difference - it still throws the same
>
> exception!
>
>>What I meant about converting from UTF-8 to Unicode is that the database
>>driver can handle Unicode. In the filestore UTF-8 is converted to local
>>character set in order to create the files and this is why the filestore
>
> (I
>
>>think) has a problem. If the database could store the data in Unicode then
>>there would be no problem. Since java is using Unicode in strings the task
>>would simply be to decode the strings before they are stored in the
>
> database
>
>>and then make sure that all text fields in the database are Unicode (or
>>widechar or nchar).
>>
>>Please tell me if I am way off here!
>>
>>/Jacob
>>
>>-----Original Message-----
>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
>>Sent: 29. januar 2004 10:02
>>To: Slide Users Mailing List
>>Subject: Re: TXFileStore and local filesystem
>>
>>Jacob Lund wrote:
>>
>>
>>>No, the filestore works correctly.
>>
>>
>>OK, shall I check in the patch? Did it work for you?
>>
>>
>>
>>>>>From what I can see the filestore converts from UTF-8 to local before it
>>>stores data. This I why UTF-8 works fine for me when I upload files with
>>>Danish letters in the filename, and also why if fails when it stores files
>>>with characters not supported by the codepage.
>>>
>>>Windows XP use Unicode, but in "dos mode" it will use the old codepage
>>>types. The only thing that I can imagine is that java will use this
>>
>>codepage
>>
>>
>>>when it is doing IO operations towards the filesystem. This problem might
>>
>>be
>>
>>
>>>a problem that only appears on windows systems.
>>>
>>>I do not think that the problem is in the fill data into the database that
>>>has a problem. Some place in slide it will convert that data (in this case
>>>the uri) to UTF-8 before it is send to the client. The data stored in the
>>>database is UTF-8, and I believe that java is using Unicode. So the
>>
>>solution
>>
>>
>>>might be to convert data fetched from the database back to Unicode as soon
>>>as it arrives to the store class.
>>>
>>>The correct solution might be to convert from UTF-8 to Unicode before
>>>storing the data and then change the database scheme to Unicode char in
>>
>>all
>>
>>
>>>fields containing strings.
>>
>>
>>Hmmmm. You might be confusing certain things here. On one side there is
>>Unicode having a number for each character. On the other side there is
>>the representation in bytes. Now, UTF-8 *is* Unicode, but on the other
>>side, i.e. the representation in bytes. Thus it does not make too much
>>sense to compare Unicode with UTF-8. Do you agree?
>>
>>
>>
>>>I am guessing here since I do not have any idea of how the stores are
>>>structured in slide. I you want I would be happy to do some debugging, but
>>
>>I
>>
>>
>>>will need a short introduction to how the datastores are designed in
>>
>>slide.
>>
>>I know, proper documentation is a major problem. I will try to prepare
>>something like a short introduction and will post it to the list as soon
>>as it is done. This may take a while though :(
>>
>>Oliver
>>
>>
>>
>>>/Jacob
>>>
>>>-----Original Message-----
>>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
>>>Sent: 28. januar 2004 16:40
>>>To: Slide Users Mailing List
>>>Subject: Re: TXFileStore and local filesystem
>>>
>>>Jacob Lund wrote:
>>>
>>>
>>>
>>>
>>>>Sorry about that - yes I am talking about the URI!
>>>>
>>>>If I look in a record in the database, each Danish character is stored as
>>>>two "funny looking" characters corresponding to the unescaped UTF-8
>>>
>>>encoded
>>>
>>>
>>>
>>>>version - so this looks correct! However when I do a propfind on the
>>>>collection I which I place this file, then I get something like this
>>>>/files/%C3%83%C2%B8 - and this should have been representing one Danish
>>>>character. If I take the above and convert from UTF8 to my local, then I
>>>
>>>get
>>>
>>>
>>>
>>>>what is store in the database - If I then convert from UTF8 to local
>
> again
>
>>>>the I get the correct Danish letter.
>>>
>>>
>>>I could not find anything that might have converted the URI strings.
>>>They are just plainly filled into the SQL like in
>>>
>>>
>>>
>>>
>>>> "select 1 from OBJECT o, URI u where
>>>
>>>o.URI_ID=u.URI_ID and u.URI_STRING=?");
>>>
>>>
>>>
>>>> statement.setString(1, uri.toString());
>>>
>>>
>>>So, maybe this is a more general problem...
>>>
>>>
>>>
>>>
>>>>I seem that slide converts the URI's from the db to UTF8, but they are
>>>>already stored in unescaped UTF-8!
>>>
>>>
>>>Does this happen with the file store as well?
>>>
>>>Oliver
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>>
>>>
>>>.
>>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>.
>>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> .
>
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
RE: TXFileStore and local filesystem
Posted by Jacob Lund <jl...@qualiware.com>.
First of all - the patch you just checked in for the txfilestore works fine
:-)
Some of my problems with the SQLServerAdapter was my fault - forgot to set
encoding to UTF-8 in slide.properties.
However to get the SQL store working with Russian and Danish characters at
the same time I had to change the database scheme. It turns out that slide
does send the Unicode characters to the database but the database scheme
user 8bit char in the string fields.
I have attached the new scheme - all I did was change varchar to nvarchar.
Now it works fine :-)
/Jacob
-----Original Message-----
From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
Sent: 29. januar 2004 10:47
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem
So, I think we have two problems now, I am endangered to mix up:
(1) The filestore has a problem with file names
(2) The dabase stores have a problem as well, which is yet unclear to me
Concerning (1): Could you send the new exception after the patch was
applied? At least the file name given in the exceptions head followed by
"Can not create resource at " should look different for me to see what
might be be going on.
Concerning (2): Could you describe this a bit more in order to make my
rusty mind understand?
Concerning the Unicode vs. UTF-8 issue: How would you decode a string
before storing into the database? Into what? The JDBC method accepts a
string, so you will have to pass it one. As I said, you can only
decode/encode into/from bytes...
Oliver
Jacob Lund wrote:
> The patch did not make any difference - it still throws the same
exception!
>
> What I meant about converting from UTF-8 to Unicode is that the database
> driver can handle Unicode. In the filestore UTF-8 is converted to local
> character set in order to create the files and this is why the filestore
(I
> think) has a problem. If the database could store the data in Unicode then
> there would be no problem. Since java is using Unicode in strings the task
> would simply be to decode the strings before they are stored in the
database
> and then make sure that all text fields in the database are Unicode (or
> widechar or nchar).
>
> Please tell me if I am way off here!
>
> /Jacob
>
> -----Original Message-----
> From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
> Sent: 29. januar 2004 10:02
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
>
> Jacob Lund wrote:
>
>>No, the filestore works correctly.
>
>
> OK, shall I check in the patch? Did it work for you?
>
>
>>>>From what I can see the filestore converts from UTF-8 to local before it
>>stores data. This I why UTF-8 works fine for me when I upload files with
>>Danish letters in the filename, and also why if fails when it stores files
>>with characters not supported by the codepage.
>>
>>Windows XP use Unicode, but in "dos mode" it will use the old codepage
>>types. The only thing that I can imagine is that java will use this
>
> codepage
>
>>when it is doing IO operations towards the filesystem. This problem might
>
> be
>
>>a problem that only appears on windows systems.
>>
>>I do not think that the problem is in the fill data into the database that
>>has a problem. Some place in slide it will convert that data (in this case
>>the uri) to UTF-8 before it is send to the client. The data stored in the
>>database is UTF-8, and I believe that java is using Unicode. So the
>
> solution
>
>>might be to convert data fetched from the database back to Unicode as soon
>>as it arrives to the store class.
>>
>>The correct solution might be to convert from UTF-8 to Unicode before
>>storing the data and then change the database scheme to Unicode char in
>
> all
>
>>fields containing strings.
>
>
> Hmmmm. You might be confusing certain things here. On one side there is
> Unicode having a number for each character. On the other side there is
> the representation in bytes. Now, UTF-8 *is* Unicode, but on the other
> side, i.e. the representation in bytes. Thus it does not make too much
> sense to compare Unicode with UTF-8. Do you agree?
>
>
>>I am guessing here since I do not have any idea of how the stores are
>>structured in slide. I you want I would be happy to do some debugging, but
>
> I
>
>>will need a short introduction to how the datastores are designed in
>
> slide.
>
> I know, proper documentation is a major problem. I will try to prepare
> something like a short introduction and will post it to the list as soon
> as it is done. This may take a while though :(
>
> Oliver
>
>
>>/Jacob
>>
>>-----Original Message-----
>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
>>Sent: 28. januar 2004 16:40
>>To: Slide Users Mailing List
>>Subject: Re: TXFileStore and local filesystem
>>
>>Jacob Lund wrote:
>>
>>
>>
>>>Sorry about that - yes I am talking about the URI!
>>>
>>>If I look in a record in the database, each Danish character is stored as
>>>two "funny looking" characters corresponding to the unescaped UTF-8
>>
>>encoded
>>
>>
>>>version - so this looks correct! However when I do a propfind on the
>>>collection I which I place this file, then I get something like this
>>>/files/%C3%83%C2%B8 - and this should have been representing one Danish
>>>character. If I take the above and convert from UTF8 to my local, then I
>>
>>get
>>
>>
>>>what is store in the database - If I then convert from UTF8 to local
again
>>>the I get the correct Danish letter.
>>
>>
>>I could not find anything that might have converted the URI strings.
>>They are just plainly filled into the SQL like in
>>
>>
>>
>>> "select 1 from OBJECT o, URI u where
>>
>>o.URI_ID=u.URI_ID and u.URI_STRING=?");
>>
>>
>>> statement.setString(1, uri.toString());
>>
>>
>>So, maybe this is a more general problem...
>>
>>
>>
>>>I seem that slide converts the URI's from the db to UTF8, but they are
>>>already stored in unescaped UTF-8!
>>
>>
>>Does this happen with the file store as well?
>>
>>Oliver
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>.
>>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> .
>
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
Re: TXFileStore and local filesystem
Posted by Oliver Zeigermann <oz...@c1-fse.de>.
So, I think we have two problems now, I am endangered to mix up:
(1) The filestore has a problem with file names
(2) The dabase stores have a problem as well, which is yet unclear to me
Concerning (1): Could you send the new exception after the patch was
applied? At least the file name given in the exceptions head followed by
"Can not create resource at " should look different for me to see what
might be be going on.
Concerning (2): Could you describe this a bit more in order to make my
rusty mind understand?
Concerning the Unicode vs. UTF-8 issue: How would you decode a string
before storing into the database? Into what? The JDBC method accepts a
string, so you will have to pass it one. As I said, you can only
decode/encode into/from bytes...
Oliver
Jacob Lund wrote:
> The patch did not make any difference - it still throws the same exception!
>
> What I meant about converting from UTF-8 to Unicode is that the database
> driver can handle Unicode. In the filestore UTF-8 is converted to local
> character set in order to create the files and this is why the filestore (I
> think) has a problem. If the database could store the data in Unicode then
> there would be no problem. Since java is using Unicode in strings the task
> would simply be to decode the strings before they are stored in the database
> and then make sure that all text fields in the database are Unicode (or
> widechar or nchar).
>
> Please tell me if I am way off here!
>
> /Jacob
>
> -----Original Message-----
> From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
> Sent: 29. januar 2004 10:02
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
>
> Jacob Lund wrote:
>
>>No, the filestore works correctly.
>
>
> OK, shall I check in the patch? Did it work for you?
>
>
>>>>From what I can see the filestore converts from UTF-8 to local before it
>>stores data. This I why UTF-8 works fine for me when I upload files with
>>Danish letters in the filename, and also why if fails when it stores files
>>with characters not supported by the codepage.
>>
>>Windows XP use Unicode, but in "dos mode" it will use the old codepage
>>types. The only thing that I can imagine is that java will use this
>
> codepage
>
>>when it is doing IO operations towards the filesystem. This problem might
>
> be
>
>>a problem that only appears on windows systems.
>>
>>I do not think that the problem is in the fill data into the database that
>>has a problem. Some place in slide it will convert that data (in this case
>>the uri) to UTF-8 before it is send to the client. The data stored in the
>>database is UTF-8, and I believe that java is using Unicode. So the
>
> solution
>
>>might be to convert data fetched from the database back to Unicode as soon
>>as it arrives to the store class.
>>
>>The correct solution might be to convert from UTF-8 to Unicode before
>>storing the data and then change the database scheme to Unicode char in
>
> all
>
>>fields containing strings.
>
>
> Hmmmm. You might be confusing certain things here. On one side there is
> Unicode having a number for each character. On the other side there is
> the representation in bytes. Now, UTF-8 *is* Unicode, but on the other
> side, i.e. the representation in bytes. Thus it does not make too much
> sense to compare Unicode with UTF-8. Do you agree?
>
>
>>I am guessing here since I do not have any idea of how the stores are
>>structured in slide. I you want I would be happy to do some debugging, but
>
> I
>
>>will need a short introduction to how the datastores are designed in
>
> slide.
>
> I know, proper documentation is a major problem. I will try to prepare
> something like a short introduction and will post it to the list as soon
> as it is done. This may take a while though :(
>
> Oliver
>
>
>>/Jacob
>>
>>-----Original Message-----
>>From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
>>Sent: 28. januar 2004 16:40
>>To: Slide Users Mailing List
>>Subject: Re: TXFileStore and local filesystem
>>
>>Jacob Lund wrote:
>>
>>
>>
>>>Sorry about that - yes I am talking about the URI!
>>>
>>>If I look in a record in the database, each Danish character is stored as
>>>two "funny looking" characters corresponding to the unescaped UTF-8
>>
>>encoded
>>
>>
>>>version - so this looks correct! However when I do a propfind on the
>>>collection I which I place this file, then I get something like this
>>>/files/%C3%83%C2%B8 - and this should have been representing one Danish
>>>character. If I take the above and convert from UTF8 to my local, then I
>>
>>get
>>
>>
>>>what is store in the database - If I then convert from UTF8 to local again
>>>the I get the correct Danish letter.
>>
>>
>>I could not find anything that might have converted the URI strings.
>>They are just plainly filled into the SQL like in
>>
>>
>>
>>> "select 1 from OBJECT o, URI u where
>>
>>o.URI_ID=u.URI_ID and u.URI_STRING=?");
>>
>>
>>> statement.setString(1, uri.toString());
>>
>>
>>So, maybe this is a more general problem...
>>
>>
>>
>>>I seem that slide converts the URI's from the db to UTF8, but they are
>>>already stored in unescaped UTF-8!
>>
>>
>>Does this happen with the file store as well?
>>
>>Oliver
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: slide-user-help@jakarta.apache.org
>>
>>
>>.
>>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> .
>
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
RE: TXFileStore and local filesystem
Posted by Jacob Lund <jl...@qualiware.com>.
The patch did not make any difference - it still throws the same exception!
What I meant about converting from UTF-8 to Unicode is that the database
driver can handle Unicode. In the filestore UTF-8 is converted to local
character set in order to create the files and this is why the filestore (I
think) has a problem. If the database could store the data in Unicode then
there would be no problem. Since java is using Unicode in strings the task
would simply be to decode the strings before they are stored in the database
and then make sure that all text fields in the database are Unicode (or
widechar or nchar).
Please tell me if I am way off here!
/Jacob
-----Original Message-----
From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
Sent: 29. januar 2004 10:02
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem
Jacob Lund wrote:
> No, the filestore works correctly.
OK, shall I check in the patch? Did it work for you?
>>>From what I can see the filestore converts from UTF-8 to local before it
> stores data. This I why UTF-8 works fine for me when I upload files with
> Danish letters in the filename, and also why if fails when it stores files
> with characters not supported by the codepage.
>
> Windows XP use Unicode, but in "dos mode" it will use the old codepage
> types. The only thing that I can imagine is that java will use this
codepage
> when it is doing IO operations towards the filesystem. This problem might
be
> a problem that only appears on windows systems.
>
> I do not think that the problem is in the fill data into the database that
> has a problem. Some place in slide it will convert that data (in this case
> the uri) to UTF-8 before it is send to the client. The data stored in the
> database is UTF-8, and I believe that java is using Unicode. So the
solution
> might be to convert data fetched from the database back to Unicode as soon
> as it arrives to the store class.
>
> The correct solution might be to convert from UTF-8 to Unicode before
> storing the data and then change the database scheme to Unicode char in
all
> fields containing strings.
Hmmmm. You might be confusing certain things here. On one side there is
Unicode having a number for each character. On the other side there is
the representation in bytes. Now, UTF-8 *is* Unicode, but on the other
side, i.e. the representation in bytes. Thus it does not make too much
sense to compare Unicode with UTF-8. Do you agree?
> I am guessing here since I do not have any idea of how the stores are
> structured in slide. I you want I would be happy to do some debugging, but
I
> will need a short introduction to how the datastores are designed in
slide.
I know, proper documentation is a major problem. I will try to prepare
something like a short introduction and will post it to the list as soon
as it is done. This may take a while though :(
Oliver
> /Jacob
>
> -----Original Message-----
> From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
> Sent: 28. januar 2004 16:40
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
>
> Jacob Lund wrote:
>
>
>>Sorry about that - yes I am talking about the URI!
>>
>>If I look in a record in the database, each Danish character is stored as
>>two "funny looking" characters corresponding to the unescaped UTF-8
>
> encoded
>
>>version - so this looks correct! However when I do a propfind on the
>>collection I which I place this file, then I get something like this
>>/files/%C3%83%C2%B8 - and this should have been representing one Danish
>>character. If I take the above and convert from UTF8 to my local, then I
>
> get
>
>>what is store in the database - If I then convert from UTF8 to local again
>>the I get the correct Danish letter.
>
>
> I could not find anything that might have converted the URI strings.
> They are just plainly filled into the SQL like in
>
>
>> "select 1 from OBJECT o, URI u where
>
> o.URI_ID=u.URI_ID and u.URI_STRING=?");
>
>> statement.setString(1, uri.toString());
>
>
> So, maybe this is a more general problem...
>
>
>>I seem that slide converts the URI's from the db to UTF8, but they are
>>already stored in unescaped UTF-8!
>
>
> Does this happen with the file store as well?
>
> Oliver
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> .
>
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
Re: TXFileStore and local filesystem
Posted by Oliver Zeigermann <oz...@c1-fse.de>.
Jacob Lund wrote:
> No, the filestore works correctly.
OK, shall I check in the patch? Did it work for you?
>>>From what I can see the filestore converts from UTF-8 to local before it
> stores data. This I why UTF-8 works fine for me when I upload files with
> Danish letters in the filename, and also why if fails when it stores files
> with characters not supported by the codepage.
>
> Windows XP use Unicode, but in "dos mode" it will use the old codepage
> types. The only thing that I can imagine is that java will use this codepage
> when it is doing IO operations towards the filesystem. This problem might be
> a problem that only appears on windows systems.
>
> I do not think that the problem is in the fill data into the database that
> has a problem. Some place in slide it will convert that data (in this case
> the uri) to UTF-8 before it is send to the client. The data stored in the
> database is UTF-8, and I believe that java is using Unicode. So the solution
> might be to convert data fetched from the database back to Unicode as soon
> as it arrives to the store class.
>
> The correct solution might be to convert from UTF-8 to Unicode before
> storing the data and then change the database scheme to Unicode char in all
> fields containing strings.
Hmmmm. You might be confusing certain things here. On one side there is
Unicode having a number for each character. On the other side there is
the representation in bytes. Now, UTF-8 *is* Unicode, but on the other
side, i.e. the representation in bytes. Thus it does not make too much
sense to compare Unicode with UTF-8. Do you agree?
> I am guessing here since I do not have any idea of how the stores are
> structured in slide. I you want I would be happy to do some debugging, but I
> will need a short introduction to how the datastores are designed in slide.
I know, proper documentation is a major problem. I will try to prepare
something like a short introduction and will post it to the list as soon
as it is done. This may take a while though :(
Oliver
> /Jacob
>
> -----Original Message-----
> From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
> Sent: 28. januar 2004 16:40
> To: Slide Users Mailing List
> Subject: Re: TXFileStore and local filesystem
>
> Jacob Lund wrote:
>
>
>>Sorry about that - yes I am talking about the URI!
>>
>>If I look in a record in the database, each Danish character is stored as
>>two "funny looking" characters corresponding to the unescaped UTF-8
>
> encoded
>
>>version - so this looks correct! However when I do a propfind on the
>>collection I which I place this file, then I get something like this
>>/files/%C3%83%C2%B8 - and this should have been representing one Danish
>>character. If I take the above and convert from UTF8 to my local, then I
>
> get
>
>>what is store in the database - If I then convert from UTF8 to local again
>>the I get the correct Danish letter.
>
>
> I could not find anything that might have converted the URI strings.
> They are just plainly filled into the SQL like in
>
>
>> "select 1 from OBJECT o, URI u where
>
> o.URI_ID=u.URI_ID and u.URI_STRING=?");
>
>> statement.setString(1, uri.toString());
>
>
> So, maybe this is a more general problem...
>
>
>>I seem that slide converts the URI's from the db to UTF8, but they are
>>already stored in unescaped UTF-8!
>
>
> Does this happen with the file store as well?
>
> Oliver
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> .
>
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
RE: TXFileStore and local filesystem
Posted by Jacob Lund <jl...@qualiware.com>.
No, the filestore works correctly.
>From what I can see the filestore converts from UTF-8 to local before it
stores data. This I why UTF-8 works fine for me when I upload files with
Danish letters in the filename, and also why if fails when it stores files
with characters not supported by the codepage.
Windows XP use Unicode, but in "dos mode" it will use the old codepage
types. The only thing that I can imagine is that java will use this codepage
when it is doing IO operations towards the filesystem. This problem might be
a problem that only appears on windows systems.
I do not think that the problem is in the fill data into the database that
has a problem. Some place in slide it will convert that data (in this case
the uri) to UTF-8 before it is send to the client. The data stored in the
database is UTF-8, and I believe that java is using Unicode. So the solution
might be to convert data fetched from the database back to Unicode as soon
as it arrives to the store class.
The correct solution might be to convert from UTF-8 to Unicode before
storing the data and then change the database scheme to Unicode char in all
fields containing strings.
I am guessing here since I do not have any idea of how the stores are
structured in slide. I you want I would be happy to do some debugging, but I
will need a short introduction to how the datastores are designed in slide.
/Jacob
-----Original Message-----
From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
Sent: 28. januar 2004 16:40
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem
Jacob Lund wrote:
> Sorry about that - yes I am talking about the URI!
>
> If I look in a record in the database, each Danish character is stored as
> two "funny looking" characters corresponding to the unescaped UTF-8
encoded
> version - so this looks correct! However when I do a propfind on the
> collection I which I place this file, then I get something like this
> /files/%C3%83%C2%B8 - and this should have been representing one Danish
> character. If I take the above and convert from UTF8 to my local, then I
get
> what is store in the database - If I then convert from UTF8 to local again
> the I get the correct Danish letter.
I could not find anything that might have converted the URI strings.
They are just plainly filled into the SQL like in
> "select 1 from OBJECT o, URI u where
o.URI_ID=u.URI_ID and u.URI_STRING=?");
> statement.setString(1, uri.toString());
So, maybe this is a more general problem...
> I seem that slide converts the URI's from the db to UTF8, but they are
> already stored in unescaped UTF-8!
Does this happen with the file store as well?
Oliver
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
Re: TXFileStore and local filesystem
Posted by Oliver Zeigermann <oz...@c1-fse.de>.
Jacob Lund wrote:
> Sorry about that - yes I am talking about the URI!
>
> If I look in a record in the database, each Danish character is stored as
> two "funny looking" characters corresponding to the unescaped UTF-8 encoded
> version - so this looks correct! However when I do a propfind on the
> collection I which I place this file, then I get something like this
> /files/%C3%83%C2%B8 - and this should have been representing one Danish
> character. If I take the above and convert from UTF8 to my local, then I get
> what is store in the database - If I then convert from UTF8 to local again
> the I get the correct Danish letter.
I could not find anything that might have converted the URI strings.
They are just plainly filled into the SQL like in
> "select 1 from OBJECT o, URI u where o.URI_ID=u.URI_ID and u.URI_STRING=?");
> statement.setString(1, uri.toString());
So, maybe this is a more general problem...
> I seem that slide converts the URI's from the db to UTF8, but they are
> already stored in unescaped UTF-8!
Does this happen with the file store as well?
Oliver
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
RE: TXFileStore and local filesystem
Posted by Jacob Lund <jl...@qualiware.com>.
Sorry about that - yes I am talking about the URI!
If I look in a record in the database, each Danish character is stored as
two "funny looking" characters corresponding to the unescaped UTF-8 encoded
version - so this looks correct! However when I do a propfind on the
collection I which I place this file, then I get something like this
/files/%C3%83%C2%B8 - and this should have been representing one Danish
character. If I take the above and convert from UTF8 to my local, then I get
what is store in the database - If I then convert from UTF8 to local again
the I get the correct Danish letter.
I seem that slide converts the URI's from the db to UTF8, but they are
already stored in unescaped UTF-8!
/Jacob
-----Original Message-----
From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
Sent: 28. januar 2004 15:25
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem
Oh, well, you seem to be one step ahead...
Jacob Lund wrote:
> I just tried the SQLServerRDMBSAdapter and it seems to have some problems
> too. Has UTF-8 been tested with the RDMS adapters?
>
> All tables in the db are uses normal char and not Unicode char. I guess
that
> it means that slide should store data as UTF-8. Looking into the content
of
> the records it does look like UTF-8 encoded data. I this correct?
Are you talking about the URIs? Or about content? Content is stored in
BLOBs, i.e. with no en/decoding at all as it considered binary.
What problems are you experiencing?
Oliver
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
Re: TXFileStore and local filesystem
Posted by Oliver Zeigermann <oz...@c1-fse.de>.
Oh, well, you seem to be one step ahead...
Jacob Lund wrote:
> I just tried the SQLServerRDMBSAdapter and it seems to have some problems
> too. Has UTF-8 been tested with the RDMS adapters?
>
> All tables in the db are uses normal char and not Unicode char. I guess that
> it means that slide should store data as UTF-8. Looking into the content of
> the records it does look like UTF-8 encoded data. I this correct?
Are you talking about the URIs? Or about content? Content is stored in
BLOBs, i.e. with no en/decoding at all as it considered binary.
What problems are you experiencing?
Oliver
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
RE: TXFileStore and local filesystem
Posted by Jacob Lund <jl...@qualiware.com>.
I just tried the SQLServerRDMBSAdapter and it seems to have some problems
too. Has UTF-8 been tested with the RDMS adapters?
All tables in the db are uses normal char and not Unicode char. I guess that
it means that slide should store data as UTF-8. Looking into the content of
the records it does look like UTF-8 encoded data. I this correct?
/Jacob
-----Original Message-----
From: Jacob Lund [mailto:jl@qualiware.com]
Sent: 28. januar 2004 13:03
To: 'Slide Users Mailing List'
Subject: RE: TXFileStore and local filesystem
No file is created by the filestore!
German letters and danish letters works fine for me too - and they are also
supported by the codepage.
To avoid these problems, why not save the files in escaped UTF-8 codes -
that way wee always know that is will work?!?
/Jacob
Here is the stack trace:
28 Jan 2004 12:57:25 -
org.apache.slide.store.txfile.AbstractTxFileStoreService
- WARNING - Thread Thread[http8080-Processor23,5,main] marked transaction
branch
http8080-Processor23-1075291045101-264 for rollback
org.apache.slide.store.txfile.rm.ResourceManagerSystemException:
http8080-Proces
sor23-1075291045101-264: Can not create resource at
'/files/+?-++?-Á+?-¦+?-©+æ-Ç
+æ-â+?-Á+?-++æ-ï.txt.def.xml' (ERR_SYSTEM)
Caused by: java.io.IOException: The filename, directory name, or volume
label sy
ntax is incorrect
at java.io.WinNTFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:828)
at
org.apache.slide.store.txfile.rm.impl.FileResourceManager.createFile(
FileResourceManager.java:209)
at
org.apache.slide.store.txfile.rm.impl.FileResourceManager.createResou
rce(FileResourceManager.java:770)
at
org.apache.slide.store.txfile.XMLResourceDescriptor.create(XMLResourc
eDescriptor.java:201)
at
org.apache.slide.store.txfile.TxXMLFileDescriptorsStore.createObject(
TxXMLFileDescriptorsStore.java:126)
at
org.apache.slide.store.AbstractStore.createObject(AbstractStore.java:
597)
at
org.apache.slide.store.ExtendedStore.createObject(ExtendedStore.java:
476)
at
org.apache.slide.structure.StructureImpl.create(StructureImpl.java:36
2)
at
org.apache.slide.webdav.method.PutMethod.executeRequest(PutMethod.jav
a:333)
at
org.apache.slide.webdav.method.AbstractWebdavMethod.run(AbstractWebda
vMethod.java:355)
at
org.apache.slide.webdav.WebdavServlet.service(WebdavServlet.java:204)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl
icationFilterChain.java:284)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF
ilterChain.java:204)
at
org.apache.slide.webdav.filter.LogFilter.doFilter(LogFilter.java:182)
-----Original Message-----
From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
Sent: 28. januar 2004 12:30
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem
Hi Jacob!
I can imagine there might be problems with the filename. I do not know
how to create filenames with Russian characters (or any other key not on
my keyboard, but German special characters "öäüÖÄÜß" work fine ). So,
please report any progress and try to find out where it fails. Are files
being created? If so, how do the filenames look like.
Thanks for helping and cheers,
Oliver
Jacob Lund wrote:
> Hi All!
>
> Has anyone tried the TXFile store with filenames that includes characters
> that are not allowed in the server file system?
>
> I am using windows XP and the UI supports UTF8. My XP is English and local
> is set to Danish! If I upload a file with Russian characters in the
filename
> then slide fails with a 500 error. Slide is set to UFT8 and running on TC
> 5.0.16.
>
> My slide version I about a week old (from the SLIDE_2_0_RELEASE_BRANCH)
>
> I will dig deeper into this, but I just wanted to hear if anyone else has
> experience with this.
>
> Without knowing anything for sure I think that java will apply the "DOS"
> codepage to all IO operations. If this is the true then the filestore
needs
> to save all information in an escaped UTF8 mannor.
>
> /Jacob
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> .
>
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
RE: TXFileStore and local filesystem
Posted by Jacob Lund <jl...@qualiware.com>.
No file is created by the filestore!
German letters and danish letters works fine for me too - and they are also supported by the codepage.
To avoid these problems, why not save the files in escaped UTF-8 codes - that way wee always know that is will work?!?
/Jacob
Here is the stack trace:
28 Jan 2004 12:57:25 - org.apache.slide.store.txfile.AbstractTxFileStoreService
- WARNING - Thread Thread[http8080-Processor23,5,main] marked transaction branch
http8080-Processor23-1075291045101-264 for rollback
org.apache.slide.store.txfile.rm.ResourceManagerSystemException: http8080-Proces
sor23-1075291045101-264: Can not create resource at '/files/├?┬╗├?┬Á├?┬│├?┬©├æ┬Ç
├æ┬â├?┬Á├?┬╝├æ┬ï.txt.def.xml' (ERR_SYSTEM)
Caused by: java.io.IOException: The filename, directory name, or volume label sy
ntax is incorrect
at java.io.WinNTFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:828)
at org.apache.slide.store.txfile.rm.impl.FileResourceManager.createFile(
FileResourceManager.java:209)
at org.apache.slide.store.txfile.rm.impl.FileResourceManager.createResou
rce(FileResourceManager.java:770)
at org.apache.slide.store.txfile.XMLResourceDescriptor.create(XMLResourc
eDescriptor.java:201)
at org.apache.slide.store.txfile.TxXMLFileDescriptorsStore.createObject(
TxXMLFileDescriptorsStore.java:126)
at org.apache.slide.store.AbstractStore.createObject(AbstractStore.java:
597)
at org.apache.slide.store.ExtendedStore.createObject(ExtendedStore.java:
476)
at org.apache.slide.structure.StructureImpl.create(StructureImpl.java:36
2)
at org.apache.slide.webdav.method.PutMethod.executeRequest(PutMethod.jav
a:333)
at org.apache.slide.webdav.method.AbstractWebdavMethod.run(AbstractWebda
vMethod.java:355)
at org.apache.slide.webdav.WebdavServlet.service(WebdavServlet.java:204)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:853)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl
icationFilterChain.java:284)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF
ilterChain.java:204)
at org.apache.slide.webdav.filter.LogFilter.doFilter(LogFilter.java:182)
-----Original Message-----
From: Oliver Zeigermann [mailto:ozeigermann@c1-fse.de]
Sent: 28. januar 2004 12:30
To: Slide Users Mailing List
Subject: Re: TXFileStore and local filesystem
Hi Jacob!
I can imagine there might be problems with the filename. I do not know
how to create filenames with Russian characters (or any other key not on
my keyboard, but German special characters "öäüÖÄÜß" work fine ). So,
please report any progress and try to find out where it fails. Are files
being created? If so, how do the filenames look like.
Thanks for helping and cheers,
Oliver
Jacob Lund wrote:
> Hi All!
>
> Has anyone tried the TXFile store with filenames that includes characters
> that are not allowed in the server file system?
>
> I am using windows XP and the UI supports UTF8. My XP is English and local
> is set to Danish! If I upload a file with Russian characters in the filename
> then slide fails with a 500 error. Slide is set to UFT8 and running on TC
> 5.0.16.
>
> My slide version I about a week old (from the SLIDE_2_0_RELEASE_BRANCH)
>
> I will dig deeper into this, but I just wanted to hear if anyone else has
> experience with this.
>
> Without knowing anything for sure I think that java will apply the "DOS"
> codepage to all IO operations. If this is the true then the filestore needs
> to save all information in an escaped UTF8 mannor.
>
> /Jacob
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> .
>
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org
Re: TXFileStore and local filesystem
Posted by Oliver Zeigermann <oz...@c1-fse.de>.
Hi Jacob!
I can imagine there might be problems with the filename. I do not know
how to create filenames with Russian characters (or any other key not on
my keyboard, but German special characters "öäüÖÄÜß" work fine ). So,
please report any progress and try to find out where it fails. Are files
being created? If so, how do the filenames look like.
Thanks for helping and cheers,
Oliver
Jacob Lund wrote:
> Hi All!
>
> Has anyone tried the TXFile store with filenames that includes characters
> that are not allowed in the server file system?
>
> I am using windows XP and the UI supports UTF8. My XP is English and local
> is set to Danish! If I upload a file with Russian characters in the filename
> then slide fails with a 500 error. Slide is set to UFT8 and running on TC
> 5.0.16.
>
> My slide version I about a week old (from the SLIDE_2_0_RELEASE_BRANCH)
>
> I will dig deeper into this, but I just wanted to hear if anyone else has
> experience with this.
>
> Without knowing anything for sure I think that java will apply the "DOS"
> codepage to all IO operations. If this is the true then the filestore needs
> to save all information in an escaped UTF8 mannor.
>
> /Jacob
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
> .
>
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org