You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by ttemprano <tt...@Toyota.com.ve> on 2011/02/16 20:42:22 UTC

Storing latin characters on repository: bad results!

Hi everyone...

I believe I've advanced pretty much on what I consider a pretty steep
learning curve (Jackrabbit).

This may be more of an issue with html than JCR.

The problem arises when saving information to the repository with latin
"special" characters like ñ and tildes: á é í ó ú.

For example. I'm creating a document repository that has categories of
documents.

In the end, those categories are nt:folder nodes.

Preferably, the category name should be the node path. Lets say I create a
fictional category called "ácéntós y eñe".

I create the node and save... So far so good, however, when I retrieve the
path with node.getPath() I get this: /ácéntós y eñe

In fact, if I print the path on console it shows the same.

This happens for the path of the node or in properties of Authorizables e.g.
Group authorizables with custom properties as "group name".

I'm using Jackrabbit 2.2 on Tomcat 6 and JSP.

Thank you.
Tomás.


-- 
View this message in context: http://jackrabbit.510166.n4.nabble.com/Storing-latin-characters-on-repository-bad-results-tp3309595p3309595.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Re: AW: Storing latin characters on repository: bad results!

Posted by ttemprano <tt...@Toyota.com.ve>.

Jukka Zitting-6 wrote:
> 
> Hi,
> 
> On 02/17/2011 09:13 AM, Seidel. Robert wrote:
>> I didn't test node paths and property names with other than ASCII
>> characters, but the values of properties are working correctly
>> (tested with german umlauts).
> 
> Jackrabbit stores all character data (names, values, etc.) as Unicode, 
> so issues like the one experienced by the Tomás are most likely due to 
> character set conversion problems in the client code either before the 
> data is added to the repository or after it has been read.
> 
> -- 
> Jukka Zitting
> 
> 

Hi.

I stepped through the code and in fact the string was getting stored
incorrectly.

I changed the encoding of the JSP to ISO-8859-1 and it works as expected
now.

Thank you everyone!

Expect some more questions from me as I step into the dark access control
world of Jackrabbit ;)
-- 
View this message in context: http://jackrabbit.510166.n4.nabble.com/Storing-latin-characters-on-repository-bad-results-tp3309595p3310738.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Re: AW: Storing latin characters on repository: bad results!

Posted by Jukka Zitting <jz...@adobe.com>.
Hi,

On 02/17/2011 09:13 AM, Seidel. Robert wrote:
> I didn't test node paths and property names with other than ASCII
> characters, but the values of properties are working correctly
> (tested with german umlauts).

Jackrabbit stores all character data (names, values, etc.) as Unicode, 
so issues like the one experienced by the Tomás are most likely due to 
character set conversion problems in the client code either before the 
data is added to the repository or after it has been read.

-- 
Jukka Zitting

AW: Storing latin characters on repository: bad results!

Posted by "Seidel. Robert" <Ro...@aeb.de>.
Hi Tomás,

the characters you get are UTF8 bytes read as ANSI.

I didn't test node paths and property names with other than ASCII characters, but the values of properties are working correctly (tested with german umlauts). 

Regards, Robert

-----Ursprüngliche Nachricht-----
Von: ttemprano [mailto:ttemprano@Toyota.com.ve] 
Gesendet: Mittwoch, 16. Februar 2011 20:42
An: users@jackrabbit.apache.org
Betreff: Storing latin characters on repository: bad results!


Hi everyone...

I believe I've advanced pretty much on what I consider a pretty steep
learning curve (Jackrabbit).

This may be more of an issue with html than JCR.

The problem arises when saving information to the repository with latin
"special" characters like ñ and tildes: á é í ó ú.

For example. I'm creating a document repository that has categories of
documents.

In the end, those categories are nt:folder nodes.

Preferably, the category name should be the node path. Lets say I create a
fictional category called "ácéntós y eñe".

I create the node and save... So far so good, however, when I retrieve the
path with node.getPath() I get this: /ácéntós y eñe

In fact, if I print the path on console it shows the same.

This happens for the path of the node or in properties of Authorizables e.g.
Group authorizables with custom properties as "group name".

I'm using Jackrabbit 2.2 on Tomcat 6 and JSP.

Thank you.
Tomás.


-- 
View this message in context: http://jackrabbit.510166.n4.nabble.com/Storing-latin-characters-on-repository-bad-results-tp3309595p3309595.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Re: Storing latin characters on repository: bad results!

Posted by Alexander Klimetschek <ak...@adobe.com>.
On 16.02.11 20:42, "ttemprano" <tt...@Toyota.com.ve> wrote:
>The problem arises when saving information to the repository with latin
>"special" characters like ñ and tildes: á é í ó ú.
>...
>Preferably, the category name should be the node path. Lets say I create a
>fictional category called "ácéntós y eñe".
>
>I create the node and save... So far so good, however, when I retrieve the
>path with node.getPath() I get this: /ácéntós y eñe

JCR node and property names are defined [0] to consist of

ValidChar ::= XmlChar – InvalidChar

InvalidChar ::= '/' | ':' | '[' | ']' | '|' | '*'

and XmlChar is defined by the xml spec [1] to be "any Unicode character,
excluding the surrogate blocks, FFFE, and FFFF." [2].

So the problem happens most likely when you actually print out the path to
the console or log, which is not utf-8. (loggers or log files are
typically not utf-8 by default in my experience).

[0] 
http://www.day.com/specs/jcr/2.0/3_Repository_Model.html#3.2.2%20Local%20Na
mes
[1] this is required because of the jcr to xml document/sysview mapping
[2] http://www.w3.org/TR/xml/#NT-Char

Regards,
Alex

-- 
Alexander Klimetschek
Developer // Adobe (Day) // Berlin - Basel