You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Paco Avila <pa...@git.es> on 2006/12/05 14:24:46 UTC

repository portability

Is an repository created in Windows portable to Linux? The repository
files are stored in XML local files. The XML in the
"worspaces/default/data" seems to be portable (they are UTF-8 encoded),
but the files stored in "worspaces/default/blobs" seems quite diferents.
At least, the TEXT files.
-- 
Paco Avila <pa...@git.es>


Re: repository portability

Posted by Michael Neale <mi...@gmail.com>.
you should be able to export and import using the JCR api between platforms,
and it should take care of it all (if not being the most efficient way for
large amounts of data).

On 12/6/06, Tobias Bocanegra <to...@day.com> wrote:
>
> > More or less.... The text data files are platform dependant encoding.
> which ones?
>
>
> --
> -----------------------------------------< tobias.bocanegra@day.com >---
> Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
> T +41 61 226 98 98, F +41 61 226 98 97
> -----------------------------------------------< http://www.day.com >---
>

Re: repository portability

Posted by Julian Reschke <ju...@gmx.de>.
Tobias Bocanegra schrieb:
> well, your problem is not jackrabbit related, rather to the default
> platform encoding for your system:
> 
> you read/write the data, using the default encoding:
> 
> nodoPruebaName.getBytes()
> 
> which is not platform independent. it's better to use:
> 
> nodoPruebaName.getBytes("utf-8")
> 
> or even better, use the same encoding you store in the jcr:encoding 
> property.
> regards, toby

+1.

Using the system default encoding in getBytes() (and related IO methods) 
almost always is a potential bug.

In doubt, always use UTF-8 (and, as Tobias said), make sure that the 
encoding property reflects the actual content encoding.

Best regards, Julian

Re: V1.1.1 source download file incomplete?

Posted by Lei Zhou <Le...@pointalliance.com>.
Hi Jukka,

> Would it be useful to have full javadocs of each release available on
> the Jackrabbit web site? We can do that if there's demand

It would be nice to have, but don't bother if it takes many efforts to 
maintain.


> Many of the query classes are generated from JavaCC grammar files by
> the build system. Run "maven jar" and look for target/generated-src.

I actually would really like to have a binary jackrabbit JAR file that 
includes all jar dependencies and doesn't rely on any other classpath 
except for finding the common JDK packages.

I ask for this because I'm having problem implementing jackrabbit 
deployment Model 2 on WebSphere Portal server. Because there is already an 
IBM version of Jcr implementation on the server's root classpath. And I 
can't configure a JNDI resource environment reference for Jackrabbit since 
the portal server would always find the IBM's implementation of 
javax.jcr.Repository.

I'm thinking if I can have all dependencies packaged into one 
jackrabbit-1.1.1.jar file, I'd be able to put this jar file in the shared 
library and instantiate the repository implementation as a singleton. 
Would this be possible? 

Thanks,
Lei

Re: V1.1.1 source download file incomplete?

Posted by Alexandru Popescu <th...@gmail.com>.
On 12/7/06, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On 12/7/06, Lei Zhou <Le...@pointalliance.com> wrote:
> > I just downloaded the jackrabbit-core-1.1.1-src.jar file from the website
> > (http://jackrabbit.apache.org/downloads.cgi), it seems that some class
> > files are missing. For example, there is only one class definition in
> > package org.apache.jackrabbit.core.query.lucene.fulltext. But from the
> > binary jar file I could see a lot more.
>
> Many of the query classes are generated from JavaCC grammar files by
> the build system. Run "maven jar" and look for target/generated-src.
>
> > I also noticed that the available class definitions in the download
> > jackrabbit-core-1.1.1-src.jar seem to match what is available on the
> > JavaDoc web pages. is there any other place that I can download the
> > complete code?
>
> The javadocs at http://jackrabbit.apache.org/api-1/ is based on the
> 1.0 release. We haven't updated the Javadocs there as the main reason
> for having them on the web site is as documentation of the public
> interfaces in org.apache.jackrabbit.api.
>
> Would it be useful to have full javadocs of each release available on
> the Jackrabbit web site? We can do that if there's demand.
>

Hmm... I guess this would be pretty cool. Then you can set your
Eclipse IDE to look for the javadocs online according to the jar
version.

If it is not too complex then you have my +1.

./alex
--
.w( the_mindstorm )p.

> BR,
>
> Jukka Zitting
>

Re: V1.1.1 source download file incomplete?

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 12/7/06, Lei Zhou <Le...@pointalliance.com> wrote:
> I just downloaded the jackrabbit-core-1.1.1-src.jar file from the website
> (http://jackrabbit.apache.org/downloads.cgi), it seems that some class
> files are missing. For example, there is only one class definition in
> package org.apache.jackrabbit.core.query.lucene.fulltext. But from the
> binary jar file I could see a lot more.

Many of the query classes are generated from JavaCC grammar files by
the build system. Run "maven jar" and look for target/generated-src.

> I also noticed that the available class definitions in the download
> jackrabbit-core-1.1.1-src.jar seem to match what is available on the
> JavaDoc web pages. is there any other place that I can download the
> complete code?

The javadocs at http://jackrabbit.apache.org/api-1/ is based on the
1.0 release. We haven't updated the Javadocs there as the main reason
for having them on the web site is as documentation of the public
interfaces in org.apache.jackrabbit.api.

Would it be useful to have full javadocs of each release available on
the Jackrabbit web site? We can do that if there's demand.

BR,

Jukka Zitting

V1.1.1 source download file incomplete?

Posted by Lei Zhou <Le...@pointalliance.com>.
Hi,
I just downloaded the jackrabbit-core-1.1.1-src.jar file from the website 
(http://jackrabbit.apache.org/downloads.cgi), it seems that some class 
files are missing. For example, there is only one class definition in 
package org.apache.jackrabbit.core.query.lucene.fulltext. But from the 
binary jar file I could see a lot more. 
I also noticed that the available class definitions in the download  
jackrabbit-core-1.1.1-src.jar seem to match what is available on the 
JavaDoc web pages. is there any other place that I can download the 
complete code? 
Thanks,
Lei

Re: repository portability

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 12/7/06, Paco Avila <pa...@git.es> wrote:
> El jue, 07-12-2006 a las 11:10 +0100, Paco Avila escribió:
> > Ok, I was not sure about it. I thougth that jcr:encoding was used by
> > Jackrabbit to encode and store the data when the jcr:mimeType was
> > "text/*".
>
> Or used by Lucene to index the data with the proper encoding.

Correct. In fact the jcr:encoding property *is* used by the
repository, it gets passed to the configured TextFilter instances by
the NodeIndexer class, and is used for example by the
TextPlainTextIndexer as the encoding parameter of the
InputStreamReader it instantiates to access the character content of
the binary stream.

BR,

Jukka Zitting

Re: repository portability

Posted by Paco Avila <pa...@git.es>.
El jue, 07-12-2006 a las 11:10 +0100, Paco Avila escribió:
> El jue, 07-12-2006 a las 10:06 +0100, Tobias Bocanegra escribió:
> > > > But I have a last question... The JSR-170 says: "The jcr:encoding
> > > > indicates the character set encoding used. If this resource does not
> > > > contains character data then this property will not be present". I'm not
> > > > sure about the meaning of this paragraph: in the test program I store
> > > > String in the nt:resource node and should have this property because of
> > > > encoding issues. But what about storing binary data like an JPG image?
> > > > It this property ignored or it shouldn't be present? Should be used only
> > > > when the jcr:mimeType property is "text/*"?
> > >
> > > The term "encoding" (as used for jcr:encoding) is meaningless for binary
> > > content. It describes how to reconstruct a sequence from characters for
> > > a sequence of bytes.
> > >
> > > So, yes, it's usually only meaningful for jcr:mimeType values matching
> > > "text/*".
> > 
> > please note, that the jcr:encoding property is not used by the
> > repository itself for encoding data. it's thought as a helper property
> > that applications may need when they want to store the encoding.
> 
> Ok, I was not sure about it. I thougth that jcr:encoding was used by
> Jackrabbit to encode and store the data when the jcr:mimeType was
> "text/*".

Or used by Lucene to index the data with the proper encoding.

-- 
Paco Avila <pa...@git.es>


Re: repository portability

Posted by Paco Avila <pa...@git.es>.
El jue, 07-12-2006 a las 10:06 +0100, Tobias Bocanegra escribió:
> > > But I have a last question... The JSR-170 says: "The jcr:encoding
> > > indicates the character set encoding used. If this resource does not
> > > contains character data then this property will not be present". I'm not
> > > sure about the meaning of this paragraph: in the test program I store
> > > String in the nt:resource node and should have this property because of
> > > encoding issues. But what about storing binary data like an JPG image?
> > > It this property ignored or it shouldn't be present? Should be used only
> > > when the jcr:mimeType property is "text/*"?
> >
> > The term "encoding" (as used for jcr:encoding) is meaningless for binary
> > content. It describes how to reconstruct a sequence from characters for
> > a sequence of bytes.
> >
> > So, yes, it's usually only meaningful for jcr:mimeType values matching
> > "text/*".
> 
> please note, that the jcr:encoding property is not used by the
> repository itself for encoding data. it's thought as a helper property
> that applications may need when they want to store the encoding.

Ok, I was not sure about it. I thougth that jcr:encoding was used by
Jackrabbit to encode and store the data when the jcr:mimeType was
"text/*".

Many thanks of this tip!

-- 
Paco Avila <pa...@git.es>


Re: repository portability

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 12/7/06, Tobias Bocanegra <to...@day.com> wrote:
> of course the application is free to choose what information to store
> in this property. for example, an zipped xml-file could have:
>   jcr:mimeType "text/xml"
>   jcr:encoding "gzip"
> although this might not make sense :-)

This is more in line with the HTTP Content-Encoding header than the
charset parameter of the MIME type as suggested by JSR 170. I think
that the Content-Encoding makes more sense, as there already is a
defined way to embed character encoding information in the MIME type.
For example:

    jcr:mimeType "text/plain; charset=UTF-8"
    jcr:encoding "gzip"

But since this contradicts JSR 170, I would argue that this should not
be done. The preferred alternatives being

    jcr:mimeType "text/plain; charset=UTF-8"
    jcr:encoding <unset>

and

    jcr:mimeType "text/plain"
    jcr:encoding "UTF-8"

Note that this way any content encodings need to be decoded before
storing the content in the jcr:data property.

BR,

Jukka Zitting

Re: repository portability

Posted by Tobias Bocanegra <to...@day.com>.
> Tobias Bocanegra schrieb:
> >> > But I have a last question... The JSR-170 says: "The jcr:encoding
> >> > indicates the character set encoding used. If this resource does not
> >> > contains character data then this property will not be present". I'm
> >> not
> >> > sure about the meaning of this paragraph: in the test program I store
> >> > String in the nt:resource node and should have this property because of
> >> > encoding issues. But what about storing binary data like an JPG image?
> >> > It this property ignored or it shouldn't be present? Should be used
> >> only
> >> > when the jcr:mimeType property is "text/*"?
> >>
> >> The term "encoding" (as used for jcr:encoding) is meaningless for binary
> >> content. It describes how to reconstruct a sequence from characters for
> >> a sequence of bytes.
> >>
> >> So, yes, it's usually only meaningful for jcr:mimeType values matching
> >> "text/*".
> >
> > please note, that the jcr:encoding property is not used by the
> > repository itself for encoding data. it's thought as a helper property
> > that applications may need when they want to store the encoding.
> >
> > of course the application is free to choose what information to store
> > in this property. for example, an zipped xml-file could have:
> >  jcr:mimeType "text/xml"
> >  jcr:encoding "gzip"
> > although this might not make sense :-)
>
> No,
>
> that in fact does not make any sense, so don't do it.
>
> In case the binary content can be decoded into characters, the
> jcr:encoding property should hold the name of the character encoding
> (registry at <http://www.iana.org/assignments/character-sets>). In this
> case, it can be used to construct the encoding parameter for an MIME
> Content-Type header (for instance, when serving the content through HTTP).

well, it should have been named jcr:charset or jcr:charsetEncoding then :-)
but of course the spec is very clear of how to use this property as
paco quoted above:

"The jcr:encoding indicates the character set encoding used. If this
resource does not contains character data then this property will not
be present"

regards, toby
-- 
-----------------------------------------< tobias.bocanegra@day.com >---
Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
T +41 61 226 98 98, F +41 61 226 98 97
-----------------------------------------------< http://www.day.com >---

Re: repository portability

Posted by Julian Reschke <ju...@gmx.de>.
Tobias Bocanegra schrieb:
>> > But I have a last question... The JSR-170 says: "The jcr:encoding
>> > indicates the character set encoding used. If this resource does not
>> > contains character data then this property will not be present". I'm 
>> not
>> > sure about the meaning of this paragraph: in the test program I store
>> > String in the nt:resource node and should have this property because of
>> > encoding issues. But what about storing binary data like an JPG image?
>> > It this property ignored or it shouldn't be present? Should be used 
>> only
>> > when the jcr:mimeType property is "text/*"?
>>
>> The term "encoding" (as used for jcr:encoding) is meaningless for binary
>> content. It describes how to reconstruct a sequence from characters for
>> a sequence of bytes.
>>
>> So, yes, it's usually only meaningful for jcr:mimeType values matching
>> "text/*".
> 
> please note, that the jcr:encoding property is not used by the
> repository itself for encoding data. it's thought as a helper property
> that applications may need when they want to store the encoding.
> 
> of course the application is free to choose what information to store
> in this property. for example, an zipped xml-file could have:
>  jcr:mimeType "text/xml"
>  jcr:encoding "gzip"
> although this might not make sense :-)

No,

that in fact does not make any sense, so don't do it.

In case the binary content can be decoded into characters, the 
jcr:encoding property should hold the name of the character encoding 
(registry at <http://www.iana.org/assignments/character-sets>). In this 
case, it can be used to construct the encoding parameter for an MIME 
Content-Type header (for instance, when serving the content through HTTP).

Best regards, Julian



Re: repository portability

Posted by Tobias Bocanegra <to...@day.com>.
> > But I have a last question... The JSR-170 says: "The jcr:encoding
> > indicates the character set encoding used. If this resource does not
> > contains character data then this property will not be present". I'm not
> > sure about the meaning of this paragraph: in the test program I store
> > String in the nt:resource node and should have this property because of
> > encoding issues. But what about storing binary data like an JPG image?
> > It this property ignored or it shouldn't be present? Should be used only
> > when the jcr:mimeType property is "text/*"?
>
> The term "encoding" (as used for jcr:encoding) is meaningless for binary
> content. It describes how to reconstruct a sequence from characters for
> a sequence of bytes.
>
> So, yes, it's usually only meaningful for jcr:mimeType values matching
> "text/*".

please note, that the jcr:encoding property is not used by the
repository itself for encoding data. it's thought as a helper property
that applications may need when they want to store the encoding.

of course the application is free to choose what information to store
in this property. for example, an zipped xml-file could have:
  jcr:mimeType "text/xml"
  jcr:encoding "gzip"
although this might not make sense :-)

regards, toby
-- 
-----------------------------------------< tobias.bocanegra@day.com >---
Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
T +41 61 226 98 98, F +41 61 226 98 97
-----------------------------------------------< http://www.day.com >---

Re: repository portability

Posted by Paco Avila <pa...@git.es>.
El mié, 06-12-2006 a las 20:30 +0100, Julian Reschke escribió:
> Paco Avila schrieb:
> > Thanks to Tobias & Julian, now it works.
> > 
> > But I have a last question... The JSR-170 says: "The jcr:encoding
> > indicates the character set encoding used. If this resource does not
> > contains character data then this property will not be present". I'm not
> > sure about the meaning of this paragraph: in the test program I store
> > String in the nt:resource node and should have this property because of
> > encoding issues. But what about storing binary data like an JPG image?
> > It this property ignored or it shouldn't be present? Should be used only
> > when the jcr:mimeType property is "text/*"?
> 
> The term "encoding" (as used for jcr:encoding) is meaningless for binary 
> content. It describes how to reconstruct a sequence from characters for 
> a sequence of bytes.
> 
> So, yes, it's usually only meaningful for jcr:mimeType values matching 
> "text/*".

Many thanks!
-- 
Paco Avila <pa...@git.es>


Re: repository portability

Posted by Julian Reschke <ju...@gmx.de>.
Paco Avila schrieb:
> Thanks to Tobias & Julian, now it works.
> 
> But I have a last question... The JSR-170 says: "The jcr:encoding
> indicates the character set encoding used. If this resource does not
> contains character data then this property will not be present". I'm not
> sure about the meaning of this paragraph: in the test program I store
> String in the nt:resource node and should have this property because of
> encoding issues. But what about storing binary data like an JPG image?
> It this property ignored or it shouldn't be present? Should be used only
> when the jcr:mimeType property is "text/*"?

The term "encoding" (as used for jcr:encoding) is meaningless for binary 
content. It describes how to reconstruct a sequence from characters for 
a sequence of bytes.

So, yes, it's usually only meaningful for jcr:mimeType values matching 
"text/*".

Best regards, Julian

Re: repository portability

Posted by Paco Avila <pa...@git.es>.
El mié, 06-12-2006 a las 14:44 +0100, Tobias Bocanegra escribió:
> well, your problem is not jackrabbit related, rather to the default
> platform encoding for your system:
> 
> you read/write the data, using the default encoding:
> 
> nodoPruebaName.getBytes()
> 
> which is not platform independent. it's better to use:
> 
> nodoPruebaName.getBytes("utf-8")
> 
> or even better, use the same encoding you store in the jcr:encoding property.
> regards, toby

Thanks to Tobias & Julian, now it works.

But I have a last question... The JSR-170 says: "The jcr:encoding
indicates the character set encoding used. If this resource does not
contains character data then this property will not be present". I'm not
sure about the meaning of this paragraph: in the test program I store
String in the nt:resource node and should have this property because of
encoding issues. But what about storing binary data like an JPG image?
It this property ignored or it shouldn't be present? Should be used only
when the jcr:mimeType property is "text/*"?
-- 
Paco Avila <pa...@git.es>


Re: repository portability

Posted by Tobias Bocanegra <to...@day.com>.
well, your problem is not jackrabbit related, rather to the default
platform encoding for your system:

you read/write the data, using the default encoding:

nodoPruebaName.getBytes()

which is not platform independent. it's better to use:

nodoPruebaName.getBytes("utf-8")

or even better, use the same encoding you store in the jcr:encoding property.
regards, toby


On 12/6/06, Paco Avila <pa...@git.es> wrote:
> El mié, 06-12-2006 a las 09:34 +0100, Tobias Bocanegra escribió:
> > > More or less.... The text data files are platform dependant encoding.
> > which ones?
>
> I create an repository under Windows with a "nt:file" node. The content
> is a string with some latin-1 and greek chars. I make an backup (a tar
> archive) and restore it on Linux. But, the greek chars are not readed
> correctly.
>
> Attached is the repository definition, the test code (read and write)
> and the repository backuped from Windows.
>
> Note: I make a tar from the repository directory, not an repository
> export.
> --
> Paco Avila <pa...@git.es>
>
>
>


-- 
-----------------------------------------< tobias.bocanegra@day.com >---
Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
T +41 61 226 98 98, F +41 61 226 98 97
-----------------------------------------------< http://www.day.com >---

Re: repository portability

Posted by Paco Avila <pa...@git.es>.
El mié, 06-12-2006 a las 09:34 +0100, Tobias Bocanegra escribió:
> > More or less.... The text data files are platform dependant encoding.
> which ones?

I create an repository under Windows with a "nt:file" node. The content
is a string with some latin-1 and greek chars. I make an backup (a tar
archive) and restore it on Linux. But, the greek chars are not readed
correctly. 

Attached is the repository definition, the test code (read and write)
and the repository backuped from Windows.

Note: I make a tar from the repository directory, not an repository
export.
-- 
Paco Avila <pa...@git.es>

Re: repository portability

Posted by Tobias Bocanegra <to...@day.com>.
> More or less.... The text data files are platform dependant encoding.
which ones?


-- 
-----------------------------------------< tobias.bocanegra@day.com >---
Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
T +41 61 226 98 98, F +41 61 226 98 97
-----------------------------------------------< http://www.day.com >---

Re: repository portability

Posted by Paco Avila <pa...@git.es>.
El mar, 05-12-2006 a las 14:36 +0100, Tobias Bocanegra escribió:
> the files in the blobs directory are not encoded but are the contents
> of the binary properties. the repository data is platform independent.

More or less.... The text data files are platform dependant encoding.
-- 
Paco Avila <pa...@git.es>


Re: repository portability

Posted by Tobias Bocanegra <to...@day.com>.
the files in the blobs directory are not encoded but are the contents
of the binary properties. the repository data is platform independent.

regards, toby

On 12/5/06, Paco Avila <pa...@git.es> wrote:
> Is an repository created in Windows portable to Linux? The repository
> files are stored in XML local files. The XML in the
> "worspaces/default/data" seems to be portable (they are UTF-8 encoded),
> but the files stored in "worspaces/default/blobs" seems quite diferents.
> At least, the TEXT files.
> --
> Paco Avila <pa...@git.es>
>
>


-- 
-----------------------------------------< tobias.bocanegra@day.com >---
Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
T +41 61 226 98 98, F +41 61 226 98 97
-----------------------------------------------< http://www.day.com >---