You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Tobias Bocanegra <to...@day.com> on 2007/02/21 11:58:16 UTC
General Packaging mechanism
hi all,
2 weeks ago we promised to contribute our package mechanism for jcr
content to the jackrabbit project [JCR-733]. after a lengthy
(internal) discussion we decided to completely re-develop a new
content archiver that is based on a filesystem-like abstraction of the
content in jcr.
a content archive (.car) will:
- have a hierarchy that corresponds to the hierarchy in the repo
- have a filesystem based serialization of the items
- have a sophisticated serialization for non-nt:file based nodes
- be compressed
- contain meta-information about used node-types, namespaces and mapping rules
- allow exporting/importing to any jcr-repository
- use a standard format for the archive (i.e. zip/jar)
i will add the respective contrib project shortly and develop the
first batch of code together with some documentation.
comments welcome.
regards, toby
[JCR-733] http://issues.apache.org/jira/browse/JCR-733
Very Simple Example
-------------------
Repository Structure:
+ parent [nt:folder]
+ file1.txt [nt:file]
+ jcr:content [nt:resource]
- jcr:mimeType "text/plain"
- jcr:data
+ tests [nt:folder]
+ file2.txt [nt:file]
...
Filesystem/Archive structure:
/META-INF
...(nodetype an namespace infos)...
/content
/parent
/file1.txt
/tests
/file2.txt
--
-----------------------------------------< tobias.bocanegra@day.com >---
Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
T +41 61 226 98 98, F +41 61 226 98 97
-----------------------------------------------< http://www.day.com >---
Re: General Packaging mechanism
Posted by Tobias Bocanegra <to...@day.com>.
correct.
i created a small test class that writes and reads from a jar file.
and everything works ok. no matter whatever platform-encoding i use on
my system.
so, i see no problem using the zip-io from jdk1.4.
regards, toby
public class JarTest {
public static void main(String[] args) throws Exception {
System.out.println("System file.encoding: " +
System.getProperty("file.encoding"));
// write entries
byte[] testBuffer = "Hello, world.\n".getBytes();
FileOutputStream out = new FileOutputStream("test.jar");
ZipOutputStream zout = new ZipOutputStream(out);
ZipEntry e = new ZipEntry("\u03b1 - first.txt");
zout.putNextEntry(e);
zout.write(testBuffer);
zout.closeEntry();
e = new ZipEntry("\u03b2 - second.txt");
zout.putNextEntry(e);
zout.write(testBuffer);
zout.closeEntry();
e = new ZipEntry("\u263a - smile.txt");
zout.putNextEntry(e);
zout.write(testBuffer);
zout.closeEntry();
zout.close();
out.close();
// reopen and read entries
FileInputStream in = new FileInputStream("test.jar");
ZipInputStream zin = new ZipInputStream(in);
while ((e = zin.getNextEntry()) != null) {
System.out.println(e.getName());
}
zin.close();
in.close();
}
}
--
-----------------------------------------< tobias.bocanegra@day.com >---
Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
T +41 61 226 98 98, F +41 61 226 98 97
-----------------------------------------------< http://www.day.com >---
Re: General Packaging mechanism
Posted by Nicolas <nt...@gmail.com>.
Hi,
This link might help I think:
http://www.peterbuettner.de/develop/javasnippets/zipOnlyAscii/index.html
The important excerpt: "After investigation of the native code i found, that
they interpret the names in the zip as utf-8 encoded. The bytes are
transformed into a String nevertheless if they are valid utf-8 or not."
So if the package application produces UTF-8 encoded filename there won't be
any issue
Hope it helps.
Nicolas
On 2/21/07, Julian Reschke <ju...@gmx.de> wrote:
>
> Tobias Bocanegra schrieb:
> >> Unrelated to that...:
> >>
> >> > - use a standard format for the archive (i.e. zip/jar)
> >>
> >> If you use ZIP/JAR as format, how are you going to handle non-ASCII
> >> characters in filenames in a portable way?
> >
> > all non-valid filesystem characters are escaped using url-escaping %xx
> > or %uXXXX. actually i haven't looked at how non-ascii characters are
> > handled in a jar file, but obviously it works, since i can include
> > such a file in a jar file:
> > ...
>
> My understanding is that the JAR/ZIP format is silent on filename
> encoding. So a producer if these files will have to select an encoding,
> and the recipient need to select the same one. In general, this is not
> going to work unless everybody agrees to use UTF-8.
>
> > [tripod@sulu test]$ touch "到日本来.txt"
> > [tripod@sulu test]$ ll
> > total 0
> > -rw-rw-r-- 1 tripod tripod 0 Feb 21 14:44 到日本来.txt
> > [tripod@sulu test]$ cd ..
> > [tripod@sulu jcr-car]$ jar cvf test.jar test/
> > added manifest
> > adding: test/(in = 0) (out= 0)(stored 0%)
> > adding: test/到日本来.txt(in = 0) (out= 0)(stored 0%)
> > [tripod@sulu jcr-car]$ jar tf test.jar
> > META-INF/
> > META-INF/MANIFEST.MF
> > test/
> > test/到日本来.txt
>
> I think it's using the platform encoding, and you happened to try a
> character (can't tell from your mail) that can be represented in that
> encoding. Try a mix of special character (Euro sign, Hebrew, Arabic) in
> one filename, and retry :-)
>
> Best regards, Julian
>
> (P.S.: we had trouble using ZIP as a content container format two years
> ago for the reasons above; maybe the situation has improved but I really
> doubt that)
>
>
--
a+
Nico
my blog! http://www.deviant-abstraction.net !!
Re: General Packaging mechanism
Posted by Julian Reschke <ju...@gmx.de>.
Tobias Bocanegra schrieb:
>> Unrelated to that...:
>>
>> > - use a standard format for the archive (i.e. zip/jar)
>>
>> If you use ZIP/JAR as format, how are you going to handle non-ASCII
>> characters in filenames in a portable way?
>
> all non-valid filesystem characters are escaped using url-escaping %xx
> or %uXXXX. actually i haven't looked at how non-ascii characters are
> handled in a jar file, but obviously it works, since i can include
> such a file in a jar file:
> ...
My understanding is that the JAR/ZIP format is silent on filename
encoding. So a producer if these files will have to select an encoding,
and the recipient need to select the same one. In general, this is not
going to work unless everybody agrees to use UTF-8.
> [tripod@sulu test]$ touch "到日本来.txt"
> [tripod@sulu test]$ ll
> total 0
> -rw-rw-r-- 1 tripod tripod 0 Feb 21 14:44 到日本来.txt
> [tripod@sulu test]$ cd ..
> [tripod@sulu jcr-car]$ jar cvf test.jar test/
> added manifest
> adding: test/(in = 0) (out= 0)(stored 0%)
> adding: test/到日本来.txt(in = 0) (out= 0)(stored 0%)
> [tripod@sulu jcr-car]$ jar tf test.jar
> META-INF/
> META-INF/MANIFEST.MF
> test/
> test/到日本来.txt
I think it's using the platform encoding, and you happened to try a
character (can't tell from your mail) that can be represented in that
encoding. Try a mix of special character (Euro sign, Hebrew, Arabic) in
one filename, and retry :-)
Best regards, Julian
(P.S.: we had trouble using ZIP as a content container format two years
ago for the reasons above; maybe the situation has improved but I really
doubt that)
Re: General Packaging mechanism
Posted by Tobias Bocanegra <to...@day.com>.
> Unrelated to that...:
>
> > - use a standard format for the archive (i.e. zip/jar)
>
> If you use ZIP/JAR as format, how are you going to handle non-ASCII
> characters in filenames in a portable way?
all non-valid filesystem characters are escaped using url-escaping %xx
or %uXXXX. actually i haven't looked at how non-ascii characters are
handled in a jar file, but obviously it works, since i can include
such a file in a jar file:
[tripod@sulu test]$ touch "到日本来.txt"
[tripod@sulu test]$ ll
total 0
-rw-rw-r-- 1 tripod tripod 0 Feb 21 14:44 到日本来.txt
[tripod@sulu test]$ cd ..
[tripod@sulu jcr-car]$ jar cvf test.jar test/
added manifest
adding: test/(in = 0) (out= 0)(stored 0%)
adding: test/到日本来.txt(in = 0) (out= 0)(stored 0%)
[tripod@sulu jcr-car]$ jar tf test.jar
META-INF/
META-INF/MANIFEST.MF
test/
test/到日本来.txt
regards, toby
--
-----------------------------------------< tobias.bocanegra@day.com >---
Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
T +41 61 226 98 98, F +41 61 226 98 97
-----------------------------------------------< http://www.day.com >---
Re: General Packaging mechanism
Posted by Tim Kettering <ti...@vivakos.com>.
isn't .car also used in J2EE for the database connector packages
often used in J2EE servers such as Geronimo?
On Feb 21, 2007, at 8:25 AM, Julian Reschke wrote:
> Tobias Bocanegra schrieb:
>> well, it seems that they use now .sar as new extension:
>> "In the past SAP developed the tool and named it CAR. The extensions
>> of all compressed files were named ".CAR" as well. In SAP release
>> 4.6C
>> SAP decided to enhance the functionality of the CAR utility a bit.
>> Therefore, the internal structure of the compressed files slightly
>> changed. Because of this, it was necessary to create a new extension.
>> There the new extension ".SAR" was born and the new utility was named
>> SAPCAR."
>> however, finding a suitable, not used extension is almost impossible
>> (http://filext.com/detaillist.php?extdetail=car&Search=Search). using
>> .car seemed to be to most obvious and least known in the j2ee/java
>> world.
>
> Point taken, but maybe it may make sense to consider to longer
> extensions, such as "jcrcar".
>
> Unrelated to that...:
>
>> - use a standard format for the archive (i.e. zip/jar)
>
> If you use ZIP/JAR as format, how are you going to handle non-ASCII
> characters in filenames in a portable way?
>
> Best regards, Julian
Re: General Packaging mechanism
Posted by Julian Reschke <ju...@gmx.de>.
Tobias Bocanegra schrieb:
> well, it seems that they use now .sar as new extension:
>
> "In the past SAP developed the tool and named it CAR. The extensions
> of all compressed files were named ".CAR" as well. In SAP release 4.6C
> SAP decided to enhance the functionality of the CAR utility a bit.
> Therefore, the internal structure of the compressed files slightly
> changed. Because of this, it was necessary to create a new extension.
> There the new extension ".SAR" was born and the new utility was named
> SAPCAR."
>
> however, finding a suitable, not used extension is almost impossible
> (http://filext.com/detaillist.php?extdetail=car&Search=Search). using
> .car seemed to be to most obvious and least known in the j2ee/java
> world.
Point taken, but maybe it may make sense to consider to longer
extensions, such as "jcrcar".
Unrelated to that...:
> - use a standard format for the archive (i.e. zip/jar)
If you use ZIP/JAR as format, how are you going to handle non-ASCII
characters in filenames in a portable way?
Best regards, Julian
Re: General Packaging mechanism
Posted by Tobias Bocanegra <to...@day.com>.
well, it seems that they use now .sar as new extension:
"In the past SAP developed the tool and named it CAR. The extensions
of all compressed files were named ".CAR" as well. In SAP release 4.6C
SAP decided to enhance the functionality of the CAR utility a bit.
Therefore, the internal structure of the compressed files slightly
changed. Because of this, it was necessary to create a new extension.
There the new extension ".SAR" was born and the new utility was named
SAPCAR."
however, finding a suitable, not used extension is almost impossible
(http://filext.com/detaillist.php?extdetail=car&Search=Search). using
.car seemed to be to most obvious and least known in the j2ee/java
world.
regards, toby
On 2/21/07, Julian Reschke <ju...@gmx.de> wrote:
> Tobias Bocanegra schrieb:
> > hi all,
> > 2 weeks ago we promised to contribute our package mechanism for jcr
> > content to the jackrabbit project [JCR-733]. after a lengthy
> > (internal) discussion we decided to completely re-develop a new
> > content archiver that is based on a filesystem-like abstraction of the
> > content in jcr.
> >
> > a content archive (.car) will:
> > ...
>
> Hi Tobias,
>
> you may want to avoid confusion with SAPCAR, which uses the same
> extension (<http://www.easymarketplace.de/SAPCAR.php>).
>
> Best regards, Julian
>
--
-----------------------------------------< tobias.bocanegra@day.com >---
Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
T +41 61 226 98 98, F +41 61 226 98 97
-----------------------------------------------< http://www.day.com >---
Re: General Packaging mechanism
Posted by Julian Reschke <ju...@gmx.de>.
Tobias Bocanegra schrieb:
> hi all,
> 2 weeks ago we promised to contribute our package mechanism for jcr
> content to the jackrabbit project [JCR-733]. after a lengthy
> (internal) discussion we decided to completely re-develop a new
> content archiver that is based on a filesystem-like abstraction of the
> content in jcr.
>
> a content archive (.car) will:
> ...
Hi Tobias,
you may want to avoid confusion with SAPCAR, which uses the same
extension (<http://www.easymarketplace.de/SAPCAR.php>).
Best regards, Julian