You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Tobias Bocanegra <to...@day.com> on 2007/02/21 11:58:16 UTC

General Packaging mechanism

hi all,
2 weeks ago we promised to contribute our package mechanism for jcr
content to the jackrabbit project [JCR-733]. after a lengthy
(internal) discussion we decided to completely re-develop a new
content archiver that is based on a filesystem-like abstraction of the
content in jcr.

a content archive (.car) will:
- have a hierarchy that corresponds to the hierarchy in the repo
- have a filesystem based serialization of the items
- have a sophisticated serialization for non-nt:file based nodes
- be compressed
- contain meta-information about used node-types, namespaces and mapping rules
- allow exporting/importing to any jcr-repository
- use a standard format for the archive (i.e. zip/jar)

i will add the respective contrib project shortly and develop the
first batch of code together with some documentation.

comments welcome.
regards, toby

[JCR-733] http://issues.apache.org/jira/browse/JCR-733

Very Simple Example
-------------------
Repository Structure:

+ parent [nt:folder]
  + file1.txt [nt:file]
    + jcr:content [nt:resource]
      - jcr:mimeType "text/plain"
      - jcr:data
  + tests [nt:folder]
    + file2.txt [nt:file]
      ...

Filesystem/Archive structure:

/META-INF
  ...(nodetype an namespace infos)...
/content
  /parent
    /file1.txt
    /tests
      /file2.txt

-- 
-----------------------------------------< tobias.bocanegra@day.com >---
Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
T +41 61 226 98 98, F +41 61 226 98 97
-----------------------------------------------< http://www.day.com >---

Re: General Packaging mechanism

Posted by Tobias Bocanegra <to...@day.com>.
correct.
i created a small test class that writes and reads from a jar file.
and everything works ok. no matter whatever platform-encoding i use on
my system.
so, i see no problem using the zip-io from jdk1.4.

regards, toby

public class JarTest {

    public static void main(String[] args) throws Exception {
        System.out.println("System file.encoding: " +
System.getProperty("file.encoding"));
        // write entries
        byte[] testBuffer = "Hello, world.\n".getBytes();
        FileOutputStream out = new FileOutputStream("test.jar");
        ZipOutputStream zout = new ZipOutputStream(out);
        ZipEntry e = new ZipEntry("\u03b1 - first.txt");
        zout.putNextEntry(e);
        zout.write(testBuffer);
        zout.closeEntry();
        e = new ZipEntry("\u03b2 - second.txt");
        zout.putNextEntry(e);
        zout.write(testBuffer);
        zout.closeEntry();
        e = new ZipEntry("\u263a - smile.txt");
        zout.putNextEntry(e);
        zout.write(testBuffer);
        zout.closeEntry();
        zout.close();
        out.close();

        // reopen and read entries
        FileInputStream in = new FileInputStream("test.jar");
        ZipInputStream zin = new ZipInputStream(in);
        while ((e = zin.getNextEntry()) != null) {
            System.out.println(e.getName());
        }
        zin.close();
        in.close();
    }
}

-- 
-----------------------------------------< tobias.bocanegra@day.com >---
Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
T +41 61 226 98 98, F +41 61 226 98 97
-----------------------------------------------< http://www.day.com >---

Re: General Packaging mechanism

Posted by Nicolas <nt...@gmail.com>.
Hi,

This link might help I think:
http://www.peterbuettner.de/develop/javasnippets/zipOnlyAscii/index.html

The important excerpt: "After investigation of the native code i found, that
they interpret the names in the zip as utf-8 encoded. The bytes are
transformed into a String nevertheless if they are valid utf-8 or not."

So if the package application produces UTF-8 encoded filename there won't be
any issue

Hope it helps.

Nicolas

On 2/21/07, Julian Reschke <ju...@gmx.de> wrote:
>
> Tobias Bocanegra schrieb:
> >> Unrelated to that...:
> >>
> >> > - use a standard format for the archive (i.e. zip/jar)
> >>
> >> If you use ZIP/JAR as format, how are you going to handle non-ASCII
> >> characters in filenames in a portable way?
> >
> > all non-valid filesystem characters are escaped using url-escaping %xx
> > or %uXXXX. actually i haven't looked at how non-ascii characters are
> > handled in a jar file, but obviously it works, since i can include
> > such a file in a jar file:
> > ...
>
> My understanding is that the JAR/ZIP format is silent on filename
> encoding. So a producer if these files will have to select an encoding,
> and the recipient need to select the same one. In general, this is not
> going to work unless everybody agrees to use UTF-8.
>
> > [tripod@sulu test]$ touch "到日本来.txt"
> > [tripod@sulu test]$ ll
> > total 0
> > -rw-rw-r-- 1 tripod tripod 0 Feb 21 14:44 到日本来.txt
> > [tripod@sulu test]$ cd ..
> > [tripod@sulu jcr-car]$ jar cvf test.jar test/
> > added manifest
> > adding: test/(in = 0) (out= 0)(stored 0%)
> > adding: test/到日本来.txt(in = 0) (out= 0)(stored 0%)
> > [tripod@sulu jcr-car]$ jar tf test.jar
> > META-INF/
> > META-INF/MANIFEST.MF
> > test/
> > test/到日本来.txt
>
> I think it's using the platform encoding, and you happened to try a
> character (can't tell from your mail) that can be represented in that
> encoding. Try a mix of special character (Euro sign, Hebrew, Arabic) in
> one filename, and retry :-)
>
> Best regards, Julian
>
> (P.S.: we had trouble using ZIP as a content container format two years
> ago for the reasons above; maybe the situation has improved but I really
> doubt that)
>
>


-- 
a+
Nico
my blog! http://www.deviant-abstraction.net !!

Re: General Packaging mechanism

Posted by Julian Reschke <ju...@gmx.de>.
Tobias Bocanegra schrieb:
>> Unrelated to that...:
>>
>> > - use a standard format for the archive (i.e. zip/jar)
>>
>> If you use ZIP/JAR as format, how are you going to handle non-ASCII
>> characters in filenames in a portable way?
> 
> all non-valid filesystem characters are escaped using url-escaping %xx
> or %uXXXX. actually i haven't looked at how non-ascii characters are
> handled in a jar file, but obviously it works, since i can include
> such a file in a jar file:
> ...

My understanding is that the JAR/ZIP format is silent on filename
encoding. So a producer if these files will have to select an encoding,
and the recipient need to select the same one. In general, this is not
going to work unless everybody agrees to use UTF-8.

> [tripod@sulu test]$ touch "到日本来.txt"
> [tripod@sulu test]$ ll
> total 0
> -rw-rw-r-- 1 tripod tripod 0 Feb 21 14:44 到日本来.txt
> [tripod@sulu test]$ cd ..
> [tripod@sulu jcr-car]$ jar cvf test.jar test/
> added manifest
> adding: test/(in = 0) (out= 0)(stored 0%)
> adding: test/到日本来.txt(in = 0) (out= 0)(stored 0%)
> [tripod@sulu jcr-car]$ jar tf test.jar
> META-INF/
> META-INF/MANIFEST.MF
> test/
> test/到日本来.txt

I think it's using the platform encoding, and you happened to try a
character (can't tell from your mail) that can be represented in that
encoding. Try a mix of special character (Euro sign, Hebrew, Arabic) in
one filename, and retry :-)

Best regards, Julian

(P.S.: we had trouble using ZIP as a content container format two years
ago for the reasons above; maybe the situation has improved but I really
doubt that)


Re: General Packaging mechanism

Posted by Tobias Bocanegra <to...@day.com>.
> Unrelated to that...:
>
> > - use a standard format for the archive (i.e. zip/jar)
>
> If you use ZIP/JAR as format, how are you going to handle non-ASCII
> characters in filenames in a portable way?

all non-valid filesystem characters are escaped using url-escaping %xx
or %uXXXX. actually i haven't looked at how non-ascii characters are
handled in a jar file, but obviously it works, since i can include
such a file in a jar file:

[tripod@sulu test]$ touch "到日本来.txt"
[tripod@sulu test]$ ll
total 0
-rw-rw-r-- 1 tripod tripod 0 Feb 21 14:44 到日本来.txt
[tripod@sulu test]$ cd ..
[tripod@sulu jcr-car]$ jar cvf test.jar test/
added manifest
adding: test/(in = 0) (out= 0)(stored 0%)
adding: test/到日本来.txt(in = 0) (out= 0)(stored 0%)
[tripod@sulu jcr-car]$ jar tf test.jar
META-INF/
META-INF/MANIFEST.MF
test/
test/到日本来.txt


regards, toby
-- 
-----------------------------------------< tobias.bocanegra@day.com >---
Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
T +41 61 226 98 98, F +41 61 226 98 97
-----------------------------------------------< http://www.day.com >---

Re: General Packaging mechanism

Posted by Tim Kettering <ti...@vivakos.com>.
isn't .car also used in J2EE for the database connector packages  
often used in J2EE servers such as Geronimo?

On Feb 21, 2007, at 8:25 AM, Julian Reschke wrote:

> Tobias Bocanegra schrieb:
>> well, it seems that they use now .sar as new extension:
>> "In the past SAP developed the tool and named it CAR. The extensions
>> of all compressed files were named ".CAR" as well. In SAP release  
>> 4.6C
>> SAP decided to enhance the functionality of the CAR utility a bit.
>> Therefore, the internal structure of the compressed files slightly
>> changed. Because of this, it was necessary to create a new extension.
>> There the new extension ".SAR" was born and the new utility was named
>> SAPCAR."
>> however, finding a suitable, not used extension is almost impossible
>> (http://filext.com/detaillist.php?extdetail=car&Search=Search). using
>> .car seemed to be to most obvious and least known in the j2ee/java
>> world.
>
> Point taken, but maybe it may make sense to consider to longer  
> extensions, such as "jcrcar".
>
> Unrelated to that...:
>
>> - use a standard format for the archive (i.e. zip/jar)
>
> If you use ZIP/JAR as format, how are you going to handle non-ASCII  
> characters in filenames in a portable way?
>
> Best regards, Julian


Re: General Packaging mechanism

Posted by Julian Reschke <ju...@gmx.de>.
Tobias Bocanegra schrieb:
> well, it seems that they use now .sar as new extension:
> 
> "In the past SAP developed the tool and named it CAR. The extensions
> of all compressed files were named ".CAR" as well. In SAP release 4.6C
> SAP decided to enhance the functionality of the CAR utility a bit.
> Therefore, the internal structure of the compressed files slightly
> changed. Because of this, it was necessary to create a new extension.
> There the new extension ".SAR" was born and the new utility was named
> SAPCAR."
> 
> however, finding a suitable, not used extension is almost impossible
> (http://filext.com/detaillist.php?extdetail=car&Search=Search). using
> .car seemed to be to most obvious and least known in the j2ee/java
> world.

Point taken, but maybe it may make sense to consider to longer 
extensions, such as "jcrcar".

Unrelated to that...:

> - use a standard format for the archive (i.e. zip/jar)

If you use ZIP/JAR as format, how are you going to handle non-ASCII 
characters in filenames in a portable way?

Best regards, Julian

Re: General Packaging mechanism

Posted by Tobias Bocanegra <to...@day.com>.
well, it seems that they use now .sar as new extension:

"In the past SAP developed the tool and named it CAR. The extensions
of all compressed files were named ".CAR" as well. In SAP release 4.6C
SAP decided to enhance the functionality of the CAR utility a bit.
Therefore, the internal structure of the compressed files slightly
changed. Because of this, it was necessary to create a new extension.
There the new extension ".SAR" was born and the new utility was named
SAPCAR."

however, finding a suitable, not used extension is almost impossible
(http://filext.com/detaillist.php?extdetail=car&Search=Search). using
.car seemed to be to most obvious and least known in the j2ee/java
world.

regards, toby

On 2/21/07, Julian Reschke <ju...@gmx.de> wrote:
> Tobias Bocanegra schrieb:
> > hi all,
> > 2 weeks ago we promised to contribute our package mechanism for jcr
> > content to the jackrabbit project [JCR-733]. after a lengthy
> > (internal) discussion we decided to completely re-develop a new
> > content archiver that is based on a filesystem-like abstraction of the
> > content in jcr.
> >
> > a content archive (.car) will:
>  > ...
>
> Hi Tobias,
>
> you may want to avoid confusion with SAPCAR, which uses the same
> extension (<http://www.easymarketplace.de/SAPCAR.php>).
>
> Best regards, Julian
>


-- 
-----------------------------------------< tobias.bocanegra@day.com >---
Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
T +41 61 226 98 98, F +41 61 226 98 97
-----------------------------------------------< http://www.day.com >---

Re: General Packaging mechanism

Posted by Julian Reschke <ju...@gmx.de>.
Tobias Bocanegra schrieb:
> hi all,
> 2 weeks ago we promised to contribute our package mechanism for jcr
> content to the jackrabbit project [JCR-733]. after a lengthy
> (internal) discussion we decided to completely re-develop a new
> content archiver that is based on a filesystem-like abstraction of the
> content in jcr.
> 
> a content archive (.car) will:
 > ...

Hi Tobias,

you may want to avoid confusion with SAPCAR, which uses the same 
extension (<http://www.easymarketplace.de/SAPCAR.php>).

Best regards, Julian