You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Alexander Klimetschek <ak...@adobe.com> on 2012/11/22 19:15:03 UTC

[OT] JCR on a file system

Hi everyone,

I came across [0] and its interesting "Don't fear the filesystem!" section. That brought me to the "stupid" question:

Why does Jackrabbit/Oak not map JCR hierarchies directly to the filesystem?

Things that come to my mind, but I am not sure if you couldn't overcome them with some creativity:
- transactions
- names: JCR allowing almost unrestricted node names, file systems probably not
- batch-write performance (though in-memory buffering in OS should have the same effect)

[0] http://incubator.apache.org/kafka/design.html

Cheers,
Alex

Re: [OT] JCR on a file system

Posted by Jörg Hoh <jh...@googlemail.com>.
Hi Alex,

2012/11/22 Alexander Klimetschek <ak...@adobe.com>

>
> Why does Jackrabbit/Oak not map JCR hierarchies directly to the filesystem?
>
>
I worked with the proprietary content "repository" of Day's Communique 3
(the contentbus), which stored pages directly on the filesystem. Besides
the things Jukka pointed out I would like to add:

* ACLs: I think it's very hard to map the JCR ACL semantic to filesystem
ACLs (I don't want to discuss here the differences between Windows and *nix
systems regarding access rights, ACLs, RBAC and so on ... true
multiplattform is then a real real hard job). Besides this fact, I don't
think it's a good idea of reuse the user management of your OS and map
system users to JCR users. So in any case you have to do the ACL and
usermanagement stuff on a repository level yourself.
* Dealing with millions or even billions of small files is still a hard job
for a file system, although they improved a lot in the last 10 years.
* Operation people will complain if you need to store these millions of
files on their backup systems, because they need to keep track of every
single file.

So from a operational point of view there are a lot of arguments against a
"every node is file" approach.


-- 
Cheers,
Jörg Hoh,

http://cqdump.wordpress.com

Re: [OT] JCR on a file system

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Thu, Nov 22, 2012 at 8:15 PM, Alexander Klimetschek
<ak...@adobe.com> wrote:
> Why does Jackrabbit/Oak not map JCR hierarchies directly to the filesystem?

As pointed out in the Kafka document, random access over a file system
is terribly inefficient which is why splitting finely grained content
like what you typically see in a content repository to separate files
and directories wouldn't work too well performance-wise with normal
file systems. Doing so would also suffer from the other issues you
mentioned, most notably lack of atomicity or locking.

Instead, and like Kafka also does, storing repository content in big
journal or collection files is a pretty good idea. That's what our
proprietary TarPM does for Jackrabbit 2.x and you could argue that
also the database-backed PMs and Oak MKs are doing the something
similar through the database engine. (Also git does this with its pack
files.) The main difference to the design as outlined in Kafka is that
for various reasons (remote access, etc.) we've had to add various
levels of in-memory caching especially in Jackrabbit 2.x. In Oak we've
tried to avoid extra caches, and so far only had to add one (that we
could perhaps avoid with OAK-468). If we can keep that goal up, and
further optimize JSON processing at the MK level, it should be
possible also for an Oak stack to work as outlined in the Kafka
document.

BR,

Jukka Zitting

Re: [OT] JCR on a file system

Posted by Lukas Kahwe Smith <ml...@pooteeweet.org>.

On 22.11.2012, at 19:15, Alexander Klimetschek <ak...@adobe.com> wrote:

> Hi everyone,
> 
> I came across [0] and its interesting "Don't fear the filesystem!" section. That brought me to the "stupid" question:
> 
> Why does Jackrabbit/Oak not map JCR hierarchies directly to the filesystem?
> 
> Things that come to my mind, but I am not sure if you couldn't overcome them with some creativity:
> - transactions
> - names: JCR allowing almost unrestricted node names, file systems probably not
> - batch-write performance (though in-memory buffering in OS should have the same effect)
> 
> [0] http://incubator.apache.org/kafka/design.html

sure .. or git

regards,
Lukas