You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Stefano Mazzocchi <st...@apache.org> on 2004/02/23 16:47:09 UTC

[RT] rethinking the cache storage system

We were using Jisp and Scott's decision makes it clear that we either:

  - have to maintain Jisp 2.x ourselves

or

  - use something else

Here I would like to ask you a much easier question: do we really need 
it? can't we just our storage into a bunch directories and use that as a 
file system? that works very well for file-intensive setups like mail 
client/servers and browser caches, why shouldn't it work for us?

My gut feelins is that having such a critical piece of our 
infrastructure so away from the metal is actually hurting us, both 
performance and complexity wise.

I would love to use BerkeleyDB, but it's native, incompatibly licensed 
and has terrible Java APIs. And all the problems of binary stores: you 
can't see inside from your shell!

I think that a better use of the file system would yield much more 
performance, since JVM IO is pretty much optimized for file access 
anyway (and uses OS-level caching).

thoughts?

-- 
Stefano.



Re: [RT] rethinking the cache storage system

Posted by Antonio Gallardo <ag...@agssa.net>.
Hi:

I think a cache storage system is IMHO a very good idea. A piece of history:

1- Initial idea of using jisp:
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=100781998632619&w=2

2-is this is still valid?: "B-Tree indexed file vs filesystem directory"
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=100816638410528&w=2

3-"The jisp store is definitely better..."
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101120120825438&w=2

Well, I think all the pointed mails + others in the devel maillist clearly
stated that it is better have a database in the storage than any
filesystem. Did we need to retest it again?

So I will be glad if people here seriously consider Apache JCS. It is used
in many other Apache projects it is from our own workshop and it can work
with jisp too.

WDYT?

Best Regards,

Antonio Gallardo

Re: [RT] rethinking the cache storage system

Posted by Joerg Heinicke <jo...@gmx.de>.
On 23.02.2004 16:47, Stefano Mazzocchi wrote:
> We were using Jisp and Scott's decision makes it clear that we either:
> 
>  - have to maintain Jisp 2.x ourselves
> 
> or
> 
>  - use something else
> 
> Here I would like to ask you a much easier question: do we really need 
> it? can't we just our storage into a bunch directories and use that as a 
> file system? that works very well for file-intensive setups like mail 
> client/servers and browser caches, why shouldn't it work for us?

And why shall we invent the wheel again? If every Apache project using 
Jisp until now starts with its own simple cache implementation ... Why 
not just using Turbine JCS? Had anybody a closer look in the meantime? 
Is it also oversized?

Joerg

Re: [RT] rethinking the cache storage system

Posted by Stefano Mazzocchi <st...@apache.org>.
Geoff Howard wrote:

> Pier Fumagalli wrote:
> 
>> On 23 Feb 2004, at 15:47, Stefano Mazzocchi wrote:
>>
>>>
>>> My gut feelins is that having such a critical piece of our 
>>> infrastructure so away from the metal is actually hurting us, both 
>>> performance and complexity wise.
>>
>>
>>
>> +1
>>
>>> I would love to use BerkeleyDB, but it's native, incompatibly 
>>> licensed and has terrible Java APIs. And all the problems of binary 
>>> stores: you can't see inside from your shell!
>>
>>
>>
>> It's all right...
>>
>>> I think that a better use of the file system would yield much more 
>>> performance, since JVM IO is pretty much optimized for file access 
>>> anyway (and uses OS-level caching).
>>>
>>> thoughts?
>>
>>
>>
>> I've been looking at the java.nio stuff, especially in the area of 
>> memory mapping some files :-P I can tell you that it's FAST, and 
>> basically does the trick. See a file as a big array in ram, well, but 
>> actually it's only a "fake" array mapped really on the disk, and 
>> cached by kernel...
> 
> 
> 
> I've been thinking of that myself -- do I remember correctly that we've 
> tossed around the idea of making 2.2 jdk1.4 only??

I would be in favor of that, yes. The FreeBSD guys pretty much know that 
java on that operating system is basically a lost battle anyway and 
Linux 2.6 is actually good enough to stand side by side with FreeBSD 
even for very big installations. All other OS have a 1.4 vm anyway.

-- 
Stefano.


Re: [RT] rethinking the cache storage system

Posted by Pier Fumagalli <pi...@betaversion.org>.
On 29 Feb 2004, at 21:58, Antonio Gallardo wrote:

> Pier Fumagalli dijo:
>> On 29 Feb 2004, at 19:25, Geoff Howard wrote:
>>
>>>>> I think that a better use of the file system would yield much more
>>>>> performance, since JVM IO is pretty much optimized for file access
>>>>> anyway (and uses OS-level caching).
>>>>>
>>>>> thoughts?
>>>>
>>>> I've been looking at the java.nio stuff, especially in the area of
>>>> memory mapping some files :-P I can tell you that it's FAST, and
>>>> basically does the trick. See a file as a big array in ram, well, 
>>>> but
>>>> actually it's only a "fake" array mapped really on the disk, and
>>>> cached by kernel...
>>>
>>> I've been thinking of that myself -- do I remember correctly that
>>> we've tossed around the idea of making 2.2 jdk1.4 only??
>>
>> That would be so cool... Now that even IBM has a 1.4 JVM for most
>> platform (including Linux), well, I don't see why not! :-)
>
> Is a VOTE needed for this?

I suppose so...

> If yes, Can you start it?

I believe that whoever volunteers to do the work should do it, and 
currently I have zero time - Cocoon is going in production at work :-)

	Pier


Re: [RT] rethinking the cache storage system

Posted by Antonio Gallardo <ag...@agssa.net>.
Pier Fumagalli dijo:
> On 29 Feb 2004, at 19:25, Geoff Howard wrote:
>
>>>> I think that a better use of the file system would yield much more
>>>> performance, since JVM IO is pretty much optimized for file access
>>>> anyway (and uses OS-level caching).
>>>>
>>>> thoughts?
>>>
>>> I've been looking at the java.nio stuff, especially in the area of
>>> memory mapping some files :-P I can tell you that it's FAST, and
>>> basically does the trick. See a file as a big array in ram, well, but
>>> actually it's only a "fake" array mapped really on the disk, and
>>> cached by kernel...
>>
>> I've been thinking of that myself -- do I remember correctly that
>> we've tossed around the idea of making 2.2 jdk1.4 only??
>
> That would be so cool... Now that even IBM has a 1.4 JVM for most
> platform (including Linux), well, I don't see why not! :-)

Is a VOTE needed for this? If yes, Can you start it?

Best Regards,

Antonio Gallardo

Re: [RT] rethinking the cache storage system

Posted by Pier Fumagalli <pi...@betaversion.org>.
On 29 Feb 2004, at 19:25, Geoff Howard wrote:

>>> I think that a better use of the file system would yield much more 
>>> performance, since JVM IO is pretty much optimized for file access 
>>> anyway (and uses OS-level caching).
>>>
>>> thoughts?
>>
>> I've been looking at the java.nio stuff, especially in the area of 
>> memory mapping some files :-P I can tell you that it's FAST, and 
>> basically does the trick. See a file as a big array in ram, well, but 
>> actually it's only a "fake" array mapped really on the disk, and 
>> cached by kernel...
>
> I've been thinking of that myself -- do I remember correctly that 
> we've tossed around the idea of making 2.2 jdk1.4 only??

That would be so cool... Now that even IBM has a 1.4 JVM for most 
platform (including Linux), well, I don't see why not! :-)

	Pier


Re: [RT] rethinking the cache storage system

Posted by Geoff Howard <co...@leverageweb.com>.
Pier Fumagalli wrote:

> On 23 Feb 2004, at 15:47, Stefano Mazzocchi wrote:
>
>>
>> My gut feelins is that having such a critical piece of our 
>> infrastructure so away from the metal is actually hurting us, both 
>> performance and complexity wise.
>
>
> +1
>
>> I would love to use BerkeleyDB, but it's native, incompatibly 
>> licensed and has terrible Java APIs. And all the problems of binary 
>> stores: you can't see inside from your shell!
>
>
> It's all right...
>
>> I think that a better use of the file system would yield much more 
>> performance, since JVM IO is pretty much optimized for file access 
>> anyway (and uses OS-level caching).
>>
>> thoughts?
>
>
> I've been looking at the java.nio stuff, especially in the area of 
> memory mapping some files :-P I can tell you that it's FAST, and 
> basically does the trick. See a file as a big array in ram, well, but 
> actually it's only a "fake" array mapped really on the disk, and 
> cached by kernel...


I've been thinking of that myself -- do I remember correctly that we've 
tossed around the idea of making 2.2 jdk1.4 only??

Geoff

Re: [RT] rethinking the cache storage system

Posted by Pier Fumagalli <pi...@betaversion.org>.
On 23 Feb 2004, at 15:47, Stefano Mazzocchi wrote:
>
> My gut feelins is that having such a critical piece of our 
> infrastructure so away from the metal is actually hurting us, both 
> performance and complexity wise.

+1

> I would love to use BerkeleyDB, but it's native, incompatibly licensed 
> and has terrible Java APIs. And all the problems of binary stores: you 
> can't see inside from your shell!

It's all right...

> I think that a better use of the file system would yield much more 
> performance, since JVM IO is pretty much optimized for file access 
> anyway (and uses OS-level caching).
>
> thoughts?

I've been looking at the java.nio stuff, especially in the area of 
memory mapping some files :-P I can tell you that it's FAST, and 
basically does the trick. See a file as a big array in ram, well, but 
actually it's only a "fake" array mapped really on the disk, and cached 
by kernel...

	Pier


Re: [RT] rethinking the cache storage system

Posted by Scott Robert Ladd <co...@coyotegulch.com>.
Stefano Mazzocchi wrote:
> We were using Jisp and Scott's decision makes it clear that we either:
> 
>  - have to maintain Jisp 2.x ourselves

No. I have stated that I am more than willing to maintain and support 
Jisp 2.x, under the libpng license, specifically for Cocoon.

I am not willing to relicense Jisp 3.0.0.

In other words, Apache Cocoon could have its own fork of Jisp (from the 
2.x source base), perhaps under another name, for its own purposes. I'd 
even be willing to donate such a version into the Apache community.

..Scott

-- 
Scott Robert Ladd
Coyote Gulch Productions (http://www.coyotegulch.com)
Software Invention for High-Performance Computing


Re: [RT] rethinking the cache storage system

Posted by Pier Fumagalli <pi...@betaversion.org>.
On 2 Mar 2004, at 05:29, Mircea Toma wrote:

> Stefano Mazzocchi wrote:
>> We were using Jisp and Scott's decision makes it clear that we either:
>>  - have to maintain Jisp 2.x ourselves
>> or
>>  - use something else
>
> How about http://jdbm.sourceforge.net/ ? It implements the B+Tree 
> algorithm. B+Tree is a better indexing mechanism than BTree, because 
> the indexes can be cached in memory. JDBM has a BSD license also.

Decent starting point... I used to work with the guy who wrote it: Alex 
Boisvert (Intalio). Now, if it only implemented the java.util.Map 
interface, and used NIO, THAT would be so cool! :-D

	Pier


Re: [RT] rethinking the cache storage system

Posted by Mircea Toma <mi...@airpost.net>.
Stefano Mazzocchi wrote:
> We were using Jisp and Scott's decision makes it clear that we either:
> 
>  - have to maintain Jisp 2.x ourselves
> 
> or
> 
>  - use something else

How about http://jdbm.sourceforge.net/ ? It implements the B+Tree 
algorithm. B+Tree is a better indexing mechanism than BTree, because the 
indexes can be cached in memory. JDBM has a BSD license also.

Mircea