You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@sling.apache.org by Jason Bailey <Ja...@sas.com> on 2015/11/23 14:41:30 UTC

on using MongoMK

First, I would say that the majority of my issues come from the fact that I was attempting to migrate a good sized amount of content from crx2 to crx3. However, the issues I have with OAK/MongoDB are pretty significant.

1.  Having a remote JCR is not like having a remote data store for some content on a traditional webserver. Unlike other platforms, everything is in the JCR. Everything; your code, configurations, images. Everything is content and it's somewhere else. In fact this is such an impact that the OAK team has gone to great strides to  facilitate caching so that after they've moved everything to a remote data store, they could then store it back locally.

2. MongoDB was a poor choice for OAK. The JCR specification detailed the ability to have a point in time view of the content repository.  This is a usually referred to as MVCC and a lot of databases have this capability by default so that you can grab a timestamp /id/session key and maintain a consistent view. MongoDB 2.x doesn't have this. So instead of using something that comes OOTB for the majority of db's the OAK team had to implement their own MVCC on top of the data.  

3. Additionally, possibly because of a lack of MVCC, indexing on the MongoDB blocks the table it's indexing. So the OAK team appears to have implemented their own index tables instead of using the native indexing. This has the side issue of indexing being performed by the OAK implementation, which is on the front end. Resulting in a far slower indexing experience on Mongo then you would if you used local store. Additionally.  In CRX2 everything was indexed. Which made it far easier to make performant queries but had the side effect of huge repositories. In OAK, nothing is indexed unless you say so. Much like a traditional DB table. So telling someone that someone can make a query against any property is true. But if you have a large amount of data, that query is going to be painfully long unless you index correctly.

4. AEM backtracking. If you go to the documentation site for AEM, the documentation that I used to choose Mongo for our Architecture isn't there anymore. There has been a culling of references to MongoDB and it's usage. Right now, if you were to contact their support team, they only recommend it for use if your Data is too large for the TarMK solution, and in their documentation they suggest that you have an AEM architect review what you're doing before using it.


I should clarify that the problems I've seen isn't with MongoDB itself. It's with the OAK implementation of the JCR on top of MongoDB that appears to be problematic. 

-Jason

-----Original Message-----
From: Olaf [mailto:olaf@x100.de] 
Sent: Monday, November 16, 2015 11:19 AM
To: users@sling.apache.org
Subject: RE: Resource class vs CND

Hi Jason,

That sounds interesting! Would you mind providing a bit more detail as to why you would advise against it?

Regards,
Olaf


Re: on using MongoMK

Posted by Robert Munteanu <ro...@apache.org>.
Hi Jason,

Some of the points probably require someone more knowledgeable
regarding Oak, but I would like to add my point of view to a couple.

On Mon, 2015-11-23 at 13:41 +0000, Jason Bailey wrote:
> 1.  Having a remote JCR is not like having a remote data store for
> some content on a traditional webserver. Unlike other platforms,
> everything is in the JCR. Everything; your code, configurations,
> images. Everything is content and it's somewhere else. In fact this
> is such an impact that the OAK team has gone to great strides
> to  facilitate caching so that after they've moved everything to a
> remote data store, they could then store it back locally.

Well, yes :-) I think that's pretty much how you would expect JCR to
work. Since Oak is an intermediate layer between Mongo and your
application code, it makes sense to cache values locally.

Consider the fact that, even on Linux, the kernel caches recently
accessed files in memory for faster access. It's a common approach to
cache values from a (relatively) slow data source.

> 
> 2. MongoDB was a poor choice for OAK. The JCR specification detailed
> the ability to have a point in time view of the content
> repository.  This is a usually referred to as MVCC and a lot of
> databases have this capability by default so that you can grab a
> timestamp /id/session key and maintain a consistent view. MongoDB 2.x
> doesn't have this. So instead of using something that comes OOTB for
> the majority of db's the OAK team had to implement their own MVCC on
> top of the data.

See my comment on 3 - I am not sure that even a MVCC-aware data would
be enough to use the native indexes, as opposed to Oak-based ones.

> 
> 3. Additionally, possibly because of a lack of MVCC, indexing on the
> MongoDB blocks the table it's indexing. So the OAK team appears to
> have implemented their own index tables instead of using the native
> indexing. This has the side issue of indexing being performed by the
> OAK implementation, which is on the front end. Resulting in a far
> slower indexing experience on Mongo then you would if you used local
> store. Additionally.  In CRX2 everything was indexed. Which made it
> far easier to make performant queries but had the side effect of huge
> repositories. In OAK, nothing is indexed unless you say so. Much like
> a traditional DB table. So telling someone that someone can make a
> query against any property is true. But if you have a large amount of
> data, that query is going to be painfully long unless you index
> correctly.

Regarding indexing - the Oak indexes are aware of access control and
versioning - something that does not came native with other persistence
options - and the indexes must be aware of that. So there isn't a way
of using 'native' indexes with Oak. They are indeed used, but
internally, and not useful to applications which build on top of Oak.


> 4. AEM backtracking. If you go to the documentation site for AEM, the
> documentation that I used to choose Mongo for our Architecture isn't
> there anymore. There has been a culling of references to MongoDB and
> it's usage. Right now, if you were to contact their support team,
> they only recommend it for use if your Data is too large for the
> TarMK solution, and in their documentation they suggest that you have
> an AEM architect review what you're doing before using it.

Well, I'm in position to comment on that, and I doubt anyone on this
list is :-) I will limit my discussion to Sling only.

Thanks,

Robert

> 
> 
> I should clarify that the problems I've seen isn't with MongoDB
> itself. It's with the OAK implementation of the JCR on top of MongoDB
> that appears to be problematic. 
> 
> -Jason
> 
> -----Original Message-----
> From: Olaf [mailto:olaf@x100.de] 
> Sent: Monday, November 16, 2015 11:19 AM
> To: users@sling.apache.org
> Subject: RE: Resource class vs CND
> 
> Hi Jason,
> 
> That sounds interesting! Would you mind providing a bit more detail
> as to why you would advise against it?
> 
> Regards,
> Olaf
>