You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by Stefan Seifert <ss...@pro-vision.de> on 2015/01/27 17:09:18 UTC

[RT] Sling Resource Providers for NoSQL databases - MongoDB, Couchbase

we currently evaluate to integrate a Couchbase NoSQL database [1] into a sling resource tree. as a starting point i had a deeper look on the MongoDB resource provider [2], because the concept is quite similar.

some thoughts on this:

1. what is the status of the mongodb provider? is someone using it already in production? looking at the code it seems to be not threadsafe concerning the CRUD handling with non-synchronized hash maps.

2. how to map resource URLs to NoSQL: the mongodb provider has a syntax like:
<root_path>/<collection>/<custom_path>
where root_path and the mongodb database name is configurable via OSGi (multiple entry points possible), collection has to match with an existing collection in mongodb and the remaining path is mapped to a property in a document in the collection.
i wonder if this is the best solution, the collection path part seems too restrictive to me (fails if the collection does not exist). i would favor specifying both root_path and collection via osgi allowing entry points with an unconstrained tree hierarchy below.
couchbase for example does not have such a collection concept, it has only "bucket" which is comparable to the mongodb "database".

3. the resource provider mixes up the in-memory CRUD handling with keeping maps of changed/deleted resources, and the mapping to the NoSQL structure. if these two aspects would be separated the former could be reused for all NoSQL databases and the latter is responsible only for the flat list resource-to-document mapping and will be different for each NoSQL database. bonus: the thread-safety of the CRUD handling has to be implemented only once, not once for each resource provider.
additional logic like type mapping values to strings, generic value map implementations, automatic tree creation etc. could be shared between all NoSQL providers.

4. an open point is whether to support binary data as well, or to leave it out in the first phase. storing binary data may be problematic for some NoSQL databases, requiring a separate storage concept for this. the mongodb resource provider currently does not support binary data.

5. there were plans to create a SOLR sling resource provider [3][4], which goes roughly in the same direction; but it seems it had no outcome.

WDYT?

stefan

[1] http://www.couchbase.com
[2] https://svn.apache.org/repos/asf/sling/trunk/contrib/extensions/mongodb
[3] https://issues.apache.org/jira/browse/SLING-2795
[4] http://apache-sling.73963.n3.nabble.com/GSoC-2013-Apache-Solr-backend-for-Apache-Sling-tt4023347.html


Re: [RT] Sling Resource Providers for NoSQL databases - MongoDB, Couchbase

Posted by Bertrand Delacretaz <bd...@apache.org>.
On Tue, Jan 27, 2015 at 5:49 PM, Stefan Seifert <ss...@pro-vision.de> wrote:
> ...this is of course a very special usecase....

Ok, if a Sling ResourceProvider is sufficient it's probably simpler to
write than an Oak backend.

-Bertrand

RE: [RT] Sling Resource Providers for NoSQL databases - MongoDB, Couchbase

Posted by Stefan Seifert <ss...@pro-vision.de>.
yes, this is a good point. have to think about it.

on the first sight it feels a bit heavyweight for my usecase: in an osgi-microservice scenario without JCR/oak, and currently without sling engine but with some sling subprojects including launchpad i want to use sling distributed events and job processing and discovery. but the current implementations need to store some small parts of data in a shared repository, for which couchbase would suit well because it's used already by the microservices. but this is of course a very special usecase.

stefan

>-----Original Message-----
>From: Bertrand Delacretaz [mailto:bdelacretaz@apache.org]
>Sent: Tuesday, January 27, 2015 5:38 PM
>To: dev
>Subject: Re: [RT] Sling Resource Providers for NoSQL databases - MongoDB,
>Couchbase
>
>Hi,
>
>On Tue, Jan 27, 2015 at 5:09 PM, Stefan Seifert <ss...@pro-vision.de>
>wrote:
>> ...we currently evaluate to integrate a Couchbase NoSQL database [1] into a
>sling resource tree...
>
>As mentioned in a different thread about DynamoDB, it might be
>interesting to consider writing a Couchbase backend for Oak instead.
>
>The advantages are inheriting Oak's indexing and other useful JCR
>features, and maybe that helps with the other issues that you mention,
>transient space etc.
>
>It's probably a bit more complicated, IIUC the current recommended way
>is to implement an Oak DocumentStore, but that should be checked with
>the Oak team.
>
>-Bertrand

Re: [RT] Sling Resource Providers for NoSQL databases - MongoDB, Couchbase

Posted by Bertrand Delacretaz <bd...@apache.org>.
Hi,

On Tue, Jan 27, 2015 at 5:09 PM, Stefan Seifert <ss...@pro-vision.de> wrote:
> ...we currently evaluate to integrate a Couchbase NoSQL database [1] into a sling resource tree...

As mentioned in a different thread about DynamoDB, it might be
interesting to consider writing a Couchbase backend for Oak instead.

The advantages are inheriting Oak's indexing and other useful JCR
features, and maybe that helps with the other issues that you mention,
transient space etc.

It's probably a bit more complicated, IIUC the current recommended way
is to implement an Oak DocumentStore, but that should be checked with
the Oak team.

-Bertrand

RE: [RT] Sling Resource Providers for NoSQL databases - MongoDB, Couchbase

Posted by Stefan Seifert <ss...@pro-vision.de>.
>ok - my next plan is to create a branch for a proof of concept couchbase
>resource adapter including a proposal for a shared part of nosql providers
>(and create a ticket for further discussion).

ticket and whiteboard branch created:
https://issues.apache.org/jira/browse/SLING-4381

stefan

RE: [RT] Sling Resource Providers for NoSQL databases - MongoDB, Couchbase

Posted by Stefan Seifert <ss...@pro-vision.de>.
>> 1. what is the status of the mongodb provider? is someone using it already
>in production? looking at the code it seems to be not threadsafe concerning
>the CRUD handling with non-synchronized hash maps.
>
>Afaik, there are people using that code slightly modified (not sure what
>the changes are) in production. I think the impl is thread safe as a
>resource provider by itself must not be thread safe. But if it's not, we
>should fix it.

ah, i see. each resource resolver gets its own instance of resource providers, so the resource provider itself does not have to care about concurrency because resource resolver is not thread-safe either. this makes the implementation simpler.


>In general it would be cool to have some more/better NoSQL support in
>Sling to attract devs using these storages.

ok - my next plan is to create a branch for a proof of concept couchbase resource adapter including a proposal for a shared part of nosql providers (and create a ticket for further discussion).

stefan

Re: [RT] Sling Resource Providers for NoSQL databases - MongoDB, Couchbase

Posted by Carsten Ziegeler <cz...@apache.org>.
Am 27.01.15 um 17:09 schrieb Stefan Seifert:
> we currently evaluate to integrate a Couchbase NoSQL database [1] into a sling resource tree. as a starting point i had a deeper look on the MongoDB resource provider [2], because the concept is quite similar.
> 
> some thoughts on this:
> 
> 1. what is the status of the mongodb provider? is someone using it already in production? looking at the code it seems to be not threadsafe concerning the CRUD handling with non-synchronized hash maps.

Afaik, there are people using that code slightly modified (not sure what
the changes are) in production. I think the impl is thread safe as a
resource provider by itself must not be thread safe. But if it's not, we
should fix it.

> 
> 
> 3. the resource provider mixes up the in-memory CRUD handling with keeping maps of changed/deleted resources, and the mapping to the NoSQL structure. if these two aspects would be separated the former could be reused for all NoSQL databases and the latter is responsible only for the flat list resource-to-document mapping and will be different for each NoSQL database. bonus: the thread-safety of the CRUD handling has to be implemented only once, not once for each resource provider.
> additional logic like type mapping values to strings, generic value map implementations, automatic tree creation etc. could be shared between all NoSQL providers.
> 
Yes I heard this suggestion from several people already :) So +1 for
refactoring.


> 4. an open point is whether to support binary data as well, or to leave it out in the first phase. storing binary data may be problematic for some NoSQL databases, requiring a separate storage concept for this. the mongodb resource provider currently does not support binary data.
> 
I guess as a first step, not supporting binaries is fine.

In general it would be cool to have some more/better NoSQL support in
Sling to attract devs using these storages.

Regards
Carsten


-- 
Carsten Ziegeler
Adobe Research Switzerland
cziegeler@apache.org