You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Damiano Albani <da...@gmail.com> on 2019/02/28 11:50:00 UTC

[Oak] Using RDBDocumentStore in OSGi environment

Hello,

I have looked at the Oak documentation / code and googled quite a lot but I
couldn't find instructions on how to configure Oak to use an
RDBDocumentStore in an OSGi environment.

I have written a Spring Boot application based on the example available at
https://github.com/apache/jackrabbit-oak/tree/trunk/oak-examples/standalone,
and it works fine with MongoDB.
Well, actually, I see quite a lof of disk I/O from MongoDB when adding lots
of JCR nodes.
That's why I wanted to have a try with an relational database (PostgreSQL)
and see how it compares.

I'm an OSGi newbie so it's probably obvious for someone with more
experience, but here's what I tried in my repository JSON configuration
file:

> ...
> "org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService": {
>   "customBlobStore": true,
>   "documentStoreType": "rdb"
> },
> ...
>

I see the following in the logs at startup:

> ...
> o.a.j.o.p.d.DocumentNodeStoreService     : Initializing DocumentNodeStore
> with BlobStore [DataStore backed BlobStore
> [org.apache.jackrabbit.oak.blob.cloud.gcs.GcsDataStore]]
> o.a.j.o.p.d.DocumentNodeStoreService     : DataSource use enabled.
> DocumentNodeStoreService would be initialized when DataSource would be
> available (currently available: nodes: null, blobs: null)
> ...
>

I suppose Oak is waiting for a *javax.sql.DataSource* to be "available",
but how am I supposed to define one?
Do I need to use a library like OPS4J Pax JDBC??
I've tried adding the following to the repository JSON config:

> "org.ops4j.datasource": {
>   "osgi.jdbc.driver.class": "org.postgresql.Driver",
>   "dataSourceName": "PostgreSQL,",
>   "user": "sa",
>   "password": "sa"
> },
>

And I also had to modify the *REPOSITORY_BUNDLE_FILTER* property to add
"(Bundle-SymbolicName=org.ops4j.pax*)".
I now see activity related to Pax in the logs, but Oak still refuses to
startup properly.
What am I missing? Thanks for your help!

Best regards,

-- 
Damiano Albani

Re: [Oak] Using RDBDocumentStore in OSGi environment

Posted by zhouxu <zh...@docworks.cn>.
Hello expert!
      I'm an oak newbie,I have written a Spring Boot application embed oak.I
found that many functions must be used in the OSGi framework,solr and so
on.Can you give us some advice? Must we use the OSGi framework? How about
Apache sling?
    thanks a lot!




--
Sent from: http://jackrabbit.510166.n4.nabble.com/Jackrabbit-Users-f510167.html

Re: [Oak] Using RDBDocumentStore in OSGi environment

Posted by Julian Reschke <ju...@gmx.de>.
On 05.03.2019 11:40, Damiano Albani wrote:
> Hi Julian,
> 
> On Thu, Feb 28, 2019 at 1:34 PM Julian Reschke <ju...@gmx.de>
> wrote:
> 
>>
>> You can use
>> <https://sling.apache.org/documentation/bundles/datasource-providers.html>
>>
>> as datasource provider.
>>
> 
> That's indeed the piece I was looking for, thanks a lot for the tip!
> For the record, in case someone else would ever be interested, here's what
> I did:
> 
> Maven pom.xml
> 
>>    <dependency>
>>        <groupId>org.apache.sling</groupId>
>>        <artifactId>org.apache.sling.datasource</artifactId>
>>        <version>1.0.4</version>
>>    </dependency>
>>
> 
> OAK JSON configuration file:
> 
>>    "org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService": {
>>      "documentStoreType": "rdb"
>>    },
>>    "org.apache.sling.datasource.DataSourceFactory": {
>>      "datasource.name": "oak",
>>      "driverClassName": "org.postgresql.Driver",
>>      "url": "${oak.postgresql.url}",
>>      "username": "${oak.postgresql.username}",
>>      "password": "${oak.postgresql.password}"
>>    },
>>
> 
> I saw a few interesting configuration flags as well in RDBDocumentStore.java
> <https://github.com/apache/jackrabbit-oak/blob/trunk/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBDocumentStore.java#L2145>
> (NOGZIP, NOAPPEND, etc).
> What the reasoning behind those flags? Are they useful to activate in
> certain conditions / with certain databases?

These were added so that certain features could be switched off when 
there's doubt that it works properly. (Don't panic: weren't needed after 
all...)

> By the way, using a set of ON SELECT / INSERT / UPDATE / DELETE DO INSTEAD
> rules, I could even store the *data* column as *jsonb* in PostgreSQL.
> (That should also be possible for the bdata column but I ran into some JSON
> syntax issues so far.)

Yes, DB-specific optimizations are in theory interesting. However, the 
goal was to keep stuff simple and portable.

>> That said: don't expect this to perform better than MongoDB - the
>> RDBDocumentStore essentially emulates the JSON storage of MongoDB inside
>> a relational database.
>>
> 
> Speaking of performance, how can I determine if my Oak setup works as fast
> as it should?
> I understand that performance is very specific to the environment where it
> runs, but what's an average "baseline" performance to expect from Oak?
> In terms of nodes added / updated / deleted per second for example? No
> queries here, simply CRUD operations.
> And what would be the associated MongoDB insert / query / update / delete
> metrics to expect?
> I couldn't find much information related to performance (tuning) on the
> website.


There is no such information there, and I really do not have anything to 
share. Sorry.

Best regards, Julian


Re: [Oak] Using RDBDocumentStore in OSGi environment

Posted by Damiano Albani <da...@gmail.com>.
Hi Julian,

On Thu, Feb 28, 2019 at 1:34 PM Julian Reschke <ju...@gmx.de>
wrote:

>
> You can use
> <https://sling.apache.org/documentation/bundles/datasource-providers.html>
>
> as datasource provider.
>

That's indeed the piece I was looking for, thanks a lot for the tip!
For the record, in case someone else would ever be interested, here's what
I did:

Maven pom.xml

>   <dependency>
>       <groupId>org.apache.sling</groupId>
>       <artifactId>org.apache.sling.datasource</artifactId>
>       <version>1.0.4</version>
>   </dependency>
>

OAK JSON configuration file:

>   "org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService": {
>     "documentStoreType": "rdb"
>   },
>   "org.apache.sling.datasource.DataSourceFactory": {
>     "datasource.name": "oak",
>     "driverClassName": "org.postgresql.Driver",
>     "url": "${oak.postgresql.url}",
>     "username": "${oak.postgresql.username}",
>     "password": "${oak.postgresql.password}"
>   },
>

I saw a few interesting configuration flags as well in RDBDocumentStore.java
<https://github.com/apache/jackrabbit-oak/blob/trunk/oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/rdb/RDBDocumentStore.java#L2145>
(NOGZIP, NOAPPEND, etc).
What the reasoning behind those flags? Are they useful to activate in
certain conditions / with certain databases?

By the way, using a set of ON SELECT / INSERT / UPDATE / DELETE DO INSTEAD
rules, I could even store the *data* column as *jsonb* in PostgreSQL.
(That should also be possible for the bdata column but I ran into some JSON
syntax issues so far.)


> That said: don't expect this to perform better than MongoDB - the
> RDBDocumentStore essentially emulates the JSON storage of MongoDB inside
> a relational database.
>

Speaking of performance, how can I determine if my Oak setup works as fast
as it should?
I understand that performance is very specific to the environment where it
runs, but what's an average "baseline" performance to expect from Oak?
In terms of nodes added / updated / deleted per second for example? No
queries here, simply CRUD operations.
And what would be the associated MongoDB insert / query / update / delete
metrics to expect?
I couldn't find much information related to performance (tuning) on the
website.

Best regards,

-- 
Damiano Albani

Re: [Oak] Using RDBDocumentStore in OSGi environment

Posted by Julian Reschke <ju...@gmx.de>.
On 28.02.2019 12:50, Damiano Albani wrote:
> Hello,
> 
> I have looked at the Oak documentation / code and googled quite a lot but I
> couldn't find instructions on how to configure Oak to use an
> RDBDocumentStore in an OSGi environment.
> 
> I have written a Spring Boot application based on the example available at
> https://github.com/apache/jackrabbit-oak/tree/trunk/oak-examples/standalone,
> and it works fine with MongoDB.
> Well, actually, I see quite a lof of disk I/O from MongoDB when adding lots
> of JCR nodes.
> That's why I wanted to have a try with an relational database (PostgreSQL)
> and see how it compares.
> 
> I'm an OSGi newbie so it's probably obvious for someone with more
> experience, but here's what I tried in my repository JSON configuration
> file:
> 
>> ...
>> "org.apache.jackrabbit.oak.plugins.document.DocumentNodeStoreService": {
>>    "customBlobStore": true,
>>    "documentStoreType": "rdb"
>> },
>> ...
>>
> 
> I see the following in the logs at startup:
> 
>> ...
>> o.a.j.o.p.d.DocumentNodeStoreService     : Initializing DocumentNodeStore
>> with BlobStore [DataStore backed BlobStore
>> [org.apache.jackrabbit.oak.blob.cloud.gcs.GcsDataStore]]
>> o.a.j.o.p.d.DocumentNodeStoreService     : DataSource use enabled.
>> DocumentNodeStoreService would be initialized when DataSource would be
>> available (currently available: nodes: null, blobs: null)
>> ...
>>
> 
> I suppose Oak is waiting for a *javax.sql.DataSource* to be "available",
> but how am I supposed to define one?
> Do I need to use a library like OPS4J Pax JDBC??
> I've tried adding the following to the repository JSON config:
> 
>> "org.ops4j.datasource": {
>>    "osgi.jdbc.driver.class": "org.postgresql.Driver",
>>    "dataSourceName": "PostgreSQL,",
>>    "user": "sa",
>>    "password": "sa"
>> },
>>
> 
> And I also had to modify the *REPOSITORY_BUNDLE_FILTER* property to add
> "(Bundle-SymbolicName=org.ops4j.pax*)".
> I now see activity related to Pax in the logs, but Oak still refuses to
> startup properly.
> What am I missing? Thanks for your help!

You can use 
<https://sling.apache.org/documentation/bundles/datasource-providers.html> 
as datasource provider.

That said: don't expect this to perform better than MongoDB - the 
RDBDocumentStore essentially emulates the JSON storage of MongoDB inside 
a relational database.

Best regards, Julian