You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Matt Ryan <os...@mvryan.org> on 2017/10/02 23:00:27 UTC

[CompositeDataStore] How to properly create delegate data stores?

Hi,

A CompositeDataStore in practice will generally have at least two other
data stores “inside” it (if there are less than two, there’s little point
in using CompositeDataStore).  I refer to these as “delegate” data stores
or just “delegates”.

In my prototype, the addition and removal of delegates was OSGi-lifecycle
aware:
- When the CompositeDataStoreService was being activated, any delegate data
stores implemented in already-active bundles would be created then.
- As additional bundles were activated, if a bundle contained the
implementation for a delegate it would be created then.
- As bundles were deactivated, if a bundle contained the implementation for
a delegate it would be deleted then.

My question then has to do with how to create the delegate data store.

I assumed that creating an instance of the Service class wouldn’t work
because I can envision scenarios where I would want multiple delegate data
stores of the same type - for example, two FileDataStores - and I am of the
understanding that I wouldn’t be able to create more than one of any type
of service.

So instead I simply constructed the delegate data stores using reflection.
Something like this:

private DataStore createDelegateDataStore(String className, Properties
properties, Bundle bundle) {
if (Bundle.ACTIVE == bundle.getState()) {
Class dataStoreClass = bundle.loadClass(className);
DataStore dataStore = (DataStore) dataStoreClass.newInstance();
Method setPropertiesMethod = dataStoreClass.getMethod(“setProperties”,
Properties.class);
setPropertiesMethod.invoke(dataStore, properties);
}
}

(Of course handling of exceptions like NoSuchMethodException,
ClassNotFoundException, etc are not included here for brevity.)

Is a reasonable way to do this or should some other approach be used?  What
would be the best way to go about doing this?

-MR

Re: [CompositeDataStore] How to properly create delegate data stores?

Posted by Tomek Rękawek <to...@apache.org>.
Hi Matt,

> On 24 Oct 2017, at 21:54, Matt Ryan <os...@mvryan.org> wrote:
> It is still unclear to me how this works in terms of configuration files,
> and how this would work for the CompositeDataStore.  This is how I believe
> it would work for two FileDataStores in the composite:
> 
> FDS config 1:
> 
> path=datastore/ds1
> role=local1
> 
> FDS config 2:
> 
> path=datastore/ds2
> role=local2
> 
> CompositeDataStore config:
> 
> local1:readOnly=false
> local2:readOnly=true
> 
> Something like that anyway.

Yes, I’d see something like this too.

> My questions then are:  How do we store both FileDataStore configuration
> files when both have the same PID?  What is the file name for each one?
> And how to do they associate with the FileDataStoreFactory?

For the factory services we use suffixes for the config files:

org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStoreFactory-local1.cfg
org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStoreFactory-local2.cfg
org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStoreFactory-other.cfg

OSGi knows that the […].FileDataStoreFactory is a factory and creates as many instances as needed, binding the provided configurations.

Regards,
Tomek

--
Tomek Rękawek | Adobe Research | www.adobe.com
rekawek@adobe.com

Re: [CompositeDataStore] How to properly create delegate data stores?

Posted by Matt Ryan <os...@mvryan.org>.
Hi

Tomek (and/or whoever else), I’ve put a PR together at [0].  This includes
changes defined in another PR [1] which is still awaiting review.  The new
PR covers the implementation of the CompositeDataStoreService etc.  All it
is doing at this point is responding to service and delegate activations
and mapping the delegates into the CompositeDataStore itself.  It isn’t
actually behaving as a data store yet.

Please let me know if this is heading the direction you recommended and
what changes need to be made.


[0] - https://github.com/apache/jackrabbit-oak/pull/74
[1] - https://github.com/apache/jackrabbit-oak/pull/71


-MR


On October 26, 2017 at 5:39:29 AM, Tomek Rekawek (rekawek@adobe.com.invalid)
wrote:

Hi Matt,

> On 24 Oct 2017, at 21:54, Matt Ryan <os...@mvryan.org> wrote:
> It is still unclear to me how this works in terms of configuration files,
> and how this would work for the CompositeDataStore. This is how I believe
> it would work for two FileDataStores in the composite:
>
> FDS config 1:
>
> path=datastore/ds1
> role=local1
>
> FDS config 2:
>
> path=datastore/ds2
> role=local2
>
> CompositeDataStore config:
>
> local1:readOnly=false
> local2:readOnly=true
>
> Something like that anyway.

Yes, I’d see something like this too.

> My questions then are: How do we store both FileDataStore configuration
> files when both have the same PID? What is the file name for each one?
> And how to do they associate with the FileDataStoreFactory?

For the factory services we use suffixes for the config files:

org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStoreFactory-local1.cfg

org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStoreFactory-local2.cfg

org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStoreFactory-other.cfg


OSGi knows that the […].FileDataStoreFactory is a factory and creates as
many instances as needed, binding the provided configurations.

Regards,
Tomek

-- 
Tomek Rękawek | Adobe Research | www.adobe.com
rekawek@adobe.com

Re: [CompositeDataStore] How to properly create delegate data stores?

Posted by Tomek Rekawek <re...@adobe.com.INVALID>.
Hi Matt,

> On 24 Oct 2017, at 21:54, Matt Ryan <os...@mvryan.org> wrote:
> It is still unclear to me how this works in terms of configuration files,
> and how this would work for the CompositeDataStore.  This is how I believe
> it would work for two FileDataStores in the composite:
> 
> FDS config 1:
> 
> path=datastore/ds1
> role=local1
> 
> FDS config 2:
> 
> path=datastore/ds2
> role=local2
> 
> CompositeDataStore config:
> 
> local1:readOnly=false
> local2:readOnly=true
> 
> Something like that anyway.

Yes, I’d see something like this too.

> My questions then are:  How do we store both FileDataStore configuration
> files when both have the same PID?  What is the file name for each one?
> And how to do they associate with the FileDataStoreFactory?

For the factory services we use suffixes for the config files:

org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStoreFactory-local1.cfg
org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStoreFactory-local2.cfg
org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStoreFactory-other.cfg

OSGi knows that the […].FileDataStoreFactory is a factory and creates as many instances as needed, binding the provided configurations.

Regards,
Tomek

-- 
Tomek Rękawek | Adobe Research | www.adobe.com
rekawek@adobe.com

Re: [CompositeDataStore] How to properly create delegate data stores?

Posted by Matt Ryan <os...@mvryan.org>.
Hi Tomek,

Thanks for the pointer on using a factory - I’m looking into that now.

It is still unclear to me how this works in terms of configuration files,
and how this would work for the CompositeDataStore.  This is how I believe
it would work for two FileDataStores in the composite:

FDS config 1:

path=datastore/ds1
role=local1


FDS config 2:

path=datastore/ds2
role=local2


CompositeDataStore config:

local1:readOnly=false
local2:readOnly=true


Something like that anyway.

My questions then are:  How do we store both FileDataStore configuration
files when both have the same PID?  What is the file name for each one?
And how to do they associate with the FileDataStoreFactory?

Thanks

-MR

On October 23, 2017 at 1:44:57 AM, Tomek Rekawek (rekawek@adobe.com.invalid)
wrote:

Hi Matt,

> On 20 Oct 2017, at 23:02, Matt Ryan <os...@mvryan.org> wrote:
>
> I think I basically understand all of this, except I don’t know how you
go
> about configuring two file data stores. What would that look like in
> practice? Normally if I were going to configure a FileDataStore I would
> create a configuration file with the pid
> “org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStore”. So it
is
> unclear to me how I would go about configuring more than one of these.
>
> This is important because being able to have more than one of the same
type
> of data store in the composite is a requirement IMO.

Yes, it’ll need implementing the FileDataStoreFactory - an OSGi type that
allows creating multiple service instances with different configurations
[1].

It was similar with the composite node store - we had to implement the
SegmentNodeStoreFactory [2], to create multiple SegmentMK in different
locations.

Regards,
Tomek

[1]
https://cqdump.wordpress.com/2014/08/05/managing-multiple-instances-of-services-osgi-service-factories/
[2]
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/SegmentNodeStoreFactory.java

-- 
Tomek Rękawek | Adobe Research | www.adobe.com
rekawek@adobe.com

Re: [CompositeDataStore] How to properly create delegate data stores?

Posted by Tomek Rekawek <re...@adobe.com.INVALID>.
Hi Matt,

> On 20 Oct 2017, at 23:02, Matt Ryan <os...@mvryan.org> wrote:
> 
> I think I basically understand all of this, except I don’t know how you go
> about configuring two file data stores.  What would that look like in
> practice?  Normally if I were going to configure a FileDataStore I would
> create a configuration file with the pid
> “org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStore”.  So it is
> unclear to me how I would go about configuring more than one of these.
> 
> This is important because being able to have more than one of the same type
> of data store in the composite is a requirement IMO.

Yes, it’ll need implementing the FileDataStoreFactory - an OSGi type that allows creating multiple service instances with different configurations [1].

It was similar with the composite node store - we had to implement the SegmentNodeStoreFactory [2], to create multiple SegmentMK in different locations.

Regards,
Tomek

[1] https://cqdump.wordpress.com/2014/08/05/managing-multiple-instances-of-services-osgi-service-factories/
[2] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/SegmentNodeStoreFactory.java

--
Tomek Rękawek | Adobe Research | www.adobe.com
rekawek@adobe.com

Re: [CompositeDataStore] How to properly create delegate data stores?

Posted by Tomek Rekawek <re...@adobe.com.INVALID>.
Hi Matt,

> On 20 Oct 2017, at 23:02, Matt Ryan <os...@mvryan.org> wrote:
> 
> I think I basically understand all of this, except I don’t know how you go
> about configuring two file data stores.  What would that look like in
> practice?  Normally if I were going to configure a FileDataStore I would
> create a configuration file with the pid
> “org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStore”.  So it is
> unclear to me how I would go about configuring more than one of these.
> 
> This is important because being able to have more than one of the same type
> of data store in the composite is a requirement IMO.

Yes, it’ll need implementing the FileDataStoreFactory - an OSGi type that allows creating multiple service instances with different configurations [1].

It was similar with the composite node store - we had to implement the SegmentNodeStoreFactory [2], to create multiple SegmentMK in different locations.

Regards,
Tomek

[1] https://cqdump.wordpress.com/2014/08/05/managing-multiple-instances-of-services-osgi-service-factories/
[2] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/SegmentNodeStoreFactory.java

--
Tomek Rękawek | Adobe Research | www.adobe.com
rekawek@adobe.com


--
Tomek Rękawek | Adobe Research | www.adobe.com
rekawek@adobe.com


Re: [CompositeDataStore] How to properly create delegate data stores?

Posted by Matt Ryan <os...@mvryan.org>.
Hi Tomek,

Thanks for the clarification.  I’ve been working on applying the feedback
you provided, studying CompositeNodeStore for guidance, etc.

One question inline below.


On October 4, 2017 at 1:22:22 AM, Tomek Rękawek (tomekr@apache.org) wrote:


As above, the CompositeDataStore won’t wait for any particular
implementations, but for the BlobStoreProvider configured with an
appropriate roles. It knows the role list, so it can tell when all the
roles are in place.

For instance, we can configure CompositeDataStore with following role list:
local1, local2, shared.

Now, in the OSGi we’re configuring two FileDataStores, named “local1” and
“local2” and also a S3DataStore named “shared”.


I think I basically understand all of this, except I don’t know how you go
about configuring two file data stores.  What would that look like in
practice?  Normally if I were going to configure a FileDataStore I would
create a configuration file with the pid
“org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStore”.  So it is
unclear to me how I would go about configuring more than one of these.

This is important because being able to have more than one of the same type
of data store in the composite is a requirement IMO.


-MR

Re: [CompositeDataStore] How to properly create delegate data stores?

Posted by Tomek Rękawek <to...@apache.org>.
Hello Matt,

Please find my replies inlined.

> On 4 Oct 2017, at 00:13, Matt Ryan <os...@mvryan.org> wrote:
> 
>> 1. Create new BlobStoreProvider interface, with just one method:
>> getBlobStore().
>> 2. Modify all the existing blob store services adding them an optional
>> “role” property (any string).

> One concern I have with this approach is that if we want a data store to be
> usable as a CompositeDataStore delegate, that data store has to make
> specific provisions to do this.  My thinking was that it would be
> preferable to have the CompositeDataStore contain as much of the logic as
> possible.  Ideally a data store should work as a delegate without having to
> make any changes to the data store itself.  (Not sure if we can achieve
> this, but…)

Could you elaborate on what kind of provisioning is required for the delegatees?

From what I understand, you didn’t plan to rely on OSGi to get the delegate data stores, but initialise all of them in the CompositeDataStore (“contain as much of the logic as possible”). I’m not sure if this is a right approach. It means that composite data store have to depend on every existing blob store and know it internals. If something changes in any blob store, the composite data store have to be updated as well. For the data stores with a rich configuration (s3DataStore) this may get quite complex.

On the other hand, the OSGi-based approach makes the whole thing simpler, less coupled, extensible and easier for the maintenance. CompositeDataStore doesn’t need to know any concrete implementation, but rely on the BlobStore interface, without knowing the implementation. OSGi will take care of providing the already-configured delegatees.

>> 3. If the data store service is configured with this role, it should
>> register the BlobStoreProvider service rather than a normal BlobStore.
>> 4. The CompositeDataStoreService should be configured with a list of blob
>> store roles it should wait for.
>> 5. The CompositeDataStoreService has a MANDATORY_MULTIPLE @Reference of
>> type BlobStoreProvider.
>> 6. Once (a) the CompositeDataStoreService is activated and (b) all the blob
>> store providers are there, it’ll register a BlobStore service, which will
>> be picked up by the node store.


> I have concerns about this part also.  Which blob store providers should
> the CompositeDataStoreService wait for?
> 
> For example, should it wait for S3DataStore?  If yes, and if the
> installation doesn’t use the S3 connector, that provider will never show
> up, and therefore the CompositeDataStoreService would never get
> registered.  If it doesn’t wait for S3DataStore but the installation does
> use S3DataStore, what happens if that bundle is unloaded?

As above, the CompositeDataStore won’t wait for any particular implementations, but for the BlobStoreProvider configured with an appropriate roles. It knows the role list, so it can tell when all the roles are in place.

For instance, we can configure CompositeDataStore with following role list: local1, local2, shared.

Now, in the OSGi we’re configuring two FileDataStores, named “local1” and “local2” and also a S3DataStore named “shared”.

CompositeDataStore will be notified about all the data store registrations and as soon as three data stores are in place, it can carry on with its initialisation.

> Wouldn’t this approach require that every possible data store that can be a
> blob store provider for the composite be included in each installation that
> wants to use the CompositeDataStore?

No. The CompositeDataStore will only reference the BlobStoreProvider interface, not the actual implementations. It’ll be even possible for the customer to implement a completely new blob store implementation and use it as a delegatee (as long as he implements the BlobStoreProvider). Not that we expect customers to do that, but this kind of decoupling makes it easier to work on the Oak codebase.

Regards,
Tomek

--
Tomek Rękawek | Adobe Research | www.adobe.com
rekawek@adobe.com


Re: [CompositeDataStore] How to properly create delegate data stores?

Posted by Matt Ryan <os...@mvryan.org>.
Hi Tomek,

Thanks for the feedback.  I hadn’t thought about it this way, so I’ll
consider it further.

Some concerns listed below.


On October 3, 2017 at 12:35:32 AM, Tomek Rekawek (rekawek@adobe.com.invalid)
wrote:

Hello Matt,

I don’t think we should rely on the bundle activation / deactivation, but
rather on the service registration / reregistration. OSGi allows to use
MANDATORY_MULTIPLE cardinality for a @Reference - in this case, the service
consumer will be informed every time there is a new service implementing
given interface.

Then, if the CompositeDataStoreService thinks that all the required partial
data stores are there, it can register itself as a BlobStore, using
BundleContext#registerService.

Also, we probably need some kind of differentiation between “partial” and
the “final” datastore (so the node store won’t pick the first one). For the
node stores, we introduced a new property “role”. If the node store service
is configured with this property in place, it registers a NodeStoreProvider
rather than NodeStore (so we are sure the partial node store is not used
directly).

So, my idea is as follows:

1. Create new BlobStoreProvider interface, with just one method:
getBlobStore().
2. Modify all the existing blob store services adding them an optional
“role” property (any string).

One concern I have with this approach is that if we want a data store to be
usable as a CompositeDataStore delegate, that data store has to make
specific provisions to do this.  My thinking was that it would be
preferable to have the CompositeDataStore contain as much of the logic as
possible.  Ideally a data store should work as a delegate without having to
make any changes to the data store itself.  (Not sure if we can achieve
this, but…)


3. If the data store service is configured with this role, it should
register the BlobStoreProvider service rather than a normal BlobStore.
4. The CompositeDataStoreService should be configured with a list of blob
store roles it should wait for.
5. The CompositeDataStoreService has a MANDATORY_MULTIPLE @Reference of
type BlobStoreProvider.
6. Once (a) the CompositeDataStoreService is activated and (b) all the blob
store providers are there, it’ll register a BlobStore service, which will
be picked up by the node store.

I have concerns about this part also.  Which blob store providers should
the CompositeDataStoreService wait for?

For example, should it wait for S3DataStore?  If yes, and if the
installation doesn’t use the S3 connector, that provider will never show
up, and therefore the CompositeDataStoreService would never get
registered.  If it doesn’t wait for S3DataStore but the installation does
use S3DataStore, what happens if that bundle is unloaded?

Wouldn’t this approach require that every possible data store that can be a
blob store provider for the composite be included in each installation that
wants to use the CompositeDataStore?


Regards,

-MR

Re: [CompositeDataStore] How to properly create delegate data stores?

Posted by Tomek Rekawek <re...@adobe.com.INVALID>.
Hello Matt,

I don’t think we should rely on the bundle activation / deactivation, but rather on the service registration / reregistration. OSGi allows to use MANDATORY_MULTIPLE cardinality for a @Reference - in this case, the service consumer will be informed every time there is a new service implementing given interface.

Then, if the CompositeDataStoreService thinks that all the required partial data stores are there, it can register itself as a BlobStore, using BundleContext#registerService.

Also, we probably need some kind of differentiation between “partial” and the “final” datastore (so the node store won’t pick the first one). For the node stores, we introduced a new property “role”. If the node store service is configured with this property in place, it registers a NodeStoreProvider rather than NodeStore (so we are sure the partial node store is not used directly).

So, my idea is as follows:

1. Create new BlobStoreProvider interface, with just one method: getBlobStore().
2. Modify all the existing blob store services adding them an optional “role” property (any string).
3. If the data store service is configured with this role, it should register the BlobStoreProvider service rather than a normal BlobStore.
4. The CompositeDataStoreService should be configured with a list of blob store roles it should wait for.
5. The CompositeDataStoreService has a MANDATORY_MULTIPLE @Reference of type BlobStoreProvider.
6. Once (a) the CompositeDataStoreService is activated and (b) all the blob store providers are there, it’ll register a BlobStore service, which will be picked up by the node store.

It’s similar to what we have in the Composite Node Store [1].

Regards,
Tomek

[1] https://github.com/apache/jackrabbit-oak/blob/trunk/oak-store-composite/src/main/java/org/apache/jackrabbit/oak/composite/CompositeNodeStoreService.java

-- 
Tomek Rękawek | Adobe Research | www.adobe.com
rekawek@adobe.com

> On 3 Oct 2017, at 01:00, Matt Ryan <os...@mvryan.org> wrote:
> 
> Hi,
> 
> A CompositeDataStore in practice will generally have at least two other
> data stores “inside” it (if there are less than two, there’s little point
> in using CompositeDataStore).  I refer to these as “delegate” data stores
> or just “delegates”.
> 
> In my prototype, the addition and removal of delegates was OSGi-lifecycle
> aware:
> - When the CompositeDataStoreService was being activated, any delegate data
> stores implemented in already-active bundles would be created then.
> - As additional bundles were activated, if a bundle contained the
> implementation for a delegate it would be created then.
> - As bundles were deactivated, if a bundle contained the implementation for
> a delegate it would be deleted then.
> 
> My question then has to do with how to create the delegate data store.
> 
> I assumed that creating an instance of the Service class wouldn’t work
> because I can envision scenarios where I would want multiple delegate data
> stores of the same type - for example, two FileDataStores - and I am of the
> understanding that I wouldn’t be able to create more than one of any type
> of service.
> 
> So instead I simply constructed the delegate data stores using reflection.
> Something like this:
> 
> private DataStore createDelegateDataStore(String className, Properties
> properties, Bundle bundle) {
> if (Bundle.ACTIVE == bundle.getState()) {
> Class dataStoreClass = bundle.loadClass(className);
> DataStore dataStore = (DataStore) dataStoreClass.newInstance();
> Method setPropertiesMethod = dataStoreClass.getMethod(“setProperties”,
> Properties.class);
> setPropertiesMethod.invoke(dataStore, properties);
> }
> }
> 
> (Of course handling of exceptions like NoSuchMethodException,
> ClassNotFoundException, etc are not included here for brevity.)
> 
> Is a reasonable way to do this or should some other approach be used?  What
> would be the best way to go about doing this?
> 
> -MR