You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openjpa.apache.org by Francesco Chicchiriccò <il...@apache.org> on 2024/01/08 16:13:58 UTC

Multi-tenancy and caching issues

Hi there,
at Syncope we have been implementing multi-tenancy by relying on something like:

* 1 data source per tenant
* 1 entity manager factory per tenant
* 1 transaction manager per tenant
* etc

So far so good.

Now I am experimenting a different approach similar to [1], e.g.

* 1 low-level data source per tenant
* 1 data source extending Spring's AbstractRoutingDataSource using the value of a ThreadLocal variable as lookup key
* 1 single entity manager factory configured with the routing data source
* 1 single transaction manager
* etc

It mostly works but I am having caching issues with concurrent operations working on different tenants, so I was wondering: how can I extend the various OpenJPA (query, data, L1, L2, every one) caches to hold back different actual instances per tenant and to use the appropriate one depending on the same ThreadLocal value I have already used above for data sources?

Thanks in advance.
Regards.

[1] https://github.com/Cepr0/sb-multitenant-db-demo

-- 
Francesco Chicchiriccò

Tirasa - Open Source Excellence
http://www.tirasa.net/

Member at The Apache Software Foundation
Syncope, Cocoon, Olingo, CXF, OpenJPA, PonyMail
http://home.apache.org/~ilgrosso/


Re: Multi-tenancy and caching issues

Posted by Francesco Chicchiriccò <il...@apache.org>.
FYI, I've adopted a similar solution: still a few things to iron, but globally it works.

Thank you.
Regards.

On 09/01/24 11:58, Romain Manni-Bucau wrote:
> Don't have everything ready for spring-data but had something like that in
> mind:
>
>
> public class RoutedEMFConf {
>      @Bean
>      @Primary
>      LocalContainerEntityManagerFactoryBean
> mainEntityManagerFactory(final Tenant tenant, final ApplicationContext
> context) {
>          final var emfs = findDelegates(context); // can be other beans
> with qualifiers
>          final var routedEmf =
> EntityManagerFactory.class.cast(Proxy.newProxyInstance(
>                  RoutedEMFConf.class.getClassLoader(),
>                  // opt: use SessionFactoryImplementor.class if you
> need hibernate internals
>                  new Class<?>[]{EntityManagerFactory.class, Marking.class},
>                  (proxy, method, args) -> {
>                      switch (method.getName()) {
>                          case "equals":
>                              return args[0] instanceof Marking; //
> assume there is a single one per app, otherwise complete the impl
>                          case "hashCode":
>                              return 1;
>                          default:
>                              try {
>                                  final var id = tenant.get();
>                                  return
> method.invoke(requireNonNull(emfs.get(id), () -> "No emf for '" + id +
> "'"), args);
>                              } catch (final InvocationTargetException ite) {
>                                  throw ite.getTargetException();
>                              }
>                      }
>                  }
>          ));
>          return new LocalContainerEntityManagerFactoryBean() {
>              @Override
>              protected EntityManagerFactory
> createNativeEntityManagerFactory() throws PersistenceException {
>                  return routedEmf;
>              }
>          };
>      }
>
>      private Map<String, EntityManagerFactory> findDelegates(final
> ListableBeanFactory lbf) {
>          return Stream.of(lbf.getBeanNamesForType(EntityManagerFactory.class))
>                  .filter(it -> !"mainEntityManagerFactory".equals(it))
>                  .collect(toMap(identity(), k -> lbf.getBean(k,
> EntityManagerFactory.class)));
>      }
>
>      public interface Marking {}
>
>      // modelize the tenant lookup but can be a class, interface is not
> always needed
>      public interface Tenant extends Supplier<String> {
>      }
> }
>
> Side note: the delegate must have a valid name (likely make it a spring
> extension registering beans from your conf or "properties" models).
> The missing part is mainly the Tenant impl but guess you already have
> something for that ;) - I assume some security context and meta for login.
>
> Romain Manni-Bucau
> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
> <https://rmannibucau.metawerx.net/> | Old Blog
> <http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
> LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>
>
> Le mar. 9 janv. 2024 à 11:28, Francesco Chicchiriccò <il...@apache.org>
> a écrit :
>
>> Thank Romain, I share your considerations and concerns below, and also
>> agree that EMF routing is the way to go.
>>
>> I probably need to tune my current exploration to let evolve what we
>> currently have in Syncope towards proper EMF routing.
>>
>> Do you have any sample I could follow about that?
>>
>> Regards.
>>
>> On 09/01/24 10:51, Romain Manni-Bucau wrote:
>>> Hi Francesco,
>>>
>>> While you have an EMF router you don't have pitfall 4, it only happens if
>>> your routing is done at datasource level but it also means you have way
>>> more side effects and you start to loose the hability to tune per tenant
>> (a
>>> common pattern is to tune the cache per tenant "size"/usage, there all
>>> would be shared, not isolated so no real way to handle anything there).
>>>
>>> Note: having routed caches can make it work somehow but will need a lot
>> of
>>> reimplementation of the cache whereas it is free when using a routed emf.
>>> It can be faked with PartitionedDataCache overriding the key name
>>> (appending the tenant) but in terms of supervision I fear it will be way
>>> harder and I'm not sure it would be very consummable for people (you end
>> up
>>> making the leak risk higher for users by design and you don't get any
>>> benefit from that - you don't reduce the overhead, you don't reduce the
>>> pool size etc which are at another level).
>>>
>>> In terms of spring-data integration there is also no link, just @Bean EMF
>>> routedEmf() and you'll get it working transparently while a tx - cache
>>> scope of spring - is for a single tenant.
>>>
>>> Hope I'm not missing something "key" ;).
>>>
>>> Romain Manni-Bucau
>>> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
>>> <https://rmannibucau.metawerx.net/> | Old Blog
>>> <http://rmannibucau.wordpress.com> | Github <
>> https://github.com/rmannibucau> |
>>> LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
>>> <
>> https://www.packtpub.com/application-development/java-ee-8-high-performance
>>>
>>>
>>> Le mar. 9 janv. 2024 à 10:32, Francesco Chicchiriccò <
>> ilgrosso@apache.org>
>>> a écrit :
>>>
>>>> Hi Romain,
>>>> see my replies embedded below.
>>>>
>>>> Regards.
>>>>
>>>> On 08/01/24 17:43, Romain Manni-Bucau wrote:
>>>>> Hi Francesco,
>>>>>
>>>>> Normally if you have one EMF per tenant there is no leak between them
>>>> since the cache instance is stored in the EMF - used that approach in
>> TomEE.
>>>> As I am saying below, this is what we have already in Syncope.
>>>>
>>>> My company is also supporting customers heavily using this particular
>>>> feature: it works, I have no issues with that.
>>>> Someone is also building a SaaS solution on top of that, so runtime
>> tenant
>>>> addition and removal is also fine.
>>>>
>>>> I am exploring this different approach because it would allow to
>> introduce
>>>> Spring Data JPA, which could have some benefits - see
>>>> https://issues.apache.org/jira/browse/SYNCOPE-1799
>>>>
>>>>> You can check it in
>>>> org.apache.openjpa.datacache.DataCacheManagerImpl#initialize of each emf
>>>> which should be different.
>>>>
>>>> Thanks for the pointer.
>>>>
>>>>> So overall if there is a leak it is likely that it leaks accross
>>>> transactions or some spring cache level.
>>>>
>>>> I think that things are more subtle: consider the following use case.
>>>>
>>>> We have MyEntity with String @Id.
>>>>
>>>> Suppose we have two tenants: A and B.
>>>>
>>>> 1. Tenant A will make a REST call which creates a MyEntity instance with
>>>> key "key1" under the db for A.
>>>>
>>>> 2. Tenant A will make another REST call which looks for the newly
>> created
>>>> MyEntity instance via:
>>>>
>>>> entityManager.find(MyEntity.class, "key1");
>>>>
>>>> 3. Tenant B makes the same call as (1) with the same key "key1": all is
>>>> fine, a new row is created under the db for B.
>>>>
>>>> 4. Tenant B makes the same call as (2) with the same key "key1": if not
>>>> already evicted, entityManager will return the MyEntity instance for
>> Tenant
>>>> A from the cache.
>>>>
>>>> I need to avoid the pitfalls from (4).
>>>>
>>>>> Side note: the datasource routing pattern is useless if you have an
>>>> entity manager routing pattern and only use JPA to do database work,
>> both
>>>> will more easily conflict than help.
>>>>
>>>> The idea is not to have an entity manager routing pattern, rather to
>> have
>>>> a cache routing patter on the single entity manager factory; or just to
>>>> configure some predefined partitions.
>>>>
>>>>> If you still want to plug the datacase (query cache) configuration in
>>>> the jpa properties can take a custom fully qualified name too.
>>>>> Le lun. 8 janv. 2024 à 17:14, Francesco Chicchiriccò <
>>>> ilgrosso@apache.org>
>>>>> a écrit :
>>>>>
>>>>>> Hi there,
>>>>>> at Syncope we have been implementing multi-tenancy by relying on
>>>> something
>>>>>> like:
>>>>>>
>>>>>> * 1 data source per tenant
>>>>>> * 1 entity manager factory per tenant
>>>>>> * 1 transaction manager per tenant
>>>>>> * etc
>>>>>>
>>>>>> So far so good.
>>>>>>
>>>>>> Now I am experimenting a different approach similar to [1], e.g.
>>>>>>
>>>>>> * 1 low-level data source per tenant
>>>>>> * 1 data source extending Spring's AbstractRoutingDataSource using the
>>>>>> value of a ThreadLocal variable as lookup key
>>>>>> * 1 single entity manager factory configured with the routing data
>>>> source
>>>>>> * 1 single transaction manager
>>>>>> * etc
>>>>>>
>>>>>> It mostly works but I am having caching issues with concurrent
>>>> operations
>>>>>> working on different tenants, so I was wondering: how can I extend the
>>>>>> various OpenJPA (query, data, L1, L2, every one) caches to hold back
>>>>>> different actual instances per tenant and to use the appropriate one
>>>>>> depending on the same ThreadLocal value I have already used above for
>>>> data
>>>>>> sources?
>>>>>>
>>>>>> Thanks in advance.
>>>>>> Regards.
>>>>>>
>>>>>> [1] https://github.com/Cepr0/sb-multitenant-db-demo

-- 
Francesco Chicchiriccò

Tirasa - Open Source Excellence
http://www.tirasa.net/

Member at The Apache Software Foundation
Syncope, Cocoon, Olingo, CXF, OpenJPA, PonyMail
http://home.apache.org/~ilgrosso/


Re: Multi-tenancy and caching issues

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Don't have everything ready for spring-data but had something like that in
mind:


public class RoutedEMFConf {
    @Bean
    @Primary
    LocalContainerEntityManagerFactoryBean
mainEntityManagerFactory(final Tenant tenant, final ApplicationContext
context) {
        final var emfs = findDelegates(context); // can be other beans
with qualifiers
        final var routedEmf =
EntityManagerFactory.class.cast(Proxy.newProxyInstance(
                RoutedEMFConf.class.getClassLoader(),
                // opt: use SessionFactoryImplementor.class if you
need hibernate internals
                new Class<?>[]{EntityManagerFactory.class, Marking.class},
                (proxy, method, args) -> {
                    switch (method.getName()) {
                        case "equals":
                            return args[0] instanceof Marking; //
assume there is a single one per app, otherwise complete the impl
                        case "hashCode":
                            return 1;
                        default:
                            try {
                                final var id = tenant.get();
                                return
method.invoke(requireNonNull(emfs.get(id), () -> "No emf for '" + id +
"'"), args);
                            } catch (final InvocationTargetException ite) {
                                throw ite.getTargetException();
                            }
                    }
                }
        ));
        return new LocalContainerEntityManagerFactoryBean() {
            @Override
            protected EntityManagerFactory
createNativeEntityManagerFactory() throws PersistenceException {
                return routedEmf;
            }
        };
    }

    private Map<String, EntityManagerFactory> findDelegates(final
ListableBeanFactory lbf) {
        return Stream.of(lbf.getBeanNamesForType(EntityManagerFactory.class))
                .filter(it -> !"mainEntityManagerFactory".equals(it))
                .collect(toMap(identity(), k -> lbf.getBean(k,
EntityManagerFactory.class)));
    }

    public interface Marking {}

    // modelize the tenant lookup but can be a class, interface is not
always needed
    public interface Tenant extends Supplier<String> {
    }
}

Side note: the delegate must have a valid name (likely make it a spring
extension registering beans from your conf or "properties" models).
The missing part is mainly the Tenant impl but guess you already have
something for that ;) - I assume some security context and meta for login.

Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> |  Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/application-development/java-ee-8-high-performance>


Le mar. 9 janv. 2024 à 11:28, Francesco Chicchiriccò <il...@apache.org>
a écrit :

> Thank Romain, I share your considerations and concerns below, and also
> agree that EMF routing is the way to go.
>
> I probably need to tune my current exploration to let evolve what we
> currently have in Syncope towards proper EMF routing.
>
> Do you have any sample I could follow about that?
>
> Regards.
>
> On 09/01/24 10:51, Romain Manni-Bucau wrote:
> > Hi Francesco,
> >
> > While you have an EMF router you don't have pitfall 4, it only happens if
> > your routing is done at datasource level but it also means you have way
> > more side effects and you start to loose the hability to tune per tenant
> (a
> > common pattern is to tune the cache per tenant "size"/usage, there all
> > would be shared, not isolated so no real way to handle anything there).
> >
> > Note: having routed caches can make it work somehow but will need a lot
> of
> > reimplementation of the cache whereas it is free when using a routed emf.
> > It can be faked with PartitionedDataCache overriding the key name
> > (appending the tenant) but in terms of supervision I fear it will be way
> > harder and I'm not sure it would be very consummable for people (you end
> up
> > making the leak risk higher for users by design and you don't get any
> > benefit from that - you don't reduce the overhead, you don't reduce the
> > pool size etc which are at another level).
> >
> > In terms of spring-data integration there is also no link, just @Bean EMF
> > routedEmf() and you'll get it working transparently while a tx - cache
> > scope of spring - is for a single tenant.
> >
> > Hope I'm not missing something "key" ;).
> >
> > Romain Manni-Bucau
> > @rmannibucau <https://twitter.com/rmannibucau> |  Blog
> > <https://rmannibucau.metawerx.net/> | Old Blog
> > <http://rmannibucau.wordpress.com> | Github <
> https://github.com/rmannibucau> |
> > LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
> > <
> https://www.packtpub.com/application-development/java-ee-8-high-performance
> >
> >
> >
> > Le mar. 9 janv. 2024 à 10:32, Francesco Chicchiriccò <
> ilgrosso@apache.org>
> > a écrit :
> >
> >> Hi Romain,
> >> see my replies embedded below.
> >>
> >> Regards.
> >>
> >> On 08/01/24 17:43, Romain Manni-Bucau wrote:
> >>> Hi Francesco,
> >>>
> >>> Normally if you have one EMF per tenant there is no leak between them
> >> since the cache instance is stored in the EMF - used that approach in
> TomEE.
> >>
> >> As I am saying below, this is what we have already in Syncope.
> >>
> >> My company is also supporting customers heavily using this particular
> >> feature: it works, I have no issues with that.
> >> Someone is also building a SaaS solution on top of that, so runtime
> tenant
> >> addition and removal is also fine.
> >>
> >> I am exploring this different approach because it would allow to
> introduce
> >> Spring Data JPA, which could have some benefits - see
> >> https://issues.apache.org/jira/browse/SYNCOPE-1799
> >>
> >>> You can check it in
> >> org.apache.openjpa.datacache.DataCacheManagerImpl#initialize of each emf
> >> which should be different.
> >>
> >> Thanks for the pointer.
> >>
> >>> So overall if there is a leak it is likely that it leaks accross
> >> transactions or some spring cache level.
> >>
> >> I think that things are more subtle: consider the following use case.
> >>
> >> We have MyEntity with String @Id.
> >>
> >> Suppose we have two tenants: A and B.
> >>
> >> 1. Tenant A will make a REST call which creates a MyEntity instance with
> >> key "key1" under the db for A.
> >>
> >> 2. Tenant A will make another REST call which looks for the newly
> created
> >> MyEntity instance via:
> >>
> >> entityManager.find(MyEntity.class, "key1");
> >>
> >> 3. Tenant B makes the same call as (1) with the same key "key1": all is
> >> fine, a new row is created under the db for B.
> >>
> >> 4. Tenant B makes the same call as (2) with the same key "key1": if not
> >> already evicted, entityManager will return the MyEntity instance for
> Tenant
> >> A from the cache.
> >>
> >> I need to avoid the pitfalls from (4).
> >>
> >>> Side note: the datasource routing pattern is useless if you have an
> >> entity manager routing pattern and only use JPA to do database work,
> both
> >> will more easily conflict than help.
> >>
> >> The idea is not to have an entity manager routing pattern, rather to
> have
> >> a cache routing patter on the single entity manager factory; or just to
> >> configure some predefined partitions.
> >>
> >>> If you still want to plug the datacase (query cache) configuration in
> >> the jpa properties can take a custom fully qualified name too.
> >>> Le lun. 8 janv. 2024 à 17:14, Francesco Chicchiriccò <
> >> ilgrosso@apache.org>
> >>> a écrit :
> >>>
> >>>> Hi there,
> >>>> at Syncope we have been implementing multi-tenancy by relying on
> >> something
> >>>> like:
> >>>>
> >>>> * 1 data source per tenant
> >>>> * 1 entity manager factory per tenant
> >>>> * 1 transaction manager per tenant
> >>>> * etc
> >>>>
> >>>> So far so good.
> >>>>
> >>>> Now I am experimenting a different approach similar to [1], e.g.
> >>>>
> >>>> * 1 low-level data source per tenant
> >>>> * 1 data source extending Spring's AbstractRoutingDataSource using the
> >>>> value of a ThreadLocal variable as lookup key
> >>>> * 1 single entity manager factory configured with the routing data
> >> source
> >>>> * 1 single transaction manager
> >>>> * etc
> >>>>
> >>>> It mostly works but I am having caching issues with concurrent
> >> operations
> >>>> working on different tenants, so I was wondering: how can I extend the
> >>>> various OpenJPA (query, data, L1, L2, every one) caches to hold back
> >>>> different actual instances per tenant and to use the appropriate one
> >>>> depending on the same ThreadLocal value I have already used above for
> >> data
> >>>> sources?
> >>>>
> >>>> Thanks in advance.
> >>>> Regards.
> >>>>
> >>>> [1] https://github.com/Cepr0/sb-multitenant-db-demo
>
>
> --
> Francesco Chicchiriccò
>
> Tirasa - Open Source Excellence
> http://www.tirasa.net/
>
> Member at The Apache Software Foundation
> Syncope, Cocoon, Olingo, CXF, OpenJPA, PonyMail
> http://home.apache.org/~ilgrosso/
>
>

Re: Multi-tenancy and caching issues

Posted by Francesco Chicchiriccò <il...@apache.org>.
Thank Romain, I share your considerations and concerns below, and also agree that EMF routing is the way to go.

I probably need to tune my current exploration to let evolve what we currently have in Syncope towards proper EMF routing.

Do you have any sample I could follow about that?

Regards.

On 09/01/24 10:51, Romain Manni-Bucau wrote:
> Hi Francesco,
>
> While you have an EMF router you don't have pitfall 4, it only happens if
> your routing is done at datasource level but it also means you have way
> more side effects and you start to loose the hability to tune per tenant (a
> common pattern is to tune the cache per tenant "size"/usage, there all
> would be shared, not isolated so no real way to handle anything there).
>
> Note: having routed caches can make it work somehow but will need a lot of
> reimplementation of the cache whereas it is free when using a routed emf.
> It can be faked with PartitionedDataCache overriding the key name
> (appending the tenant) but in terms of supervision I fear it will be way
> harder and I'm not sure it would be very consummable for people (you end up
> making the leak risk higher for users by design and you don't get any
> benefit from that - you don't reduce the overhead, you don't reduce the
> pool size etc which are at another level).
>
> In terms of spring-data integration there is also no link, just @Bean EMF
> routedEmf() and you'll get it working transparently while a tx - cache
> scope of spring - is for a single tenant.
>
> Hope I'm not missing something "key" ;).
>
> Romain Manni-Bucau
> @rmannibucau <https://twitter.com/rmannibucau> |  Blog
> <https://rmannibucau.metawerx.net/> | Old Blog
> <http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
> LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
> <https://www.packtpub.com/application-development/java-ee-8-high-performance>
>
>
> Le mar. 9 janv. 2024 à 10:32, Francesco Chicchiriccò <il...@apache.org>
> a écrit :
>
>> Hi Romain,
>> see my replies embedded below.
>>
>> Regards.
>>
>> On 08/01/24 17:43, Romain Manni-Bucau wrote:
>>> Hi Francesco,
>>>
>>> Normally if you have one EMF per tenant there is no leak between them
>> since the cache instance is stored in the EMF - used that approach in TomEE.
>>
>> As I am saying below, this is what we have already in Syncope.
>>
>> My company is also supporting customers heavily using this particular
>> feature: it works, I have no issues with that.
>> Someone is also building a SaaS solution on top of that, so runtime tenant
>> addition and removal is also fine.
>>
>> I am exploring this different approach because it would allow to introduce
>> Spring Data JPA, which could have some benefits - see
>> https://issues.apache.org/jira/browse/SYNCOPE-1799
>>
>>> You can check it in
>> org.apache.openjpa.datacache.DataCacheManagerImpl#initialize of each emf
>> which should be different.
>>
>> Thanks for the pointer.
>>
>>> So overall if there is a leak it is likely that it leaks accross
>> transactions or some spring cache level.
>>
>> I think that things are more subtle: consider the following use case.
>>
>> We have MyEntity with String @Id.
>>
>> Suppose we have two tenants: A and B.
>>
>> 1. Tenant A will make a REST call which creates a MyEntity instance with
>> key "key1" under the db for A.
>>
>> 2. Tenant A will make another REST call which looks for the newly created
>> MyEntity instance via:
>>
>> entityManager.find(MyEntity.class, "key1");
>>
>> 3. Tenant B makes the same call as (1) with the same key "key1": all is
>> fine, a new row is created under the db for B.
>>
>> 4. Tenant B makes the same call as (2) with the same key "key1": if not
>> already evicted, entityManager will return the MyEntity instance for Tenant
>> A from the cache.
>>
>> I need to avoid the pitfalls from (4).
>>
>>> Side note: the datasource routing pattern is useless if you have an
>> entity manager routing pattern and only use JPA to do database work, both
>> will more easily conflict than help.
>>
>> The idea is not to have an entity manager routing pattern, rather to have
>> a cache routing patter on the single entity manager factory; or just to
>> configure some predefined partitions.
>>
>>> If you still want to plug the datacase (query cache) configuration in
>> the jpa properties can take a custom fully qualified name too.
>>> Le lun. 8 janv. 2024 à 17:14, Francesco Chicchiriccò <
>> ilgrosso@apache.org>
>>> a écrit :
>>>
>>>> Hi there,
>>>> at Syncope we have been implementing multi-tenancy by relying on
>> something
>>>> like:
>>>>
>>>> * 1 data source per tenant
>>>> * 1 entity manager factory per tenant
>>>> * 1 transaction manager per tenant
>>>> * etc
>>>>
>>>> So far so good.
>>>>
>>>> Now I am experimenting a different approach similar to [1], e.g.
>>>>
>>>> * 1 low-level data source per tenant
>>>> * 1 data source extending Spring's AbstractRoutingDataSource using the
>>>> value of a ThreadLocal variable as lookup key
>>>> * 1 single entity manager factory configured with the routing data
>> source
>>>> * 1 single transaction manager
>>>> * etc
>>>>
>>>> It mostly works but I am having caching issues with concurrent
>> operations
>>>> working on different tenants, so I was wondering: how can I extend the
>>>> various OpenJPA (query, data, L1, L2, every one) caches to hold back
>>>> different actual instances per tenant and to use the appropriate one
>>>> depending on the same ThreadLocal value I have already used above for
>> data
>>>> sources?
>>>>
>>>> Thanks in advance.
>>>> Regards.
>>>>
>>>> [1] https://github.com/Cepr0/sb-multitenant-db-demo


-- 
Francesco Chicchiriccò

Tirasa - Open Source Excellence
http://www.tirasa.net/

Member at The Apache Software Foundation
Syncope, Cocoon, Olingo, CXF, OpenJPA, PonyMail
http://home.apache.org/~ilgrosso/


Re: Multi-tenancy and caching issues

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Hi Francesco,

While you have an EMF router you don't have pitfall 4, it only happens if
your routing is done at datasource level but it also means you have way
more side effects and you start to loose the hability to tune per tenant (a
common pattern is to tune the cache per tenant "size"/usage, there all
would be shared, not isolated so no real way to handle anything there).

Note: having routed caches can make it work somehow but will need a lot of
reimplementation of the cache whereas it is free when using a routed emf.
It can be faked with PartitionedDataCache overriding the key name
(appending the tenant) but in terms of supervision I fear it will be way
harder and I'm not sure it would be very consummable for people (you end up
making the leak risk higher for users by design and you don't get any
benefit from that - you don't reduce the overhead, you don't reduce the
pool size etc which are at another level).

In terms of spring-data integration there is also no link, just @Bean EMF
routedEmf() and you'll get it working transparently while a tx - cache
scope of spring - is for a single tenant.

Hope I'm not missing something "key" ;).

Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> |  Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/application-development/java-ee-8-high-performance>


Le mar. 9 janv. 2024 à 10:32, Francesco Chicchiriccò <il...@apache.org>
a écrit :

> Hi Romain,
> see my replies embedded below.
>
> Regards.
>
> On 08/01/24 17:43, Romain Manni-Bucau wrote:
> > Hi Francesco,
> >
> > Normally if you have one EMF per tenant there is no leak between them
> since the cache instance is stored in the EMF - used that approach in TomEE.
>
> As I am saying below, this is what we have already in Syncope.
>
> My company is also supporting customers heavily using this particular
> feature: it works, I have no issues with that.
> Someone is also building a SaaS solution on top of that, so runtime tenant
> addition and removal is also fine.
>
> I am exploring this different approach because it would allow to introduce
> Spring Data JPA, which could have some benefits - see
> https://issues.apache.org/jira/browse/SYNCOPE-1799
>
> > You can check it in
> org.apache.openjpa.datacache.DataCacheManagerImpl#initialize of each emf
> which should be different.
>
> Thanks for the pointer.
>
> > So overall if there is a leak it is likely that it leaks accross
> transactions or some spring cache level.
>
> I think that things are more subtle: consider the following use case.
>
> We have MyEntity with String @Id.
>
> Suppose we have two tenants: A and B.
>
> 1. Tenant A will make a REST call which creates a MyEntity instance with
> key "key1" under the db for A.
>
> 2. Tenant A will make another REST call which looks for the newly created
> MyEntity instance via:
>
> entityManager.find(MyEntity.class, "key1");
>
> 3. Tenant B makes the same call as (1) with the same key "key1": all is
> fine, a new row is created under the db for B.
>
> 4. Tenant B makes the same call as (2) with the same key "key1": if not
> already evicted, entityManager will return the MyEntity instance for Tenant
> A from the cache.
>
> I need to avoid the pitfalls from (4).
>
> > Side note: the datasource routing pattern is useless if you have an
> entity manager routing pattern and only use JPA to do database work, both
> will more easily conflict than help.
>
> The idea is not to have an entity manager routing pattern, rather to have
> a cache routing patter on the single entity manager factory; or just to
> configure some predefined partitions.
>
> > If you still want to plug the datacase (query cache) configuration in
> the jpa properties can take a custom fully qualified name too.
> >
> > Le lun. 8 janv. 2024 à 17:14, Francesco Chicchiriccò <
> ilgrosso@apache.org>
> > a écrit :
> >
> >> Hi there,
> >> at Syncope we have been implementing multi-tenancy by relying on
> something
> >> like:
> >>
> >> * 1 data source per tenant
> >> * 1 entity manager factory per tenant
> >> * 1 transaction manager per tenant
> >> * etc
> >>
> >> So far so good.
> >>
> >> Now I am experimenting a different approach similar to [1], e.g.
> >>
> >> * 1 low-level data source per tenant
> >> * 1 data source extending Spring's AbstractRoutingDataSource using the
> >> value of a ThreadLocal variable as lookup key
> >> * 1 single entity manager factory configured with the routing data
> source
> >> * 1 single transaction manager
> >> * etc
> >>
> >> It mostly works but I am having caching issues with concurrent
> operations
> >> working on different tenants, so I was wondering: how can I extend the
> >> various OpenJPA (query, data, L1, L2, every one) caches to hold back
> >> different actual instances per tenant and to use the appropriate one
> >> depending on the same ThreadLocal value I have already used above for
> data
> >> sources?
> >>
> >> Thanks in advance.
> >> Regards.
> >>
> >> [1] https://github.com/Cepr0/sb-multitenant-db-demo
>
> --
> Francesco Chicchiriccò
>
> Tirasa - Open Source Excellence
> http://www.tirasa.net/
>
> Member at The Apache Software Foundation
> Syncope, Cocoon, Olingo, CXF, OpenJPA, PonyMail
> http://home.apache.org/~ilgrosso/
>
>

Re: Multi-tenancy and caching issues

Posted by Francesco Chicchiriccò <il...@apache.org>.
Hi Romain,
see my replies embedded below.

Regards.

On 08/01/24 17:43, Romain Manni-Bucau wrote:
> Hi Francesco,
>
> Normally if you have one EMF per tenant there is no leak between them since the cache instance is stored in the EMF - used that approach in TomEE.

As I am saying below, this is what we have already in Syncope.

My company is also supporting customers heavily using this particular feature: it works, I have no issues with that.
Someone is also building a SaaS solution on top of that, so runtime tenant addition and removal is also fine.

I am exploring this different approach because it would allow to introduce Spring Data JPA, which could have some benefits - see
https://issues.apache.org/jira/browse/SYNCOPE-1799

> You can check it in org.apache.openjpa.datacache.DataCacheManagerImpl#initialize of each emf which should be different.

Thanks for the pointer.

> So overall if there is a leak it is likely that it leaks accross transactions or some spring cache level.

I think that things are more subtle: consider the following use case.

We have MyEntity with String @Id.

Suppose we have two tenants: A and B.

1. Tenant A will make a REST call which creates a MyEntity instance with key "key1" under the db for A.

2. Tenant A will make another REST call which looks for the newly created MyEntity instance via:

entityManager.find(MyEntity.class, "key1");

3. Tenant B makes the same call as (1) with the same key "key1": all is fine, a new row is created under the db for B.

4. Tenant B makes the same call as (2) with the same key "key1": if not already evicted, entityManager will return the MyEntity instance for Tenant A from the cache.

I need to avoid the pitfalls from (4).

> Side note: the datasource routing pattern is useless if you have an entity manager routing pattern and only use JPA to do database work, both will more easily conflict than help.

The idea is not to have an entity manager routing pattern, rather to have a cache routing patter on the single entity manager factory; or just to configure some predefined partitions.

> If you still want to plug the datacase (query cache) configuration in the jpa properties can take a custom fully qualified name too.
>
> Le lun. 8 janv. 2024 à 17:14, Francesco Chicchiriccò <il...@apache.org>
> a écrit :
>
>> Hi there,
>> at Syncope we have been implementing multi-tenancy by relying on something
>> like:
>>
>> * 1 data source per tenant
>> * 1 entity manager factory per tenant
>> * 1 transaction manager per tenant
>> * etc
>>
>> So far so good.
>>
>> Now I am experimenting a different approach similar to [1], e.g.
>>
>> * 1 low-level data source per tenant
>> * 1 data source extending Spring's AbstractRoutingDataSource using the
>> value of a ThreadLocal variable as lookup key
>> * 1 single entity manager factory configured with the routing data source
>> * 1 single transaction manager
>> * etc
>>
>> It mostly works but I am having caching issues with concurrent operations
>> working on different tenants, so I was wondering: how can I extend the
>> various OpenJPA (query, data, L1, L2, every one) caches to hold back
>> different actual instances per tenant and to use the appropriate one
>> depending on the same ThreadLocal value I have already used above for data
>> sources?
>>
>> Thanks in advance.
>> Regards.
>>
>> [1] https://github.com/Cepr0/sb-multitenant-db-demo

-- 
Francesco Chicchiriccò

Tirasa - Open Source Excellence
http://www.tirasa.net/

Member at The Apache Software Foundation
Syncope, Cocoon, Olingo, CXF, OpenJPA, PonyMail
http://home.apache.org/~ilgrosso/


Re: Multi-tenancy and caching issues

Posted by Romain Manni-Bucau <rm...@gmail.com>.
Hi Francesco,

Normally if you have one EMF per tenant there is no leak between them since
the cache instance is stored in the EMF - used that approach in TomEE.
You can check it
in org.apache.openjpa.datacache.DataCacheManagerImpl#initialize of each emf
which should be different.

So overall if there is a leak it is likely that it leaks accross
transactions or some spring cache level.

Side note: the datasource routing pattern is useless if you have an entity
manager routing pattern and only use JPA to do database work, both will
more easily conflict than help.

If you still want to plug the datacase (query cache) configuration in the
jpa properties can take a custom fully qualified name too.

Romain Manni-Bucau
@rmannibucau <https://twitter.com/rmannibucau> |  Blog
<https://rmannibucau.metawerx.net/> | Old Blog
<http://rmannibucau.wordpress.com> | Github <https://github.com/rmannibucau> |
LinkedIn <https://www.linkedin.com/in/rmannibucau> | Book
<https://www.packtpub.com/application-development/java-ee-8-high-performance>


Le lun. 8 janv. 2024 à 17:14, Francesco Chicchiriccò <il...@apache.org>
a écrit :

> Hi there,
> at Syncope we have been implementing multi-tenancy by relying on something
> like:
>
> * 1 data source per tenant
> * 1 entity manager factory per tenant
> * 1 transaction manager per tenant
> * etc
>
> So far so good.
>
> Now I am experimenting a different approach similar to [1], e.g.
>
> * 1 low-level data source per tenant
> * 1 data source extending Spring's AbstractRoutingDataSource using the
> value of a ThreadLocal variable as lookup key
> * 1 single entity manager factory configured with the routing data source
> * 1 single transaction manager
> * etc
>
> It mostly works but I am having caching issues with concurrent operations
> working on different tenants, so I was wondering: how can I extend the
> various OpenJPA (query, data, L1, L2, every one) caches to hold back
> different actual instances per tenant and to use the appropriate one
> depending on the same ThreadLocal value I have already used above for data
> sources?
>
> Thanks in advance.
> Regards.
>
> [1] https://github.com/Cepr0/sb-multitenant-db-demo
>
> --
> Francesco Chicchiriccò
>
> Tirasa - Open Source Excellence
> http://www.tirasa.net/
>
> Member at The Apache Software Foundation
> Syncope, Cocoon, Olingo, CXF, OpenJPA, PonyMail
> http://home.apache.org/~ilgrosso/
>
>