You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@predictionio.apache.org by David Jones <da...@resolvedigital.com> on 2016/09/08 23:08:07 UTC

Multitenancy on the Universal Product Recommender

Hi All,

I have a use case where I have events coming in from many seperate tenants
and I want to use the Universal Product Recommender engine. The challenge
is separating data from each tenant throughout the PIO process.

I can think of three possible ways to solve this issue, but they all have
tradeoffs:

*1) Create Multiple Apps*

You have one app per tenant. When you create events, you use the access key
specific to that tenant. Then you query for recommendations using that same
access key to get recommendations for just that app.

Issue: each engine has to specify an “appName” in engine.json. So now you
have to have an engine per tenant (AKA app) that has all the same source
code except for the “appName” will be different.

This’ll result in a bunch of duplicated code and you’ll have to train and
deploy each one individually.

There is also no API for creating apps, so something will need to be
created to bridge that to allow a new tenant to be on boarded.

*2) Use Channels*

You create one app, but create a channel per tenant. When you create an
event you specific the channel.

Issue: the Universal Recommender engine can be modified to look at data for
a single channel name but that name cannot be dynamically queried, it’ll be
hardcoded into DataSource.scala. So now you’re in this same situation where
you’ll need to create one engine per tenant, where each engine has the
exact same source code except a one line change in the DataSource.scala
file.

*3) Use Product Properties*

Provided your user ids are unique over all tenants, you could set a
property on each product with a tenant id.

This way you can use one app, one engine, and simply query for
recommendations and supply a significant bias to products that contain the
tenant id property.

Example, give me the top recommendations for user xyz who is on tenant_id
12.

{
  "user": "xyz",
  "fields": [
    {
      "name": tenant_id",
      "values": ["12"],
      "bias": 10
    }
  ]
}

Issues: since all the data for all tenants is in one place, you’re going to
have to train over all tenant’s data each time. There’s also issues around
risk of deleting data from the wrong tenant should a tenant leave.

-
I was wondering if anyone has done something to any of these options?
Perhaps there are other options? Are there any better ones? I’m thinking
option 3) might be the best for our needs.

Thanks,
David.

Re: Multitenancy on the Universal Product Recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.
As I said below, best to send me a private message, the feature is not in the Apache version. Or make a feature request by creating a Jira for PIO.


On Sep 9, 2016, at 10:03 AM, Dipen Patel <pa...@gmail.com> wrote:

Could you please provide links to resources on the PIO that supports multi-tenancy with lightweight Actors one per tenant. 

On Thu, Sep 8, 2016 at 7:52 PM, Pat Ferrel <pat@occamsmachete.com <ma...@occamsmachete.com>> wrote:
I’m the maintainer of the Universal Recommender. We have OSS support at https://groups.google.com/forum/#!forum/actionml-user <https://groups.google.com/forum/#!forum/actionml-user>

Do you wish to take advantage of the same user being in multiple datasets/tenants? The answer below is assuming no.

There are several ways to do this. First the PIO EventServer is multi-tenant, just keep data in separate “apps” which really should be named “datasets” they are IDed by keys generated when you do `pio app new <your-app-name>

The PredictionServer is not multi-tenant but you can put a separate process on different ports. You would train each tenant from a different directory containing the UR and the correct engine.json for that tenant/dataset. Then deploy it on some port that is specific to the tenant/model. This will create somewhat heavyweight processes for each port.

We have a version of PIO that supports multi-tenancy with lightweight Actors one per tenant. You deploy with a resource-id and when you make queries include the REST resource id in the URI. All engines are on the same port running in the same process so it’s very light-weight and performant. Otherwise the query works the same. Private message me to hear more.

I would not advise the item property method, unless you know there is no overlap in user-ids it may produce undesired results in the model and these may leak into recommendations. You can solve that with a filter (instead of the boost below) but there are better ways to solve this.



On Sep 8, 2016, at 4:08 PM, David Jones <dave@resolvedigital.com <ma...@resolvedigital.com>> wrote:

Hi All,

I have a use case where I have events coming in from many seperate tenants and I want to use the Universal Product Recommender engine. The challenge is separating data from each tenant throughout the PIO process.

I can think of three possible ways to solve this issue, but they all have tradeoffs:

1) Create Multiple Apps

You have one app per tenant. When you create events, you use the access key specific to that tenant. Then you query for recommendations using that same access key to get recommendations for just that app.

Issue: each engine has to specify an “appName” in engine.json. So now you have to have an engine per tenant (AKA app) that has all the same source code except for the “appName” will be different.

This’ll result in a bunch of duplicated code and you’ll have to train and deploy each one individually.

There is also no API for creating apps, so something will need to be created to bridge that to allow a new tenant to be on boarded.

2) Use Channels

You create one app, but create a channel per tenant. When you create an event you specific the channel.

Issue: the Universal Recommender engine can be modified to look at data for a single channel name but that name cannot be dynamically queried, it’ll be hardcoded into DataSource.scala. So now you’re in this same situation where you’ll need to create one engine per tenant, where each engine has the exact same source code except a one line change in the DataSource.scala file.

3) Use Product Properties

Provided your user ids are unique over all tenants, you could set a property on each product with a tenant id.

This way you can use one app, one engine, and simply query for recommendations and supply a significant bias to products that contain the tenant id property.

Example, give me the top recommendations for user xyz who is on tenant_id 12.

{
  "user": "xyz",
  "fields": [
    {
      "name": tenant_id",
      "values": ["12"],
      "bias": 10
    }
  ]
}

Issues: since all the data for all tenants is in one place, you’re going to have to train over all tenant’s data each time. There’s also issues around risk of deleting data from the wrong tenant should a tenant leave.

-
I was wondering if anyone has done something to any of these options? Perhaps there are other options? Are there any better ones? I’m thinking option 3) might be the best for our needs.

Thanks,
David.




Re: Multitenancy on the Universal Product Recommender

Posted by Dipen Patel <pa...@gmail.com>.
Could you please provide links to resources on the PIO that supports
multi-tenancy with lightweight Actors one per tenant.

On Thu, Sep 8, 2016 at 7:52 PM, Pat Ferrel <pa...@occamsmachete.com> wrote:

> I’m the maintainer of the Universal Recommender. We have OSS support at
> https://groups.google.com/forum/#!forum/actionml-user
>
> Do you wish to take advantage of the same user being in multiple
> datasets/tenants? The answer below is assuming no.
>
> There are several ways to do this. First the PIO EventServer is
> multi-tenant, just keep data in separate “apps” which really should be
> named “datasets” they are IDed by keys generated when you do `pio app new
> <your-app-name>
>
> The PredictionServer is not multi-tenant but you can put a separate
> process on different ports. You would train each tenant from a different
> directory containing the UR and the correct engine.json for that
> tenant/dataset. Then deploy it on some port that is specific to the
> tenant/model. This will create somewhat heavyweight processes for each port.
>
> We have a version of PIO that supports multi-tenancy with lightweight
> Actors one per tenant. You deploy with a resource-id and when you make
> queries include the REST resource id in the URI. All engines are on the
> same port running in the same process so it’s very light-weight and
> performant. Otherwise the query works the same. Private message me to hear
> more.
>
> I would not advise the item property method, unless you know there is no
> overlap in user-ids it may produce undesired results in the model and these
> may leak into recommendations. You can solve that with a filter (instead of
> the boost below) but there are better ways to solve this.
>
>
>
> On Sep 8, 2016, at 4:08 PM, David Jones <da...@resolvedigital.com> wrote:
>
> Hi All,
>
> I have a use case where I have events coming in from many seperate tenants
> and I want to use the Universal Product Recommender engine. The challenge
> is separating data from each tenant throughout the PIO process.
>
> I can think of three possible ways to solve this issue, but they all have
> tradeoffs:
>
> *1) Create Multiple Apps*
>
> You have one app per tenant. When you create events, you use the access
> key specific to that tenant. Then you query for recommendations using that
> same access key to get recommendations for just that app.
>
> Issue: each engine has to specify an “appName” in engine.json. So now you
> have to have an engine per tenant (AKA app) that has all the same source
> code except for the “appName” will be different.
>
> This’ll result in a bunch of duplicated code and you’ll have to train and
> deploy each one individually.
>
> There is also no API for creating apps, so something will need to be
> created to bridge that to allow a new tenant to be on boarded.
>
> *2) Use Channels*
>
> You create one app, but create a channel per tenant. When you create an
> event you specific the channel.
>
> Issue: the Universal Recommender engine can be modified to look at data
> for a single channel name but that name cannot be dynamically queried,
> it’ll be hardcoded into DataSource.scala. So now you’re in this same
> situation where you’ll need to create one engine per tenant, where each
> engine has the exact same source code except a one line change in the
> DataSource.scala file.
>
> *3) Use Product Properties*
>
> Provided your user ids are unique over all tenants, you could set a
> property on each product with a tenant id.
>
> This way you can use one app, one engine, and simply query for
> recommendations and supply a significant bias to products that contain the
> tenant id property.
>
> Example, give me the top recommendations for user xyz who is on tenant_id
> 12.
>
> {
>   "user": "xyz",
>   "fields": [
>     {
>       "name": tenant_id",
>       "values": ["12"],
>       "bias": 10
>     }
>   ]
> }
>
> Issues: since all the data for all tenants is in one place, you’re going
> to have to train over all tenant’s data each time. There’s also issues
> around risk of deleting data from the wrong tenant should a tenant leave.
>
> -
> I was wondering if anyone has done something to any of these options?
> Perhaps there are other options? Are there any better ones? I’m thinking
> option 3) might be the best for our needs.
>
> Thanks,
> David.
>
>

Re: Multitenancy on the Universal Product Recommender

Posted by Pat Ferrel <pa...@occamsmachete.com>.
I’m the maintainer of the Universal Recommender. We have OSS support at https://groups.google.com/forum/#!forum/actionml-user <https://groups.google.com/forum/#!forum/actionml-user>

Do you wish to take advantage of the same user being in multiple datasets/tenants? The answer below is assuming no.

There are several ways to do this. First the PIO EventServer is multi-tenant, just keep data in separate “apps” which really should be named “datasets” they are IDed by keys generated when you do `pio app new <your-app-name>

The PredictionServer is not multi-tenant but you can put a separate process on different ports. You would train each tenant from a different directory containing the UR and the correct engine.json for that tenant/dataset. Then deploy it on some port that is specific to the tenant/model. This will create somewhat heavyweight processes for each port.

We have a version of PIO that supports multi-tenancy with lightweight Actors one per tenant. You deploy with a resource-id and when you make queries include the REST resource id in the URI. All engines are on the same port running in the same process so it’s very light-weight and performant. Otherwise the query works the same. Private message me to hear more.

I would not advise the item property method, unless you know there is no overlap in user-ids it may produce undesired results in the model and these may leak into recommendations. You can solve that with a filter (instead of the boost below) but there are better ways to solve this.


On Sep 8, 2016, at 4:08 PM, David Jones <da...@resolvedigital.com> wrote:

Hi All,

I have a use case where I have events coming in from many seperate tenants and I want to use the Universal Product Recommender engine. The challenge is separating data from each tenant throughout the PIO process.

I can think of three possible ways to solve this issue, but they all have tradeoffs:

1) Create Multiple Apps

You have one app per tenant. When you create events, you use the access key specific to that tenant. Then you query for recommendations using that same access key to get recommendations for just that app.

Issue: each engine has to specify an “appName” in engine.json. So now you have to have an engine per tenant (AKA app) that has all the same source code except for the “appName” will be different.

This’ll result in a bunch of duplicated code and you’ll have to train and deploy each one individually.

There is also no API for creating apps, so something will need to be created to bridge that to allow a new tenant to be on boarded.

2) Use Channels

You create one app, but create a channel per tenant. When you create an event you specific the channel.

Issue: the Universal Recommender engine can be modified to look at data for a single channel name but that name cannot be dynamically queried, it’ll be hardcoded into DataSource.scala. So now you’re in this same situation where you’ll need to create one engine per tenant, where each engine has the exact same source code except a one line change in the DataSource.scala file.

3) Use Product Properties

Provided your user ids are unique over all tenants, you could set a property on each product with a tenant id.

This way you can use one app, one engine, and simply query for recommendations and supply a significant bias to products that contain the tenant id property.

Example, give me the top recommendations for user xyz who is on tenant_id 12.

{
  "user": "xyz",
  "fields": [
    {
      "name": tenant_id",
      "values": ["12"],
      "bias": 10
    }
  ]
}

Issues: since all the data for all tenants is in one place, you’re going to have to train over all tenant’s data each time. There’s also issues around risk of deleting data from the wrong tenant should a tenant leave.

-
I was wondering if anyone has done something to any of these options? Perhaps there are other options? Are there any better ones? I’m thinking option 3) might be the best for our needs.

Thanks,
David.