You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Dan Di Spaltro <da...@gmail.com> on 2016/03/25 08:17:02 UTC

Tenanted SQL

I'd basically like to offer some tables tenanted which, in this case, means
filtered by some tenant.  How would I do that the right way in Calcite?
Obviously the tricky part is during the query execution there's no way to
pass extra variables/markers/whatever using JDBC.  Which is where I'd like
to make the tenant decision (vs connection creation).

So I was wondering if someone has some example code laying around on how to
do that, or just some general tips to get me going in the right direction.
There will be many tenants, and hopefully a reasonable amount of
connections to the db.



-Dan

Re: Tenanted SQL

Posted by Dan Di Spaltro <da...@gmail.com>.
I checked out CURRENT_TIMESTAMP, so there seem to be two paths it can
take.  In the case of a jdbc backend it just passes the function name
during the rel2sql phase.  I see maybe if you are using the reflective
schema it does uses the datacontext (which I think refers to the special
logic you mention above for holding the value) but I just can't seem to
understand all the pieces and how'd they fit together.  Here's what I've
been discovering:

1) CURRENT_PATH doesn't work on something like Postgres, so I noticed it
gets all the way to sending the sql to the backend before failing at the PG
level.  CURRENT_TIMESTAMP does work in PG and get's passed directly in,
similarly.
2) Do views work differently or is there some materialization that happens
earlier, I couldn't quite figure that out.  Specifically, how this relates
to making a special function.
3) Code wise, I started off by using the sql parser, and then manipulating
the tree such that constraints of tenant were imposed, but it feels real
hacky (and fragily) so now I am trying to move to this model which seems
like a better solution.  I basically have the following code working,
defined my own AdapterContext and DataContext, inherited from
CalcitePrepareImpl have an execute function that basically hobbles it all
together. Would I have do something on the implementor side to get the
generated SQL to fill in that particular value at execution time?  I've
tried adding functions, but that seems like it's meant to correspond to an
actual UDF, or am I declaring them incorrectly?
4) Should I use the rules system to accomplish any of this?

And just to step back is this a valid usecase?  More or less I want to
provide people with a SQL interface that feels like you have access to the
whole database when in reality it's just your tenant's slice of your own
data.  Calcite feels like a good fit for something like that.

Thanks for any pointers!

-Dan

On Thu, Apr 7, 2016 at 11:26 PM, Jacques Nadeau <ja...@apache.org> wrote:

> FYI, In Apache Drill we expose USER for exactly this purpose, allow it to
> be used in views and also ensure that it leverages constant reduction and
> or partition pruning where possible. I think we also have a group concept
> but I can't find the docs on that at the moment.
>
> On Thu, Apr 7, 2016 at 10:08 PM, Julian Hyde <jh...@apache.org> wrote:
>
> > CURRENT_TENANT would not be an ordinary function. As I said, it should
> > work something like CURRENT_TIMESTAMP, which is not an ordinary function
> > either.
> >
> > Yeah, I wish the doc was more flushed out, too. :)
> >
> >
> > > On Apr 7, 2016, at 8:50 PM, Dan Di Spaltro <da...@gmail.com>
> > wrote:
> > >
> > > So are you suggesting doing something like this?
> > >
> >
> https://calcite.apache.org/docs/tutorial.html#tables-and-views-in-schemas
> > >
> > > And using a special function `tenant_id = CURRENT_TENANT()` in place of
> > > `gender = \'F\'`
> > >
> > > I really wish some of the further topics at the bottom were more
> flushed
> > > out, specifically how to add functions.
> > >
> > > -Dan
> > >
> > > On Sun, Mar 27, 2016 at 9:11 PM, Julian Hyde <jh...@apache.org> wrote:
> > >
> > >> See comments inline.
> > >>
> > >> Julian
> > >>
> > >>
> > >>> On Mar 25, 2016, at 9:29 AM, Dan Di Spaltro <dan.dispaltro@gmail.com
> >
> > >> wrote:
> > >>>
> > >>> So I definitely understand the data side of the target database
> ("A"),
> > >> that
> > >>> I am virtualizing.
> > >>>
> > >>> I guess more specific questions would be:
> > >>> * How would I expose only two tables from "A" (they would both
> include
> > >> the
> > >>> tenantId field), I'm guess I might override the JdbcSchema using some
> > >>> whitelist.
> > >>
> > >> A whitelist is one approach. Another is to keep the JdbcSchema
> private,
> > >> but create views in another schema that reference those tables.
> Tenants
> > >> would only see the views.
> > >>
> > >>> * Since I want to give everyone the same virtual table space (with
> > >>> different "data"), would I need to look in overriding some of the
> Jdbc
> > >> core
> > >>> implementation?
> > >>
> > >> I don’t think you need to change anything in the JDBC adapter. All of
> > the
> > >> smarts will be in the views.
> > >>
> > >>> * I would need to use the parsed tree and then add the tenantId
> filter
> > >>> * Somehow pass in the tenantId during query time, ideally at the
> > >> statement
> > >>> vs the connection level.
> > >>
> > >> You could put tenantId into the DataContext. In SQL it would be
> accessed
> > >> using a function. This is very similar to how CURRENT_TIMESTAMP
> function
> > >> works.
> > >>
> > >>>
> > >>> Anyways, was just looking for some pointers, as there is a lot of
> code
> > >>> here. And anything would be much appreciated.  I am happy to share
> some
> > >> of
> > >>> the work once it's done.
> > >>
> > >>
> > >>
> > >>
> >
> >
>

Re: Tenanted SQL

Posted by Jacques Nadeau <ja...@apache.org>.
FYI, In Apache Drill we expose USER for exactly this purpose, allow it to
be used in views and also ensure that it leverages constant reduction and
or partition pruning where possible. I think we also have a group concept
but I can't find the docs on that at the moment.

On Thu, Apr 7, 2016 at 10:08 PM, Julian Hyde <jh...@apache.org> wrote:

> CURRENT_TENANT would not be an ordinary function. As I said, it should
> work something like CURRENT_TIMESTAMP, which is not an ordinary function
> either.
>
> Yeah, I wish the doc was more flushed out, too. :)
>
>
> > On Apr 7, 2016, at 8:50 PM, Dan Di Spaltro <da...@gmail.com>
> wrote:
> >
> > So are you suggesting doing something like this?
> >
> https://calcite.apache.org/docs/tutorial.html#tables-and-views-in-schemas
> >
> > And using a special function `tenant_id = CURRENT_TENANT()` in place of
> > `gender = \'F\'`
> >
> > I really wish some of the further topics at the bottom were more flushed
> > out, specifically how to add functions.
> >
> > -Dan
> >
> > On Sun, Mar 27, 2016 at 9:11 PM, Julian Hyde <jh...@apache.org> wrote:
> >
> >> See comments inline.
> >>
> >> Julian
> >>
> >>
> >>> On Mar 25, 2016, at 9:29 AM, Dan Di Spaltro <da...@gmail.com>
> >> wrote:
> >>>
> >>> So I definitely understand the data side of the target database ("A"),
> >> that
> >>> I am virtualizing.
> >>>
> >>> I guess more specific questions would be:
> >>> * How would I expose only two tables from "A" (they would both include
> >> the
> >>> tenantId field), I'm guess I might override the JdbcSchema using some
> >>> whitelist.
> >>
> >> A whitelist is one approach. Another is to keep the JdbcSchema private,
> >> but create views in another schema that reference those tables. Tenants
> >> would only see the views.
> >>
> >>> * Since I want to give everyone the same virtual table space (with
> >>> different "data"), would I need to look in overriding some of the Jdbc
> >> core
> >>> implementation?
> >>
> >> I don’t think you need to change anything in the JDBC adapter. All of
> the
> >> smarts will be in the views.
> >>
> >>> * I would need to use the parsed tree and then add the tenantId filter
> >>> * Somehow pass in the tenantId during query time, ideally at the
> >> statement
> >>> vs the connection level.
> >>
> >> You could put tenantId into the DataContext. In SQL it would be accessed
> >> using a function. This is very similar to how CURRENT_TIMESTAMP function
> >> works.
> >>
> >>>
> >>> Anyways, was just looking for some pointers, as there is a lot of code
> >>> here. And anything would be much appreciated.  I am happy to share some
> >> of
> >>> the work once it's done.
> >>
> >>
> >>
> >>
>
>

Re: Tenanted SQL

Posted by Julian Hyde <jh...@apache.org>.
CURRENT_TENANT would not be an ordinary function. As I said, it should work something like CURRENT_TIMESTAMP, which is not an ordinary function either.

Yeah, I wish the doc was more flushed out, too. :)


> On Apr 7, 2016, at 8:50 PM, Dan Di Spaltro <da...@gmail.com> wrote:
> 
> So are you suggesting doing something like this?
> https://calcite.apache.org/docs/tutorial.html#tables-and-views-in-schemas
> 
> And using a special function `tenant_id = CURRENT_TENANT()` in place of
> `gender = \'F\'`
> 
> I really wish some of the further topics at the bottom were more flushed
> out, specifically how to add functions.
> 
> -Dan
> 
> On Sun, Mar 27, 2016 at 9:11 PM, Julian Hyde <jh...@apache.org> wrote:
> 
>> See comments inline.
>> 
>> Julian
>> 
>> 
>>> On Mar 25, 2016, at 9:29 AM, Dan Di Spaltro <da...@gmail.com>
>> wrote:
>>> 
>>> So I definitely understand the data side of the target database ("A"),
>> that
>>> I am virtualizing.
>>> 
>>> I guess more specific questions would be:
>>> * How would I expose only two tables from "A" (they would both include
>> the
>>> tenantId field), I'm guess I might override the JdbcSchema using some
>>> whitelist.
>> 
>> A whitelist is one approach. Another is to keep the JdbcSchema private,
>> but create views in another schema that reference those tables. Tenants
>> would only see the views.
>> 
>>> * Since I want to give everyone the same virtual table space (with
>>> different "data"), would I need to look in overriding some of the Jdbc
>> core
>>> implementation?
>> 
>> I don’t think you need to change anything in the JDBC adapter. All of the
>> smarts will be in the views.
>> 
>>> * I would need to use the parsed tree and then add the tenantId filter
>>> * Somehow pass in the tenantId during query time, ideally at the
>> statement
>>> vs the connection level.
>> 
>> You could put tenantId into the DataContext. In SQL it would be accessed
>> using a function. This is very similar to how CURRENT_TIMESTAMP function
>> works.
>> 
>>> 
>>> Anyways, was just looking for some pointers, as there is a lot of code
>>> here. And anything would be much appreciated.  I am happy to share some
>> of
>>> the work once it's done.
>> 
>> 
>> 
>> 


Re: Tenanted SQL

Posted by Dan Di Spaltro <da...@gmail.com>.
So are you suggesting doing something like this?
https://calcite.apache.org/docs/tutorial.html#tables-and-views-in-schemas

And using a special function `tenant_id = CURRENT_TENANT()` in place of
 `gender = \'F\'`

I really wish some of the further topics at the bottom were more flushed
out, specifically how to add functions.

-Dan

On Sun, Mar 27, 2016 at 9:11 PM, Julian Hyde <jh...@apache.org> wrote:

> See comments inline.
>
> Julian
>
>
> > On Mar 25, 2016, at 9:29 AM, Dan Di Spaltro <da...@gmail.com>
> wrote:
> >
> > So I definitely understand the data side of the target database ("A"),
> that
> > I am virtualizing.
> >
> > I guess more specific questions would be:
> > * How would I expose only two tables from "A" (they would both include
> the
> > tenantId field), I'm guess I might override the JdbcSchema using some
> > whitelist.
>
> A whitelist is one approach. Another is to keep the JdbcSchema private,
> but create views in another schema that reference those tables. Tenants
> would only see the views.
>
> > * Since I want to give everyone the same virtual table space (with
> > different "data"), would I need to look in overriding some of the Jdbc
> core
> > implementation?
>
> I don’t think you need to change anything in the JDBC adapter. All of the
> smarts will be in the views.
>
> > * I would need to use the parsed tree and then add the tenantId filter
> > * Somehow pass in the tenantId during query time, ideally at the
> statement
> > vs the connection level.
>
> You could put tenantId into the DataContext. In SQL it would be accessed
> using a function. This is very similar to how CURRENT_TIMESTAMP function
> works.
>
> >
> > Anyways, was just looking for some pointers, as there is a lot of code
> > here. And anything would be much appreciated.  I am happy to share some
> of
> > the work once it's done.
>
>
>
>

Re: Tenanted SQL

Posted by Julian Hyde <jh...@apache.org>.
See comments inline.

Julian


> On Mar 25, 2016, at 9:29 AM, Dan Di Spaltro <da...@gmail.com> wrote:
> 
> So I definitely understand the data side of the target database ("A"), that
> I am virtualizing.
> 
> I guess more specific questions would be:
> * How would I expose only two tables from "A" (they would both include the
> tenantId field), I'm guess I might override the JdbcSchema using some
> whitelist.

A whitelist is one approach. Another is to keep the JdbcSchema private, but create views in another schema that reference those tables. Tenants would only see the views.

> * Since I want to give everyone the same virtual table space (with
> different "data"), would I need to look in overriding some of the Jdbc core
> implementation?

I don’t think you need to change anything in the JDBC adapter. All of the smarts will be in the views. 

> * I would need to use the parsed tree and then add the tenantId filter
> * Somehow pass in the tenantId during query time, ideally at the statement
> vs the connection level.

You could put tenantId into the DataContext. In SQL it would be accessed using a function. This is very similar to how CURRENT_TIMESTAMP function works.

> 
> Anyways, was just looking for some pointers, as there is a lot of code
> here. And anything would be much appreciated.  I am happy to share some of
> the work once it's done.




Re: Tenanted SQL

Posted by Dan Di Spaltro <da...@gmail.com>.
So I definitely understand the data side of the target database ("A"), that
I am virtualizing.

I guess more specific questions would be:
* How would I expose only two tables from "A" (they would both include the
tenantId field), I'm guess I might override the JdbcSchema using some
whitelist.
* Since I want to give everyone the same virtual table space (with
different "data"), would I need to look in overriding some of the Jdbc core
implementation?
* I would need to use the parsed tree and then add the tenantId filter
* Somehow pass in the tenantId during query time, ideally at the statement
vs the connection level.

Anyways, was just looking for some pointers, as there is a lot of code
here. And anything would be much appreciated.  I am happy to share some of
the work once it's done.

Thanks!

-Dan

On Fri, Mar 25, 2016 at 12:50 AM, Julian Hyde <jh...@apache.org> wrote:

> Maybe you give each table a tenantId column, and give each tenant a view
> that adds a “WHERE tenantId = XXX”. Thus each tenant only sees its own
> stuff.
>
> It could be a different view for each tenant, or it could be same view
> with XXX a call to a function that retrieves, say, a variable in the
> connect string.
>
> See https://issues.apache.org/jira/browse/CALCITE-505 <
> https://issues.apache.org/jira/browse/CALCITE-505>, which made views of
> this kind updatable. If you insert a record into such a view, the tenantId
> column will automatically be assigned the value XXX.
>
> Julian
>
>
> > On Mar 25, 2016, at 12:17 AM, Dan Di Spaltro <da...@gmail.com>
> wrote:
> >
> > I'd basically like to offer some tables tenanted which, in this case,
> means
> > filtered by some tenant.  How would I do that the right way in Calcite?
> > Obviously the tricky part is during the query execution there's no way to
> > pass extra variables/markers/whatever using JDBC.  Which is where I'd
> like
> > to make the tenant decision (vs connection creation).
> >
> > So I was wondering if someone has some example code laying around on how
> to
> > do that, or just some general tips to get me going in the right
> direction.
> > There will be many tenants, and hopefully a reasonable amount of
> > connections to the db.
> >
> >
> >
> > -Dan
>
>

Re: Tenanted SQL

Posted by Julian Hyde <jh...@apache.org>.
Maybe you give each table a tenantId column, and give each tenant a view that adds a “WHERE tenantId = XXX”. Thus each tenant only sees its own stuff.

It could be a different view for each tenant, or it could be same view with XXX a call to a function that retrieves, say, a variable in the connect string.

See https://issues.apache.org/jira/browse/CALCITE-505 <https://issues.apache.org/jira/browse/CALCITE-505>, which made views of this kind updatable. If you insert a record into such a view, the tenantId column will automatically be assigned the value XXX.

Julian


> On Mar 25, 2016, at 12:17 AM, Dan Di Spaltro <da...@gmail.com> wrote:
> 
> I'd basically like to offer some tables tenanted which, in this case, means
> filtered by some tenant.  How would I do that the right way in Calcite?
> Obviously the tricky part is during the query execution there's no way to
> pass extra variables/markers/whatever using JDBC.  Which is where I'd like
> to make the tenant decision (vs connection creation).
> 
> So I was wondering if someone has some example code laying around on how to
> do that, or just some general tips to get me going in the right direction.
> There will be many tenants, and hopefully a reasonable amount of
> connections to the db.
> 
> 
> 
> -Dan