You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Courtney Robinson <co...@hypi.io> on 2022/11/07 21:32:16 UTC

Passing non-primitive objects between EnumerableConverter and the Queryable

Hello all!
We're looking at Calcite and I've watched the video of BOSS 21
with Stamatis and Julian.

All the examples I've seen, including the lucene example from that video
and others like the ES adapter all end up serialising their values that go
into the generated code.

 ConstantExpression luceneQuery = Expressions.constant(((LuceneRel)
input).implement().query*.toString()*);

See the toString() here. What I am wondering and haven't been able to
figure out is how do I pass the generated lucene Query object rather than a
String form of it?

I've tried seeing if there was some common object and found the DataContext
that is available in the generated code but there's no way to add a value
into the DataContext from what I can see.
I tried Expressions.dynamic but it's not implemented, couldn't quite get
Expressions.lambda to work either, throws NPE because `body` is missing?

I want to stick to one question here since I have a few but this is
related.
Is it possible to avoid the code generation here altogether? In the basic
lucene demo, the AbstractEnumerable is extended
and Linq4j.enumerator(searchIndex()) gets returned from enumerator().

I guess what I'm trying to ask is if I can use what seems like the simpler
API (returning an enumerator) whilst providing the rules the other example
uses?

Repo here for those who don't know what I'm referring to
https://github.com/zabetak/calcite-tutorial/blob/main/solution/src/main/java/com/github/zabetak/calcite/tutorial/LuceneEnumerable.java#L71


I'm literally extending the lucene tutorial example so replies in that
context (getting the generated lucene Query object passed around) are
welcome and it's not far of what we're going to need to do later.

-- 
Regards,
Courtney - CEO, Hypi <https://hypi.io/custom-software-development/>
Tel: 020 8123 2413

Re: Passing non-primitive objects between EnumerableConverter and the Queryable

Posted by Courtney Robinson <co...@hypi.io>.
I appreciate the responses.
I'll give that a go based on your suggestions. Thank you

On Mon, 7 Nov 2022 at 22:57, Julian Hyde <jh...@gmail.com> wrote:

> Sorry, rather busy so my answer will be partial.
>
> The easiest thing to put into DataContext is a table. If you can make your
> ‘object’ look like a table and wire it into the schema/table hierarchy then
> you can get it from the schema tree.
>
> You can’t officially share objects between executions. But you can use
> your JSON string as a key to a cache (say a static Guava cache), so that
> when you deserialize your JSON string you will end up with the same object
> as when you last executed the query.
>
> > On Nov 7, 2022, at 2:51 PM, Courtney Robinson <co...@hypi.io> wrote:
> >
> > Okay, I understand the reasoning.
> >>
> >> If you need complex non-primitive objects a really good pattern is to
> pass
> >> in a string representation of that object (say JSON) and then build the
> >> objects using ‘root’ as a directory service.
> >>
> > I actually thought something like that was possible but I couldn't figure
> > out how to get anything into the DataContext root parameter.
> > I was successfully generating code like this
> > (org.apache.lucene.search.Query)root.get("my_query")
> > but couldn't figure out how to put query into DataContext root.
> >
> > In debug I saw that it was an anonymous class from inside
> > CalciteConnectionImpl (this was with a JDBC based setup). Knowing what I
> > know now, I guess that means it wraps the connection params? I'll
> checkout
> > the code later.
> >
> > I've since changed and followed the setup from the BOSS 21 lucene video
> as
> > I needed more control over the Lex and other options.
> >
> > What I have now was the other way I thought about doing it before
> > sending that email (forgot to include it)
> >
> > When I create the planner I create a Context that when unwrap gets
> called,
> > it returns MyDBContext
> >
> >> public static class MyDBPlannerCtx {
> >>  private final Map<MyDBTable, QueryBuilder> builders = new
> LinkedHashMap<>();
> >>
> >>  public void put(MyDBTable tbl, QueryBuilder builder) {
> >>    builders.put(tbl, builder);
> >>  }
> >>
> >>  public QueryBuilder get(MyDBTable tbl) {
> >>    return builders.get(tbl);
> >>  }
> >> }
> >>
> >> In the EnumerableConverter I then just do
> >
> > MyDBPlannerCtx ctx = getInput().getCluster() .getPlanner() .getContext()
> >> .unwrap(MyDBPlannerCtx.class);
> >> ctx.put(implementer.table, implementer.query);
> >
> >
> > Later in the Queryable it has getTable().find() and in MyTable.find I
> just
> > do ctx.get(this) to get the Query
> >
> > This works...I get the right Query but there are a lot of unknowns...it's
> > why I sent the first message.
> > Right now, each query in the tests creates a planner and all the objects
> > used so there's no chance of there being two connections (when I get to
> > that point) getting their Query mixed up...the issue I see now is I don't
> > feel like all these objects should be re-created for each query but it is
> > not clear what the lifecycle of everything should be and what objects can
> > be safely kept and used for multiple queries.
> >
> > Rather long winded but it boils down to the fact I have two ways of doing
> > this.
> >
> >   1. As you said, JSON or similar combined with DataContext root - how do
> >   I put info into DataContext?
> >   2. Is it safe to do the plan context as I have it working now and are
> >   any of these objects safe to re-use between two queries and or threads?
> >      1.
> >      CalciteSchema, RelDataTypeFactory, CalciteCatalogReader,
> SqlValidator
> >      2. I'm thinking a single instance of these per catalog
> >
> >
> > On Mon, 7 Nov 2022 at 21:44, Julian Hyde <jh...@gmail.com> wrote:
> >
> >> The goal of Enumerable convention is to create a query plan that is Java
> >> source code. Along with that goes the idea that the plan can be run by
> >> something like a ‘public static void main’ method, where the only
> >> parameters are things you could pass from the command-line.
> >>
> >> We’re not literally that strict, but that should give you an idea of
> what
> >> we’re striving for. By not passing objects around we are simplifying
> things
> >> like running the code in a debugger, running the same plan several times
> >> and/or in parallel.
> >>
> >> The one non-primitive argument is the ‘DataContext root’ parameter. This
> >> is a map that contains all of the objects we need. It acts as a
> directory
> >> service, so we can look up any Table objects based on their path
> >> (schema1.schema2.myTable).
> >>
> >> If you need complex non-primitive objects a really good pattern is to
> pass
> >> in a string representation of that object (say JSON) and then build the
> >> objects using ‘root’ as a directory service.
> >>
> >> Julian
> >>
> >>> On Nov 7, 2022, at 1:32 PM, Courtney Robinson <co...@hypi.io>
> wrote:
> >>>
> >>> Hello all!
> >>> We're looking at Calcite and I've watched the video of BOSS 21
> >>> with Stamatis and Julian.
> >>>
> >>> All the examples I've seen, including the lucene example from that
> video
> >>> and others like the ES adapter all end up serialising their values that
> >> go
> >>> into the generated code.
> >>>
> >>> ConstantExpression luceneQuery = Expressions.constant(((LuceneRel)
> >>> input).implement().query*.toString()*);
> >>>
> >>> See the toString() here. What I am wondering and haven't been able to
> >>> figure out is how do I pass the generated lucene Query object rather
> >> than a
> >>> String form of it?
> >>>
> >>> I've tried seeing if there was some common object and found the
> >> DataContext
> >>> that is available in the generated code but there's no way to add a
> value
> >>> into the DataContext from what I can see.
> >>> I tried Expressions.dynamic but it's not implemented, couldn't quite
> get
> >>> Expressions.lambda to work either, throws NPE because `body` is
> missing?
> >>>
> >>> I want to stick to one question here since I have a few but this is
> >>> related.
> >>> Is it possible to avoid the code generation here altogether? In the
> basic
> >>> lucene demo, the AbstractEnumerable is extended
> >>> and Linq4j.enumerator(searchIndex()) gets returned from enumerator().
> >>>
> >>> I guess what I'm trying to ask is if I can use what seems like the
> >> simpler
> >>> API (returning an enumerator) whilst providing the rules the other
> >> example
> >>> uses?
> >>>
> >>> Repo here for those who don't know what I'm referring to
> >>>
> >>
> https://github.com/zabetak/calcite-tutorial/blob/main/solution/src/main/java/com/github/zabetak/calcite/tutorial/LuceneEnumerable.java#L71
> >>>
> >>>
> >>> I'm literally extending the lucene tutorial example so replies in that
> >>> context (getting the generated lucene Query object passed around) are
> >>> welcome and it's not far of what we're going to need to do later.
> >>>
> >>> --
> >>> Regards,
> >>> Courtney - CEO, Hypi <https://hypi.io/custom-software-development/>
> >>> Tel: 020 8123 2413
> >>
> >>
> >
> > --
> > Regards,
> > Courtney - CEO, Hypi <https://hypi.io/custom-software-development/>
> > Tel: 020 8123 2413
>
>

-- 
Regards,
Courtney - CEO, Hypi <https://hypi.io/custom-software-development/>
Tel: 020 8123 2413

Re: Passing non-primitive objects between EnumerableConverter and the Queryable

Posted by Julian Hyde <jh...@gmail.com>.
Sorry, rather busy so my answer will be partial.

The easiest thing to put into DataContext is a table. If you can make your ‘object’ look like a table and wire it into the schema/table hierarchy then you can get it from the schema tree.

You can’t officially share objects between executions. But you can use your JSON string as a key to a cache (say a static Guava cache), so that when you deserialize your JSON string you will end up with the same object as when you last executed the query.

> On Nov 7, 2022, at 2:51 PM, Courtney Robinson <co...@hypi.io> wrote:
> 
> Okay, I understand the reasoning.
>> 
>> If you need complex non-primitive objects a really good pattern is to pass
>> in a string representation of that object (say JSON) and then build the
>> objects using ‘root’ as a directory service.
>> 
> I actually thought something like that was possible but I couldn't figure
> out how to get anything into the DataContext root parameter.
> I was successfully generating code like this
> (org.apache.lucene.search.Query)root.get("my_query")
> but couldn't figure out how to put query into DataContext root.
> 
> In debug I saw that it was an anonymous class from inside
> CalciteConnectionImpl (this was with a JDBC based setup). Knowing what I
> know now, I guess that means it wraps the connection params? I'll checkout
> the code later.
> 
> I've since changed and followed the setup from the BOSS 21 lucene video as
> I needed more control over the Lex and other options.
> 
> What I have now was the other way I thought about doing it before
> sending that email (forgot to include it)
> 
> When I create the planner I create a Context that when unwrap gets called,
> it returns MyDBContext
> 
>> public static class MyDBPlannerCtx {
>>  private final Map<MyDBTable, QueryBuilder> builders = new LinkedHashMap<>();
>> 
>>  public void put(MyDBTable tbl, QueryBuilder builder) {
>>    builders.put(tbl, builder);
>>  }
>> 
>>  public QueryBuilder get(MyDBTable tbl) {
>>    return builders.get(tbl);
>>  }
>> }
>> 
>> In the EnumerableConverter I then just do
> 
> MyDBPlannerCtx ctx = getInput().getCluster() .getPlanner() .getContext()
>> .unwrap(MyDBPlannerCtx.class);
>> ctx.put(implementer.table, implementer.query);
> 
> 
> Later in the Queryable it has getTable().find() and in MyTable.find I just
> do ctx.get(this) to get the Query
> 
> This works...I get the right Query but there are a lot of unknowns...it's
> why I sent the first message.
> Right now, each query in the tests creates a planner and all the objects
> used so there's no chance of there being two connections (when I get to
> that point) getting their Query mixed up...the issue I see now is I don't
> feel like all these objects should be re-created for each query but it is
> not clear what the lifecycle of everything should be and what objects can
> be safely kept and used for multiple queries.
> 
> Rather long winded but it boils down to the fact I have two ways of doing
> this.
> 
>   1. As you said, JSON or similar combined with DataContext root - how do
>   I put info into DataContext?
>   2. Is it safe to do the plan context as I have it working now and are
>   any of these objects safe to re-use between two queries and or threads?
>      1.
>      CalciteSchema, RelDataTypeFactory, CalciteCatalogReader, SqlValidator
>      2. I'm thinking a single instance of these per catalog
> 
> 
> On Mon, 7 Nov 2022 at 21:44, Julian Hyde <jh...@gmail.com> wrote:
> 
>> The goal of Enumerable convention is to create a query plan that is Java
>> source code. Along with that goes the idea that the plan can be run by
>> something like a ‘public static void main’ method, where the only
>> parameters are things you could pass from the command-line.
>> 
>> We’re not literally that strict, but that should give you an idea of what
>> we’re striving for. By not passing objects around we are simplifying things
>> like running the code in a debugger, running the same plan several times
>> and/or in parallel.
>> 
>> The one non-primitive argument is the ‘DataContext root’ parameter. This
>> is a map that contains all of the objects we need. It acts as a directory
>> service, so we can look up any Table objects based on their path
>> (schema1.schema2.myTable).
>> 
>> If you need complex non-primitive objects a really good pattern is to pass
>> in a string representation of that object (say JSON) and then build the
>> objects using ‘root’ as a directory service.
>> 
>> Julian
>> 
>>> On Nov 7, 2022, at 1:32 PM, Courtney Robinson <co...@hypi.io> wrote:
>>> 
>>> Hello all!
>>> We're looking at Calcite and I've watched the video of BOSS 21
>>> with Stamatis and Julian.
>>> 
>>> All the examples I've seen, including the lucene example from that video
>>> and others like the ES adapter all end up serialising their values that
>> go
>>> into the generated code.
>>> 
>>> ConstantExpression luceneQuery = Expressions.constant(((LuceneRel)
>>> input).implement().query*.toString()*);
>>> 
>>> See the toString() here. What I am wondering and haven't been able to
>>> figure out is how do I pass the generated lucene Query object rather
>> than a
>>> String form of it?
>>> 
>>> I've tried seeing if there was some common object and found the
>> DataContext
>>> that is available in the generated code but there's no way to add a value
>>> into the DataContext from what I can see.
>>> I tried Expressions.dynamic but it's not implemented, couldn't quite get
>>> Expressions.lambda to work either, throws NPE because `body` is missing?
>>> 
>>> I want to stick to one question here since I have a few but this is
>>> related.
>>> Is it possible to avoid the code generation here altogether? In the basic
>>> lucene demo, the AbstractEnumerable is extended
>>> and Linq4j.enumerator(searchIndex()) gets returned from enumerator().
>>> 
>>> I guess what I'm trying to ask is if I can use what seems like the
>> simpler
>>> API (returning an enumerator) whilst providing the rules the other
>> example
>>> uses?
>>> 
>>> Repo here for those who don't know what I'm referring to
>>> 
>> https://github.com/zabetak/calcite-tutorial/blob/main/solution/src/main/java/com/github/zabetak/calcite/tutorial/LuceneEnumerable.java#L71
>>> 
>>> 
>>> I'm literally extending the lucene tutorial example so replies in that
>>> context (getting the generated lucene Query object passed around) are
>>> welcome and it's not far of what we're going to need to do later.
>>> 
>>> --
>>> Regards,
>>> Courtney - CEO, Hypi <https://hypi.io/custom-software-development/>
>>> Tel: 020 8123 2413
>> 
>> 
> 
> -- 
> Regards,
> Courtney - CEO, Hypi <https://hypi.io/custom-software-development/>
> Tel: 020 8123 2413


Re: Passing non-primitive objects between EnumerableConverter and the Queryable

Posted by Courtney Robinson <co...@hypi.io>.
Okay, I understand the reasoning.
>
> If you need complex non-primitive objects a really good pattern is to pass
> in a string representation of that object (say JSON) and then build the
> objects using ‘root’ as a directory service.
>
I actually thought something like that was possible but I couldn't figure
out how to get anything into the DataContext root parameter.
I was successfully generating code like this
(org.apache.lucene.search.Query)root.get("my_query")
but couldn't figure out how to put query into DataContext root.

In debug I saw that it was an anonymous class from inside
CalciteConnectionImpl (this was with a JDBC based setup). Knowing what I
know now, I guess that means it wraps the connection params? I'll checkout
the code later.

I've since changed and followed the setup from the BOSS 21 lucene video as
I needed more control over the Lex and other options.

What I have now was the other way I thought about doing it before
sending that email (forgot to include it)

When I create the planner I create a Context that when unwrap gets called,
it returns MyDBContext

> public static class MyDBPlannerCtx {
>   private final Map<MyDBTable, QueryBuilder> builders = new LinkedHashMap<>();
>
>   public void put(MyDBTable tbl, QueryBuilder builder) {
>     builders.put(tbl, builder);
>   }
>
>   public QueryBuilder get(MyDBTable tbl) {
>     return builders.get(tbl);
>   }
> }
>
> In the EnumerableConverter I then just do

MyDBPlannerCtx ctx = getInput().getCluster() .getPlanner() .getContext()
> .unwrap(MyDBPlannerCtx.class);
> ctx.put(implementer.table, implementer.query);


Later in the Queryable it has getTable().find() and in MyTable.find I just
do ctx.get(this) to get the Query

This works...I get the right Query but there are a lot of unknowns...it's
why I sent the first message.
Right now, each query in the tests creates a planner and all the objects
used so there's no chance of there being two connections (when I get to
that point) getting their Query mixed up...the issue I see now is I don't
feel like all these objects should be re-created for each query but it is
not clear what the lifecycle of everything should be and what objects can
be safely kept and used for multiple queries.

Rather long winded but it boils down to the fact I have two ways of doing
this.

   1. As you said, JSON or similar combined with DataContext root - how do
   I put info into DataContext?
   2. Is it safe to do the plan context as I have it working now and are
   any of these objects safe to re-use between two queries and or threads?
      1.
      CalciteSchema, RelDataTypeFactory, CalciteCatalogReader, SqlValidator
      2. I'm thinking a single instance of these per catalog


On Mon, 7 Nov 2022 at 21:44, Julian Hyde <jh...@gmail.com> wrote:

> The goal of Enumerable convention is to create a query plan that is Java
> source code. Along with that goes the idea that the plan can be run by
> something like a ‘public static void main’ method, where the only
> parameters are things you could pass from the command-line.
>
> We’re not literally that strict, but that should give you an idea of what
> we’re striving for. By not passing objects around we are simplifying things
> like running the code in a debugger, running the same plan several times
> and/or in parallel.
>
> The one non-primitive argument is the ‘DataContext root’ parameter. This
> is a map that contains all of the objects we need. It acts as a directory
> service, so we can look up any Table objects based on their path
> (schema1.schema2.myTable).
>
> If you need complex non-primitive objects a really good pattern is to pass
> in a string representation of that object (say JSON) and then build the
> objects using ‘root’ as a directory service.
>
> Julian
>
> > On Nov 7, 2022, at 1:32 PM, Courtney Robinson <co...@hypi.io> wrote:
> >
> > Hello all!
> > We're looking at Calcite and I've watched the video of BOSS 21
> > with Stamatis and Julian.
> >
> > All the examples I've seen, including the lucene example from that video
> > and others like the ES adapter all end up serialising their values that
> go
> > into the generated code.
> >
> > ConstantExpression luceneQuery = Expressions.constant(((LuceneRel)
> > input).implement().query*.toString()*);
> >
> > See the toString() here. What I am wondering and haven't been able to
> > figure out is how do I pass the generated lucene Query object rather
> than a
> > String form of it?
> >
> > I've tried seeing if there was some common object and found the
> DataContext
> > that is available in the generated code but there's no way to add a value
> > into the DataContext from what I can see.
> > I tried Expressions.dynamic but it's not implemented, couldn't quite get
> > Expressions.lambda to work either, throws NPE because `body` is missing?
> >
> > I want to stick to one question here since I have a few but this is
> > related.
> > Is it possible to avoid the code generation here altogether? In the basic
> > lucene demo, the AbstractEnumerable is extended
> > and Linq4j.enumerator(searchIndex()) gets returned from enumerator().
> >
> > I guess what I'm trying to ask is if I can use what seems like the
> simpler
> > API (returning an enumerator) whilst providing the rules the other
> example
> > uses?
> >
> > Repo here for those who don't know what I'm referring to
> >
> https://github.com/zabetak/calcite-tutorial/blob/main/solution/src/main/java/com/github/zabetak/calcite/tutorial/LuceneEnumerable.java#L71
> >
> >
> > I'm literally extending the lucene tutorial example so replies in that
> > context (getting the generated lucene Query object passed around) are
> > welcome and it's not far of what we're going to need to do later.
> >
> > --
> > Regards,
> > Courtney - CEO, Hypi <https://hypi.io/custom-software-development/>
> > Tel: 020 8123 2413
>
>

-- 
Regards,
Courtney - CEO, Hypi <https://hypi.io/custom-software-development/>
Tel: 020 8123 2413

Re: Passing non-primitive objects between EnumerableConverter and the Queryable

Posted by Julian Hyde <jh...@gmail.com>.
The goal of Enumerable convention is to create a query plan that is Java source code. Along with that goes the idea that the plan can be run by something like a ‘public static void main’ method, where the only parameters are things you could pass from the command-line.

We’re not literally that strict, but that should give you an idea of what we’re striving for. By not passing objects around we are simplifying things like running the code in a debugger, running the same plan several times and/or in parallel.

The one non-primitive argument is the ‘DataContext root’ parameter. This is a map that contains all of the objects we need. It acts as a directory service, so we can look up any Table objects based on their path (schema1.schema2.myTable).

If you need complex non-primitive objects a really good pattern is to pass in a string representation of that object (say JSON) and then build the objects using ‘root’ as a directory service.

Julian

> On Nov 7, 2022, at 1:32 PM, Courtney Robinson <co...@hypi.io> wrote:
> 
> Hello all!
> We're looking at Calcite and I've watched the video of BOSS 21
> with Stamatis and Julian.
> 
> All the examples I've seen, including the lucene example from that video
> and others like the ES adapter all end up serialising their values that go
> into the generated code.
> 
> ConstantExpression luceneQuery = Expressions.constant(((LuceneRel)
> input).implement().query*.toString()*);
> 
> See the toString() here. What I am wondering and haven't been able to
> figure out is how do I pass the generated lucene Query object rather than a
> String form of it?
> 
> I've tried seeing if there was some common object and found the DataContext
> that is available in the generated code but there's no way to add a value
> into the DataContext from what I can see.
> I tried Expressions.dynamic but it's not implemented, couldn't quite get
> Expressions.lambda to work either, throws NPE because `body` is missing?
> 
> I want to stick to one question here since I have a few but this is
> related.
> Is it possible to avoid the code generation here altogether? In the basic
> lucene demo, the AbstractEnumerable is extended
> and Linq4j.enumerator(searchIndex()) gets returned from enumerator().
> 
> I guess what I'm trying to ask is if I can use what seems like the simpler
> API (returning an enumerator) whilst providing the rules the other example
> uses?
> 
> Repo here for those who don't know what I'm referring to
> https://github.com/zabetak/calcite-tutorial/blob/main/solution/src/main/java/com/github/zabetak/calcite/tutorial/LuceneEnumerable.java#L71
> 
> 
> I'm literally extending the lucene tutorial example so replies in that
> context (getting the generated lucene Query object passed around) are
> welcome and it's not far of what we're going to need to do later.
> 
> -- 
> Regards,
> Courtney - CEO, Hypi <https://hypi.io/custom-software-development/>
> Tel: 020 8123 2413